m4: Changeword
1
1 8.4 Changing the lexical structure of words
1 ===========================================
1
1 The macro 'changeword' and all associated functionality is
1 experimental. It is only available if the '--enable-changeword'
1 option was given to 'configure', at GNU 'm4' installation time.
1 The functionality will go away in the future, to be replaced by
1 other new features that are more efficient at providing the same
1 capabilities. _Do not rely on it_. Please direct your comments
1 about it the same way you would do for bugs.
1
1 A file being processed by 'm4' is split into quoted strings, words
1 (potential macro names) and simple tokens (any other single character).
1 Initially a word is defined by the following regular expression:
1
1 [_a-zA-Z][_a-zA-Z0-9]*
1
1 Using 'changeword', you can change this regular expression:
1
1 -- Optional builtin: changeword (REGEX)
1 Changes the regular expression for recognizing macro names to be
1 REGEX. If REGEX is empty, use '[_a-zA-Z][_a-zA-Z0-9]*'. REGEX
1 must obey the constraint that every prefix of the desired final
1 pattern is also accepted by the regular expression. If REGEX
1 contains grouping parentheses, the macro invoked is the portion
1 that matched the first group, rather than the entire matching
1 string.
1
1 The expansion of 'changeword' is void. The macro 'changeword' is
1 recognized only with parameters.
1
1 Relaxing the lexical rules of 'm4' might be useful (for example) if
1 you wanted to apply translations to a file of numbers:
1
1 ifdef(`changeword', `', `errprint(` skipping: no changeword support
1 ')m4exit(`77')')dnl
1 changeword(`[_a-zA-Z0-9]+')
1 =>
1 define(`1', `0')1
1 =>0
1
1 Tightening the lexical rules is less useful, because it will
1 generally make some of the builtins unavailable. You could use it to
1 prevent accidental call of builtins, for example:
1
1 ifdef(`changeword', `', `errprint(` skipping: no changeword support
1 ')m4exit(`77')')dnl
1 define(`_indir', defn(`indir'))
1 =>
1 changeword(`_[_a-zA-Z0-9]*')
1 =>
1 esyscmd(`foo')
1 =>esyscmd(foo)
1 _indir(`esyscmd', `echo hi')
1 =>hi
1 =>
1
1 Because 'm4' constructs its words a character at a time, there is a
1 restriction on the regular expressions that may be passed to
1 'changeword'. This is that if your regular expression accepts 'foo', it
1 must also accept 'f' and 'fo'.
1
1 ifdef(`changeword', `', `errprint(` skipping: no changeword support
1 ')m4exit(`77')')dnl
1 define(`foo
1 ', `bar
1 ')
1 =>
1 dnl This example wants to recognize changeword, dnl, and `foo\n'.
1 dnl First, we check that our regexp will match.
1 regexp(`changeword', `[cd][a-z]*\|foo[
1 ]')
1 =>0
1 regexp(`foo
1 ', `[cd][a-z]*\|foo[
1 ]')
1 =>0
1 regexp(`f', `[cd][a-z]*\|foo[
1 ]')
1 =>-1
1 foo
1 =>foo
1 changeword(`[cd][a-z]*\|foo[
1 ]')
1 =>
1 dnl Even though `foo\n' matches, we forgot to allow `f'.
1 foo
1 =>foo
1 changeword(`[cd][a-z]*\|fo*[
1 ]?')
1 =>
1 dnl Now we can call `foo\n'.
1 foo
1 =>bar
1
1 'changeword' has another function. If the regular expression
1 supplied contains any grouped subexpressions, then text outside the
1 first of these is discarded before symbol lookup. So:
1
1 ifdef(`changeword', `', `errprint(` skipping: no changeword support
1 ')m4exit(`77')')dnl
1 ifdef(`__unix__', ,
1 `errprint(` skipping: syscmd does not have unix semantics
1 ')m4exit(`77')')dnl
1 changecom(`/*', `*/')dnl
1 define(`foo', `bar')dnl
1 changeword(`#\([_a-zA-Z0-9]*\)')
1 =>
1 #esyscmd(`echo foo \#foo')
1 =>foo bar
1 =>
1
1 'm4' now requires a '#' mark at the beginning of every macro
1 invocation, so one can use 'm4' to preprocess plain text without losing
1 various words like 'divert'.
1
1 In 'm4', macro substitution is based on text, while in TeX, it is
1 based on tokens. 'changeword' can throw this difference into relief.
1 For example, here is the same idea represented in TeX and 'm4'. First,
1 the TeX version:
1
1 \def\a{\message{Hello}}
1 \catcode`\@=0
1 \catcode`\\=12
1 @a
1 @bye
1 =>Hello
1
1 Then, the 'm4' version:
1
1 ifdef(`changeword', `', `errprint(` skipping: no changeword support
1 ')m4exit(`77')')dnl
1 define(`a', `errprint(`Hello')')dnl
1 changeword(`@\([_a-zA-Z0-9]*\)')
1 =>
1 @a
1 =>errprint(Hello)
1
1 In the TeX example, the first line defines a macro 'a' to print the
1 message 'Hello'. The second line defines <@> to be usable instead of
1 <\> as an escape character. The third line defines <\> to be a normal
1 printing character, not an escape. The fourth line invokes the macro
1 'a'. So, when TeX is run on this file, it displays the message 'Hello'.
1
1 When the 'm4' example is passed through 'm4', it outputs
1 'errprint(Hello)'. The reason for this is that TeX does lexical
1 analysis of macro definition when the macro is _defined_. 'm4' just
1 stores the text, postponing the lexical analysis until the macro is
1 _used_.
1
1 You should note that using 'changeword' will slow 'm4' down by a
1 factor of about seven, once it is changed to something other than the
1 default regular expression. You can invoke 'changeword' with the empty
1 string to restore the default word definition, and regain the parsing
1 speed.
1