m4: Changeword

1 
1 8.4 Changing the lexical structure of words
1 ===========================================
1 
1      The macro 'changeword' and all associated functionality is
1      experimental.  It is only available if the '--enable-changeword'
1      option was given to 'configure', at GNU 'm4' installation time.
1      The functionality will go away in the future, to be replaced by
1      other new features that are more efficient at providing the same
1      capabilities.  _Do not rely on it_.  Please direct your comments
1      about it the same way you would do for bugs.
1 
1    A file being processed by 'm4' is split into quoted strings, words
1 (potential macro names) and simple tokens (any other single character).
1 Initially a word is defined by the following regular expression:
1 
1      [_a-zA-Z][_a-zA-Z0-9]*
1 
1    Using 'changeword', you can change this regular expression:
1 
1  -- Optional builtin: changeword (REGEX)
1      Changes the regular expression for recognizing macro names to be
1      REGEX.  If REGEX is empty, use '[_a-zA-Z][_a-zA-Z0-9]*'.  REGEX
1      must obey the constraint that every prefix of the desired final
1      pattern is also accepted by the regular expression.  If REGEX
1      contains grouping parentheses, the macro invoked is the portion
1      that matched the first group, rather than the entire matching
1      string.
1 
1      The expansion of 'changeword' is void.  The macro 'changeword' is
1      recognized only with parameters.
1 
1    Relaxing the lexical rules of 'm4' might be useful (for example) if
1 you wanted to apply translations to a file of numbers:
1 
1      ifdef(`changeword', `', `errprint(` skipping: no changeword support
1      ')m4exit(`77')')dnl
1      changeword(`[_a-zA-Z0-9]+')
1      =>
1      define(`1', `0')1
1      =>0
1 
1    Tightening the lexical rules is less useful, because it will
1 generally make some of the builtins unavailable.  You could use it to
1 prevent accidental call of builtins, for example:
1 
1      ifdef(`changeword', `', `errprint(` skipping: no changeword support
1      ')m4exit(`77')')dnl
1      define(`_indir', defn(`indir'))
1      =>
1      changeword(`_[_a-zA-Z0-9]*')
1      =>
1      esyscmd(`foo')
1      =>esyscmd(foo)
1      _indir(`esyscmd', `echo hi')
1      =>hi
1      =>
1 
1    Because 'm4' constructs its words a character at a time, there is a
1 restriction on the regular expressions that may be passed to
1 'changeword'.  This is that if your regular expression accepts 'foo', it
1 must also accept 'f' and 'fo'.
1 
1      ifdef(`changeword', `', `errprint(` skipping: no changeword support
1      ')m4exit(`77')')dnl
1      define(`foo
1      ', `bar
1      ')
1      =>
1      dnl This example wants to recognize changeword, dnl, and `foo\n'.
1      dnl First, we check that our regexp will match.
1      regexp(`changeword', `[cd][a-z]*\|foo[
1      ]')
1      =>0
1      regexp(`foo
1      ', `[cd][a-z]*\|foo[
1      ]')
1      =>0
1      regexp(`f', `[cd][a-z]*\|foo[
1      ]')
1      =>-1
1      foo
1      =>foo
1      changeword(`[cd][a-z]*\|foo[
1      ]')
1      =>
1      dnl Even though `foo\n' matches, we forgot to allow `f'.
1      foo
1      =>foo
1      changeword(`[cd][a-z]*\|fo*[
1      ]?')
1      =>
1      dnl Now we can call `foo\n'.
1      foo
1      =>bar
1 
1    'changeword' has another function.  If the regular expression
1 supplied contains any grouped subexpressions, then text outside the
1 first of these is discarded before symbol lookup.  So:
1 
1      ifdef(`changeword', `', `errprint(` skipping: no changeword support
1      ')m4exit(`77')')dnl
1      ifdef(`__unix__', ,
1            `errprint(` skipping: syscmd does not have unix semantics
1      ')m4exit(`77')')dnl
1      changecom(`/*', `*/')dnl
1      define(`foo', `bar')dnl
1      changeword(`#\([_a-zA-Z0-9]*\)')
1      =>
1      #esyscmd(`echo foo \#foo')
1      =>foo bar
1      =>
1 
1    'm4' now requires a '#' mark at the beginning of every macro
1 invocation, so one can use 'm4' to preprocess plain text without losing
1 various words like 'divert'.
1 
1    In 'm4', macro substitution is based on text, while in TeX, it is
1 based on tokens.  'changeword' can throw this difference into relief.
1 For example, here is the same idea represented in TeX and 'm4'.  First,
1 the TeX version:
1 
1      \def\a{\message{Hello}}
1      \catcode`\@=0
1      \catcode`\\=12
1      @a
1      @bye
1      =>Hello
1 
1 Then, the 'm4' version:
1 
1      ifdef(`changeword', `', `errprint(` skipping: no changeword support
1      ')m4exit(`77')')dnl
1      define(`a', `errprint(`Hello')')dnl
1      changeword(`@\([_a-zA-Z0-9]*\)')
1      =>
1      @a
1      =>errprint(Hello)
1 
1    In the TeX example, the first line defines a macro 'a' to print the
1 message 'Hello'.  The second line defines <@> to be usable instead of
1 <\> as an escape character.  The third line defines <\> to be a normal
1 printing character, not an escape.  The fourth line invokes the macro
1 'a'.  So, when TeX is run on this file, it displays the message 'Hello'.
1 
1    When the 'm4' example is passed through 'm4', it outputs
1 'errprint(Hello)'.  The reason for this is that TeX does lexical
1 analysis of macro definition when the macro is _defined_.  'm4' just
1 stores the text, postponing the lexical analysis until the macro is
1 _used_.
1 
1    You should note that using 'changeword' will slow 'm4' down by a
1 factor of about seven, once it is changed to something other than the
1 default regular expression.  You can invoke 'changeword' with the empty
1 string to restore the default word definition, and regain the parsing
1 speed.
1