gawk: GNU Regexp Operators

1 
1 3.7 'gawk'-Specific Regexp Operators
1 ====================================
1 
1 GNU software that deals with regular expressions provides a number of
1 additional regexp operators.  These operators are described in this
1 minor node and are specific to 'gawk'; they are not available in other
1 'awk' implementations.  Most of the additional operators deal with word
1 matching.  For our purposes, a "word" is a sequence of one or more
1 letters, digits, or underscores ('_'):
1 
1 '\s'
1      Matches any whitespace character.  Think of it as shorthand for
1      '[[:space:]]'.
1 
1 '\S'
1      Matches any character that is not whitespace.  Think of it as
1      shorthand for '[^[:space:]]'.
1 
1 '\w'
1      Matches any word-constituent character--that is, it matches any
1      letter, digit, or underscore.  Think of it as shorthand for
1      '[[:alnum:]_]'.
1 
1 '\W'
1      Matches any character that is not word-constituent.  Think of it as
1      shorthand for '[^[:alnum:]_]'.
1 
1 '\<'
1      Matches the empty string at the beginning of a word.  For example,
1      '/\<away/' matches 'away' but not 'stowaway'.
1 
1 '\>'
1      Matches the empty string at the end of a word.  For example,
1      '/stow\>/' matches 'stow' but not 'stowaway'.
1 
1 '\y'
1      Matches the empty string at either the beginning or the end of a
1      word (i.e., the word boundar*y*).  For example, '\yballs?\y'
1      matches either 'ball' or 'balls', as a separate word.
1 
1 '\B'
1      Matches the empty string that occurs between two word-constituent
1      characters.  For example, '/\Brat\B/' matches 'crate', but it does
1      not match 'dirty rat'.  '\B' is essentially the opposite of '\y'.
1 
1    There are two other operators that work on buffers.  In Emacs, a
1 "buffer" is, naturally, an Emacs buffer.  Other GNU programs, including
1 'gawk', consider the entire string to match as the buffer.  The
1 operators are:
1 
1 '\`'
1      Matches the empty string at the beginning of a buffer (string)
1 
1 '\''
1      Matches the empty string at the end of a buffer (string)
1 
1    Because '^' and '$' always work in terms of the beginning and end of
1 strings, these operators don't add any new capabilities for 'awk'.  They
1 are provided for compatibility with other GNU software.
1 
1    In other GNU software, the word-boundary operator is '\b'.  However,
1 that conflicts with the 'awk' language's definition of '\b' as
1 backspace, so 'gawk' uses a different letter.  An alternative method
1 would have been to require two backslashes in the GNU operators, but
1 this was deemed too confusing.  The current method of using '\y' for the
1 GNU '\b' appears to be the lesser of two evils.
1 
1    The various command-line options (⇒Options) control how 'gawk'
1 interprets characters in regexps:
1 
1 No options
1      In the default case, 'gawk' provides all the facilities of POSIX
11      regexps and the GNU regexp operators described in ⇒Regexp
      Operators.
1 
1 '--posix'
1      Match only POSIX regexps; the GNU operators are not special (e.g.,
1      '\w' matches a literal 'w').  Interval expressions are allowed.
1 
1 '--traditional'
1      Match traditional Unix 'awk' regexps.  The GNU operators are not
1      special, and interval expressions are not available.  Because BWK
1      'awk' supports them, the POSIX character classes ('[[:alnum:]]',
1      etc.)  are available.  Characters described by octal and
1      hexadecimal escape sequences are treated literally, even if they
1      represent regexp metacharacters.
1 
1 '--re-interval'
1      Allow interval expressions in regexps, if '--traditional' has been
1      provided.  Otherwise, interval expressions are available by
1      default.
1