gawk: Standard Regexp Constants

1 
1 6.1.2.1 Standard Regular Expression Constants
1 .............................................
1 
1 When used on the righthand side of the '~' or '!~' operators, a regexp
1 constant merely stands for the regexp that is to be matched.  However,
1 regexp constants (such as '/foo/') may be used like simple expressions.
1 When a regexp constant appears by itself, it has the same meaning as if
11 it appeared in a pattern (i.e., '($0 ~ /foo/)').  (d.c.)  ⇒
 Expression Patterns.  This means that the following two code segments:
1 
1      if ($0 ~ /barfly/ || $0 ~ /camelot/)
1          print "found"
1 
1 and:
1 
1      if (/barfly/ || /camelot/)
1          print "found"
1 
1 are exactly equivalent.  One rather bizarre consequence of this rule is
1 that the following Boolean expression is valid, but does not do what its
1 author probably intended:
1 
1      # Note that /foo/ is on the left of the ~
1      if (/foo/ ~ $1) print "found foo"
1 
1 This code is "obviously" testing '$1' for a match against the regexp
1 '/foo/'.  But in fact, the expression '/foo/ ~ $1' really means '($0 ~
1 /foo/) ~ $1'.  In other words, first match the input record against the
1 regexp '/foo/'.  The result is either zero or one, depending upon the
1 success or failure of the match.  That result is then matched against
1 the first field in the record.  Because it is unlikely that you would
1 ever really want to make this kind of test, 'gawk' issues a warning when
1 it sees this construct in a program.  Another consequence of this rule
1 is that the assignment statement:
1 
1      matches = /foo/
1 
1 assigns either zero or one to the variable 'matches', depending upon the
1 contents of the current input record.
1 
1    Constant regular expressions are also used as the first argument for
1 the 'gensub()', 'sub()', and 'gsub()' functions, as the second argument
1 of the 'match()' function, and as the third argument of the 'split()'
1 and 'patsplit()' functions (⇒String Functions).  Modern
1 implementations of 'awk', including 'gawk', allow the third argument of
1 'split()' to be a regexp constant, but some older implementations do
1 not.  (d.c.)  Because some built-in functions accept regexp constants as
1 arguments, confusion can arise when attempting to use regexp constants
1 as arguments to user-defined functions (⇒User-defined).  For
1 example:
1 
1      function mysub(pat, repl, str, global)
1      {
1          if (global)
1              gsub(pat, repl, str)
1          else
1              sub(pat, repl, str)
1          return str
1      }
1 
1      {
1          ...
1          text = "hi! hi yourself!"
1          mysub(/hi/, "howdy", text, 1)
1          ...
1      }
1 
1    In this example, the programmer wants to pass a regexp constant to
1 the user-defined function 'mysub()', which in turn passes it on to
1 either 'sub()' or 'gsub()'.  However, what really happens is that the
1 'pat' parameter is assigned a value of either one or zero, depending
1 upon whether or not '$0' matches '/hi/'.  'gawk' issues a warning when
1 it sees a regexp constant used as a parameter to a user-defined
1 function, because passing a truth value in this way is probably not what
1 was intended.
1