gawk: Standard Regexp Constants
1
1 6.1.2.1 Standard Regular Expression Constants
1 .............................................
1
1 When used on the righthand side of the '~' or '!~' operators, a regexp
1 constant merely stands for the regexp that is to be matched. However,
1 regexp constants (such as '/foo/') may be used like simple expressions.
1 When a regexp constant appears by itself, it has the same meaning as if
11 it appeared in a pattern (i.e., '($0 ~ /foo/)'). (d.c.) ⇒
Expression Patterns. This means that the following two code segments:
1
1 if ($0 ~ /barfly/ || $0 ~ /camelot/)
1 print "found"
1
1 and:
1
1 if (/barfly/ || /camelot/)
1 print "found"
1
1 are exactly equivalent. One rather bizarre consequence of this rule is
1 that the following Boolean expression is valid, but does not do what its
1 author probably intended:
1
1 # Note that /foo/ is on the left of the ~
1 if (/foo/ ~ $1) print "found foo"
1
1 This code is "obviously" testing '$1' for a match against the regexp
1 '/foo/'. But in fact, the expression '/foo/ ~ $1' really means '($0 ~
1 /foo/) ~ $1'. In other words, first match the input record against the
1 regexp '/foo/'. The result is either zero or one, depending upon the
1 success or failure of the match. That result is then matched against
1 the first field in the record. Because it is unlikely that you would
1 ever really want to make this kind of test, 'gawk' issues a warning when
1 it sees this construct in a program. Another consequence of this rule
1 is that the assignment statement:
1
1 matches = /foo/
1
1 assigns either zero or one to the variable 'matches', depending upon the
1 contents of the current input record.
1
1 Constant regular expressions are also used as the first argument for
1 the 'gensub()', 'sub()', and 'gsub()' functions, as the second argument
1 of the 'match()' function, and as the third argument of the 'split()'
1 and 'patsplit()' functions (⇒String Functions). Modern
1 implementations of 'awk', including 'gawk', allow the third argument of
1 'split()' to be a regexp constant, but some older implementations do
1 not. (d.c.) Because some built-in functions accept regexp constants as
1 arguments, confusion can arise when attempting to use regexp constants
1 as arguments to user-defined functions (⇒User-defined). For
1 example:
1
1 function mysub(pat, repl, str, global)
1 {
1 if (global)
1 gsub(pat, repl, str)
1 else
1 sub(pat, repl, str)
1 return str
1 }
1
1 {
1 ...
1 text = "hi! hi yourself!"
1 mysub(/hi/, "howdy", text, 1)
1 ...
1 }
1
1 In this example, the programmer wants to pass a regexp constant to
1 the user-defined function 'mysub()', which in turn passes it on to
1 either 'sub()' or 'gsub()'. However, what really happens is that the
1 'pat' parameter is assigned a value of either one or zero, depending
1 upon whether or not '$0' matches '/hi/'. 'gawk' issues a warning when
1 it sees a regexp constant used as a parameter to a user-defined
1 function, because passing a truth value in this way is probably not what
1 was intended.
1