gawk: Leftmost Longest
1
1 3.5 How Much Text Matches?
1 ==========================
1
1 Consider the following:
1
1 echo aaaabcd | awk '{ sub(/a+/, "<A>"); print }'
1
1 This example uses the 'sub()' function to make a change to the input
1 record. ('sub()' replaces the first instance of any text matched by the
11 first argument with the string provided as the second argument; ⇒
String Functions.) Here, the regexp '/a+/' indicates "one or more 'a'
1 characters," and the replacement text is '<A>'.
1
1 The input contains four 'a' characters. 'awk' (and POSIX) regular
1 expressions always match the leftmost, _longest_ sequence of input
1 characters that can match. Thus, all four 'a' characters are replaced
1 with '<A>' in this example:
1
1 $ echo aaaabcd | awk '{ sub(/a+/, "<A>"); print }'
1 -| <A>bcd
1
1 For simple match/no-match tests, this is not so important. But when
1 doing text matching and substitutions with the 'match()', 'sub()',
11 'gsub()', and 'gensub()' functions, it is very important. ⇒String
Functions, for more information on these functions. Understanding
1 this principle is also important for regexp-based record and field
1 splitting (⇒Records, and also ⇒Field Separators).
1