gawk: Egrep Program
1
1 11.2.2 Searching for Regular Expressions in Files
1 -------------------------------------------------
1
1 The 'egrep' utility searches files for patterns. It uses regular
11 expressions that are almost identical to those available in 'awk' (⇒
Regexp). You invoke it as follows:
1
1 'egrep' [OPTIONS] ''PATTERN'' FILES ...
1
1 The PATTERN is a regular expression. In typical usage, the regular
1 expression is quoted to prevent the shell from expanding any of the
1 special characters as file name wildcards. Normally, 'egrep' prints the
1 lines that matched. If multiple file names are provided on the command
1 line, each output line is preceded by the name of the file and a colon.
1
1 The options to 'egrep' are as follows:
1
1 '-c'
1 Print out a count of the lines that matched the pattern, instead of
1 the lines themselves.
1
1 '-s'
1 Be silent. No output is produced and the exit value indicates
1 whether the pattern was matched.
1
1 '-v'
1 Invert the sense of the test. 'egrep' prints the lines that do
1 _not_ match the pattern and exits successfully if the pattern is
1 not matched.
1
1 '-i'
1 Ignore case distinctions in both the pattern and the input data.
1
1 '-l'
1 Only print (list) the names of the files that matched, not the
1 lines that matched.
1
1 '-e PATTERN'
1 Use PATTERN as the regexp to match. The purpose of the '-e' option
1 is to allow patterns that start with a '-'.
1
11 Function::) and the file transition library program (⇒Filetrans
Function).
1
1 The program begins with a descriptive comment and then a 'BEGIN' rule
1 that processes the command-line arguments with 'getopt()'. The '-i'
1 (ignore case) option is particularly easy with 'gawk'; we just use the
1 'IGNORECASE' predefined variable (⇒Built-in Variables):
1
1 # egrep.awk --- simulate egrep in awk
1 #
1 # Options:
1 # -c count of lines
1 # -s silent - use exit value
1 # -v invert test, success if no match
1 # -i ignore case
1 # -l print filenames only
1 # -e argument is pattern
1 #
1 # Requires getopt and file transition library functions
1
1 BEGIN {
1 while ((c = getopt(ARGC, ARGV, "ce:svil")) != -1) {
1 if (c == "c")
1 count_only++
1 else if (c == "s")
1 no_print++
1 else if (c == "v")
1 invert++
1 else if (c == "i")
1 IGNORECASE = 1
1 else if (c == "l")
1 filenames_only++
1 else if (c == "e")
1 pattern = Optarg
1 else
1 usage()
1 }
1
1 Next comes the code that handles the 'egrep'-specific behavior. If
1 no pattern is supplied with '-e', the first nonoption on the command
1 line is used. The 'awk' command-line arguments up to 'ARGV[Optind]' are
1 cleared, so that 'awk' won't try to process them as files. If no files
1 are specified, the standard input is used, and if multiple files are
1 specified, we make sure to note this so that the file names can precede
1 the matched lines in the output:
1
1 if (pattern == "")
1 pattern = ARGV[Optind++]
1
1 for (i = 1; i < Optind; i++)
1 ARGV[i] = ""
1 if (Optind >= ARGC) {
1 ARGV[1] = "-"
1 ARGC = 2
1 } else if (ARGC - Optind > 1)
1 do_filenames++
1
1 # if (IGNORECASE)
1 # pattern = tolower(pattern)
1 }
1
1 The last two lines are commented out, as they are not needed in
1 'gawk'. They should be uncommented if you have to use another version
1 of 'awk'.
1
1 The next set of lines should be uncommented if you are not using
1 'gawk'. This rule translates all the characters in the input line into
1 lowercase if the '-i' option is specified.(1) The rule is commented out
1 as it is not necessary with 'gawk':
1
1 #{
1 # if (IGNORECASE)
1 # $0 = tolower($0)
1 #}
1
1 The 'beginfile()' function is called by the rule in 'ftrans.awk' when
1 each new file is processed. In this case, it is very simple; all it
1 does is initialize a variable 'fcount' to zero. 'fcount' tracks how
1 many lines in the current file matched the pattern. Naming the
1 parameter 'junk' shows we know that 'beginfile()' is called with a
1 parameter, but that we're not interested in its value:
1
1 function beginfile(junk)
1 {
1 fcount = 0
1 }
1
1 The 'endfile()' function is called after each file has been
1 processed. It affects the output only when the user wants a count of
1 the number of lines that matched. 'no_print' is true only if the exit
1 status is desired. 'count_only' is true if line counts are desired.
1 'egrep' therefore only prints line counts if printing and counting are
1 enabled. The output format must be adjusted depending upon the number
1 of files to process. Finally, 'fcount' is added to 'total', so that we
1 know the total number of lines that matched the pattern:
1
1 function endfile(file)
1 {
1 if (! no_print && count_only) {
1 if (do_filenames)
1 print file ":" fcount
1 else
1 print fcount
1 }
1
1 total += fcount
1 }
1
11 The 'BEGINFILE' and 'ENDFILE' special patterns (⇒
BEGINFILE/ENDFILE) could be used, but then the program would be
1 'gawk'-specific. Additionally, this example was written before 'gawk'
1 acquired 'BEGINFILE' and 'ENDFILE'.
1
1 The following rule does most of the work of matching lines. The
1 variable 'matches' is true if the line matched the pattern. If the user
1 wants lines that did not match, the sense of 'matches' is inverted using
1 the '!' operator. 'fcount' is incremented with the value of 'matches',
1 which is either one or zero, depending upon a successful or unsuccessful
1 match. If the line does not match, the 'next' statement just moves on
1 to the next record.
1
1 A number of additional tests are made, but they are only done if we
1 are not counting lines. First, if the user only wants the exit status
1 ('no_print' is true), then it is enough to know that _one_ line in this
1 file matched, and we can skip on to the next file with 'nextfile'.
1 Similarly, if we are only printing file names, we can print the file
1 name, and then skip to the next file with 'nextfile'. Finally, each
1 line is printed, with a leading file name and colon if necessary:
1
1 {
1 matches = ($0 ~ pattern)
1 if (invert)
1 matches = ! matches
1
1 fcount += matches # 1 or 0
1
1 if (! matches)
1 next
1
1 if (! count_only) {
1 if (no_print)
1 nextfile
1
1 if (filenames_only) {
1 print FILENAME
1 nextfile
1 }
1
1 if (do_filenames)
1 print FILENAME ":" $0
1 else
1 print
1 }
1 }
1
1 The 'END' rule takes care of producing the correct exit status. If
1 there are no matches, the exit status is one; otherwise, it is zero:
1
1 END {
1 exit (total == 0)
1 }
1
1 The 'usage()' function prints a usage message in case of invalid
1 options, and then exits:
1
1 function usage()
1 {
1 print("Usage: egrep [-csvil] [-e pat] [files ...]") > "/dev/stderr"
1 print("\n\tegrep [-csvil] pat [files ...]") > "/dev/stderr"
1 exit 1
1 }
1
1 ---------- Footnotes ----------
1
1 (1) It also introduces a subtle bug; if a match happens, we output
1 the translated line, not the original.
1