gawk: Simple Sed
1
1 11.3.8 A Simple Stream Editor
1 -----------------------------
1
1 The 'sed' utility is a "stream editor", a program that reads a stream of
1 data, makes changes to it, and passes it on. It is often used to make
1 global changes to a large file or to a stream of data generated by a
1 pipeline of commands. Although 'sed' is a complicated program in its
1 own right, its most common use is to perform global substitutions in the
1 middle of a pipeline:
1
1 COMMAND1 < orig.data | sed 's/old/new/g' | COMMAND2 > result
1
1 Here, 's/old/new/g' tells 'sed' to look for the regexp 'old' on each
1 input line and globally replace it with the text 'new' (i.e., all the
1 occurrences on a line). This is similar to 'awk''s 'gsub()' function
1 (⇒String Functions).
1
1 The following program, 'awksed.awk', accepts at least two
1 command-line arguments: the pattern to look for and the text to replace
1 it with. Any additional arguments are treated as data file names to
1 process. If none are provided, the standard input is used:
1
1 # awksed.awk --- do s/foo/bar/g using just print
1 # Thanks to Michael Brennan for the idea
1
1 function usage()
1 {
1 print "usage: awksed pat repl [files...]" > "/dev/stderr"
1 exit 1
1 }
1
1 BEGIN {
1 # validate arguments
1 if (ARGC < 3)
1 usage()
1
1 RS = ARGV[1]
1 ORS = ARGV[2]
1
1 # don't use arguments as files
1 ARGV[1] = ARGV[2] = ""
1 }
1
1 # look ma, no hands!
1 {
1 if (RT == "")
1 printf "%s", $0
1 else
1 print
1 }
1
1 The program relies on 'gawk''s ability to have 'RS' be a regexp, as
1 well as on the setting of 'RT' to the actual text that terminates the
1 record (⇒Records).
1
1 The idea is to have 'RS' be the pattern to look for. 'gawk'
1 automatically sets '$0' to the text between matches of the pattern.
1 This is text that we want to keep, unmodified. Then, by setting 'ORS'
1 to the replacement text, a simple 'print' statement outputs the text we
1 want to keep, followed by the replacement text.
1
1 There is one wrinkle to this scheme, which is what to do if the last
1 record doesn't end with text that matches 'RS'. Using a 'print'
1 statement unconditionally prints the replacement text, which is not
1 correct. However, if the file did not end in text that matches 'RS',
1 'RT' is set to the null string. In this case, we can print '$0' using
1 'printf' (⇒Printf).
1
1 The 'BEGIN' rule handles the setup, checking for the right number of
1 arguments and calling 'usage()' if there is a problem. Then it sets
1 'RS' and 'ORS' from the command-line arguments and sets 'ARGV[1]' and
1 'ARGV[2]' to the null string, so that they are not treated as file names
1 (⇒ARGC and ARGV).
1
1 The 'usage()' function prints an error message and exits. Finally,
1 the single rule handles the printing scheme outlined earlier, using
1 'print' or 'printf' as appropriate, depending upon the value of 'RT'.
1