gawk: Simple Sed

1 
1 11.3.8 A Simple Stream Editor
1 -----------------------------
1 
1 The 'sed' utility is a "stream editor", a program that reads a stream of
1 data, makes changes to it, and passes it on.  It is often used to make
1 global changes to a large file or to a stream of data generated by a
1 pipeline of commands.  Although 'sed' is a complicated program in its
1 own right, its most common use is to perform global substitutions in the
1 middle of a pipeline:
1 
1      COMMAND1 < orig.data | sed 's/old/new/g' | COMMAND2 > result
1 
1    Here, 's/old/new/g' tells 'sed' to look for the regexp 'old' on each
1 input line and globally replace it with the text 'new' (i.e., all the
1 occurrences on a line).  This is similar to 'awk''s 'gsub()' function
1 (⇒String Functions).
1 
1    The following program, 'awksed.awk', accepts at least two
1 command-line arguments: the pattern to look for and the text to replace
1 it with.  Any additional arguments are treated as data file names to
1 process.  If none are provided, the standard input is used:
1 
1      # awksed.awk --- do s/foo/bar/g using just print
1      #    Thanks to Michael Brennan for the idea
1 
1      function usage()
1      {
1          print "usage: awksed pat repl [files...]" > "/dev/stderr"
1          exit 1
1      }
1 
1      BEGIN {
1          # validate arguments
1          if (ARGC < 3)
1              usage()
1 
1          RS = ARGV[1]
1          ORS = ARGV[2]
1 
1          # don't use arguments as files
1          ARGV[1] = ARGV[2] = ""
1      }
1 
1      # look ma, no hands!
1      {
1          if (RT == "")
1              printf "%s", $0
1          else
1              print
1      }
1 
1    The program relies on 'gawk''s ability to have 'RS' be a regexp, as
1 well as on the setting of 'RT' to the actual text that terminates the
1 record (⇒Records).
1 
1    The idea is to have 'RS' be the pattern to look for.  'gawk'
1 automatically sets '$0' to the text between matches of the pattern.
1 This is text that we want to keep, unmodified.  Then, by setting 'ORS'
1 to the replacement text, a simple 'print' statement outputs the text we
1 want to keep, followed by the replacement text.
1 
1    There is one wrinkle to this scheme, which is what to do if the last
1 record doesn't end with text that matches 'RS'.  Using a 'print'
1 statement unconditionally prints the replacement text, which is not
1 correct.  However, if the file did not end in text that matches 'RS',
1 'RT' is set to the null string.  In this case, we can print '$0' using
1 'printf' (⇒Printf).
1 
1    The 'BEGIN' rule handles the setup, checking for the right number of
1 arguments and calling 'usage()' if there is a problem.  Then it sets
1 'RS' and 'ORS' from the command-line arguments and sets 'ARGV[1]' and
1 'ARGV[2]' to the null string, so that they are not treated as file names
1 (⇒ARGC and ARGV).
1 
1    The 'usage()' function prints an error message and exits.  Finally,
1 the single rule handles the printing scheme outlined earlier, using
1 'print' or 'printf' as appropriate, depending upon the value of 'RT'.
1