gawk: Extract Program
1
1 11.3.7 Extracting Programs from Texinfo Source Files
1 ----------------------------------------------------
1
1 The nodes ⇒Library Functions, and ⇒Sample Programs, are
1 the top level nodes for a large number of 'awk' programs. If you want
1 to experiment with these programs, it is tedious to type them in by
1 hand. Here we present a program that can extract parts of a Texinfo
1 input file into separate files.
1
1 This Info file is written in Texinfo
1 (https://www.gnu.org/software/texinfo/), the GNU Project's document
1 formatting language. A single Texinfo source file can be used to
1 produce both printed documentation, with TeX, and online documentation.
1 (The Texinfo language is described fully, starting with *note(Texinfo,
1 texinfo,Texinfo---The GNU Documentation Format)Top::.)
1
1 For our purposes, it is enough to know three things about Texinfo
1 input files:
1
1 * The "at" symbol ('@') is special in Texinfo, much as the backslash
1 ('\') is in C or 'awk'. Literal '@' symbols are represented in
1 Texinfo source files as '@@'.
1
1 * Comments start with either '@c' or '@comment'. The file-extraction
1 program works by using special comments that start at the beginning
1 of a line.
1
1 * Lines containing '@group' and '@end group' commands bracket example
1 text that should not be split across a page boundary.
1 (Unfortunately, TeX isn't always smart enough to do things exactly
1 right, so we have to give it some help.)
1
1 The following program, 'extract.awk', reads through a Texinfo source
1 file and does two things, based on the special comments. Upon seeing
1 '@c system ...', it runs a command, by extracting the command text from
11 the control line and passing it on to the 'system()' function (⇒I/O
Functions). Upon seeing '@c file FILENAME', each subsequent line is
1 sent to the file FILENAME, until '@c endfile' is encountered. The rules
1 in 'extract.awk' match either '@c' or '@comment' by letting the 'omment'
1 part be optional. Lines containing '@group' and '@end group' are simply
11 removed. 'extract.awk' uses the 'join()' library function (⇒Join
Function).
1
1 The example programs in the online Texinfo source for 'GAWK:
1 Effective AWK Programming' ('gawktexi.in') have all been bracketed
1 inside 'file' and 'endfile' lines. The 'gawk' distribution uses a copy
1 of 'extract.awk' to extract the sample programs and install many of them
1 in a standard directory where 'gawk' can find them. The Texinfo file
1 looks something like this:
1
1 ...
1 This program has a @code{BEGIN} rule
1 that prints a nice message:
1
1 @example
1 @c file examples/messages.awk
1 BEGIN @{ print "Don't panic!" @}
1 @c endfile
1 @end example
1
1 It also prints some final advice:
1
1 @example
1 @c file examples/messages.awk
1 END @{ print "Always avoid bored archaeologists!" @}
1 @c endfile
1 @end example
1 ...
1
1 'extract.awk' begins by setting 'IGNORECASE' to one, so that mixed
1 upper- and lowercase letters in the directives won't matter.
1
1 The first rule handles calling 'system()', checking that a command is
1 given ('NF' is at least three) and also checking that the command exits
1 with a zero exit status, signifying OK:
1
1 # extract.awk --- extract files and run programs from Texinfo files
1
1 BEGIN { IGNORECASE = 1 }
1
1 /^@c(omment)?[ \t]+system/ {
1 if (NF < 3) {
1 e = ("extract: " FILENAME ":" FNR)
1 e = (e ": badly formed `system' line")
1 print e > "/dev/stderr"
1 next
1 }
1 $1 = ""
1 $2 = ""
1 stat = system($0)
1 if (stat != 0) {
1 e = ("extract: " FILENAME ":" FNR)
1 e = (e ": warning: system returned " stat)
1 print e > "/dev/stderr"
1 }
1 }
1
1 The variable 'e' is used so that the rule fits nicely on the screen.
1
1 The second rule handles moving data into files. It verifies that a
1 file name is given in the directive. If the file named is not the
1 current file, then the current file is closed. Keeping the current file
1 open until a new file is encountered allows the use of the '>'
1 redirection for printing the contents, keeping open-file management
1 simple.
1
11 The 'for' loop does the work. It reads lines using 'getline' (⇒
Getline). For an unexpected end-of-file, it calls the
1 'unexpected_eof()' function. If the line is an "endfile" line, then it
1 breaks out of the loop. If the line is an '@group' or '@end group'
1 line, then it ignores it and goes on to the next line. Similarly,
1 comments within examples are also ignored.
1
1 Most of the work is in the following few lines. If the line has no
1 '@' symbols, the program can print it directly. Otherwise, each leading
1 '@' must be stripped off. To remove the '@' symbols, the line is split
1 into separate elements of the array 'a', using the 'split()' function
1 (⇒String Functions). The '@' symbol is used as the separator
1 character. Each element of 'a' that is empty indicates two successive
1 '@' symbols in the original line. For each two empty elements ('@@' in
1 the original file), we have to add a single '@' symbol back in.
1
1 When the processing of the array is finished, 'join()' is called with
1 the value of 'SUBSEP' (⇒Multidimensional), to rejoin the pieces
1 back into a single line. That line is then printed to the output file:
1
1 /^@c(omment)?[ \t]+file/ {
1 if (NF != 3) {
1 e = ("extract: " FILENAME ":" FNR ": badly formed `file' line")
1 print e > "/dev/stderr"
1 next
1 }
1 if ($3 != curfile) {
1 if (curfile != "")
1 close(curfile)
1 curfile = $3
1 }
1
1 for (;;) {
1 if ((getline line) <= 0)
1 unexpected_eof()
1 if (line ~ /^@c(omment)?[ \t]+endfile/)
1 break
1 else if (line ~ /^@(end[ \t]+)?group/)
1 continue
1 else if (line ~ /^@c(omment+)?[ \t]+/)
1 continue
1 if (index(line, "@") == 0) {
1 print line > curfile
1 continue
1 }
1 n = split(line, a, "@")
1 # if a[1] == "", means leading @,
1 # don't add one back in.
1 for (i = 2; i <= n; i++) {
1 if (a[i] == "") { # was an @@
1 a[i] = "@"
1 if (a[i+1] == "")
1 i++
1 }
1 }
1 print join(a, 1, n, SUBSEP) > curfile
1 }
1 }
1
1 An important thing to note is the use of the '>' redirection. Output
1 done with '>' only opens the file once; it stays open and subsequent
1 output is appended to the file (⇒Redirection). This makes it
1 easy to mix program text and explanatory prose for the same sample
1 source file (as has been done here!) without any hassle. The file is
1 only closed when a new data file name is encountered or at the end of
1 the input file.
1
1 Finally, the function 'unexpected_eof()' prints an appropriate error
1 message and then exits. The 'END' rule handles the final cleanup,
1 closing the open file:
1
1 function unexpected_eof()
1 {
1 printf("extract: %s:%d: unexpected EOF or error\n",
1 FILENAME, FNR) > "/dev/stderr"
1 exit 1
1 }
1
1 END {
1 if (curfile)
1 close(curfile)
1 }
1