gawk: I/O Functions
1
1 9.1.4 Input/Output Functions
1 ----------------------------
1
1 The following functions relate to input/output (I/O). Optional
1 parameters are enclosed in square brackets ([ ]):
1
1 'close('FILENAME [',' HOW]')'
1 Close the file FILENAME for input or output. Alternatively, the
1 argument may be a shell command that was used for creating a
1 coprocess, or for redirecting to or from a pipe; then the coprocess
1 or pipe is closed. ⇒Close Files And Pipes for more
1 information.
1
1 When closing a coprocess, it is occasionally useful to first close
1 one end of the two-way pipe and then to close the other. This is
1 done by providing a second argument to 'close()'. This second
1 argument (HOW) should be one of the two string values '"to"' or
1 '"from"', indicating which end of the pipe to close. Case in the
1 string does not matter. ⇒Two-way I/O, which discusses this
1 feature in more detail and gives an example.
1
1 Note that the second argument to 'close()' is a 'gawk' extension;
1 it is not available in compatibility mode (⇒Options).
1
1 'fflush('[FILENAME]')'
1 Flush any buffered output associated with FILENAME, which is either
1 a file opened for writing or a shell command for redirecting output
1 to a pipe or coprocess.
1
1 Many utility programs "buffer" their output (i.e., they save
1 information to write to a disk file or the screen in memory until
1 there is enough for it to be worthwhile to send the data to the
1 output device). This is often more efficient than writing every
1 little bit of information as soon as it is ready. However,
1 sometimes it is necessary to force a program to "flush" its buffers
1 (i.e., write the information to its destination, even if a buffer
1 is not full). This is the purpose of the 'fflush()'
1 function--'gawk' also buffers its output, and the 'fflush()'
1 function forces 'gawk' to flush its buffers.
1
1 Brian Kernighan added 'fflush()' to his 'awk' in April 1992. For
1 two decades, it was a common extension. In December 2012, it was
1 accepted for inclusion into the POSIX standard. See the Austin
1 Group website (http://austingroupbugs.net/view.php?id=634).
1
1 POSIX standardizes 'fflush()' as follows: if there is no argument,
1 or if the argument is the null string ('""'), then 'awk' flushes
1 the buffers for _all_ open output files and pipes.
1
1 NOTE: Prior to version 4.0.2, 'gawk' would flush only the
1 standard output if there was no argument, and flush all output
1 files and pipes if the argument was the null string. This was
1 changed in order to be compatible with Brian Kernighan's
1 'awk', in the hope that standardizing this feature in POSIX
1 would then be easier (which indeed proved to be the case).
1
1 With 'gawk', you can use 'fflush("/dev/stdout")' if you wish
1 to flush only the standard output.
1
1 'fflush()' returns zero if the buffer is successfully flushed;
1 otherwise, it returns a nonzero value. ('gawk' returns -1.) In
1 the case where all buffers are flushed, the return value is zero
1 only if all buffers were flushed successfully. Otherwise, it is
1 -1, and 'gawk' warns about the problem FILENAME.
1
1 'gawk' also issues a warning message if you attempt to flush a file
1 or pipe that was opened for reading (such as with 'getline'), or if
1 FILENAME is not an open file, pipe, or coprocess. In such a case,
1 'fflush()' returns -1, as well.
1
1 Interactive Versus Noninteractive Buffering
1
1 As a side point, buffering issues can be even more confusing if your
1 program is "interactive" (i.e., communicating with a user sitting at a
1 keyboard).(1)
1
1 Interactive programs generally "line buffer" their output (i.e., they
1 write out every line). Noninteractive programs wait until they have a
1 full buffer, which may be many lines of output. Here is an example of
1 the difference:
1
1 $ awk '{ print $1 + $2 }'
1 1 1
1 -| 2
1 2 3
1 -| 5
1 Ctrl-d
1
1 Each line of output is printed immediately. Compare that behavior with
1 this example:
1
1 $ awk '{ print $1 + $2 }' | cat
1 1 1
1 2 3
1 Ctrl-d
1 -| 2
1 -| 5
1
1 Here, no output is printed until after the 'Ctrl-d' is typed, because it
1 is all buffered and sent down the pipe to 'cat' in one shot.
1
1 'system(COMMAND)'
1 Execute the operating system command COMMAND and then return to the
1 'awk' program. Return COMMAND's exit status (see further on).
1
1 For example, if the following fragment of code is put in your 'awk'
1 program:
1
1 END {
1 system("date | mail -s 'awk run done' root")
1 }
1
1 the system administrator is sent mail when the 'awk' program
1 finishes processing input and begins its end-of-input processing.
1
1 Note that redirecting 'print' or 'printf' into a pipe is often
1 enough to accomplish your task. If you need to run many commands,
1 it is more efficient to simply print them down a pipeline to the
1 shell:
1
1 while (MORE STUFF TO DO)
1 print COMMAND | "/bin/sh"
1 close("/bin/sh")
1
1 However, if your 'awk' program is interactive, 'system()' is useful
1 for running large self-contained programs, such as a shell or an
1 editor. Some operating systems cannot implement the 'system()'
1 function. 'system()' causes a fatal error if it is not supported.
1
1 NOTE: When '--sandbox' is specified, the 'system()' function
1 is disabled (⇒Options).
1
1 On POSIX systems, a command's exit status is a 16-bit number. The
1 exit value passed to the C 'exit()' function is held in the
1 high-order eight bits. The low-order bits indicate if the process
1 was killed by a signal (bit 7) and if so, the guilty signal number
1 (bits 0-6).
1
1 Traditionally, 'awk''s 'system()' function has simply returned the
1 exit status value divided by 256. In the normal case this gives
1 the exit status but in the case of death-by-signal it yields a
1 fractional floating-point value.(2) POSIX states that 'awk''s
1 'system()' should return the full 16-bit value.
1
1 'gawk' steers a middle ground. The return values are summarized in
1 ⇒Table 9.5 table-system-return-values.
1
1 Situation Return value from 'system()'
1 --------------------------------------------------------------------------
1 '--traditional' C 'system()''s value divided by 256
1 '--posix' C 'system()''s value
1 Normal exit of command Command's exit status
1 Death by signal of command 256 + number of murderous signal
1 Death by signal of command 512 + number of murderous signal
1 with core dump
1 Some kind of error -1
1
1 Table 9.5: Return values from 'system()'
1
1 Controlling Output Buffering with 'system()'
1
1 The 'fflush()' function provides explicit control over output
1 buffering for individual files and pipes. However, its use is not
1 portable to many older 'awk' implementations. An alternative method to
1 flush output buffers is to call 'system()' with a null string as its
1 argument:
1
1 system("") # flush output
1
1 'gawk' treats this use of the 'system()' function as a special case and
1 is smart enough not to run a shell (or other command interpreter) with
1 the empty command. Therefore, with 'gawk', this idiom is not only
1 useful, it is also efficient. Although this method should work with
1 other 'awk' implementations, it does not necessarily avoid starting an
1 unnecessary shell. (Other implementations may only flush the buffer
1 associated with the standard output and not necessarily all buffered
1 output.)
1
1 If you think about what a programmer expects, it makes sense that
1 'system()' should flush any pending output. The following program:
1
1 BEGIN {
1 print "first print"
1 system("echo system echo")
1 print "second print"
1 }
1
1 must print:
1
1 first print
1 system echo
1 second print
1
1 and not:
1
1 system echo
1 first print
1 second print
1
1 If 'awk' did not flush its buffers before calling 'system()', you
1 would see the latter (undesirable) output.
1
1 ---------- Footnotes ----------
1
1 (1) A program is interactive if the standard output is connected to a
1 terminal device. On modern systems, this means your keyboard and
1 screen.
1
1 (2) In private correspondence, Dr. Kernighan has indicated to me that
1 the way this was done was probably a mistake.
1