gawk: I/O Functions

1 
1 9.1.4 Input/Output Functions
1 ----------------------------
1 
1 The following functions relate to input/output (I/O). Optional
1 parameters are enclosed in square brackets ([ ]):
1 
1 'close('FILENAME [',' HOW]')'
1      Close the file FILENAME for input or output.  Alternatively, the
1      argument may be a shell command that was used for creating a
1      coprocess, or for redirecting to or from a pipe; then the coprocess
1      or pipe is closed.  ⇒Close Files And Pipes for more
1      information.
1 
1      When closing a coprocess, it is occasionally useful to first close
1      one end of the two-way pipe and then to close the other.  This is
1      done by providing a second argument to 'close()'.  This second
1      argument (HOW) should be one of the two string values '"to"' or
1      '"from"', indicating which end of the pipe to close.  Case in the
1      string does not matter.  ⇒Two-way I/O, which discusses this
1      feature in more detail and gives an example.
1 
1      Note that the second argument to 'close()' is a 'gawk' extension;
1      it is not available in compatibility mode (⇒Options).
1 
1 'fflush('[FILENAME]')'
1      Flush any buffered output associated with FILENAME, which is either
1      a file opened for writing or a shell command for redirecting output
1      to a pipe or coprocess.
1 
1      Many utility programs "buffer" their output (i.e., they save
1      information to write to a disk file or the screen in memory until
1      there is enough for it to be worthwhile to send the data to the
1      output device).  This is often more efficient than writing every
1      little bit of information as soon as it is ready.  However,
1      sometimes it is necessary to force a program to "flush" its buffers
1      (i.e., write the information to its destination, even if a buffer
1      is not full).  This is the purpose of the 'fflush()'
1      function--'gawk' also buffers its output, and the 'fflush()'
1      function forces 'gawk' to flush its buffers.
1 
1      Brian Kernighan added 'fflush()' to his 'awk' in April 1992.  For
1      two decades, it was a common extension.  In December 2012, it was
1      accepted for inclusion into the POSIX standard.  See the Austin
1      Group website (http://austingroupbugs.net/view.php?id=634).
1 
1      POSIX standardizes 'fflush()' as follows: if there is no argument,
1      or if the argument is the null string ('""'), then 'awk' flushes
1      the buffers for _all_ open output files and pipes.
1 
1           NOTE: Prior to version 4.0.2, 'gawk' would flush only the
1           standard output if there was no argument, and flush all output
1           files and pipes if the argument was the null string.  This was
1           changed in order to be compatible with Brian Kernighan's
1           'awk', in the hope that standardizing this feature in POSIX
1           would then be easier (which indeed proved to be the case).
1 
1           With 'gawk', you can use 'fflush("/dev/stdout")' if you wish
1           to flush only the standard output.
1 
1      'fflush()' returns zero if the buffer is successfully flushed;
1      otherwise, it returns a nonzero value.  ('gawk' returns -1.)  In
1      the case where all buffers are flushed, the return value is zero
1      only if all buffers were flushed successfully.  Otherwise, it is
1      -1, and 'gawk' warns about the problem FILENAME.
1 
1      'gawk' also issues a warning message if you attempt to flush a file
1      or pipe that was opened for reading (such as with 'getline'), or if
1      FILENAME is not an open file, pipe, or coprocess.  In such a case,
1      'fflush()' returns -1, as well.
1 
1               Interactive Versus Noninteractive Buffering
1 
1    As a side point, buffering issues can be even more confusing if your
1 program is "interactive" (i.e., communicating with a user sitting at a
1 keyboard).(1)
1 
1    Interactive programs generally "line buffer" their output (i.e., they
1 write out every line).  Noninteractive programs wait until they have a
1 full buffer, which may be many lines of output.  Here is an example of
1 the difference:
1 
1      $ awk '{ print $1 + $2 }'
1      1 1
1      -| 2
1      2 3
1      -| 5
1      Ctrl-d
1 
1 Each line of output is printed immediately.  Compare that behavior with
1 this example:
1 
1      $ awk '{ print $1 + $2 }' | cat
1      1 1
1      2 3
1      Ctrl-d
1      -| 2
1      -| 5
1 
1 Here, no output is printed until after the 'Ctrl-d' is typed, because it
1 is all buffered and sent down the pipe to 'cat' in one shot.
1 
1 'system(COMMAND)'
1      Execute the operating system command COMMAND and then return to the
1      'awk' program.  Return COMMAND's exit status (see further on).
1 
1      For example, if the following fragment of code is put in your 'awk'
1      program:
1 
1           END {
1                system("date | mail -s 'awk run done' root")
1           }
1 
1      the system administrator is sent mail when the 'awk' program
1      finishes processing input and begins its end-of-input processing.
1 
1      Note that redirecting 'print' or 'printf' into a pipe is often
1      enough to accomplish your task.  If you need to run many commands,
1      it is more efficient to simply print them down a pipeline to the
1      shell:
1 
1           while (MORE STUFF TO DO)
1               print COMMAND | "/bin/sh"
1           close("/bin/sh")
1 
1      However, if your 'awk' program is interactive, 'system()' is useful
1      for running large self-contained programs, such as a shell or an
1      editor.  Some operating systems cannot implement the 'system()'
1      function.  'system()' causes a fatal error if it is not supported.
1 
1           NOTE: When '--sandbox' is specified, the 'system()' function
1           is disabled (⇒Options).
1 
1      On POSIX systems, a command's exit status is a 16-bit number.  The
1      exit value passed to the C 'exit()' function is held in the
1      high-order eight bits.  The low-order bits indicate if the process
1      was killed by a signal (bit 7) and if so, the guilty signal number
1      (bits 0-6).
1 
1      Traditionally, 'awk''s 'system()' function has simply returned the
1      exit status value divided by 256.  In the normal case this gives
1      the exit status but in the case of death-by-signal it yields a
1      fractional floating-point value.(2)  POSIX states that 'awk''s
1      'system()' should return the full 16-bit value.
1 
1      'gawk' steers a middle ground.  The return values are summarized in
1      ⇒Table 9.5 table-system-return-values.
1 
1      Situation                     Return value from 'system()'
1      --------------------------------------------------------------------------
1      '--traditional'               C 'system()''s value divided by 256
1      '--posix'                     C 'system()''s value
1      Normal exit of command        Command's exit status
1      Death by signal of command    256 + number of murderous signal
1      Death by signal of command    512 + number of murderous signal
1      with core dump
1      Some kind of error            -1
1 
1      Table 9.5: Return values from 'system()'
1 
1              Controlling Output Buffering with 'system()'
1 
1    The 'fflush()' function provides explicit control over output
1 buffering for individual files and pipes.  However, its use is not
1 portable to many older 'awk' implementations.  An alternative method to
1 flush output buffers is to call 'system()' with a null string as its
1 argument:
1 
1      system("")   # flush output
1 
1 'gawk' treats this use of the 'system()' function as a special case and
1 is smart enough not to run a shell (or other command interpreter) with
1 the empty command.  Therefore, with 'gawk', this idiom is not only
1 useful, it is also efficient.  Although this method should work with
1 other 'awk' implementations, it does not necessarily avoid starting an
1 unnecessary shell.  (Other implementations may only flush the buffer
1 associated with the standard output and not necessarily all buffered
1 output.)
1 
1    If you think about what a programmer expects, it makes sense that
1 'system()' should flush any pending output.  The following program:
1 
1      BEGIN {
1           print "first print"
1           system("echo system echo")
1           print "second print"
1      }
1 
1 must print:
1 
1      first print
1      system echo
1      second print
1 
1 and not:
1 
1      system echo
1      first print
1      second print
1 
1    If 'awk' did not flush its buffers before calling 'system()', you
1 would see the latter (undesirable) output.
1 
1    ---------- Footnotes ----------
1 
1    (1) A program is interactive if the standard output is connected to a
1 terminal device.  On modern systems, this means your keyboard and
1 screen.
1 
1    (2) In private correspondence, Dr. Kernighan has indicated to me that
1 the way this was done was probably a mistake.
1