gawk: Close Files And Pipes

1 
1 5.9 Closing Input and Output Redirections
1 =========================================
1 
1 If the same file name or the same shell command is used with 'getline'
11 more than once during the execution of an 'awk' program (⇒
 Getline), the file is opened (or the command is executed) the first
1 time only.  At that time, the first record of input is read from that
1 file or command.  The next time the same file or command is used with
1 'getline', another record is read from it, and so on.
1 
1    Similarly, when a file or pipe is opened for output, 'awk' remembers
1 the file name or command associated with it, and subsequent writes to
1 the same file or command are appended to the previous writes.  The file
1 or pipe stays open until 'awk' exits.
1 
1    This implies that special steps are necessary in order to read the
1 same file again from the beginning, or to rerun a shell command (rather
1 than reading more output from the same command).  The 'close()' function
1 makes these things possible:
1 
1      close(FILENAME)
1 
1 or:
1 
1      close(COMMAND)
1 
1    The argument FILENAME or COMMAND can be any expression.  Its value
1 must _exactly_ match the string that was used to open the file or start
1 the command (spaces and other "irrelevant" characters included).  For
1 example, if you open a pipe with this:
1 
1      "sort -r names" | getline foo
1 
1 then you must close it with this:
1 
1      close("sort -r names")
1 
1    Once this function call is executed, the next 'getline' from that
1 file or command, or the next 'print' or 'printf' to that file or
1 command, reopens the file or reruns the command.  Because the expression
1 that you use to close a file or pipeline must exactly match the
1 expression used to open the file or run the command, it is good practice
1 to use a variable to store the file name or command.  The previous
1 example becomes the following:
1 
1      sortcom = "sort -r names"
1      sortcom | getline foo
1      ...
1      close(sortcom)
1 
1 This helps avoid hard-to-find typographical errors in your 'awk'
1 programs.  Here are some of the reasons for closing an output file:
1 
1    * To write a file and read it back later on in the same 'awk'
1      program.  Close the file after writing it, then begin reading it
1      with 'getline'.
1 
1    * To write numerous files, successively, in the same 'awk' program.
1      If the files aren't closed, eventually 'awk' may exceed a system
1      limit on the number of open files in one process.  It is best to
1      close each one when the program has finished writing it.
1 
1    * To make a command finish.  When output is redirected through a
1      pipe, the command reading the pipe normally continues to try to
1      read input as long as the pipe is open.  Often this means the
1      command cannot really do its work until the pipe is closed.  For
1      example, if output is redirected to the 'mail' program, the message
1      is not actually sent until the pipe is closed.
1 
1    * To run the same program a second time, with the same arguments.
1      This is not the same thing as giving more input to the first run!
1 
1      For example, suppose a program pipes output to the 'mail' program.
1      If it outputs several lines redirected to this pipe without closing
1      it, they make a single message of several lines.  By contrast, if
1      the program closes the pipe after each line of output, then each
1      line makes a separate message.
1 
1    If you use more files than the system allows you to have open, 'gawk'
1 attempts to multiplex the available open files among your data files.
1 'gawk''s ability to do this depends upon the facilities of your
1 operating system, so it may not always work.  It is therefore both good
1 practice and good portability advice to always use 'close()' on your
1 files when you are done with them.  In fact, if you are using a lot of
1 pipes, it is essential that you close commands when done.  For example,
1 consider something like this:
1 
1      {
1          ...
1          command = ("grep " $1 " /some/file | my_prog -q " $3)
1          while ((command | getline) > 0) {
1              PROCESS OUTPUT OF command
1          }
1          # need close(command) here
1      }
1 
1    This example creates a new pipeline based on data in _each_ record.
1 Without the call to 'close()' indicated in the comment, 'awk' creates
1 child processes to run the commands, until it eventually runs out of
1 file descriptors for more pipelines.
1 
1    Even though each command has finished (as indicated by the
1 end-of-file return status from 'getline'), the child process is not
1 terminated;(1) more importantly, the file descriptor for the pipe is not
1 closed and released until 'close()' is called or 'awk' exits.
1 
1    'close()' silently does nothing if given an argument that does not
1 represent a file, pipe, or coprocess that was opened with a redirection.
1 In such a case, it returns a negative value, indicating an error.  In
1 addition, 'gawk' sets 'ERRNO' to a string indicating the error.
1 
1    Note also that 'close(FILENAME)' has no "magic" effects on the
1 implicit loop that reads through the files named on the command line.
1 It is, more likely, a close of a file that was never opened with a
1 redirection, so 'awk' silently does nothing, except return a negative
1 value.
1 
1    When using the '|&' operator to communicate with a coprocess, it is
1 occasionally useful to be able to close one end of the two-way pipe
1 without closing the other.  This is done by supplying a second argument
1 to 'close()'.  As in any other call to 'close()', the first argument is
1 the name of the command or special file used to start the coprocess.
1 The second argument should be a string, with either of the values '"to"'
1 or '"from"'.  Case does not matter.  As this is an advanced feature,
1 discussion is delayed until ⇒Two-way I/O, which describes it in
1 more detail and gives an example.
1 
1                     Using 'close()''s Return Value
1 
1    In many older versions of Unix 'awk', the 'close()' function is
1 actually a statement.  (d.c.)  It is a syntax error to try and use the
1 return value from 'close()':
1 
1      command = "..."
1      command | getline info
1      retval = close(command)  # syntax error in many Unix awks
1 
1    'gawk' treats 'close()' as a function.  The return value is -1 if the
1 argument names something that was never opened with a redirection, or if
1 there is a system problem closing the file or process.  In these cases,
1 'gawk' sets the predefined variable 'ERRNO' to a string describing the
1 problem.
1 
1    In 'gawk', starting with version 4.2, when closing a pipe or
1 coprocess (input or output), the return value is the exit status of the
11 command, as described in ⇒(2)Otherwise(2)  Otherwise, it is the return value
1 from the system's 'close()' or 'fclose()' C functions when closing input
1 or output files, respectively.  This value is zero if the close
1 succeeds, or -1 if it fails.
1 
1 Situation                            Return value from 'close()'
1 --------------------------------------------------------------------------
1 Normal exit of command               Command's exit status
1 Death by signal of command           256 + number of murderous signal
1 Death by signal of command with      512 + number of murderous signal
1 core dump
1 Some kind of error                   -1
1 
1 Table 5.1: Return values from 'close()' of a pipe
1 
1    The POSIX standard is very vague; it says that 'close()' returns zero
1 on success and a nonzero value otherwise.  In general, different
1 implementations vary in what they report when closing pipes; thus, the
11 return value cannot be used portably.  (d.c.)  In POSIX mode (⇒
 Options), 'gawk' just returns zero when closing a pipe.
1 
1    ---------- Footnotes ----------
1 
1    (1) The technical terminology is rather morbid.  The finished child
1 is called a "zombie," and cleaning up after it is referred to as
1 "reaping."
1 
1    (2) Prior to version 4.2, the return value from closing a pipe or
1 co-process was the full 16-bit exit value as defined by the 'wait()'
1 system call.
1