gawk: Close Files And Pipes
1
1 5.9 Closing Input and Output Redirections
1 =========================================
1
1 If the same file name or the same shell command is used with 'getline'
11 more than once during the execution of an 'awk' program (⇒
Getline), the file is opened (or the command is executed) the first
1 time only. At that time, the first record of input is read from that
1 file or command. The next time the same file or command is used with
1 'getline', another record is read from it, and so on.
1
1 Similarly, when a file or pipe is opened for output, 'awk' remembers
1 the file name or command associated with it, and subsequent writes to
1 the same file or command are appended to the previous writes. The file
1 or pipe stays open until 'awk' exits.
1
1 This implies that special steps are necessary in order to read the
1 same file again from the beginning, or to rerun a shell command (rather
1 than reading more output from the same command). The 'close()' function
1 makes these things possible:
1
1 close(FILENAME)
1
1 or:
1
1 close(COMMAND)
1
1 The argument FILENAME or COMMAND can be any expression. Its value
1 must _exactly_ match the string that was used to open the file or start
1 the command (spaces and other "irrelevant" characters included). For
1 example, if you open a pipe with this:
1
1 "sort -r names" | getline foo
1
1 then you must close it with this:
1
1 close("sort -r names")
1
1 Once this function call is executed, the next 'getline' from that
1 file or command, or the next 'print' or 'printf' to that file or
1 command, reopens the file or reruns the command. Because the expression
1 that you use to close a file or pipeline must exactly match the
1 expression used to open the file or run the command, it is good practice
1 to use a variable to store the file name or command. The previous
1 example becomes the following:
1
1 sortcom = "sort -r names"
1 sortcom | getline foo
1 ...
1 close(sortcom)
1
1 This helps avoid hard-to-find typographical errors in your 'awk'
1 programs. Here are some of the reasons for closing an output file:
1
1 * To write a file and read it back later on in the same 'awk'
1 program. Close the file after writing it, then begin reading it
1 with 'getline'.
1
1 * To write numerous files, successively, in the same 'awk' program.
1 If the files aren't closed, eventually 'awk' may exceed a system
1 limit on the number of open files in one process. It is best to
1 close each one when the program has finished writing it.
1
1 * To make a command finish. When output is redirected through a
1 pipe, the command reading the pipe normally continues to try to
1 read input as long as the pipe is open. Often this means the
1 command cannot really do its work until the pipe is closed. For
1 example, if output is redirected to the 'mail' program, the message
1 is not actually sent until the pipe is closed.
1
1 * To run the same program a second time, with the same arguments.
1 This is not the same thing as giving more input to the first run!
1
1 For example, suppose a program pipes output to the 'mail' program.
1 If it outputs several lines redirected to this pipe without closing
1 it, they make a single message of several lines. By contrast, if
1 the program closes the pipe after each line of output, then each
1 line makes a separate message.
1
1 If you use more files than the system allows you to have open, 'gawk'
1 attempts to multiplex the available open files among your data files.
1 'gawk''s ability to do this depends upon the facilities of your
1 operating system, so it may not always work. It is therefore both good
1 practice and good portability advice to always use 'close()' on your
1 files when you are done with them. In fact, if you are using a lot of
1 pipes, it is essential that you close commands when done. For example,
1 consider something like this:
1
1 {
1 ...
1 command = ("grep " $1 " /some/file | my_prog -q " $3)
1 while ((command | getline) > 0) {
1 PROCESS OUTPUT OF command
1 }
1 # need close(command) here
1 }
1
1 This example creates a new pipeline based on data in _each_ record.
1 Without the call to 'close()' indicated in the comment, 'awk' creates
1 child processes to run the commands, until it eventually runs out of
1 file descriptors for more pipelines.
1
1 Even though each command has finished (as indicated by the
1 end-of-file return status from 'getline'), the child process is not
1 terminated;(1) more importantly, the file descriptor for the pipe is not
1 closed and released until 'close()' is called or 'awk' exits.
1
1 'close()' silently does nothing if given an argument that does not
1 represent a file, pipe, or coprocess that was opened with a redirection.
1 In such a case, it returns a negative value, indicating an error. In
1 addition, 'gawk' sets 'ERRNO' to a string indicating the error.
1
1 Note also that 'close(FILENAME)' has no "magic" effects on the
1 implicit loop that reads through the files named on the command line.
1 It is, more likely, a close of a file that was never opened with a
1 redirection, so 'awk' silently does nothing, except return a negative
1 value.
1
1 When using the '|&' operator to communicate with a coprocess, it is
1 occasionally useful to be able to close one end of the two-way pipe
1 without closing the other. This is done by supplying a second argument
1 to 'close()'. As in any other call to 'close()', the first argument is
1 the name of the command or special file used to start the coprocess.
1 The second argument should be a string, with either of the values '"to"'
1 or '"from"'. Case does not matter. As this is an advanced feature,
1 discussion is delayed until ⇒Two-way I/O, which describes it in
1 more detail and gives an example.
1
1 Using 'close()''s Return Value
1
1 In many older versions of Unix 'awk', the 'close()' function is
1 actually a statement. (d.c.) It is a syntax error to try and use the
1 return value from 'close()':
1
1 command = "..."
1 command | getline info
1 retval = close(command) # syntax error in many Unix awks
1
1 'gawk' treats 'close()' as a function. The return value is -1 if the
1 argument names something that was never opened with a redirection, or if
1 there is a system problem closing the file or process. In these cases,
1 'gawk' sets the predefined variable 'ERRNO' to a string describing the
1 problem.
1
1 In 'gawk', starting with version 4.2, when closing a pipe or
1 coprocess (input or output), the return value is the exit status of the
11 command, as described in ⇒(2)Otherwise(2) Otherwise, it is the return value
1 from the system's 'close()' or 'fclose()' C functions when closing input
1 or output files, respectively. This value is zero if the close
1 succeeds, or -1 if it fails.
1
1 Situation Return value from 'close()'
1 --------------------------------------------------------------------------
1 Normal exit of command Command's exit status
1 Death by signal of command 256 + number of murderous signal
1 Death by signal of command with 512 + number of murderous signal
1 core dump
1 Some kind of error -1
1
1 Table 5.1: Return values from 'close()' of a pipe
1
1 The POSIX standard is very vague; it says that 'close()' returns zero
1 on success and a nonzero value otherwise. In general, different
1 implementations vary in what they report when closing pipes; thus, the
11 return value cannot be used portably. (d.c.) In POSIX mode (⇒
Options), 'gawk' just returns zero when closing a pipe.
1
1 ---------- Footnotes ----------
1
1 (1) The technical terminology is rather morbid. The finished child
1 is called a "zombie," and cleaning up after it is referred to as
1 "reaping."
1
1 (2) Prior to version 4.2, the return value from closing a pipe or
1 co-process was the full 16-bit exit value as defined by the 'wait()'
1 system call.
1