gawk: Filetrans Function

1 
1 10.3.1 Noting Data file Boundaries
1 ----------------------------------
1 
1 The 'BEGIN' and 'END' rules are each executed exactly once, at the
11 beginning and end of your 'awk' program, respectively (⇒
 BEGIN/END).  We (the 'gawk' authors) once had a user who mistakenly
1 thought that the 'BEGIN' rules were executed at the beginning of each
1 data file and the 'END' rules were executed at the end of each data
1 file.
1 
1    When informed that this was not the case, the user requested that we
1 add new special patterns to 'gawk', named 'BEGIN_FILE' and 'END_FILE',
1 that would have the desired behavior.  He even supplied us the code to
1 do so.
1 
1    Adding these special patterns to 'gawk' wasn't necessary; the job can
1 be done cleanly in 'awk' itself, as illustrated by the following library
1 program.  It arranges to call two user-supplied functions, 'beginfile()'
1 and 'endfile()', at the beginning and end of each data file.  Besides
1 solving the problem in only nine(!)  lines of code, it does so
1 _portably_; this works with any implementation of 'awk':
1 
1      # transfile.awk
1      #
1      # Give the user a hook for filename transitions
1      #
1      # The user must supply functions beginfile() and endfile()
1      # that each take the name of the file being started or
1      # finished, respectively.
1 
1      FILENAME != _oldfilename {
1          if (_oldfilename != "")
1              endfile(_oldfilename)
1          _oldfilename = FILENAME
1          beginfile(FILENAME)
1      }
1 
1      END { endfile(FILENAME) }
1 
1    This file must be loaded before the user's "main" program, so that
1 the rule it supplies is executed first.
1 
1    This rule relies on 'awk''s 'FILENAME' variable, which automatically
1 changes for each new data file.  The current file name is saved in a
1 private variable, '_oldfilename'.  If 'FILENAME' does not equal
1 '_oldfilename', then a new data file is being processed and it is
1 necessary to call 'endfile()' for the old file.  Because 'endfile()'
1 should only be called if a file has been processed, the program first
1 checks to make sure that '_oldfilename' is not the null string.  The
1 program then assigns the current file name to '_oldfilename' and calls
1 'beginfile()' for the file.  Because, like all 'awk' variables,
1 '_oldfilename' is initialized to the null string, this rule executes
1 correctly even for the first data file.
1 
1    The program also supplies an 'END' rule to do the final processing
1 for the last file.  Because this 'END' rule comes before any 'END' rules
1 supplied in the "main" program, 'endfile()' is called first.  Once
1 again, the value of multiple 'BEGIN' and 'END' rules should be clear.
1 
1    If the same data file occurs twice in a row on the command line, then
1 'endfile()' and 'beginfile()' are not executed at the end of the first
1 pass and at the beginning of the second pass.  The following version
1 solves the problem:
1 
1      # ftrans.awk --- handle datafile transitions
1      #
1      # user supplies beginfile() and endfile() functions
1 
1      FNR == 1 {
1          if (_filename_ != "")
1              endfile(_filename_)
1          _filename_ = FILENAME
1          beginfile(FILENAME)
1      }
1 
1      END { endfile(_filename_) }
1 
1    ⇒Wc Program shows how this library function can be used and
1 how it simplifies writing the main program.
1 
1           So Why Does 'gawk' Have 'BEGINFILE' and 'ENDFILE'?
1 
1    You are probably wondering, if 'beginfile()' and 'endfile()'
1 functions can do the job, why does 'gawk' have 'BEGINFILE' and 'ENDFILE'
1 patterns?
1 
1    Good question.  Normally, if 'awk' cannot open a file, this causes an
1 immediate fatal error.  In this case, there is no way for a user-defined
1 function to deal with the problem, as the mechanism for calling it
1 relies on the file being open and at the first record.  Thus, the main
1 reason for 'BEGINFILE' is to give you a "hook" to catch files that
1 cannot be processed.  'ENDFILE' exists for symmetry, and because it
1 provides an easy way to do per-file cleanup processing.  For more
1 information, refer to ⇒BEGINFILE/ENDFILE.
1