gawk: Auto-set

1 
1 7.5.2 Built-in Variables That Convey Information
1 ------------------------------------------------
1 
1 The following is an alphabetical list of variables that 'awk' sets
1 automatically on certain occasions in order to provide information to
1 your program.
1 
1    The variables that are specific to 'gawk' are marked with a pound
1 sign ('#').  These variables are 'gawk' extensions.  In other 'awk'
1 implementations or if 'gawk' is in compatibility mode (⇒Options),
1 they are not special:
1 
1 'ARGC', 'ARGV'
1      The command-line arguments available to 'awk' programs are stored
1      in an array called 'ARGV'.  'ARGC' is the number of command-line
1      arguments present.  ⇒Other Arguments.  Unlike most 'awk'
1      arrays, 'ARGV' is indexed from 0 to 'ARGC' - 1.  In the following
1      example:
1 
1           $ awk 'BEGIN {
1           >         for (i = 0; i < ARGC; i++)
1           >             print ARGV[i]
1           >      }' inventory-shipped mail-list
1           -| awk
1           -| inventory-shipped
1           -| mail-list
1 
1      'ARGV[0]' contains 'awk', 'ARGV[1]' contains 'inventory-shipped',
1      and 'ARGV[2]' contains 'mail-list'.  The value of 'ARGC' is three,
1      one more than the index of the last element in 'ARGV', because the
1      elements are numbered from zero.
1 
1      The names 'ARGC' and 'ARGV', as well as the convention of indexing
1      the array from 0 to 'ARGC' - 1, are derived from the C language's
1      method of accessing command-line arguments.
1 
1      The value of 'ARGV[0]' can vary from system to system.  Also, you
1      should note that the program text is _not_ included in 'ARGV', nor
1      are any of 'awk''s command-line options.  ⇒ARGC and ARGV for
1      information about how 'awk' uses these variables.  (d.c.)
1 
1 'ARGIND #'
1      The index in 'ARGV' of the current file being processed.  Every
1      time 'gawk' opens a new data file for processing, it sets 'ARGIND'
1      to the index in 'ARGV' of the file name.  When 'gawk' is processing
1      the input files, 'FILENAME == ARGV[ARGIND]' is always true.
1 
1      This variable is useful in file processing; it allows you to tell
1      how far along you are in the list of data files as well as to
1      distinguish between successive instances of the same file name on
1      the command line.
1 
1      While you can change the value of 'ARGIND' within your 'awk'
1      program, 'gawk' automatically sets it to a new value when it opens
1      the next file.
1 
1 'ENVIRON'
1      An associative array containing the values of the environment.  The
1      array indices are the environment variable names; the elements are
1      the values of the particular environment variables.  For example,
1      'ENVIRON["HOME"]' might be '/home/arnold'.
1 
1      For POSIX 'awk', changing this array does not affect the
1      environment passed on to any programs that 'awk' may spawn via
1      redirection or the 'system()' function.
1 
1      However, beginning with version 4.2, if not in POSIX compatibility
1      mode, 'gawk' does update its own environment when 'ENVIRON' is
1      changed, thus changing the environment seen by programs that it
1      creates.  You should therefore be especially careful if you modify
1      'ENVIRON["PATH"]', which is the search path for finding executable
1      programs.
1 
1      This can also affect the running 'gawk' program, since some of the
1      built-in functions may pay attention to certain environment
11      variables.  The most notable instance of this is 'mktime()' (⇒
      Time Functions), which pays attention the value of the 'TZ'
1      environment variable on many systems.
1 
1      Some operating systems may not have environment variables.  On such
1      systems, the 'ENVIRON' array is empty (except for
11      'ENVIRON["AWKPATH"]' and 'ENVIRON["AWKLIBPATH"]'; ⇒AWKPATH
      Variable and ⇒AWKLIBPATH Variable).
1 
1 'ERRNO #'
1      If a system error occurs during a redirection for 'getline', during
1      a read for 'getline', or during a 'close()' operation, then 'ERRNO'
1      contains a string describing the error.
1 
1      In addition, 'gawk' clears 'ERRNO' before opening each command-line
1      input file.  This enables checking if the file is readable inside a
1      'BEGINFILE' pattern (⇒BEGINFILE/ENDFILE).
1 
1      Otherwise, 'ERRNO' works similarly to the C variable 'errno'.
1      Except for the case just mentioned, 'gawk' _never_ clears it (sets
1      it to zero or '""').  Thus, you should only expect its value to be
1      meaningful when an I/O operation returns a failure value, such as
1      'getline' returning -1.  You are, of course, free to clear it
1      yourself before doing an I/O operation.
1 
1      If the value of 'ERRNO' corresponds to a system error in the C
1      'errno' variable, then 'PROCINFO["errno"]' will be set to the value
1      of 'errno'.  For non-system errors, 'PROCINFO["errno"]' will be
1      zero.
1 
1 'FILENAME'
1      The name of the current input file.  When no data files are listed
1      on the command line, 'awk' reads from the standard input and
1      'FILENAME' is set to '"-"'.  'FILENAME' changes each time a new
1      file is read (⇒Reading Files).  Inside a 'BEGIN' rule, the
1      value of 'FILENAME' is '""', because there are no input files being
1      processed yet.(1)  (d.c.)  Note, though, that using 'getline'
1      (⇒Getline) inside a 'BEGIN' rule can give 'FILENAME' a
1      value.
1 
1 'FNR'
1      The current record number in the current file.  'awk' increments
1      'FNR' each time it reads a new record (⇒Records).  'awk'
1      resets 'FNR' to zero each time it starts a new input file.
1 
1 'NF'
1      The number of fields in the current input record.  'NF' is set each
1      time a new record is read, when a new field is created, or when
1      '$0' changes (⇒Fields).
1 
1      Unlike most of the variables described in this node, assigning a
1      value to 'NF' has the potential to affect 'awk''s internal
1      workings.  In particular, assignments to 'NF' can be used to create
11      fields in or remove fields from the current record.  ⇒Changing
      Fields.
1 
1 'FUNCTAB #'
1      An array whose indices and corresponding values are the names of
1      all the built-in, user-defined, and extension functions in the
1      program.
1 
1           NOTE: Attempting to use the 'delete' statement with the
1           'FUNCTAB' array causes a fatal error.  Any attempt to assign
1           to an element of 'FUNCTAB' also causes a fatal error.
1 
1 'NR'
1      The number of input records 'awk' has processed since the beginning
1      of the program's execution (⇒Records).  'awk' increments
1      'NR' each time it reads a new record.
1 
1 'PROCINFO #'
1      The elements of this array provide access to information about the
1      running 'awk' program.  The following elements (listed
1      alphabetically) are guaranteed to be available:
1 
1      'PROCINFO["argv"]'
1           The 'PROCINFO["argv"]' array contains all of the command-line
1           arguments (after glob expansion and redirection processing on
1           platforms where that must be done manually by the program)
1           with subscripts ranging from 0 through 'argc' - 1.  For
1           example, 'PROCINFO["argv"][0]' will contain the name by which
1           'gawk' was invoked.  Here is an example of how this feature
1           may be used:
1 
1                gawk '
1                BEGIN {
1                        for (i = 0; i < length(PROCINFO["argv"]); i++)
1                                print i, PROCINFO["argv"][i]
1                }'
1 
1           Please note that this differs from the standard 'ARGV' array
1           which does not include command-line arguments that have
1           already been processed by 'gawk' (⇒ARGC and ARGV).
1 
1      'PROCINFO["egid"]'
1           The value of the 'getegid()' system call.
1 
1      'PROCINFO["errno"]'
1           The value of the C 'errno' variable when 'ERRNO' is set to the
1           associated error message.
1 
1      'PROCINFO["euid"]'
1           The value of the 'geteuid()' system call.
1 
1      'PROCINFO["FS"]'
1           This is '"FS"' if field splitting with 'FS' is in effect,
1           '"FIELDWIDTHS"' if field splitting with 'FIELDWIDTHS' is in
1           effect, '"FPAT"' if field matching with 'FPAT' is in effect,
1           or '"API"' if field splitting is controlled by an API input
1           parser.
1 
1      'PROCINFO["gid"]'
1           The value of the 'getgid()' system call.
1 
1      'PROCINFO["identifiers"]'
1           A subarray, indexed by the names of all identifiers used in
1           the text of the 'awk' program.  An "identifier" is simply the
1           name of a variable (be it scalar or array), built-in function,
1           user-defined function, or extension function.  For each
1           identifier, the value of the element is one of the following:
1 
1           '"array"'
1                The identifier is an array.
1 
1           '"builtin"'
1                The identifier is a built-in function.
1 
1           '"extension"'
1                The identifier is an extension function loaded via
1                '@load' or '-l'.
1 
1           '"scalar"'
1                The identifier is a scalar.
1 
1           '"untyped"'
1                The identifier is untyped (could be used as a scalar or
1                an array; 'gawk' doesn't know yet).
1 
1           '"user"'
1                The identifier is a user-defined function.
1 
1           The values indicate what 'gawk' knows about the identifiers
1           after it has finished parsing the program; they are _not_
1           updated while the program runs.
1 
1      'PROCINFO["pgrpid"]'
1           The process group ID of the current process.
1 
1      'PROCINFO["pid"]'
1           The process ID of the current process.
1 
1      'PROCINFO["ppid"]'
1           The parent process ID of the current process.
1 
1      'PROCINFO["strftime"]'
1           The default time format string for 'strftime()'.  Assigning a
11           new value to this element changes the default.  ⇒Time
           Functions.
1 
1      'PROCINFO["uid"]'
1           The value of the 'getuid()' system call.
1 
1      'PROCINFO["version"]'
1           The version of 'gawk'.
1 
1      The following additional elements in the array are available to
1      provide information about the MPFR and GMP libraries if your
1      Arbitrary Precision Arithmetic::):
1 
1      'PROCINFO["gmp_version"]'
1           The version of the GNU MP library.
1 
1      'PROCINFO["mpfr_version"]'
1           The version of the GNU MPFR library.
1 
1      'PROCINFO["prec_max"]'
1           The maximum precision supported by MPFR.
1 
1      'PROCINFO["prec_min"]'
1           The minimum precision required by MPFR.
1 
1      The following additional elements in the array are available to
1      provide information about the version of the extension API, if your
1      version of 'gawk' supports dynamic loading of extension functions
1      (⇒Dynamic Extensions):
1 
1      'PROCINFO["api_major"]'
1           The major version of the extension API.
1 
1      'PROCINFO["api_minor"]'
1           The minor version of the extension API.
1 
1      On some systems, there may be elements in the array, '"group1"'
1      through '"groupN"' for some N.  N is the number of supplementary
1      groups that the process has.  Use the 'in' operator to test for
1      these elements (⇒Reference to Elements).
1 
1      The following elements allow you to change 'gawk''s behavior:
1 
1      'PROCINFO["NONFATAL"]'
1           If this element exists, then I/O errors for all redirections
1           become nonfatal.  ⇒Nonfatal.
1 
1      'PROCINFO["NAME", "NONFATAL"]'
1           Make I/O errors for NAME be nonfatal.  ⇒Nonfatal.
1 
1      'PROCINFO["COMMAND", "pty"]'
1           For two-way communication to COMMAND, use a pseudo-tty instead
1           of setting up a two-way pipe.  ⇒Two-way I/O for more
1           information.
1 
1      'PROCINFO["INPUT_NAME", "READ_TIMEOUT"]'
1           Set a timeout for reading from input redirection INPUT_NAME.
1           ⇒Read Timeout for more information.
1 
1      'PROCINFO["INPUT_NAME", "RETRY"]'
1           If an I/O error that may be retried occurs when reading data
1           from INPUT_NAME, and this array entry exists, then 'getline'
1           returns -2 instead of following the default behavior of
1           returning -1 and configuring INPUT_NAME to return no further
1           data.  An I/O error that may be retried is one where 'errno'
1           has the value 'EAGAIN', 'EWOULDBLOCK', 'EINTR', or
1           'ETIMEDOUT'.  This may be useful in conjunction with
1           'PROCINFO["INPUT_NAME", "READ_TIMEOUT"]' or situations where a
1           file descriptor has been configured to behave in a
1           non-blocking fashion.  ⇒Retrying Input for more
1           information.
1 
1      'PROCINFO["sorted_in"]'
1           If this element exists in 'PROCINFO', its value controls the
1           order in which array indices will be processed by 'for (INDX
1           in ARRAY)' loops.  This is an advanced feature, so we defer
11           the full description until later; see ⇒Scanning an
           Array.
1 
1 'RLENGTH'
1      The length of the substring matched by the 'match()' function
1      (⇒String Functions).  'RLENGTH' is set by invoking the
1      'match()' function.  Its value is the length of the matched string,
1      or -1 if no match is found.
1 
1 'RSTART'
1      The start index in characters of the substring that is matched by
1      the 'match()' function (⇒String Functions).  'RSTART' is set
1      by invoking the 'match()' function.  Its value is the position of
1      the string where the matched substring starts, or zero if no match
1      was found.
1 
1 'RT #'
1      The input text that matched the text denoted by 'RS', the record
1      separator.  It is set every time a record is read.
1 
1 'SYMTAB #'
1      An array whose indices are the names of all defined global
1      variables and arrays in the program.  'SYMTAB' makes 'gawk''s
1      symbol table visible to the 'awk' programmer.  It is built as
1      'gawk' parses the program and is complete before the program starts
1      to run.
1 
1      The array may be used for indirect access to read or write the
1      value of a variable:
1 
1           foo = 5
1           SYMTAB["foo"] = 4
1           print foo    # prints 4
1 
1      The 'isarray()' function (⇒Type Functions) may be used to
1      test if an element in 'SYMTAB' is an array.  Also, you may not use
1      the 'delete' statement with the 'SYMTAB' array.
1 
1      You may use an index for 'SYMTAB' that is not a predefined
1      identifier:
1 
1           SYMTAB["xxx"] = 5
1           print SYMTAB["xxx"]
1 
1      This works as expected: in this case 'SYMTAB' acts just like a
1      regular array.  The only difference is that you can't then delete
1      'SYMTAB["xxx"]'.
1 
1      The 'SYMTAB' array is more interesting than it looks.  Andrew
1      Schorr points out that it effectively gives 'awk' data pointers.
1      Consider his example:
1 
1           # Indirect multiply of any variable by amount, return result
1 
1           function multiply(variable, amount)
1           {
1               return SYMTAB[variable] *= amount
1           }
1 
1      You would use it like this:
1 
1           BEGIN {
1               answer = 10.5
1               multiply("answer", 4)
1               print "The answer is", answer
1           }
1 
1      When run, this produces:
1 
1           $ gawk -f answer.awk
1           -| The answer is 42
1 
1           NOTE: In order to avoid severe time-travel paradoxes,(2)
1           neither 'FUNCTAB' nor 'SYMTAB' is available as an element
1           within the 'SYMTAB' array.
1 
1                         Changing 'NR' and 'FNR'
1 
1    'awk' increments 'NR' and 'FNR' each time it reads a record, instead
1 of setting them to the absolute value of the number of records read.
1 This means that a program can change these variables and their new
1 values are incremented for each record.  (d.c.)  The following example
1 shows this:
1 
1      $ echo '1
1      > 2
1      > 3
1      > 4' | awk 'NR == 2 { NR = 17 }
1      > { print NR }'
1      -| 1
1      -| 17
1      -| 18
1      -| 19
1 
1 Before 'FNR' was added to the 'awk' language (⇒V7/SVR3.1), many
1 'awk' programs used this feature to track the number of records in a
1 file by resetting 'NR' to zero when 'FILENAME' changed.
1 
1    ---------- Footnotes ----------
1 
1    (1) Some early implementations of Unix 'awk' initialized 'FILENAME'
1 to '"-"', even if there were data files to be processed.  This behavior
1 was incorrect and should not be relied upon in your programs.
1 
1    (2) Not to mention difficult implementation issues.
1