gawk: Auto-set
1
1 7.5.2 Built-in Variables That Convey Information
1 ------------------------------------------------
1
1 The following is an alphabetical list of variables that 'awk' sets
1 automatically on certain occasions in order to provide information to
1 your program.
1
1 The variables that are specific to 'gawk' are marked with a pound
1 sign ('#'). These variables are 'gawk' extensions. In other 'awk'
1 implementations or if 'gawk' is in compatibility mode (⇒Options),
1 they are not special:
1
1 'ARGC', 'ARGV'
1 The command-line arguments available to 'awk' programs are stored
1 in an array called 'ARGV'. 'ARGC' is the number of command-line
1 arguments present. ⇒Other Arguments. Unlike most 'awk'
1 arrays, 'ARGV' is indexed from 0 to 'ARGC' - 1. In the following
1 example:
1
1 $ awk 'BEGIN {
1 > for (i = 0; i < ARGC; i++)
1 > print ARGV[i]
1 > }' inventory-shipped mail-list
1 -| awk
1 -| inventory-shipped
1 -| mail-list
1
1 'ARGV[0]' contains 'awk', 'ARGV[1]' contains 'inventory-shipped',
1 and 'ARGV[2]' contains 'mail-list'. The value of 'ARGC' is three,
1 one more than the index of the last element in 'ARGV', because the
1 elements are numbered from zero.
1
1 The names 'ARGC' and 'ARGV', as well as the convention of indexing
1 the array from 0 to 'ARGC' - 1, are derived from the C language's
1 method of accessing command-line arguments.
1
1 The value of 'ARGV[0]' can vary from system to system. Also, you
1 should note that the program text is _not_ included in 'ARGV', nor
1 are any of 'awk''s command-line options. ⇒ARGC and ARGV for
1 information about how 'awk' uses these variables. (d.c.)
1
1 'ARGIND #'
1 The index in 'ARGV' of the current file being processed. Every
1 time 'gawk' opens a new data file for processing, it sets 'ARGIND'
1 to the index in 'ARGV' of the file name. When 'gawk' is processing
1 the input files, 'FILENAME == ARGV[ARGIND]' is always true.
1
1 This variable is useful in file processing; it allows you to tell
1 how far along you are in the list of data files as well as to
1 distinguish between successive instances of the same file name on
1 the command line.
1
1 While you can change the value of 'ARGIND' within your 'awk'
1 program, 'gawk' automatically sets it to a new value when it opens
1 the next file.
1
1 'ENVIRON'
1 An associative array containing the values of the environment. The
1 array indices are the environment variable names; the elements are
1 the values of the particular environment variables. For example,
1 'ENVIRON["HOME"]' might be '/home/arnold'.
1
1 For POSIX 'awk', changing this array does not affect the
1 environment passed on to any programs that 'awk' may spawn via
1 redirection or the 'system()' function.
1
1 However, beginning with version 4.2, if not in POSIX compatibility
1 mode, 'gawk' does update its own environment when 'ENVIRON' is
1 changed, thus changing the environment seen by programs that it
1 creates. You should therefore be especially careful if you modify
1 'ENVIRON["PATH"]', which is the search path for finding executable
1 programs.
1
1 This can also affect the running 'gawk' program, since some of the
1 built-in functions may pay attention to certain environment
11 variables. The most notable instance of this is 'mktime()' (⇒
Time Functions), which pays attention the value of the 'TZ'
1 environment variable on many systems.
1
1 Some operating systems may not have environment variables. On such
1 systems, the 'ENVIRON' array is empty (except for
11 'ENVIRON["AWKPATH"]' and 'ENVIRON["AWKLIBPATH"]'; ⇒AWKPATH
Variable and ⇒AWKLIBPATH Variable).
1
1 'ERRNO #'
1 If a system error occurs during a redirection for 'getline', during
1 a read for 'getline', or during a 'close()' operation, then 'ERRNO'
1 contains a string describing the error.
1
1 In addition, 'gawk' clears 'ERRNO' before opening each command-line
1 input file. This enables checking if the file is readable inside a
1 'BEGINFILE' pattern (⇒BEGINFILE/ENDFILE).
1
1 Otherwise, 'ERRNO' works similarly to the C variable 'errno'.
1 Except for the case just mentioned, 'gawk' _never_ clears it (sets
1 it to zero or '""'). Thus, you should only expect its value to be
1 meaningful when an I/O operation returns a failure value, such as
1 'getline' returning -1. You are, of course, free to clear it
1 yourself before doing an I/O operation.
1
1 If the value of 'ERRNO' corresponds to a system error in the C
1 'errno' variable, then 'PROCINFO["errno"]' will be set to the value
1 of 'errno'. For non-system errors, 'PROCINFO["errno"]' will be
1 zero.
1
1 'FILENAME'
1 The name of the current input file. When no data files are listed
1 on the command line, 'awk' reads from the standard input and
1 'FILENAME' is set to '"-"'. 'FILENAME' changes each time a new
1 file is read (⇒Reading Files). Inside a 'BEGIN' rule, the
1 value of 'FILENAME' is '""', because there are no input files being
1 processed yet.(1) (d.c.) Note, though, that using 'getline'
1 (⇒Getline) inside a 'BEGIN' rule can give 'FILENAME' a
1 value.
1
1 'FNR'
1 The current record number in the current file. 'awk' increments
1 'FNR' each time it reads a new record (⇒Records). 'awk'
1 resets 'FNR' to zero each time it starts a new input file.
1
1 'NF'
1 The number of fields in the current input record. 'NF' is set each
1 time a new record is read, when a new field is created, or when
1 '$0' changes (⇒Fields).
1
1 Unlike most of the variables described in this node, assigning a
1 value to 'NF' has the potential to affect 'awk''s internal
1 workings. In particular, assignments to 'NF' can be used to create
11 fields in or remove fields from the current record. ⇒Changing
Fields.
1
1 'FUNCTAB #'
1 An array whose indices and corresponding values are the names of
1 all the built-in, user-defined, and extension functions in the
1 program.
1
1 NOTE: Attempting to use the 'delete' statement with the
1 'FUNCTAB' array causes a fatal error. Any attempt to assign
1 to an element of 'FUNCTAB' also causes a fatal error.
1
1 'NR'
1 The number of input records 'awk' has processed since the beginning
1 of the program's execution (⇒Records). 'awk' increments
1 'NR' each time it reads a new record.
1
1 'PROCINFO #'
1 The elements of this array provide access to information about the
1 running 'awk' program. The following elements (listed
1 alphabetically) are guaranteed to be available:
1
1 'PROCINFO["argv"]'
1 The 'PROCINFO["argv"]' array contains all of the command-line
1 arguments (after glob expansion and redirection processing on
1 platforms where that must be done manually by the program)
1 with subscripts ranging from 0 through 'argc' - 1. For
1 example, 'PROCINFO["argv"][0]' will contain the name by which
1 'gawk' was invoked. Here is an example of how this feature
1 may be used:
1
1 gawk '
1 BEGIN {
1 for (i = 0; i < length(PROCINFO["argv"]); i++)
1 print i, PROCINFO["argv"][i]
1 }'
1
1 Please note that this differs from the standard 'ARGV' array
1 which does not include command-line arguments that have
1 already been processed by 'gawk' (⇒ARGC and ARGV).
1
1 'PROCINFO["egid"]'
1 The value of the 'getegid()' system call.
1
1 'PROCINFO["errno"]'
1 The value of the C 'errno' variable when 'ERRNO' is set to the
1 associated error message.
1
1 'PROCINFO["euid"]'
1 The value of the 'geteuid()' system call.
1
1 'PROCINFO["FS"]'
1 This is '"FS"' if field splitting with 'FS' is in effect,
1 '"FIELDWIDTHS"' if field splitting with 'FIELDWIDTHS' is in
1 effect, '"FPAT"' if field matching with 'FPAT' is in effect,
1 or '"API"' if field splitting is controlled by an API input
1 parser.
1
1 'PROCINFO["gid"]'
1 The value of the 'getgid()' system call.
1
1 'PROCINFO["identifiers"]'
1 A subarray, indexed by the names of all identifiers used in
1 the text of the 'awk' program. An "identifier" is simply the
1 name of a variable (be it scalar or array), built-in function,
1 user-defined function, or extension function. For each
1 identifier, the value of the element is one of the following:
1
1 '"array"'
1 The identifier is an array.
1
1 '"builtin"'
1 The identifier is a built-in function.
1
1 '"extension"'
1 The identifier is an extension function loaded via
1 '@load' or '-l'.
1
1 '"scalar"'
1 The identifier is a scalar.
1
1 '"untyped"'
1 The identifier is untyped (could be used as a scalar or
1 an array; 'gawk' doesn't know yet).
1
1 '"user"'
1 The identifier is a user-defined function.
1
1 The values indicate what 'gawk' knows about the identifiers
1 after it has finished parsing the program; they are _not_
1 updated while the program runs.
1
1 'PROCINFO["pgrpid"]'
1 The process group ID of the current process.
1
1 'PROCINFO["pid"]'
1 The process ID of the current process.
1
1 'PROCINFO["ppid"]'
1 The parent process ID of the current process.
1
1 'PROCINFO["strftime"]'
1 The default time format string for 'strftime()'. Assigning a
11 new value to this element changes the default. ⇒Time
Functions.
1
1 'PROCINFO["uid"]'
1 The value of the 'getuid()' system call.
1
1 'PROCINFO["version"]'
1 The version of 'gawk'.
1
1 The following additional elements in the array are available to
1 provide information about the MPFR and GMP libraries if your
1 Arbitrary Precision Arithmetic::):
1
1 'PROCINFO["gmp_version"]'
1 The version of the GNU MP library.
1
1 'PROCINFO["mpfr_version"]'
1 The version of the GNU MPFR library.
1
1 'PROCINFO["prec_max"]'
1 The maximum precision supported by MPFR.
1
1 'PROCINFO["prec_min"]'
1 The minimum precision required by MPFR.
1
1 The following additional elements in the array are available to
1 provide information about the version of the extension API, if your
1 version of 'gawk' supports dynamic loading of extension functions
1 (⇒Dynamic Extensions):
1
1 'PROCINFO["api_major"]'
1 The major version of the extension API.
1
1 'PROCINFO["api_minor"]'
1 The minor version of the extension API.
1
1 On some systems, there may be elements in the array, '"group1"'
1 through '"groupN"' for some N. N is the number of supplementary
1 groups that the process has. Use the 'in' operator to test for
1 these elements (⇒Reference to Elements).
1
1 The following elements allow you to change 'gawk''s behavior:
1
1 'PROCINFO["NONFATAL"]'
1 If this element exists, then I/O errors for all redirections
1 become nonfatal. ⇒Nonfatal.
1
1 'PROCINFO["NAME", "NONFATAL"]'
1 Make I/O errors for NAME be nonfatal. ⇒Nonfatal.
1
1 'PROCINFO["COMMAND", "pty"]'
1 For two-way communication to COMMAND, use a pseudo-tty instead
1 of setting up a two-way pipe. ⇒Two-way I/O for more
1 information.
1
1 'PROCINFO["INPUT_NAME", "READ_TIMEOUT"]'
1 Set a timeout for reading from input redirection INPUT_NAME.
1 ⇒Read Timeout for more information.
1
1 'PROCINFO["INPUT_NAME", "RETRY"]'
1 If an I/O error that may be retried occurs when reading data
1 from INPUT_NAME, and this array entry exists, then 'getline'
1 returns -2 instead of following the default behavior of
1 returning -1 and configuring INPUT_NAME to return no further
1 data. An I/O error that may be retried is one where 'errno'
1 has the value 'EAGAIN', 'EWOULDBLOCK', 'EINTR', or
1 'ETIMEDOUT'. This may be useful in conjunction with
1 'PROCINFO["INPUT_NAME", "READ_TIMEOUT"]' or situations where a
1 file descriptor has been configured to behave in a
1 non-blocking fashion. ⇒Retrying Input for more
1 information.
1
1 'PROCINFO["sorted_in"]'
1 If this element exists in 'PROCINFO', its value controls the
1 order in which array indices will be processed by 'for (INDX
1 in ARRAY)' loops. This is an advanced feature, so we defer
11 the full description until later; see ⇒Scanning an
Array.
1
1 'RLENGTH'
1 The length of the substring matched by the 'match()' function
1 (⇒String Functions). 'RLENGTH' is set by invoking the
1 'match()' function. Its value is the length of the matched string,
1 or -1 if no match is found.
1
1 'RSTART'
1 The start index in characters of the substring that is matched by
1 the 'match()' function (⇒String Functions). 'RSTART' is set
1 by invoking the 'match()' function. Its value is the position of
1 the string where the matched substring starts, or zero if no match
1 was found.
1
1 'RT #'
1 The input text that matched the text denoted by 'RS', the record
1 separator. It is set every time a record is read.
1
1 'SYMTAB #'
1 An array whose indices are the names of all defined global
1 variables and arrays in the program. 'SYMTAB' makes 'gawk''s
1 symbol table visible to the 'awk' programmer. It is built as
1 'gawk' parses the program and is complete before the program starts
1 to run.
1
1 The array may be used for indirect access to read or write the
1 value of a variable:
1
1 foo = 5
1 SYMTAB["foo"] = 4
1 print foo # prints 4
1
1 The 'isarray()' function (⇒Type Functions) may be used to
1 test if an element in 'SYMTAB' is an array. Also, you may not use
1 the 'delete' statement with the 'SYMTAB' array.
1
1 You may use an index for 'SYMTAB' that is not a predefined
1 identifier:
1
1 SYMTAB["xxx"] = 5
1 print SYMTAB["xxx"]
1
1 This works as expected: in this case 'SYMTAB' acts just like a
1 regular array. The only difference is that you can't then delete
1 'SYMTAB["xxx"]'.
1
1 The 'SYMTAB' array is more interesting than it looks. Andrew
1 Schorr points out that it effectively gives 'awk' data pointers.
1 Consider his example:
1
1 # Indirect multiply of any variable by amount, return result
1
1 function multiply(variable, amount)
1 {
1 return SYMTAB[variable] *= amount
1 }
1
1 You would use it like this:
1
1 BEGIN {
1 answer = 10.5
1 multiply("answer", 4)
1 print "The answer is", answer
1 }
1
1 When run, this produces:
1
1 $ gawk -f answer.awk
1 -| The answer is 42
1
1 NOTE: In order to avoid severe time-travel paradoxes,(2)
1 neither 'FUNCTAB' nor 'SYMTAB' is available as an element
1 within the 'SYMTAB' array.
1
1 Changing 'NR' and 'FNR'
1
1 'awk' increments 'NR' and 'FNR' each time it reads a record, instead
1 of setting them to the absolute value of the number of records read.
1 This means that a program can change these variables and their new
1 values are incremented for each record. (d.c.) The following example
1 shows this:
1
1 $ echo '1
1 > 2
1 > 3
1 > 4' | awk 'NR == 2 { NR = 17 }
1 > { print NR }'
1 -| 1
1 -| 17
1 -| 18
1 -| 19
1
1 Before 'FNR' was added to the 'awk' language (⇒V7/SVR3.1), many
1 'awk' programs used this feature to track the number of records in a
1 file by resetting 'NR' to zero when 'FILENAME' changed.
1
1 ---------- Footnotes ----------
1
1 (1) Some early implementations of Unix 'awk' initialized 'FILENAME'
1 to '"-"', even if there were data files to be processed. This behavior
1 was incorrect and should not be relied upon in your programs.
1
1 (2) Not to mention difficult implementation issues.
1