gawk: Getopt Function
1
1 10.4 Processing Command-Line Options
1 ====================================
1
1 Most utilities on POSIX-compatible systems take options on the command
1 line that can be used to change the way a program behaves. 'awk' is an
1 example of such a program (⇒Options). Often, options take
1 "arguments" (i.e., data that the program needs to correctly obey the
1 command-line option). For example, 'awk''s '-F' option requires a
1 string to use as the field separator. The first occurrence on the
1 command line of either '--' or a string that does not begin with '-'
1 ends the options.
1
1 Modern Unix systems provide a C function named 'getopt()' for
1 processing command-line arguments. The programmer provides a string
1 describing the one-letter options. If an option requires an argument,
1 it is followed in the string with a colon. 'getopt()' is also passed
1 the count and values of the command-line arguments and is called in a
1 loop. 'getopt()' processes the command-line arguments for option
1 letters. Each time around the loop, it returns a single character
1 representing the next option letter that it finds, or '?' if it finds an
1 invalid option. When it returns -1, there are no options left on the
1 command line.
1
1 When using 'getopt()', options that do not take arguments can be
1 grouped together. Furthermore, options that take arguments require that
1 the argument be present. The argument can immediately follow the option
1 letter, or it can be a separate command-line argument.
1
1 Given a hypothetical program that takes three command-line options,
1 '-a', '-b', and '-c', where '-b' requires an argument, all of the
1 following are valid ways of invoking the program:
1
1 prog -a -b foo -c data1 data2 data3
1 prog -ac -bfoo -- data1 data2 data3
1 prog -acbfoo data1 data2 data3
1
1 Notice that when the argument is grouped with its option, the rest of
1 the argument is considered to be the option's argument. In this
1 example, '-acbfoo' indicates that all of the '-a', '-b', and '-c'
1 options were supplied, and that 'foo' is the argument to the '-b'
1 option.
1
1 'getopt()' provides four external variables that the programmer can
1 use:
1
1 'optind'
1 The index in the argument value array ('argv') where the first
1 nonoption command-line argument can be found.
1
1 'optarg'
1 The string value of the argument to an option.
1
1 'opterr'
1 Usually 'getopt()' prints an error message when it finds an invalid
1 option. Setting 'opterr' to zero disables this feature. (An
1 application might want to print its own error message.)
1
1 'optopt'
1 The letter representing the command-line option.
1
1 The following C fragment shows how 'getopt()' might process
1 command-line arguments for 'awk':
1
1 int
1 main(int argc, char *argv[])
1 {
1 ...
1 /* print our own message */
1 opterr = 0;
1 while ((c = getopt(argc, argv, "v:f:F:W:")) != -1) {
1 switch (c) {
1 case 'f': /* file */
1 ...
1 break;
1 case 'F': /* field separator */
1 ...
1 break;
1 case 'v': /* variable assignment */
1 ...
1 break;
1 case 'W': /* extension */
1 ...
1 break;
1 case '?':
1 default:
1 usage();
1 break;
1 }
1 }
1 ...
1 }
1
1 As a side point, 'gawk' actually uses the GNU 'getopt_long()'
11 function to process both normal and GNU-style long options (⇒
Options).
1
1 The abstraction provided by 'getopt()' is very useful and is quite
1 handy in 'awk' programs as well. Following is an 'awk' version of
1 'getopt()'. This function highlights one of the greatest weaknesses in
1 'awk', which is that it is very poor at manipulating single characters.
1 Repeated calls to 'substr()' are necessary for accessing individual
1 characters (⇒String Functions).(1)
1
1 The discussion that follows walks through the code a bit at a time:
1
1 # getopt.awk --- Do C library getopt(3) function in awk
1
1 # External variables:
1 # Optind -- index in ARGV of first nonoption argument
1 # Optarg -- string value of argument to current option
1 # Opterr -- if nonzero, print our own diagnostic
1 # Optopt -- current option letter
1
1 # Returns:
1 # -1 at end of options
1 # "?" for unrecognized option
1 # <c> a character representing the current option
1
1 # Private Data:
1 # _opti -- index in multiflag option, e.g., -abc
1
1 The function starts out with comments presenting a list of the global
1 variables it uses, what the return values are, what they mean, and any
1 global variables that are "private" to this library function. Such
1 documentation is essential for any program, and particularly for library
1 functions.
1
1 The 'getopt()' function first checks that it was indeed called with a
1 string of options (the 'options' parameter). If 'options' has a zero
1 length, 'getopt()' immediately returns -1:
1
1 function getopt(argc, argv, options, thisopt, i)
1 {
1 if (length(options) == 0) # no options given
1 return -1
1
1 if (argv[Optind] == "--") { # all done
1 Optind++
1 _opti = 0
1 return -1
1 } else if (argv[Optind] !~ /^-[^:[:space:]]/) {
1 _opti = 0
1 return -1
1 }
1
1 The next thing to check for is the end of the options. A '--' ends
1 the command-line options, as does any command-line argument that does
1 not begin with a '-'. 'Optind' is used to step through the array of
1 command-line arguments; it retains its value across calls to 'getopt()',
1 because it is a global variable.
1
1 The regular expression that is used, '/^-[^:[:space:]/', checks for a
1 '-' followed by anything that is not whitespace and not a colon. If the
1 current command-line argument does not match this pattern, it is not an
1 option, and it ends option processing. Continuing on:
1
1 if (_opti == 0)
1 _opti = 2
1 thisopt = substr(argv[Optind], _opti, 1)
1 Optopt = thisopt
1 i = index(options, thisopt)
1 if (i == 0) {
1 if (Opterr)
1 printf("%c -- invalid option\n", thisopt) > "/dev/stderr"
1 if (_opti >= length(argv[Optind])) {
1 Optind++
1 _opti = 0
1 } else
1 _opti++
1 return "?"
1 }
1
1 The '_opti' variable tracks the position in the current command-line
1 argument ('argv[Optind]'). If multiple options are grouped together
1 with one '-' (e.g., '-abx'), it is necessary to return them to the user
1 one at a time.
1
1 If '_opti' is equal to zero, it is set to two, which is the index in
1 the string of the next character to look at (we skip the '-', which is
1 at position one). The variable 'thisopt' holds the character, obtained
1 with 'substr()'. It is saved in 'Optopt' for the main program to use.
1
1 If 'thisopt' is not in the 'options' string, then it is an invalid
1 option. If 'Opterr' is nonzero, 'getopt()' prints an error message on
1 the standard error that is similar to the message from the C version of
1 'getopt()'.
1
1 Because the option is invalid, it is necessary to skip it and move on
1 to the next option character. If '_opti' is greater than or equal to
1 the length of the current command-line argument, it is necessary to move
1 on to the next argument, so 'Optind' is incremented and '_opti' is reset
1 to zero. Otherwise, 'Optind' is left alone and '_opti' is merely
1 incremented.
1
1 In any case, because the option is invalid, 'getopt()' returns '"?"'.
1 The main program can examine 'Optopt' if it needs to know what the
1 invalid option letter actually is. Continuing on:
1
1 if (substr(options, i + 1, 1) == ":") {
1 # get option argument
1 if (length(substr(argv[Optind], _opti + 1)) > 0)
1 Optarg = substr(argv[Optind], _opti + 1)
1 else
1 Optarg = argv[++Optind]
1 _opti = 0
1 } else
1 Optarg = ""
1
1 If the option requires an argument, the option letter is followed by
1 a colon in the 'options' string. If there are remaining characters in
1 the current command-line argument ('argv[Optind]'), then the rest of
1 that string is assigned to 'Optarg'. Otherwise, the next command-line
1 argument is used ('-xFOO' versus '-x FOO'). In either case, '_opti' is
1 reset to zero, because there are no more characters left to examine in
1 the current command-line argument. Continuing:
1
1 if (_opti == 0 || _opti >= length(argv[Optind])) {
1 Optind++
1 _opti = 0
1 } else
1 _opti++
1 return thisopt
1 }
1
1 Finally, if '_opti' is either zero or greater than the length of the
1 current command-line argument, it means this element in 'argv' is
1 through being processed, so 'Optind' is incremented to point to the next
1 element in 'argv'. If neither condition is true, then only '_opti' is
1 incremented, so that the next option letter can be processed on the next
1 call to 'getopt()'.
1
1 The 'BEGIN' rule initializes both 'Opterr' and 'Optind' to one.
1 'Opterr' is set to one, because the default behavior is for 'getopt()'
1 to print a diagnostic message upon seeing an invalid option. 'Optind'
1 is set to one, because there's no reason to look at the program name,
1 which is in 'ARGV[0]':
1
1 BEGIN {
1 Opterr = 1 # default is to diagnose
1 Optind = 1 # skip ARGV[0]
1
1 # test program
1 if (_getopt_test) {
1 while ((_go_c = getopt(ARGC, ARGV, "ab:cd")) != -1)
1 printf("c = <%c>, Optarg = <%s>\n",
1 _go_c, Optarg)
1 printf("non-option arguments:\n")
1 for (; Optind < ARGC; Optind++)
1 printf("\tARGV[%d] = <%s>\n",
1 Optind, ARGV[Optind])
1 }
1 }
1
1 The rest of the 'BEGIN' rule is a simple test program. Here are the
1 results of two sample runs of the test program:
1
1 $ awk -f getopt.awk -v _getopt_test=1 -- -a -cbARG bax -x
1 -| c = <a>, Optarg = <>
1 -| c = <c>, Optarg = <>
1 -| c = <b>, Optarg = <ARG>
1 -| non-option arguments:
1 -| ARGV[3] = <bax>
1 -| ARGV[4] = <-x>
1
1 $ awk -f getopt.awk -v _getopt_test=1 -- -a -x -- xyz abc
1 -| c = <a>, Optarg = <>
1 error-> x -- invalid option
1 -| c = <?>, Optarg = <>
1 -| non-option arguments:
1 -| ARGV[4] = <xyz>
1 -| ARGV[5] = <abc>
1
1 In both runs, the first '--' terminates the arguments to 'awk', so
1 that it does not try to interpret the '-a', etc., as its own options.
1
1 NOTE: After 'getopt()' is through, user-level code must clear out
1 all the elements of 'ARGV' from 1 to 'Optind', so that 'awk' does
1 not try to process the command-line options as file names.
1
1 Using '#!' with the '-E' option may help avoid conflicts between your
1 program's options and 'gawk''s options, as '-E' causes 'gawk' to abandon
DONTPRINTYET 1 processing of further options (⇒Executable Scripts and *note1DONTPRINTYET 1 processing of further options (⇒Executable Scripts and ⇒
Options).
1
1 Several of the sample programs presented in ⇒Sample Programs,
1 use 'getopt()' to process their arguments.
1
1 ---------- Footnotes ----------
1
1 (1) This function was written before 'gawk' acquired the ability to
1 split strings into single characters using '""' as the separator. We
1 have left it alone, as using 'substr()' is more portable.
1