gawk: Indirect Calls
1
1 9.3 Indirect Function Calls
1 ===========================
1
1 This section describes an advanced, 'gawk'-specific extension.
1
1 Often, you may wish to defer the choice of function to call until
1 runtime. For example, you may have different kinds of records, each of
1 which should be processed differently.
1
1 Normally, you would have to use a series of 'if'-'else' statements to
1 decide which function to call. By using "indirect" function calls, you
1 can specify the name of the function to call as a string variable, and
1 then call the function. Let's look at an example.
1
1 Suppose you have a file with your test scores for the classes you are
1 taking, and you wish to get the sum and the average of your test scores.
1 The first field is the class name. The following fields are the
1 functions to call to process the data, up to a "marker" field 'data:'.
1 Following the marker, to the end of the record, are the various numeric
1 test scores.
1
1 Here is the initial file:
1
1 Biology_101 sum average data: 87.0 92.4 78.5 94.9
1 Chemistry_305 sum average data: 75.2 98.3 94.7 88.2
1 English_401 sum average data: 100.0 95.6 87.1 93.4
1
1 To process the data, you might write initially:
1
1 {
1 class = $1
1 for (i = 2; $i != "data:"; i++) {
1 if ($i == "sum")
1 sum() # processes the whole record
1 else if ($i == "average")
1 average()
1 ... # and so on
1 }
1 }
1
1 This style of programming works, but can be awkward. With "indirect"
1 function calls, you tell 'gawk' to use the _value_ of a variable as the
1 _name_ of the function to call.
1
1 The syntax is similar to that of a regular function call: an
1 identifier immediately followed by an opening parenthesis, any
1 arguments, and then a closing parenthesis, with the addition of a
1 leading '@' character:
1
1 the_func = "sum"
1 result = @the_func() # calls the sum() function
1
1 Here is a full program that processes the previously shown data,
1 using indirect function calls:
1
1 # indirectcall.awk --- Demonstrate indirect function calls
1
1 # average --- return the average of the values in fields $first - $last
1
1 function average(first, last, sum, i)
1 {
1 sum = 0;
1 for (i = first; i <= last; i++)
1 sum += $i
1
1 return sum / (last - first + 1)
1 }
1
1 # sum --- return the sum of the values in fields $first - $last
1
1 function sum(first, last, ret, i)
1 {
1 ret = 0;
1 for (i = first; i <= last; i++)
1 ret += $i
1
1 return ret
1 }
1
1 These two functions expect to work on fields; thus, the parameters
1 'first' and 'last' indicate where in the fields to start and end.
1 Otherwise, they perform the expected computations and are not unusual:
1
1 # For each record, print the class name and the requested statistics
1 {
1 class_name = $1
1 gsub(/_/, " ", class_name) # Replace _ with spaces
1
1 # find start
1 for (i = 1; i <= NF; i++) {
1 if ($i == "data:") {
1 start = i + 1
1 break
1 }
1 }
1
1 printf("%s:\n", class_name)
1 for (i = 2; $i != "data:"; i++) {
1 the_function = $i
1 printf("\t%s: <%s>\n", $i, @the_function(start, NF) "")
1 }
1 print ""
1 }
1
1 This is the main processing for each record. It prints the class
1 name (with underscores replaced with spaces). It then finds the start
1 of the actual data, saving it in 'start'. The last part of the code
1 loops through each function name (from '$2' up to the marker, 'data:'),
1 calling the function named by the field. The indirect function call
1 itself occurs as a parameter in the call to 'printf'. (The 'printf'
1 format string uses '%s' as the format specifier so that we can use
1 functions that return strings, as well as numbers. Note that the result
1 from the indirect call is concatenated with the empty string, in order
1 to force it to be a string value.)
1
1 Here is the result of running the program:
1
1 $ gawk -f indirectcall.awk class_data1
1 -| Biology 101:
1 -| sum: <352.8>
1 -| average: <88.2>
1 -|
1 -| Chemistry 305:
1 -| sum: <356.4>
1 -| average: <89.1>
1 -|
1 -| English 401:
1 -| sum: <376.1>
1 -| average: <94.025>
1
1 The ability to use indirect function calls is more powerful than you
1 may think at first. The C and C++ languages provide "function
1 pointers," which are a mechanism for calling a function chosen at
1 runtime. One of the most well-known uses of this ability is the C
1 'qsort()' function, which sorts an array using the famous "quicksort"
1 algorithm (see the Wikipedia article
1 (https://en.wikipedia.org/wiki/Quicksort) for more information). To use
1 this function, you supply a pointer to a comparison function. This
1 mechanism allows you to sort arbitrary data in an arbitrary fashion.
1
1 We can do something similar using 'gawk', like this:
1
1 # quicksort.awk --- Quicksort algorithm, with user-supplied
1 # comparison function
1
1 # quicksort --- C.A.R. Hoare's quicksort algorithm. See Wikipedia
1 # or almost any algorithms or computer science text.
1
1 function quicksort(data, left, right, less_than, i, last)
1 {
1 if (left >= right) # do nothing if array contains fewer
1 return # than two elements
1
1 quicksort_swap(data, left, int((left + right) / 2))
1 last = left
1 for (i = left + 1; i <= right; i++)
1 if (@less_than(data[i], data[left]))
1 quicksort_swap(data, ++last, i)
1 quicksort_swap(data, left, last)
1 quicksort(data, left, last - 1, less_than)
1 quicksort(data, last + 1, right, less_than)
1 }
1
1 # quicksort_swap --- helper function for quicksort, should really be inline
1
1 function quicksort_swap(data, i, j, temp)
1 {
1 temp = data[i]
1 data[i] = data[j]
1 data[j] = temp
1 }
1
1 The 'quicksort()' function receives the 'data' array, the starting
1 and ending indices to sort ('left' and 'right'), and the name of a
1 function that performs a "less than" comparison. It then implements the
1 quicksort algorithm.
1
1 To make use of the sorting function, we return to our previous
1 example. The first thing to do is write some comparison functions:
1
1 # num_lt --- do a numeric less than comparison
1
1 function num_lt(left, right)
1 {
1 return ((left + 0) < (right + 0))
1 }
1
1 # num_ge --- do a numeric greater than or equal to comparison
1
1 function num_ge(left, right)
1 {
1 return ((left + 0) >= (right + 0))
1 }
1
1 The 'num_ge()' function is needed to perform a descending sort; when
1 used to perform a "less than" test, it actually does the opposite
1 (greater than or equal to), which yields data sorted in descending
1 order.
1
1 Next comes a sorting function. It is parameterized with the starting
1 and ending field numbers and the comparison function. It builds an
1 array with the data and calls 'quicksort()' appropriately, and then
1 formats the results as a single string:
1
1 # do_sort --- sort the data according to `compare'
1 # and return it as a string
1
1 function do_sort(first, last, compare, data, i, retval)
1 {
1 delete data
1 for (i = 1; first <= last; first++) {
1 data[i] = $first
1 i++
1 }
1
1 quicksort(data, 1, i-1, compare)
1
1 retval = data[1]
1 for (i = 2; i in data; i++)
1 retval = retval " " data[i]
1
1 return retval
1 }
1
1 Finally, the two sorting functions call 'do_sort()', passing in the
1 names of the two comparison functions:
1
1 # sort --- sort the data in ascending order and return it as a string
1
1 function sort(first, last)
1 {
1 return do_sort(first, last, "num_lt")
1 }
1
1 # rsort --- sort the data in descending order and return it as a string
1
1 function rsort(first, last)
1 {
1 return do_sort(first, last, "num_ge")
1 }
1
1 Here is an extended version of the data file:
1
1 Biology_101 sum average sort rsort data: 87.0 92.4 78.5 94.9
1 Chemistry_305 sum average sort rsort data: 75.2 98.3 94.7 88.2
1 English_401 sum average sort rsort data: 100.0 95.6 87.1 93.4
1
1 Finally, here are the results when the enhanced program is run:
1
1 $ gawk -f quicksort.awk -f indirectcall.awk class_data2
1 -| Biology 101:
1 -| sum: <352.8>
1 -| average: <88.2>
1 -| sort: <78.5 87.0 92.4 94.9>
1 -| rsort: <94.9 92.4 87.0 78.5>
1 -|
1 -| Chemistry 305:
1 -| sum: <356.4>
1 -| average: <89.1>
1 -| sort: <75.2 88.2 94.7 98.3>
1 -| rsort: <98.3 94.7 88.2 75.2>
1 -|
1 -| English 401:
1 -| sum: <376.1>
1 -| average: <94.025>
1 -| sort: <87.1 93.4 95.6 100.0>
1 -| rsort: <100.0 95.6 93.4 87.1>
1
1 Another example where indirect functions calls are useful can be
11 found in processing arrays. This is described in ⇒Walking
Arrays.
1
1 Remember that you must supply a leading '@' in front of an indirect
1 function call.
1
1 Starting with version 4.1.2 of 'gawk', indirect function calls may
11 also be used with built-in functions and with extension functions (⇒
Dynamic Extensions). There are some limitations when calling built-in
1 functions indirectly, as follows.
1
1 * You cannot pass a regular expression constant to a built-in
1 function through an indirect function call.(1) This applies to the
1 'sub()', 'gsub()', 'gensub()', 'match()', 'split()' and
1 'patsplit()' functions.
1
1 * If calling 'sub()' or 'gsub()', you may only pass two arguments,
1 since those functions are unusual in that they update their third
1 argument. This means that '$0' will be updated.
1
1 'gawk' does its best to make indirect function calls efficient. For
1 example, in the following case:
1
1 for (i = 1; i <= n; i++)
1 @the_func()
1
1 'gawk' looks up the actual function to call only once.
1
1 ---------- Footnotes ----------
1
1 (1) This may change in a future version; recheck the documentation
1 that comes with your version of 'gawk' to see if it has.
1