gawk: Indirect Calls

1 
1 9.3 Indirect Function Calls
1 ===========================
1 
1 This section describes an advanced, 'gawk'-specific extension.
1 
1    Often, you may wish to defer the choice of function to call until
1 runtime.  For example, you may have different kinds of records, each of
1 which should be processed differently.
1 
1    Normally, you would have to use a series of 'if'-'else' statements to
1 decide which function to call.  By using "indirect" function calls, you
1 can specify the name of the function to call as a string variable, and
1 then call the function.  Let's look at an example.
1 
1    Suppose you have a file with your test scores for the classes you are
1 taking, and you wish to get the sum and the average of your test scores.
1 The first field is the class name.  The following fields are the
1 functions to call to process the data, up to a "marker" field 'data:'.
1 Following the marker, to the end of the record, are the various numeric
1 test scores.
1 
1    Here is the initial file:
1 
1      Biology_101 sum average data: 87.0 92.4 78.5 94.9
1      Chemistry_305 sum average data: 75.2 98.3 94.7 88.2
1      English_401 sum average data: 100.0 95.6 87.1 93.4
1 
1    To process the data, you might write initially:
1 
1      {
1          class = $1
1          for (i = 2; $i != "data:"; i++) {
1              if ($i == "sum")
1                  sum()   # processes the whole record
1              else if ($i == "average")
1                  average()
1              ...           # and so on
1          }
1      }
1 
1 This style of programming works, but can be awkward.  With "indirect"
1 function calls, you tell 'gawk' to use the _value_ of a variable as the
1 _name_ of the function to call.
1 
1    The syntax is similar to that of a regular function call: an
1 identifier immediately followed by an opening parenthesis, any
1 arguments, and then a closing parenthesis, with the addition of a
1 leading '@' character:
1 
1      the_func = "sum"
1      result = @the_func()   # calls the sum() function
1 
1    Here is a full program that processes the previously shown data,
1 using indirect function calls:
1 
1      # indirectcall.awk --- Demonstrate indirect function calls
1 
1      # average --- return the average of the values in fields $first - $last
1 
1      function average(first, last,   sum, i)
1      {
1          sum = 0;
1          for (i = first; i <= last; i++)
1              sum += $i
1 
1          return sum / (last - first + 1)
1      }
1 
1      # sum --- return the sum of the values in fields $first - $last
1 
1      function sum(first, last,   ret, i)
1      {
1          ret = 0;
1          for (i = first; i <= last; i++)
1              ret += $i
1 
1          return ret
1      }
1 
1    These two functions expect to work on fields; thus, the parameters
1 'first' and 'last' indicate where in the fields to start and end.
1 Otherwise, they perform the expected computations and are not unusual:
1 
1      # For each record, print the class name and the requested statistics
1      {
1          class_name = $1
1          gsub(/_/, " ", class_name)  # Replace _ with spaces
1 
1          # find start
1          for (i = 1; i <= NF; i++) {
1              if ($i == "data:") {
1                  start = i + 1
1                  break
1              }
1          }
1 
1          printf("%s:\n", class_name)
1          for (i = 2; $i != "data:"; i++) {
1              the_function = $i
1              printf("\t%s: <%s>\n", $i, @the_function(start, NF) "")
1          }
1          print ""
1      }
1 
1    This is the main processing for each record.  It prints the class
1 name (with underscores replaced with spaces).  It then finds the start
1 of the actual data, saving it in 'start'.  The last part of the code
1 loops through each function name (from '$2' up to the marker, 'data:'),
1 calling the function named by the field.  The indirect function call
1 itself occurs as a parameter in the call to 'printf'.  (The 'printf'
1 format string uses '%s' as the format specifier so that we can use
1 functions that return strings, as well as numbers.  Note that the result
1 from the indirect call is concatenated with the empty string, in order
1 to force it to be a string value.)
1 
1    Here is the result of running the program:
1 
1      $ gawk -f indirectcall.awk class_data1
1      -| Biology 101:
1      -|     sum: <352.8>
1      -|     average: <88.2>
1      -|
1      -| Chemistry 305:
1      -|     sum: <356.4>
1      -|     average: <89.1>
1      -|
1      -| English 401:
1      -|     sum: <376.1>
1      -|     average: <94.025>
1 
1    The ability to use indirect function calls is more powerful than you
1 may think at first.  The C and C++ languages provide "function
1 pointers," which are a mechanism for calling a function chosen at
1 runtime.  One of the most well-known uses of this ability is the C
1 'qsort()' function, which sorts an array using the famous "quicksort"
1 algorithm (see the Wikipedia article
1 (https://en.wikipedia.org/wiki/Quicksort) for more information).  To use
1 this function, you supply a pointer to a comparison function.  This
1 mechanism allows you to sort arbitrary data in an arbitrary fashion.
1 
1    We can do something similar using 'gawk', like this:
1 
1      # quicksort.awk --- Quicksort algorithm, with user-supplied
1      #                   comparison function
1 
1      # quicksort --- C.A.R. Hoare's quicksort algorithm. See Wikipedia
1      #               or almost any algorithms or computer science text.
1 
1      function quicksort(data, left, right, less_than,    i, last)
1      {
1          if (left >= right)  # do nothing if array contains fewer
1              return          # than two elements
1 
1          quicksort_swap(data, left, int((left + right) / 2))
1          last = left
1          for (i = left + 1; i <= right; i++)
1              if (@less_than(data[i], data[left]))
1                  quicksort_swap(data, ++last, i)
1          quicksort_swap(data, left, last)
1          quicksort(data, left, last - 1, less_than)
1          quicksort(data, last + 1, right, less_than)
1      }
1 
1      # quicksort_swap --- helper function for quicksort, should really be inline
1 
1      function quicksort_swap(data, i, j,      temp)
1      {
1          temp = data[i]
1          data[i] = data[j]
1          data[j] = temp
1      }
1 
1    The 'quicksort()' function receives the 'data' array, the starting
1 and ending indices to sort ('left' and 'right'), and the name of a
1 function that performs a "less than" comparison.  It then implements the
1 quicksort algorithm.
1 
1    To make use of the sorting function, we return to our previous
1 example.  The first thing to do is write some comparison functions:
1 
1      # num_lt --- do a numeric less than comparison
1 
1      function num_lt(left, right)
1      {
1          return ((left + 0) < (right + 0))
1      }
1 
1      # num_ge --- do a numeric greater than or equal to comparison
1 
1      function num_ge(left, right)
1      {
1          return ((left + 0) >= (right + 0))
1      }
1 
1    The 'num_ge()' function is needed to perform a descending sort; when
1 used to perform a "less than" test, it actually does the opposite
1 (greater than or equal to), which yields data sorted in descending
1 order.
1 
1    Next comes a sorting function.  It is parameterized with the starting
1 and ending field numbers and the comparison function.  It builds an
1 array with the data and calls 'quicksort()' appropriately, and then
1 formats the results as a single string:
1 
1      # do_sort --- sort the data according to `compare'
1      #             and return it as a string
1 
1      function do_sort(first, last, compare,      data, i, retval)
1      {
1          delete data
1          for (i = 1; first <= last; first++) {
1              data[i] = $first
1              i++
1          }
1 
1          quicksort(data, 1, i-1, compare)
1 
1          retval = data[1]
1          for (i = 2; i in data; i++)
1              retval = retval " " data[i]
1 
1          return retval
1      }
1 
1    Finally, the two sorting functions call 'do_sort()', passing in the
1 names of the two comparison functions:
1 
1      # sort --- sort the data in ascending order and return it as a string
1 
1      function sort(first, last)
1      {
1          return do_sort(first, last, "num_lt")
1      }
1 
1      # rsort --- sort the data in descending order and return it as a string
1 
1      function rsort(first, last)
1      {
1          return do_sort(first, last, "num_ge")
1      }
1 
1    Here is an extended version of the data file:
1 
1      Biology_101 sum average sort rsort data: 87.0 92.4 78.5 94.9
1      Chemistry_305 sum average sort rsort data: 75.2 98.3 94.7 88.2
1      English_401 sum average sort rsort data: 100.0 95.6 87.1 93.4
1 
1    Finally, here are the results when the enhanced program is run:
1 
1      $ gawk -f quicksort.awk -f indirectcall.awk class_data2
1      -| Biology 101:
1      -|     sum: <352.8>
1      -|     average: <88.2>
1      -|     sort: <78.5 87.0 92.4 94.9>
1      -|     rsort: <94.9 92.4 87.0 78.5>
1      -|
1      -| Chemistry 305:
1      -|     sum: <356.4>
1      -|     average: <89.1>
1      -|     sort: <75.2 88.2 94.7 98.3>
1      -|     rsort: <98.3 94.7 88.2 75.2>
1      -|
1      -| English 401:
1      -|     sum: <376.1>
1      -|     average: <94.025>
1      -|     sort: <87.1 93.4 95.6 100.0>
1      -|     rsort: <100.0 95.6 93.4 87.1>
1 
1    Another example where indirect functions calls are useful can be
11 found in processing arrays.  This is described in ⇒Walking
 Arrays.
1 
1    Remember that you must supply a leading '@' in front of an indirect
1 function call.
1 
1    Starting with version 4.1.2 of 'gawk', indirect function calls may
11 also be used with built-in functions and with extension functions (⇒
 Dynamic Extensions).  There are some limitations when calling built-in
1 functions indirectly, as follows.
1 
1    * You cannot pass a regular expression constant to a built-in
1      function through an indirect function call.(1)  This applies to the
1      'sub()', 'gsub()', 'gensub()', 'match()', 'split()' and
1      'patsplit()' functions.
1 
1    * If calling 'sub()' or 'gsub()', you may only pass two arguments,
1      since those functions are unusual in that they update their third
1      argument.  This means that '$0' will be updated.
1 
1    'gawk' does its best to make indirect function calls efficient.  For
1 example, in the following case:
1 
1      for (i = 1; i <= n; i++)
1          @the_func()
1 
1 'gawk' looks up the actual function to call only once.
1 
1    ---------- Footnotes ----------
1 
1    (1) This may change in a future version; recheck the documentation
1 that comes with your version of 'gawk' to see if it has.
1