gawk: Controlling Scanning

1 
1 8.1.6 Using Predefined Array Scanning Orders with 'gawk'
1 --------------------------------------------------------
1 
1 This node describes a feature that is specific to 'gawk'.
1 
1    By default, when a 'for' loop traverses an array, the order is
1 undefined, meaning that the 'awk' implementation determines the order in
1 which the array is traversed.  This order is usually based on the
1 internal implementation of arrays and will vary from one version of
1 'awk' to the next.
1 
1    Often, though, you may wish to do something simple, such as "traverse
1 the array by comparing the indices in ascending order," or "traverse the
1 array by comparing the values in descending order."  'gawk' provides two
1 mechanisms that give you this control:
1 
1    * Set 'PROCINFO["sorted_in"]' to one of a set of predefined values.
1      We describe this now.
1 
1    * Set 'PROCINFO["sorted_in"]' to the name of a user-defined function
1      to use for comparison of array elements.  This advanced feature is
1      described later in ⇒Array Sorting.
1 
1    The following special values for 'PROCINFO["sorted_in"]' are
1 available:
1 
1 '"@unsorted"'
1      Array elements are processed in arbitrary order, which is the
1      default 'awk' behavior.
1 
1 '"@ind_str_asc"'
1      Order by indices in ascending order compared as strings; this is
1      the most basic sort.  (Internally, array indices are always
1      strings, so with 'a[2*5] = 1' the index is '"10"' rather than
1      numeric 10.)
1 
1 '"@ind_num_asc"'
1      Order by indices in ascending order but force them to be treated as
1      numbers in the process.  Any index with a non-numeric value will
1      end up positioned as if it were zero.
1 
1 '"@val_type_asc"'
1      Order by element values in ascending order (rather than by
11      indices).  Ordering is by the type assigned to the element (⇒
      Typing and Comparison).  All numeric values come before all
1      string values, which in turn come before all subarrays.  (Subarrays
1      have not been described yet; ⇒Arrays of Arrays.)
1 
1 '"@val_str_asc"'
1      Order by element values in ascending order (rather than by
1      indices).  Scalar values are compared as strings.  Subarrays, if
1      present, come out last.
1 
1 '"@val_num_asc"'
1      Order by element values in ascending order (rather than by
1      indices).  Scalar values are compared as numbers.  Subarrays, if
1      present, come out last.  When numeric values are equal, the string
1      values are used to provide an ordering: this guarantees consistent
1      results across different versions of the C 'qsort()' function,(1)
1      which 'gawk' uses internally to perform the sorting.
1 
1 '"@ind_str_desc"'
1      Like '"@ind_str_asc"', but the string indices are ordered from high
1      to low.
1 
1 '"@ind_num_desc"'
1      Like '"@ind_num_asc"', but the numeric indices are ordered from
1      high to low.
1 
1 '"@val_type_desc"'
1      Like '"@val_type_asc"', but the element values, based on type, are
1      ordered from high to low.  Subarrays, if present, come out first.
1 
1 '"@val_str_desc"'
1      Like '"@val_str_asc"', but the element values, treated as strings,
1      are ordered from high to low.  Subarrays, if present, come out
1      first.
1 
1 '"@val_num_desc"'
1      Like '"@val_num_asc"', but the element values, treated as numbers,
1      are ordered from high to low.  Subarrays, if present, come out
1      first.
1 
1    The array traversal order is determined before the 'for' loop starts
1 to run.  Changing 'PROCINFO["sorted_in"]' in the loop body does not
1 affect the loop.  For example:
1 
1      $ gawk '
1      > BEGIN {
1      >    a[4] = 4
1      >    a[3] = 3
1      >    for (i in a)
1      >        print i, a[i]
1      > }'
1      -| 4 4
1      -| 3 3
1      $ gawk '
1      > BEGIN {
1      >    PROCINFO["sorted_in"] = "@ind_str_asc"
1      >    a[4] = 4
1      >    a[3] = 3
1      >    for (i in a)
1      >        print i, a[i]
1      > }'
1      -| 3 3
1      -| 4 4
1 
1    When sorting an array by element values, if a value happens to be a
1 subarray then it is considered to be greater than any string or numeric
1 value, regardless of what the subarray itself contains, and all
1 subarrays are treated as being equal to each other.  Their order
1 relative to each other is determined by their index strings.
1 
1    Here are some additional things to bear in mind about sorted array
1 traversal:
1 
1    * The value of 'PROCINFO["sorted_in"]' is global.  That is, it
1      affects all array traversal 'for' loops.  If you need to change it
1      within your own code, you should see if it's defined and save and
1      restore the value:
1 
1           ...
1           if ("sorted_in" in PROCINFO) {
1               save_sorted = PROCINFO["sorted_in"]
1               PROCINFO["sorted_in"] = "@val_str_desc" # or whatever
1           }
1           ...
1           if (save_sorted)
1               PROCINFO["sorted_in"] = save_sorted
1 
1    * As already mentioned, the default array traversal order is
1      represented by '"@unsorted"'.  You can also get the default
1      behavior by assigning the null string to 'PROCINFO["sorted_in"]' or
1      by just deleting the '"sorted_in"' element from the 'PROCINFO'
1      array with the 'delete' statement.  (The 'delete' statement hasn't
1      been described yet; ⇒Delete.)
1 
1    In addition, 'gawk' provides built-in functions for sorting arrays;
1 see ⇒Array Sorting Functions.
1 
1    ---------- Footnotes ----------
1 
1    (1) When two elements compare as equal, the C 'qsort()' function does
1 not guarantee that they will maintain their original relative order
1 after sorting.  Using the string value to provide a unique ordering when
1 the numeric values are equal ensures that 'gawk' behaves consistently
1 across different environments.
1