gawk: Controlling Scanning
1
1 8.1.6 Using Predefined Array Scanning Orders with 'gawk'
1 --------------------------------------------------------
1
1 This node describes a feature that is specific to 'gawk'.
1
1 By default, when a 'for' loop traverses an array, the order is
1 undefined, meaning that the 'awk' implementation determines the order in
1 which the array is traversed. This order is usually based on the
1 internal implementation of arrays and will vary from one version of
1 'awk' to the next.
1
1 Often, though, you may wish to do something simple, such as "traverse
1 the array by comparing the indices in ascending order," or "traverse the
1 array by comparing the values in descending order." 'gawk' provides two
1 mechanisms that give you this control:
1
1 * Set 'PROCINFO["sorted_in"]' to one of a set of predefined values.
1 We describe this now.
1
1 * Set 'PROCINFO["sorted_in"]' to the name of a user-defined function
1 to use for comparison of array elements. This advanced feature is
1 described later in ⇒Array Sorting.
1
1 The following special values for 'PROCINFO["sorted_in"]' are
1 available:
1
1 '"@unsorted"'
1 Array elements are processed in arbitrary order, which is the
1 default 'awk' behavior.
1
1 '"@ind_str_asc"'
1 Order by indices in ascending order compared as strings; this is
1 the most basic sort. (Internally, array indices are always
1 strings, so with 'a[2*5] = 1' the index is '"10"' rather than
1 numeric 10.)
1
1 '"@ind_num_asc"'
1 Order by indices in ascending order but force them to be treated as
1 numbers in the process. Any index with a non-numeric value will
1 end up positioned as if it were zero.
1
1 '"@val_type_asc"'
1 Order by element values in ascending order (rather than by
11 indices). Ordering is by the type assigned to the element (⇒
Typing and Comparison). All numeric values come before all
1 string values, which in turn come before all subarrays. (Subarrays
1 have not been described yet; ⇒Arrays of Arrays.)
1
1 '"@val_str_asc"'
1 Order by element values in ascending order (rather than by
1 indices). Scalar values are compared as strings. Subarrays, if
1 present, come out last.
1
1 '"@val_num_asc"'
1 Order by element values in ascending order (rather than by
1 indices). Scalar values are compared as numbers. Subarrays, if
1 present, come out last. When numeric values are equal, the string
1 values are used to provide an ordering: this guarantees consistent
1 results across different versions of the C 'qsort()' function,(1)
1 which 'gawk' uses internally to perform the sorting.
1
1 '"@ind_str_desc"'
1 Like '"@ind_str_asc"', but the string indices are ordered from high
1 to low.
1
1 '"@ind_num_desc"'
1 Like '"@ind_num_asc"', but the numeric indices are ordered from
1 high to low.
1
1 '"@val_type_desc"'
1 Like '"@val_type_asc"', but the element values, based on type, are
1 ordered from high to low. Subarrays, if present, come out first.
1
1 '"@val_str_desc"'
1 Like '"@val_str_asc"', but the element values, treated as strings,
1 are ordered from high to low. Subarrays, if present, come out
1 first.
1
1 '"@val_num_desc"'
1 Like '"@val_num_asc"', but the element values, treated as numbers,
1 are ordered from high to low. Subarrays, if present, come out
1 first.
1
1 The array traversal order is determined before the 'for' loop starts
1 to run. Changing 'PROCINFO["sorted_in"]' in the loop body does not
1 affect the loop. For example:
1
1 $ gawk '
1 > BEGIN {
1 > a[4] = 4
1 > a[3] = 3
1 > for (i in a)
1 > print i, a[i]
1 > }'
1 -| 4 4
1 -| 3 3
1 $ gawk '
1 > BEGIN {
1 > PROCINFO["sorted_in"] = "@ind_str_asc"
1 > a[4] = 4
1 > a[3] = 3
1 > for (i in a)
1 > print i, a[i]
1 > }'
1 -| 3 3
1 -| 4 4
1
1 When sorting an array by element values, if a value happens to be a
1 subarray then it is considered to be greater than any string or numeric
1 value, regardless of what the subarray itself contains, and all
1 subarrays are treated as being equal to each other. Their order
1 relative to each other is determined by their index strings.
1
1 Here are some additional things to bear in mind about sorted array
1 traversal:
1
1 * The value of 'PROCINFO["sorted_in"]' is global. That is, it
1 affects all array traversal 'for' loops. If you need to change it
1 within your own code, you should see if it's defined and save and
1 restore the value:
1
1 ...
1 if ("sorted_in" in PROCINFO) {
1 save_sorted = PROCINFO["sorted_in"]
1 PROCINFO["sorted_in"] = "@val_str_desc" # or whatever
1 }
1 ...
1 if (save_sorted)
1 PROCINFO["sorted_in"] = save_sorted
1
1 * As already mentioned, the default array traversal order is
1 represented by '"@unsorted"'. You can also get the default
1 behavior by assigning the null string to 'PROCINFO["sorted_in"]' or
1 by just deleting the '"sorted_in"' element from the 'PROCINFO'
1 array with the 'delete' statement. (The 'delete' statement hasn't
1 been described yet; ⇒Delete.)
1
1 In addition, 'gawk' provides built-in functions for sorting arrays;
1 see ⇒Array Sorting Functions.
1
1 ---------- Footnotes ----------
1
1 (1) When two elements compare as equal, the C 'qsort()' function does
1 not guarantee that they will maintain their original relative order
1 after sorting. Using the string value to provide a unique ordering when
1 the numeric values are equal ensures that 'gawk' behaves consistently
1 across different environments.
1