gawk: Scanning an Array

1 
1 8.1.5 Scanning All Elements of an Array
1 ---------------------------------------
1 
1 In programs that use arrays, it is often necessary to use a loop that
1 executes once for each element of an array.  In other languages, where
1 arrays are contiguous and indices are limited to nonnegative integers,
1 this is easy: all the valid indices can be found by counting from the
1 lowest index up to the highest.  This technique won't do the job in
1 'awk', because any number or string can be an array index.  So 'awk' has
1 a special kind of 'for' statement for scanning an array:
1 
1      for (VAR in ARRAY)
1          BODY
1 
1 This loop executes BODY once for each index in ARRAY that the program
1 has previously used, with the variable VAR set to that index.
1 
1    The following program uses this form of the 'for' statement.  The
1 first rule scans the input records and notes which words appear (at
1 least once) in the input, by storing a one into the array 'used' with
1 the word as the index.  The second rule scans the elements of 'used' to
1 find all the distinct words that appear in the input.  It prints each
1 word that is more than 10 characters long and also prints the number of
1 such words.  ⇒String Functions for more information on the
1 built-in function 'length()'.
1 
1      # Record a 1 for each word that is used at least once
1      {
1          for (i = 1; i <= NF; i++)
1              used[$i] = 1
1      }
1 
1      # Find number of distinct words more than 10 characters long
1      END {
1          for (x in used) {
1              if (length(x) > 10) {
1                  ++num_long_words
1                  print x
1              }
1          }
1          print num_long_words, "words longer than 10 characters"
1      }
1 
1 ⇒Word Sorting for a more detailed example of this type.
1 
1    The order in which elements of the array are accessed by this
1 statement is determined by the internal arrangement of the array
1 elements within 'awk' and in standard 'awk' cannot be controlled or
1 changed.  This can lead to problems if new elements are added to ARRAY
1 by statements in the loop body; it is not predictable whether the 'for'
1 loop will reach them.  Similarly, changing VAR inside the loop may
1 produce strange results.  It is best to avoid such things.
1 
1    As a point of information, 'gawk' sets up the list of elements to be
1 iterated over before the loop starts, and does not change it.  But not
1 all 'awk' versions do so.  Consider this program, named 'loopcheck.awk':
1 
1      BEGIN {
1          a["here"] = "here"
1          a["is"] = "is"
1          a["a"] = "a"
1          a["loop"] = "loop"
1          for (i in a) {
1              j++
1              a[j] = j
1              print i
1          }
1      }
1 
1    Here is what happens when run with 'gawk' (and 'mawk'):
1 
1      $ gawk -f loopcheck.awk
1      -| here
1      -| loop
1      -| a
1      -| is
1 
1    Contrast this to BWK 'awk':
1 
1      $ nawk -f loopcheck.awk
1      -| loop
1      -| here
1      -| is
1      -| a
1      -| 1
1