gawk: Scanning an Array
1
1 8.1.5 Scanning All Elements of an Array
1 ---------------------------------------
1
1 In programs that use arrays, it is often necessary to use a loop that
1 executes once for each element of an array. In other languages, where
1 arrays are contiguous and indices are limited to nonnegative integers,
1 this is easy: all the valid indices can be found by counting from the
1 lowest index up to the highest. This technique won't do the job in
1 'awk', because any number or string can be an array index. So 'awk' has
1 a special kind of 'for' statement for scanning an array:
1
1 for (VAR in ARRAY)
1 BODY
1
1 This loop executes BODY once for each index in ARRAY that the program
1 has previously used, with the variable VAR set to that index.
1
1 The following program uses this form of the 'for' statement. The
1 first rule scans the input records and notes which words appear (at
1 least once) in the input, by storing a one into the array 'used' with
1 the word as the index. The second rule scans the elements of 'used' to
1 find all the distinct words that appear in the input. It prints each
1 word that is more than 10 characters long and also prints the number of
1 such words. ⇒String Functions for more information on the
1 built-in function 'length()'.
1
1 # Record a 1 for each word that is used at least once
1 {
1 for (i = 1; i <= NF; i++)
1 used[$i] = 1
1 }
1
1 # Find number of distinct words more than 10 characters long
1 END {
1 for (x in used) {
1 if (length(x) > 10) {
1 ++num_long_words
1 print x
1 }
1 }
1 print num_long_words, "words longer than 10 characters"
1 }
1
1 ⇒Word Sorting for a more detailed example of this type.
1
1 The order in which elements of the array are accessed by this
1 statement is determined by the internal arrangement of the array
1 elements within 'awk' and in standard 'awk' cannot be controlled or
1 changed. This can lead to problems if new elements are added to ARRAY
1 by statements in the loop body; it is not predictable whether the 'for'
1 loop will reach them. Similarly, changing VAR inside the loop may
1 produce strange results. It is best to avoid such things.
1
1 As a point of information, 'gawk' sets up the list of elements to be
1 iterated over before the loop starts, and does not change it. But not
1 all 'awk' versions do so. Consider this program, named 'loopcheck.awk':
1
1 BEGIN {
1 a["here"] = "here"
1 a["is"] = "is"
1 a["a"] = "a"
1 a["loop"] = "loop"
1 for (i in a) {
1 j++
1 a[j] = j
1 print i
1 }
1 }
1
1 Here is what happens when run with 'gawk' (and 'mawk'):
1
1 $ gawk -f loopcheck.awk
1 -| here
1 -| loop
1 -| a
1 -| is
1
1 Contrast this to BWK 'awk':
1
1 $ nawk -f loopcheck.awk
1 -| loop
1 -| here
1 -| is
1 -| a
1 -| 1
1