gawk: Array Intro

1 
1 8.1.1 Introduction to Arrays
1 ----------------------------
1 
1      Doing linear scans over an associative array is like trying to club
1      someone to death with a loaded Uzi.
1                             -- _Larry Wall_
1 
1    The 'awk' language provides one-dimensional arrays for storing groups
1 of related strings or numbers.  Every 'awk' array must have a name.
1 Array names have the same syntax as variable names; any valid variable
1 name would also be a valid array name.  But one name cannot be used in
1 both ways (as an array and as a variable) in the same 'awk' program.
1 
1    Arrays in 'awk' superficially resemble arrays in other programming
1 languages, but there are fundamental differences.  In 'awk', it isn't
1 necessary to specify the size of an array before starting to use it.
1 Additionally, any number or string, not just consecutive integers, may
1 be used as an array index.
1 
1    In most other languages, arrays must be "declared" before use,
1 including a specification of how many elements or components they
1 contain.  In such languages, the declaration causes a contiguous block
1 of memory to be allocated for that many elements.  Usually, an index in
1 the array must be a nonnegative integer.  For example, the index zero
1 specifies the first element in the array, which is actually stored at
1 the beginning of the block of memory.  Index one specifies the second
1 element, which is stored in memory right after the first element, and so
1 on.  It is impossible to add more elements to the array, because it has
1 room only for as many elements as given in the declaration.  (Some
1 languages allow arbitrary starting and ending indices--e.g., '15 ..
1 27'--but the size of the array is still fixed when the array is
1 declared.)
1 
11    A contiguous array of four elements might look like ⇒(gawk)conceptually conceptually, if the element values are eight,
1 '"foo"', '""', and 30.
1 
1 [image src="array-elements.png" alt="A Contiguous Array" text="+---------+---------+--------+---------+
1 |    8    |  \"foo\"  |   \"\"   |    30   |    Value
1 +---------+---------+--------+---------+
1      0         1         2         3        Index"]
1 
1 Figure 8.1: A contiguous array
1 
1 Only the values are stored; the indices are implicit from the order of
1 the values.  Here, eight is the value at index zero, because eight
1 appears in the position with zero elements before it.
1 
1    Arrays in 'awk' are different--they are "associative".  This means
1 that each array is a collection of pairs--an index and its corresponding
1 array element value:
1 
1         Index   Value
1 ------------------------
1         '3'     '30'
1         '1'     '"foo"'
1         '0'     '8'
1         '2'     '""'
1 
1 The pairs are shown in jumbled order because their order is
1 irrelevant.(1)
1 
1    One advantage of associative arrays is that new pairs can be added at
1 any time.  For example, suppose a tenth element is added to the array
1 whose value is '"number ten"'.  The result is:
1 
1         Index   Value
1 -------------------------------
1         '10'    '"number
1                 ten"'
1         '3'     '30'
1         '1'     '"foo"'
1         '0'     '8'
1         '2'     '""'
1 
1 Now the array is "sparse", which just means some indices are missing.
1 It has elements 0-3 and 10, but doesn't have elements 4, 5, 6, 7, 8, or
1 9.
1 
1    Another consequence of associative arrays is that the indices don't
1 have to be nonnegative integers.  Any number, or even a string, can be
1 an index.  For example, the following is an array that translates words
1 from English to French:
1 
1         Index   Value
1 ------------------------
1         '"dog"' '"chien"'
1         '"cat"' '"chat"'
1         '"one"' '"un"'
1         '1'     '"un"'
1 
1 Here we decided to translate the number one in both spelled-out and
1 numeric form--thus illustrating that a single array can have both
1 numbers and strings as indices.  (In fact, array subscripts are always
1 strings.  There are some subtleties to how numbers work when used as
11 array subscripts; this is discussed in more detail in ⇒Numeric
 Array Subscripts.)  Here, the number '1' isn't double-quoted, because
1 'awk' automatically converts it to a string.
1 
1    The value of 'IGNORECASE' has no effect upon array subscripting.  The
1 identical string value used to store an array element must be used to
1 retrieve it.  When 'awk' creates an array (e.g., with the 'split()'
1 built-in function), that array's indices are consecutive integers
1 starting at one.  (⇒String Functions.)
1 
1    'awk''s arrays are efficient--the time to access an element is
1 independent of the number of elements in the array.
1 
1    ---------- Footnotes ----------
1 
1    (1) The ordering will vary among 'awk' implementations, which
1 typically use hash tables to store array elements and values.
1