gawk: Array Intro
1
1 8.1.1 Introduction to Arrays
1 ----------------------------
1
1 Doing linear scans over an associative array is like trying to club
1 someone to death with a loaded Uzi.
1 -- _Larry Wall_
1
1 The 'awk' language provides one-dimensional arrays for storing groups
1 of related strings or numbers. Every 'awk' array must have a name.
1 Array names have the same syntax as variable names; any valid variable
1 name would also be a valid array name. But one name cannot be used in
1 both ways (as an array and as a variable) in the same 'awk' program.
1
1 Arrays in 'awk' superficially resemble arrays in other programming
1 languages, but there are fundamental differences. In 'awk', it isn't
1 necessary to specify the size of an array before starting to use it.
1 Additionally, any number or string, not just consecutive integers, may
1 be used as an array index.
1
1 In most other languages, arrays must be "declared" before use,
1 including a specification of how many elements or components they
1 contain. In such languages, the declaration causes a contiguous block
1 of memory to be allocated for that many elements. Usually, an index in
1 the array must be a nonnegative integer. For example, the index zero
1 specifies the first element in the array, which is actually stored at
1 the beginning of the block of memory. Index one specifies the second
1 element, which is stored in memory right after the first element, and so
1 on. It is impossible to add more elements to the array, because it has
1 room only for as many elements as given in the declaration. (Some
1 languages allow arbitrary starting and ending indices--e.g., '15 ..
1 27'--but the size of the array is still fixed when the array is
1 declared.)
1
11 A contiguous array of four elements might look like ⇒(gawk)conceptually conceptually, if the element values are eight,
1 '"foo"', '""', and 30.
1
1 [image src="array-elements.png" alt="A Contiguous Array" text="+---------+---------+--------+---------+
1 | 8 | \"foo\" | \"\" | 30 | Value
1 +---------+---------+--------+---------+
1 0 1 2 3 Index" ]
1
1 Figure 8.1: A contiguous array
1
1 Only the values are stored; the indices are implicit from the order of
1 the values. Here, eight is the value at index zero, because eight
1 appears in the position with zero elements before it.
1
1 Arrays in 'awk' are different--they are "associative". This means
1 that each array is a collection of pairs--an index and its corresponding
1 array element value:
1
1 Index Value
1 ------------------------
1 '3' '30'
1 '1' '"foo"'
1 '0' '8'
1 '2' '""'
1
1 The pairs are shown in jumbled order because their order is
1 irrelevant.(1)
1
1 One advantage of associative arrays is that new pairs can be added at
1 any time. For example, suppose a tenth element is added to the array
1 whose value is '"number ten"'. The result is:
1
1 Index Value
1 -------------------------------
1 '10' '"number
1 ten"'
1 '3' '30'
1 '1' '"foo"'
1 '0' '8'
1 '2' '""'
1
1 Now the array is "sparse", which just means some indices are missing.
1 It has elements 0-3 and 10, but doesn't have elements 4, 5, 6, 7, 8, or
1 9.
1
1 Another consequence of associative arrays is that the indices don't
1 have to be nonnegative integers. Any number, or even a string, can be
1 an index. For example, the following is an array that translates words
1 from English to French:
1
1 Index Value
1 ------------------------
1 '"dog"' '"chien"'
1 '"cat"' '"chat"'
1 '"one"' '"un"'
1 '1' '"un"'
1
1 Here we decided to translate the number one in both spelled-out and
1 numeric form--thus illustrating that a single array can have both
1 numbers and strings as indices. (In fact, array subscripts are always
1 strings. There are some subtleties to how numbers work when used as
11 array subscripts; this is discussed in more detail in ⇒Numeric
Array Subscripts.) Here, the number '1' isn't double-quoted, because
1 'awk' automatically converts it to a string.
1
1 The value of 'IGNORECASE' has no effect upon array subscripting. The
1 identical string value used to store an array element must be used to
1 retrieve it. When 'awk' creates an array (e.g., with the 'split()'
1 built-in function), that array's indices are consecutive integers
1 starting at one. (⇒String Functions.)
1
1 'awk''s arrays are efficient--the time to access an element is
1 independent of the number of elements in the array.
1
1 ---------- Footnotes ----------
1
1 (1) The ordering will vary among 'awk' implementations, which
1 typically use hash tables to store array elements and values.
1