gawk: Variable Typing
1
1 6.3.2.1 String Type versus Numeric Type
1 .......................................
1
1 Scalar objects in 'awk' (variables, array elements, and fields) are
1 _dynamically_ typed. This means their type can change as the program
1 runs, from "untyped" before any use,(1) to string or number, and then
1 from string to number or number to string, as the program progresses.
1 ('gawk' also provides regexp-typed scalars, but let's ignore that for
1 now; ⇒Strong Regexp Constants.)
1
1 You can't do much with untyped variables, other than tell that they
1 are untyped. The following program tests 'a' against '""' and '0'; the
1 test succeeds when 'a' has never been assigned a value. It also uses
1 Functions::) to show 'a''s type:
1
1 $ gawk 'BEGIN { print (a == "" && a == 0 ?
1 > "a is untyped" : "a has a type!") ; print typeof(a) }'
1 -| a is untyped
1 -| unassigned
1
1 A scalar has numeric type when assigned a numeric value, such as from
1 a numeric constant, or from another scalar with numeric type:
1
1 $ gawk 'BEGIN { a = 42 ; print typeof(a)
1 > b = a ; print typeof(b) }'
1 number
1 number
1
1 Similarly, a scalar has string type when assigned a string value,
1 such as from a string constant, or from another scalar with string type:
1
1 $ gawk 'BEGIN { a = "forty two" ; print typeof(a)
1 > b = a ; print typeof(b) }'
1 string
1 string
1
1 So far, this is all simple and straightforward. What happens,
1 though, when 'awk' has to process data from a user? Let's start with
1 field data. What should the following command produce as output?
1
1 echo hello | awk '{ printf("%s %s < 42\n", $1,
1 ($1 < 42 ? "is" : "is not")) }'
1
1 Since 'hello' is alphabetic data, 'awk' can only do a string comparison.
1 Internally, it converts '42' into '"42"' and compares the two string
1 values '"hello"' and '"42"'. Here's the result:
1
1 $ echo hello | awk '{ printf("%s %s < 42\n", $1,
1 > ($1 < 42 ? "is" : "is not")) }'
1 -| hello is not < 42
1
1 However, what happens when data from a user _looks like_ a number?
1 On the one hand, in reality, the input data consists of characters, not
1 binary numeric values. But, on the other hand, the data looks numeric,
1 and 'awk' really ought to treat it as such. And indeed, it does:
1
1 $ echo 37 | awk '{ printf("%s %s < 42\n", $1,
1 > ($1 < 42 ? "is" : "is not")) }'
1 -| 37 is < 42
1
1 Here are the rules for when 'awk' treats data as a number, and for
1 when it treats data as a string.
1
1 The POSIX standard uses the term "numeric string" for input data that
1 looks numeric. The '37' in the previous example is a numeric string.
1 So what is the type of a numeric string? Answer: numeric.
1
1 The type of a variable is important because the types of two
1 variables determine how they are compared. Variable typing follows
1 these definitions and rules:
1
1 * A numeric constant or the result of a numeric operation has the
1 "numeric" attribute.
1
1 * A string constant or the result of a string operation has the
1 "string" attribute.
1
1 * Fields, 'getline' input, 'FILENAME', 'ARGV' elements, 'ENVIRON'
1 elements, and the elements of an array created by 'match()',
1 'split()', and 'patsplit()' that are numeric strings have the
1 "strnum" attribute.(2) Otherwise, they have the "string"
1 attribute. Uninitialized variables also have the "strnum"
1 attribute.
1
1 * Attributes propagate across assignments but are not changed by any
1 use.
1
1 The last rule is particularly important. In the following program,
1 'a' has numeric type, even though it is later used in a string
1 operation:
1
1 BEGIN {
1 a = 12.345
1 b = a " is a cute number"
1 print b
1 }
1
1 When two operands are compared, either string comparison or numeric
1 comparison may be used. This depends upon the attributes of the
1 operands, according to the following symmetric matrix:
1
1 +----------------------------------------------
1 | STRING NUMERIC STRNUM
1 --------+----------------------------------------------
1 |
1 STRING | string string string
1 |
1 NUMERIC | string numeric numeric
1 |
1 STRNUM | string numeric numeric
1 --------+----------------------------------------------
1
1 The basic idea is that user input that looks numeric--and _only_ user
1 input--should be treated as numeric, even though it is actually made of
1 characters and is therefore also a string. Thus, for example, the
1 string constant '" +3.14"', when it appears in program source code, is a
1 string--even though it looks numeric--and is _never_ treated as a number
1 for comparison purposes.
1
1 In short, when one operand is a "pure" string, such as a string
1 constant, then a string comparison is performed. Otherwise, a numeric
1 comparison is performed. (The primary difference between a number and a
1 strnum is that for strnums 'gawk' preserves the original string value
1 that the scalar had when it came in.)
1
1 This point bears additional emphasis: Input that looks numeric _is_
1 numeric. All other input is treated as strings.
1
1 Thus, the six-character input string ' +3.14' receives the strnum
1 attribute. In contrast, the eight characters '" +3.14"' appearing in
1 program text comprise a string constant. The following examples print
1 '1' when the comparison between the two different constants is true, and
1 '0' otherwise:
1
1 $ echo ' +3.14' | awk '{ print($0 == " +3.14") }' True
1 -| 1
1 $ echo ' +3.14' | awk '{ print($0 == "+3.14") }' False
1 -| 0
1 $ echo ' +3.14' | awk '{ print($0 == "3.14") }' False
1 -| 0
1 $ echo ' +3.14' | awk '{ print($0 == 3.14) }' True
1 -| 1
1 $ echo ' +3.14' | awk '{ print($1 == " +3.14") }' False
1 -| 0
1 $ echo ' +3.14' | awk '{ print($1 == "+3.14") }' True
1 -| 1
1 $ echo ' +3.14' | awk '{ print($1 == "3.14") }' False
1 -| 0
1 $ echo ' +3.14' | awk '{ print($1 == 3.14) }' True
1 -| 1
1
1 You can see the type of an input field (or other user input) using
1 'typeof()':
1
1 $ echo hello 37 | gawk '{ print typeof($1), typeof($2) }'
1 -| string strnum
1
1 ---------- Footnotes ----------
1
1 (1) 'gawk' calls this "unassigned", as the following example shows.
1
1 (2) Thus, a POSIX numeric string and 'gawk''s strnum are the same
1 thing.
1