gawk: Variable Typing

1 
1 6.3.2.1 String Type versus Numeric Type
1 .......................................
1 
1 Scalar objects in 'awk' (variables, array elements, and fields) are
1 _dynamically_ typed.  This means their type can change as the program
1 runs, from "untyped" before any use,(1) to string or number, and then
1 from string to number or number to string, as the program progresses.
1 ('gawk' also provides regexp-typed scalars, but let's ignore that for
1 now; ⇒Strong Regexp Constants.)
1 
1    You can't do much with untyped variables, other than tell that they
1 are untyped.  The following program tests 'a' against '""' and '0'; the
1 test succeeds when 'a' has never been assigned a value.  It also uses
1 Functions::) to show 'a''s type:
1 
1      $ gawk 'BEGIN { print (a == "" && a == 0 ?
1      > "a is untyped" : "a has a type!") ; print typeof(a) }'
1      -| a is untyped
1      -| unassigned
1 
1    A scalar has numeric type when assigned a numeric value, such as from
1 a numeric constant, or from another scalar with numeric type:
1 
1      $ gawk 'BEGIN { a = 42 ; print typeof(a)
1      > b = a ; print typeof(b) }'
1      number
1      number
1 
1    Similarly, a scalar has string type when assigned a string value,
1 such as from a string constant, or from another scalar with string type:
1 
1      $ gawk 'BEGIN { a = "forty two" ; print typeof(a)
1      > b = a ; print typeof(b) }'
1      string
1      string
1 
1    So far, this is all simple and straightforward.  What happens,
1 though, when 'awk' has to process data from a user?  Let's start with
1 field data.  What should the following command produce as output?
1 
1      echo hello | awk '{ printf("%s %s < 42\n", $1,
1                                 ($1 < 42 ? "is" : "is not")) }'
1 
1 Since 'hello' is alphabetic data, 'awk' can only do a string comparison.
1 Internally, it converts '42' into '"42"' and compares the two string
1 values '"hello"' and '"42"'.  Here's the result:
1 
1      $ echo hello | awk '{ printf("%s %s < 42\n", $1,
1      >                            ($1 < 42 ? "is" : "is not")) }'
1      -| hello is not < 42
1 
1    However, what happens when data from a user _looks like_ a number?
1 On the one hand, in reality, the input data consists of characters, not
1 binary numeric values.  But, on the other hand, the data looks numeric,
1 and 'awk' really ought to treat it as such.  And indeed, it does:
1 
1      $ echo 37 | awk '{ printf("%s %s < 42\n", $1,
1      >                         ($1 < 42 ? "is" : "is not")) }'
1      -| 37 is < 42
1 
1    Here are the rules for when 'awk' treats data as a number, and for
1 when it treats data as a string.
1 
1    The POSIX standard uses the term "numeric string" for input data that
1 looks numeric.  The '37' in the previous example is a numeric string.
1 So what is the type of a numeric string?  Answer: numeric.
1 
1    The type of a variable is important because the types of two
1 variables determine how they are compared.  Variable typing follows
1 these definitions and rules:
1 
1    * A numeric constant or the result of a numeric operation has the
1      "numeric" attribute.
1 
1    * A string constant or the result of a string operation has the
1      "string" attribute.
1 
1    * Fields, 'getline' input, 'FILENAME', 'ARGV' elements, 'ENVIRON'
1      elements, and the elements of an array created by 'match()',
1      'split()', and 'patsplit()' that are numeric strings have the
1      "strnum" attribute.(2)  Otherwise, they have the "string"
1      attribute.  Uninitialized variables also have the "strnum"
1      attribute.
1 
1    * Attributes propagate across assignments but are not changed by any
1      use.
1 
1    The last rule is particularly important.  In the following program,
1 'a' has numeric type, even though it is later used in a string
1 operation:
1 
1      BEGIN {
1           a = 12.345
1           b = a " is a cute number"
1           print b
1      }
1 
1    When two operands are compared, either string comparison or numeric
1 comparison may be used.  This depends upon the attributes of the
1 operands, according to the following symmetric matrix:
1 
1         +----------------------------------------------
1         |       STRING          NUMERIC         STRNUM
1 --------+----------------------------------------------
1         |
1 STRING  |       string          string          string
1         |
1 NUMERIC |       string          numeric         numeric
1         |
1 STRNUM  |       string          numeric         numeric
1 --------+----------------------------------------------
1 
1    The basic idea is that user input that looks numeric--and _only_ user
1 input--should be treated as numeric, even though it is actually made of
1 characters and is therefore also a string.  Thus, for example, the
1 string constant '" +3.14"', when it appears in program source code, is a
1 string--even though it looks numeric--and is _never_ treated as a number
1 for comparison purposes.
1 
1    In short, when one operand is a "pure" string, such as a string
1 constant, then a string comparison is performed.  Otherwise, a numeric
1 comparison is performed.  (The primary difference between a number and a
1 strnum is that for strnums 'gawk' preserves the original string value
1 that the scalar had when it came in.)
1 
1    This point bears additional emphasis: Input that looks numeric _is_
1 numeric.  All other input is treated as strings.
1 
1    Thus, the six-character input string ' +3.14' receives the strnum
1 attribute.  In contrast, the eight characters '" +3.14"' appearing in
1 program text comprise a string constant.  The following examples print
1 '1' when the comparison between the two different constants is true, and
1 '0' otherwise:
1 
1      $ echo ' +3.14' | awk '{ print($0 == " +3.14") }'    True
1      -| 1
1      $ echo ' +3.14' | awk '{ print($0 == "+3.14") }'     False
1      -| 0
1      $ echo ' +3.14' | awk '{ print($0 == "3.14") }'      False
1      -| 0
1      $ echo ' +3.14' | awk '{ print($0 == 3.14) }'        True
1      -| 1
1      $ echo ' +3.14' | awk '{ print($1 == " +3.14") }'    False
1      -| 0
1      $ echo ' +3.14' | awk '{ print($1 == "+3.14") }'     True
1      -| 1
1      $ echo ' +3.14' | awk '{ print($1 == "3.14") }'      False
1      -| 0
1      $ echo ' +3.14' | awk '{ print($1 == 3.14) }'        True
1      -| 1
1 
1    You can see the type of an input field (or other user input) using
1 'typeof()':
1 
1      $ echo hello 37 | gawk '{ print typeof($1), typeof($2) }'
1      -| string strnum
1 
1    ---------- Footnotes ----------
1 
1    (1) 'gawk' calls this "unassigned", as the following example shows.
1 
1    (2) Thus, a POSIX numeric string and 'gawk''s strnum are the same
1 thing.
1