gawk: POSIX Floating Point Problems
1
1 15.7 Standards Versus Existing Practice
1 =======================================
1
1 Historically, 'awk' has converted any nonnumeric-looking string to the
1 numeric value zero, when required. Furthermore, the original definition
1 of the language and the original POSIX standards specified that 'awk'
1 only understands decimal numbers (base 10), and not octal (base 8) or
1 hexadecimal numbers (base 16).
1
1 Changes in the language of the 2001 and 2004 POSIX standards can be
1 interpreted to imply that 'awk' should support additional features.
1 These features are:
1
1 * Interpretation of floating-point data values specified in
1 hexadecimal notation (e.g., '0xDEADBEEF'). (Note: data values,
1 _not_ source code constants.)
1
1 * Support for the special IEEE 754 floating-point values "not a
1 number" (NaN), positive infinity ("inf"), and negative infinity
1 ("-inf"). In particular, the format for these values is as
1 specified by the ISO 1999 C standard, which ignores case and can
1 allow implementation-dependent additional characters after the
1 'nan' and allow either 'inf' or 'infinity'.
1
1 The first problem is that both of these are clear changes to
1 historical practice:
1
1 * The 'gawk' maintainer feels that supporting hexadecimal
1 floating-point values, in particular, is ugly, and was never
1 intended by the original designers to be part of the language.
1
1 * Allowing completely alphabetic strings to have valid numeric values
1 is also a very severe departure from historical practice.
1
1 The second problem is that the 'gawk' maintainer feels that this
1 interpretation of the standard, which required a certain amount of
1 "language lawyering" to arrive at in the first place, was not even
1 intended by the standard developers. In other words, "We see how you
1 got where you are, but we don't think that that's where you want to be."
1
1 Recognizing these issues, but attempting to provide compatibility
1 with the earlier versions of the standard, the 2008 POSIX standard added
1 explicit wording to allow, but not require, that 'awk' support
1 hexadecimal floating-point values and special values for "not a number"
1 and infinity.
1
1 Although the 'gawk' maintainer continues to feel that providing those
1 features is inadvisable, nevertheless, on systems that support IEEE
1 floating point, it seems reasonable to provide _some_ way to support NaN
1 and infinity values. The solution implemented in 'gawk' is as follows:
1
1 * With the '--posix' command-line option, 'gawk' becomes "hands off."
1 String values are passed directly to the system library's
1 'strtod()' function, and if it successfully returns a numeric
1 value, that is what's used.(1) By definition, the results are not
1 portable across different systems. They are also a little
1 surprising:
1
1 $ echo nanny | gawk --posix '{ print $1 + 0 }'
1 -| nan
1 $ echo 0xDeadBeef | gawk --posix '{ print $1 + 0 }'
1 -| 3735928559
1
1 * Without '--posix', 'gawk' interprets the four string values '+inf',
1 '-inf', '+nan', and '-nan' specially, producing the corresponding
1 special numeric values. The leading sign acts a signal to 'gawk'
1 (and the user) that the value is really numeric. Hexadecimal
1 floating point is not supported (unless you also use
1 '--non-decimal-data', which is _not_ recommended). For example:
1
1 $ echo nanny | gawk '{ print $1 + 0 }'
1 -| 0
1 $ echo +nan | gawk '{ print $1 + 0 }'
1 -| nan
1 $ echo 0xDeadBeef | gawk '{ print $1 + 0 }'
1 -| 0
1
1 'gawk' ignores case in the four special values. Thus, '+nan' and
1 '+NaN' are the same.
1
1 ---------- Footnotes ----------
1
1 (1) You asked for it, you got it.
1