gawk: POSIX Floating Point Problems

1 
1 15.7 Standards Versus Existing Practice
1 =======================================
1 
1 Historically, 'awk' has converted any nonnumeric-looking string to the
1 numeric value zero, when required.  Furthermore, the original definition
1 of the language and the original POSIX standards specified that 'awk'
1 only understands decimal numbers (base 10), and not octal (base 8) or
1 hexadecimal numbers (base 16).
1 
1    Changes in the language of the 2001 and 2004 POSIX standards can be
1 interpreted to imply that 'awk' should support additional features.
1 These features are:
1 
1    * Interpretation of floating-point data values specified in
1      hexadecimal notation (e.g., '0xDEADBEEF').  (Note: data values,
1      _not_ source code constants.)
1 
1    * Support for the special IEEE 754 floating-point values "not a
1      number" (NaN), positive infinity ("inf"), and negative infinity
1      ("-inf").  In particular, the format for these values is as
1      specified by the ISO 1999 C standard, which ignores case and can
1      allow implementation-dependent additional characters after the
1      'nan' and allow either 'inf' or 'infinity'.
1 
1    The first problem is that both of these are clear changes to
1 historical practice:
1 
1    * The 'gawk' maintainer feels that supporting hexadecimal
1      floating-point values, in particular, is ugly, and was never
1      intended by the original designers to be part of the language.
1 
1    * Allowing completely alphabetic strings to have valid numeric values
1      is also a very severe departure from historical practice.
1 
1    The second problem is that the 'gawk' maintainer feels that this
1 interpretation of the standard, which required a certain amount of
1 "language lawyering" to arrive at in the first place, was not even
1 intended by the standard developers.  In other words, "We see how you
1 got where you are, but we don't think that that's where you want to be."
1 
1    Recognizing these issues, but attempting to provide compatibility
1 with the earlier versions of the standard, the 2008 POSIX standard added
1 explicit wording to allow, but not require, that 'awk' support
1 hexadecimal floating-point values and special values for "not a number"
1 and infinity.
1 
1    Although the 'gawk' maintainer continues to feel that providing those
1 features is inadvisable, nevertheless, on systems that support IEEE
1 floating point, it seems reasonable to provide _some_ way to support NaN
1 and infinity values.  The solution implemented in 'gawk' is as follows:
1 
1    * With the '--posix' command-line option, 'gawk' becomes "hands off."
1      String values are passed directly to the system library's
1      'strtod()' function, and if it successfully returns a numeric
1      value, that is what's used.(1)  By definition, the results are not
1      portable across different systems.  They are also a little
1      surprising:
1 
1           $ echo nanny | gawk --posix '{ print $1 + 0 }'
1           -| nan
1           $ echo 0xDeadBeef | gawk --posix '{ print $1 + 0 }'
1           -| 3735928559
1 
1    * Without '--posix', 'gawk' interprets the four string values '+inf',
1      '-inf', '+nan', and '-nan' specially, producing the corresponding
1      special numeric values.  The leading sign acts a signal to 'gawk'
1      (and the user) that the value is really numeric.  Hexadecimal
1      floating point is not supported (unless you also use
1      '--non-decimal-data', which is _not_ recommended).  For example:
1 
1           $ echo nanny | gawk '{ print $1 + 0 }'
1           -| 0
1           $ echo +nan | gawk '{ print $1 + 0 }'
1           -| nan
1           $ echo 0xDeadBeef | gawk '{ print $1 + 0 }'
1           -| 0
1 
1      'gawk' ignores case in the four special values.  Thus, '+nan' and
1      '+NaN' are the same.
1 
1    ---------- Footnotes ----------
1 
1    (1) You asked for it, you got it.
1