gawk: Locale influences conversions

1 
1 6.1.4.2 Locales Can Influence Conversion
1 ........................................
1 
1 Where you are can matter when it comes to converting between numbers and
1 strings.  The local character set and language--the "locale"--can affect
1 numeric formats.  In particular, for 'awk' programs, it affects the
1 decimal point character and the thousands-separator character.  The
1 '"C"' locale, and most English-language locales, use the period
1 character ('.') as the decimal point and don't have a thousands
1 separator.  However, many (if not most) European and non-English locales
1 use the comma (',') as the decimal point character.  European locales
1 often use either a space or a period as the thousands separator, if they
1 have one.
1 
1    The POSIX standard says that 'awk' always uses the period as the
1 decimal point when reading the 'awk' program source code, and for
1 command-line variable assignments (⇒Other Arguments).  However,
1 when interpreting input data, for 'print' and 'printf' output, and for
1 number-to-string conversion, the local decimal point character is used.
1 (d.c.)  In all cases, numbers in source code and in input data cannot
1 have a thousands separator.  Here are some examples indicating the
1 difference in behavior, on a GNU/Linux system:
1 
1      $ export POSIXLY_CORRECT=1                        Force POSIX behavior
1      $ gawk 'BEGIN { printf "%g\n", 3.1415927 }'
1      -| 3.14159
1      $ LC_ALL=en_DK.utf-8 gawk 'BEGIN { printf "%g\n", 3.1415927 }'
1      -| 3,14159
1      $ echo 4,321 | gawk '{ print $1 + 1 }'
1      -| 5
1      $ echo 4,321 | LC_ALL=en_DK.utf-8 gawk '{ print $1 + 1 }'
1      -| 5,321
1 
1 The 'en_DK.utf-8' locale is for English in Denmark, where the comma acts
1 as the decimal point separator.  In the normal '"C"' locale, 'gawk'
1 treats '4,321' as 4, while in the Danish locale, it's treated as the
1 full number including the fractional part, 4.321.
1 
1    Some earlier versions of 'gawk' fully complied with this aspect of
1 the standard.  However, many users in non-English locales complained
1 about this behavior, because their data used a period as the decimal
1 point, so the default behavior was restored to use a period as the
1 decimal point character.  You can use the '--use-lc-numeric' option
1 (⇒Options) to force 'gawk' to use the locale's decimal point
1 character.  ('gawk' also uses the locale's decimal point character when
1 in POSIX mode, either via '--posix' or the 'POSIXLY_CORRECT' environment
1 variable, as shown previously.)
1 
1    ⇒Table 6.1 table-locale-affects. describes the cases in which
1 the locale's decimal point character is used and when a period is used.
1 Some of these features have not been described yet.
1 
1 Feature     Default        '--posix' or
1                            '--use-lc-numeric'
1 ------------------------------------------------------------
1 '%'g'       Use locale     Use locale
1 '%g'        Use period     Use locale
1 Input       Use period     Use locale
1 'strtonum()'Use period     Use locale
1 
1 Table 6.1: Locale decimal point versus a period
1 
1    Finally, modern-day formal standards and the IEEE standard
1 floating-point representation can have an unusual but important effect
1 on the way 'gawk' converts some special string values to numbers.  The
1 details are presented in ⇒POSIX Floating Point Problems.
1