gawk: Locale influences conversions
1
1 6.1.4.2 Locales Can Influence Conversion
1 ........................................
1
1 Where you are can matter when it comes to converting between numbers and
1 strings. The local character set and language--the "locale"--can affect
1 numeric formats. In particular, for 'awk' programs, it affects the
1 decimal point character and the thousands-separator character. The
1 '"C"' locale, and most English-language locales, use the period
1 character ('.') as the decimal point and don't have a thousands
1 separator. However, many (if not most) European and non-English locales
1 use the comma (',') as the decimal point character. European locales
1 often use either a space or a period as the thousands separator, if they
1 have one.
1
1 The POSIX standard says that 'awk' always uses the period as the
1 decimal point when reading the 'awk' program source code, and for
1 command-line variable assignments (⇒Other Arguments). However,
1 when interpreting input data, for 'print' and 'printf' output, and for
1 number-to-string conversion, the local decimal point character is used.
1 (d.c.) In all cases, numbers in source code and in input data cannot
1 have a thousands separator. Here are some examples indicating the
1 difference in behavior, on a GNU/Linux system:
1
1 $ export POSIXLY_CORRECT=1 Force POSIX behavior
1 $ gawk 'BEGIN { printf "%g\n", 3.1415927 }'
1 -| 3.14159
1 $ LC_ALL=en_DK.utf-8 gawk 'BEGIN { printf "%g\n", 3.1415927 }'
1 -| 3,14159
1 $ echo 4,321 | gawk '{ print $1 + 1 }'
1 -| 5
1 $ echo 4,321 | LC_ALL=en_DK.utf-8 gawk '{ print $1 + 1 }'
1 -| 5,321
1
1 The 'en_DK.utf-8' locale is for English in Denmark, where the comma acts
1 as the decimal point separator. In the normal '"C"' locale, 'gawk'
1 treats '4,321' as 4, while in the Danish locale, it's treated as the
1 full number including the fractional part, 4.321.
1
1 Some earlier versions of 'gawk' fully complied with this aspect of
1 the standard. However, many users in non-English locales complained
1 about this behavior, because their data used a period as the decimal
1 point, so the default behavior was restored to use a period as the
1 decimal point character. You can use the '--use-lc-numeric' option
1 (⇒Options) to force 'gawk' to use the locale's decimal point
1 character. ('gawk' also uses the locale's decimal point character when
1 in POSIX mode, either via '--posix' or the 'POSIXLY_CORRECT' environment
1 variable, as shown previously.)
1
1 ⇒Table 6.1 table-locale-affects. describes the cases in which
1 the locale's decimal point character is used and when a period is used.
1 Some of these features have not been described yet.
1
1 Feature Default '--posix' or
1 '--use-lc-numeric'
1 ------------------------------------------------------------
1 '%'g' Use locale Use locale
1 '%g' Use period Use locale
1 Input Use period Use locale
1 'strtonum()'Use period Use locale
1
1 Table 6.1: Locale decimal point versus a period
1
1 Finally, modern-day formal standards and the IEEE standard
1 floating-point representation can have an unusual but important effect
1 on the way 'gawk' converts some special string values to numbers. The
1 details are presented in ⇒POSIX Floating Point Problems.
1