gawk: POSIX String Comparison
1
1 6.3.2.3 String Comparison Based on Locale Collating Order
1 .........................................................
1
1 The POSIX standard used to say that all string comparisons are performed
1 based on the locale's "collating order". This is the order in which
11 characters sort, as defined by the locale (for more discussion, ⇒
Locales). This order is usually very different from the results
1 obtained when doing straight byte-by-byte comparison.(1)
1
1 Because this behavior differs considerably from existing practice,
1 'gawk' only implemented it when in POSIX mode (⇒Options). Here
1 is an example to illustrate the difference, in an 'en_US.UTF-8' locale:
1
1 $ gawk 'BEGIN { printf("ABC < abc = %s\n",
1 > ("ABC" < "abc" ? "TRUE" : "FALSE")) }'
1 -| ABC < abc = TRUE
1 $ gawk --posix 'BEGIN { printf("ABC < abc = %s\n",
1 > ("ABC" < "abc" ? "TRUE" : "FALSE")) }'
1 -| ABC < abc = FALSE
1
1 Fortunately, as of August 2016, comparison based on locale collating
1 order is no longer required for the '==' and '!=' operators.(2)
1 However, comparison based on locales is still required for '<', '<=',
1 '>', and '>='. POSIX thus recommends as follows:
1
1 Since the '==' operator checks whether strings are identical, not
1 whether they collate equally, applications needing to check whether
1 strings collate equally can use:
1
1 a <= b && a >= b
1
1 As of version 4.2, 'gawk' continues to use locale collating order for
1 '<', '<=', '>', and '>=' only in POSIX mode.
1
1 ---------- Footnotes ----------
1
1 (1) Technically, string comparison is supposed to behave the same way
1 as if the strings were compared with the C 'strcoll()' function.
1
1 (2) See the Austin Group website
1 (http://austingroupbugs.net/view.php?id=1070).
1