gawk: Case-sensitivity
1
1 3.8 Case Sensitivity in Matching
1 ================================
1
1 Case is normally significant in regular expressions, both when matching
1 ordinary characters (i.e., not metacharacters) and inside bracket
1 expressions. Thus, a 'w' in a regular expression matches only a
1 lowercase 'w' and not an uppercase 'W'.
1
1 The simplest way to do a case-independent match is to use a bracket
1 expression--for example, '[Ww]'. However, this can be cumbersome if you
1 need to use it often, and it can make the regular expressions harder to
1 read. There are two alternatives that you might prefer.
1
1 One way to perform a case-insensitive match at a particular point in
1 the program is to convert the data to a single case, using the
1 'tolower()' or 'toupper()' built-in string functions (which we haven't
1 discussed yet; ⇒String Functions). For example:
1
1 tolower($1) ~ /foo/ { ... }
1
1 converts the first field to lowercase before matching against it. This
1 works in any POSIX-compliant 'awk'.
1
1 Another method, specific to 'gawk', is to set the variable
1 'IGNORECASE' to a nonzero value (⇒Built-in Variables). When
1 'IGNORECASE' is not zero, _all_ regexp and string operations ignore
1 case.
1
1 Changing the value of 'IGNORECASE' dynamically controls the case
1 sensitivity of the program as it runs. Case is significant by default
1 because 'IGNORECASE' (like most variables) is initialized to zero:
1
1 x = "aB"
1 if (x ~ /ab/) ... # this test will fail
1
1 IGNORECASE = 1
1 if (x ~ /ab/) ... # now it will succeed
1
1 In general, you cannot use 'IGNORECASE' to make certain rules case
1 insensitive and other rules case sensitive, as there is no
1 straightforward way to set 'IGNORECASE' just for the pattern of a
1 particular rule.(1) To do this, use either bracket expressions or
1 'tolower()'. However, one thing you can do with 'IGNORECASE' only is
1 dynamically turn case sensitivity on or off for all the rules at once.
1
1 'IGNORECASE' can be set on the command line or in a 'BEGIN' rule
1 (⇒Other Arguments; also ⇒Using BEGIN/END). Setting
1 'IGNORECASE' from the command line is a way to make a program case
1 insensitive without having to edit it.
1
1 In multibyte locales, the equivalences between upper- and lowercase
1 characters are tested based on the wide-character values of the locale's
1 character set. Otherwise, the characters are tested based on the
1 ISO-8859-1 (ISO Latin-1) character set. This character set is a
1 superset of the traditional 128 ASCII characters, which also provides a
1 number of characters suitable for use with European languages.(2)
1
1 The value of 'IGNORECASE' has no effect if 'gawk' is in compatibility
1 mode (⇒Options). Case is always significant in compatibility
1 mode.
1
1 ---------- Footnotes ----------
1
1 (1) Experienced C and C++ programmers will note that it is possible,
1 using something like 'IGNORECASE = 1 && /foObAr/ { ... }' and
1 'IGNORECASE = 0 || /foobar/ { ... }'. However, this is somewhat obscure
1 and we don't recommend it.
1
1 (2) If you don't understand this, don't worry about it; it just means
1 that 'gawk' does the right thing.
1