gawk: Case-sensitivity

1 
1 3.8 Case Sensitivity in Matching
1 ================================
1 
1 Case is normally significant in regular expressions, both when matching
1 ordinary characters (i.e., not metacharacters) and inside bracket
1 expressions.  Thus, a 'w' in a regular expression matches only a
1 lowercase 'w' and not an uppercase 'W'.
1 
1    The simplest way to do a case-independent match is to use a bracket
1 expression--for example, '[Ww]'.  However, this can be cumbersome if you
1 need to use it often, and it can make the regular expressions harder to
1 read.  There are two alternatives that you might prefer.
1 
1    One way to perform a case-insensitive match at a particular point in
1 the program is to convert the data to a single case, using the
1 'tolower()' or 'toupper()' built-in string functions (which we haven't
1 discussed yet; ⇒String Functions).  For example:
1 
1      tolower($1) ~ /foo/  { ... }
1 
1 converts the first field to lowercase before matching against it.  This
1 works in any POSIX-compliant 'awk'.
1 
1    Another method, specific to 'gawk', is to set the variable
1 'IGNORECASE' to a nonzero value (⇒Built-in Variables).  When
1 'IGNORECASE' is not zero, _all_ regexp and string operations ignore
1 case.
1 
1    Changing the value of 'IGNORECASE' dynamically controls the case
1 sensitivity of the program as it runs.  Case is significant by default
1 because 'IGNORECASE' (like most variables) is initialized to zero:
1 
1      x = "aB"
1      if (x ~ /ab/) ...   # this test will fail
1 
1      IGNORECASE = 1
1      if (x ~ /ab/) ...   # now it will succeed
1 
1    In general, you cannot use 'IGNORECASE' to make certain rules case
1 insensitive and other rules case sensitive, as there is no
1 straightforward way to set 'IGNORECASE' just for the pattern of a
1 particular rule.(1)  To do this, use either bracket expressions or
1 'tolower()'.  However, one thing you can do with 'IGNORECASE' only is
1 dynamically turn case sensitivity on or off for all the rules at once.
1 
1    'IGNORECASE' can be set on the command line or in a 'BEGIN' rule
1 (⇒Other Arguments; also ⇒Using BEGIN/END).  Setting
1 'IGNORECASE' from the command line is a way to make a program case
1 insensitive without having to edit it.
1 
1    In multibyte locales, the equivalences between upper- and lowercase
1 characters are tested based on the wide-character values of the locale's
1 character set.  Otherwise, the characters are tested based on the
1 ISO-8859-1 (ISO Latin-1) character set.  This character set is a
1 superset of the traditional 128 ASCII characters, which also provides a
1 number of characters suitable for use with European languages.(2)
1 
1    The value of 'IGNORECASE' has no effect if 'gawk' is in compatibility
1 mode (⇒Options).  Case is always significant in compatibility
1 mode.
1 
1    ---------- Footnotes ----------
1 
1    (1) Experienced C and C++ programmers will note that it is possible,
1 using something like 'IGNORECASE = 1 && /foObAr/ { ... }' and
1 'IGNORECASE = 0 || /foobar/ { ... }'.  However, this is somewhat obscure
1 and we don't recommend it.
1 
1    (2) If you don't understand this, don't worry about it; it just means
1 that 'gawk' does the right thing.
1