grep: Character Classes and Bracket Expressions

1 
1 3.2 Character Classes and Bracket Expressions
1 =============================================
1 
1 A “bracket expression” is a list of characters enclosed by ‘[’ and ‘]’.
1 It matches any single character in that list; if the first character of
1 the list is the caret ‘^’, then it matches any character *not* in the
1 list.  For example, the regular expression ‘[0123456789]’ matches any
1 single digit.
1 
1    Within a bracket expression, a “range expression” consists of two
1 characters separated by a hyphen.  It matches any single character that
1 sorts between the two characters, inclusive.  In the default C locale,
1 the sorting sequence is the native character order; for example, ‘[a-d]’
1 is equivalent to ‘[abcd]’.  In other locales, the sorting sequence is
1 not specified, and ‘[a-d]’ might be equivalent to ‘[abcd]’ or to
1 ‘[aBbCcDd]’, or it might fail to match any character, or the set of
1 characters that it matches might even be erratic.  To obtain the
1 traditional interpretation of bracket expressions, you can use the ‘C’
1 locale by setting the ‘LC_ALL’ environment variable to the value ‘C’.
1 
1    Finally, certain named classes of characters are predefined within
1 bracket expressions, as follows.  Their interpretation depends on the
1 ‘LC_CTYPE’ locale; for example, ‘[[:alnum:]]’ means the character class
1 of numbers and letters in the current locale.
1 
1 ‘[:alnum:]’
1      Alphanumeric characters: ‘[:alpha:]’ and ‘[:digit:]’; in the ‘C’
1      locale and ASCII character encoding, this is the same as
1      ‘[0-9A-Za-z]’.
1 
1 ‘[:alpha:]’
1      Alphabetic characters: ‘[:lower:]’ and ‘[:upper:]’; in the ‘C’
1      locale and ASCII character encoding, this is the same as
1      ‘[A-Za-z]’.
1 
1 ‘[:blank:]’
1      Blank characters: space and tab.
1 
1 ‘[:cntrl:]’
1      Control characters.  In ASCII, these characters have octal codes
1      000 through 037, and 177 (DEL). In other character sets, these are
1      the equivalent characters, if any.
1 
1 ‘[:digit:]’
1      Digits: ‘0 1 2 3 4 5 6 7 8 9’.
1 
1 ‘[:graph:]’
1      Graphical characters: ‘[:alnum:]’ and ‘[:punct:]’.
1 
1 ‘[:lower:]’
1      Lower-case letters; in the ‘C’ locale and ASCII character encoding,
1      this is ‘a b c d e f g h i j k l m n o p q r s t u v w x y z’.
1 
1 ‘[:print:]’
1      Printable characters: ‘[:alnum:]’, ‘[:punct:]’, and space.
1 
1 ‘[:punct:]’
1      Punctuation characters; in the ‘C’ locale and ASCII character
1      encoding, this is ‘! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \
1      ] ^ _ ` { | } ~’.
1 
1 ‘[:space:]’
1      Space characters: in the ‘C’ locale, this is tab, newline, vertical
1      tab, form feed, carriage return, and space.  ⇒Usage, for
1      more discussion of matching newlines.
1 
1 ‘[:upper:]’
1      Upper-case letters: in the ‘C’ locale and ASCII character encoding,
1      this is ‘A B C D E F G H I J K L M N O P Q R S T U V W X Y Z’.
1 
1 ‘[:xdigit:]’
1      Hexadecimal digits: ‘0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f’.
1 
1    Note that the brackets in these class names are part of the symbolic
1 names, and must be included in addition to the brackets delimiting the
1 bracket expression.
1 
1    If you mistakenly omit the outer brackets, and search for say,
1 ‘[:upper:]’, GNU ‘grep’ prints a diagnostic and exits with status 2, on
1 the assumption that you did not intend to search for the nominally
1 equivalent regular expression: ‘[:epru]’.  Set the ‘POSIXLY_CORRECT’
1 environment variable to disable this feature.
1 
1    Most meta-characters lose their special meaning inside bracket
1 expressions.
1 
1 ‘]’
1      ends the bracket expression if it’s not the first list item.  So,
1      if you want to make the ‘]’ character a list item, you must put it
1      first.
1 
1 ‘[.’
1      represents the open collating symbol.
1 
1 ‘.]’
1      represents the close collating symbol.
1 
1 ‘[=’
1      represents the open equivalence class.
1 
1 ‘=]’
1      represents the close equivalence class.
1 
1 ‘[:’
1      represents the open character class symbol, and should be followed
1      by a valid character class name.
1 
1 ‘:]’
1      represents the close character class symbol.
1 
1 ‘-’
1      represents the range if it’s not first or last in a list or the
1      ending point of a range.
1 
1 ‘^’
1      represents the characters not in the list.  If you want to make the
1      ‘^’ character a list item, place it anywhere but first.
1