sed: Character Classes and Bracket Expressions
1
1 5.5 Character Classes and Bracket Expressions
1 =============================================
1
1 A "bracket expression" is a list of characters enclosed by '[' and ']'.
1 It matches any single character in that list; if the first character of
1 the list is the caret '^', then it matches any character *not* in the
1 list. For example, the following command replaces the words 'gray' or
1 'grey' with 'blue':
1
1 sed 's/gr[ae]y/blue/'
1
1 Bracket expressions can be used in both ⇒basic BRE syntax. and
1 ⇒extended ERE syntax. regular expressions (that is, with or
1 without the '-E'/'-r' options).
1
1 Within a bracket expression, a "range expression" consists of two
1 characters separated by a hyphen. It matches any single character that
1 sorts between the two characters, inclusive. In the default C locale,
1 the sorting sequence is the native character order; for example, '[a-d]'
1 is equivalent to '[abcd]'.
1
1 Finally, certain named classes of characters are predefined within
1 bracket expressions, as follows.
1
1 These named classes must be used _inside_ brackets themselves.
1 Correct usage:
1 $ echo 1 | sed 's/[[:digit:]]/X/'
1 X
1
1 Incorrect usage is rejected by newer 'sed' versions. Older versions
1 accepted it but treated it as a single bracket expression (which is
1 equivalent to '[dgit:]', that is, only the characters D/G/I/T/:):
1 # current GNU sed versions - incorrect usage rejected
1 $ echo 1 | sed 's/[:digit:]/X/'
1 sed: character class syntax is [[:space:]], not [:space:]
1
1 # older GNU sed versions
1 $ echo 1 | sed 's/[:digit:]/X/'
1 1
1
1 '[:alnum:]'
1 Alphanumeric characters: '[:alpha:]' and '[:digit:]'; in the 'C'
1 locale and ASCII character encoding, this is the same as
1 '[0-9A-Za-z]'.
1
1 '[:alpha:]'
1 Alphabetic characters: '[:lower:]' and '[:upper:]'; in the 'C'
1 locale and ASCII character encoding, this is the same as
1 '[A-Za-z]'.
1
1 '[:blank:]'
1 Blank characters: space and tab.
1
1 '[:cntrl:]'
1 Control characters. In ASCII, these characters have octal codes
1 000 through 037, and 177 (DEL). In other character sets, these are
1 the equivalent characters, if any.
1
1 '[:digit:]'
1 Digits: '0 1 2 3 4 5 6 7 8 9'.
1
1 '[:graph:]'
1 Graphical characters: '[:alnum:]' and '[:punct:]'.
1
1 '[:lower:]'
1 Lower-case letters; in the 'C' locale and ASCII character encoding,
1 this is 'a b c d e f g h i j k l m n o p q r s t u v w x y z'.
1
1 '[:print:]'
1 Printable characters: '[:alnum:]', '[:punct:]', and space.
1
1 '[:punct:]'
1 Punctuation characters; in the 'C' locale and ASCII character
1 encoding, this is '! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \
1 ] ^ _ ` { | } ~'.
1
1 '[:space:]'
1 Space characters: in the 'C' locale, this is tab, newline, vertical
1 tab, form feed, carriage return, and space.
1
1 '[:upper:]'
1 Upper-case letters: in the 'C' locale and ASCII character encoding,
1 this is 'A B C D E F G H I J K L M N O P Q R S T U V W X Y Z'.
1
1 '[:xdigit:]'
1 Hexadecimal digits: '0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f'.
1
1 Note that the brackets in these class names are part of the symbolic
1 names, and must be included in addition to the brackets delimiting the
1 bracket expression.
1
1 Most meta-characters lose their special meaning inside bracket
1 expressions:
1
1 ']'
1 ends the bracket expression if it's not the first list item. So,
1 if you want to make the ']' character a list item, you must put it
1 first.
1
1 '-'
1 represents the range if it's not first or last in a list or the
1 ending point of a range.
1
1 '^'
1 represents the characters not in the list. If you want to make the
1 '^' character a list item, place it anywhere but first.
1
1 TODO: incorporate this paragraph (copied verbatim from BRE section).
1
1 The characters '$', '*', '.', '[', and '\' are normally not special
1 within LIST. For example, '[\*]' matches either '\' or '*', because the
1 '\' is not special here. However, strings like '[.ch.]', '[=a=]', and
1 '[:space:]' are special within LIST and represent collating symbols,
1 equivalence classes, and character classes, respectively, and '[' is
1 therefore special within LIST when it is followed by '.', '=', or ':'.
1 Also, when not in 'POSIXLY_CORRECT' mode, special escapes like '\n' and
1 '\t' are recognized within LIST. ⇒Escapes.
1
1 '[.'
1 represents the open collating symbol.
1
1 '.]'
1 represents the close collating symbol.
1
1 '[='
1 represents the open equivalence class.
1
1 '=]'
1 represents the close equivalence class.
1
1 '[:'
1 represents the open character class symbol, and should be followed
1 by a valid character class name.
1
1 ':]'
1 represents the close character class symbol.
1