coreutils: Character sets
1
1 9.1.1 Specifying sets of characters
1 -----------------------------------
1
1 The format of the SET1 and SET2 arguments resembles the format of
1 regular expressions; however, they are not regular expressions, only
1 lists of characters. Most characters simply represent themselves in
1 these strings, but the strings can contain the shorthands listed below,
1 for convenience. Some of them can be used only in SET1 or SET2, as
1 noted below.
1
1 Backslash escapes
1
1 The following backslash escape sequences are recognized:
1
1 ‘\a’
1 Control-G.
1 ‘\b’
1 Control-H.
1 ‘\f’
1 Control-L.
1 ‘\n’
1 Control-J.
1 ‘\r’
1 Control-M.
1 ‘\t’
1 Control-I.
1 ‘\v’
1 Control-K.
1 ‘\OOO’
1 The 8-bit character with the value given by OOO, which is 1 to
1 3 octal digits. Note that ‘\400’ is interpreted as the
1 two-byte sequence, ‘\040’ ‘0’.
1 ‘\\’
1 A backslash.
1
1 While a backslash followed by a character not listed above is
1 interpreted as that character, the backslash also effectively
1 removes any special significance, so it is useful to escape ‘[’,
1 ‘]’, ‘*’, and ‘-’.
1
1 Ranges
1
1 The notation ‘M-N’ expands to all of the characters from M through
1 N, in ascending order. M should collate before N; if it doesn’t,
1 an error results. As an example, ‘0-9’ is the same as
1 ‘0123456789’.
1
1 GNU ‘tr’ does not support the System V syntax that uses square
1 brackets to enclose ranges. Translations specified in that format
1 sometimes work as expected, since the brackets are often
1 transliterated to themselves. However, they should be avoided
1 because they sometimes behave unexpectedly. For example, ‘tr -d
1 '[0-9]'’ deletes brackets as well as digits.
1
1 Many historically common and even accepted uses of ranges are not
1 portable. For example, on EBCDIC hosts using the ‘A-Z’ range will
1 not do what most would expect because ‘A’ through ‘Z’ are not
1 contiguous as they are in ASCII. If you can rely on a POSIX
1 compliant version of ‘tr’, then the best way to work around this is
1 to use character classes (see below). Otherwise, it is most
1 portable (and most ugly) to enumerate the members of the ranges.
1
1 Repeated characters
1
1 The notation ‘[C*N]’ in SET2 expands to N copies of character C.
1 Thus, ‘[y*6]’ is the same as ‘yyyyyy’. The notation ‘[C*]’ in
1 STRING2 expands to as many copies of C as are needed to make SET2
1 as long as SET1. If N begins with ‘0’, it is interpreted in octal,
1 otherwise in decimal.
1
1 Character classes
1
1 The notation ‘[:CLASS:]’ expands to all of the characters in the
1 (predefined) class CLASS. The characters expand in no particular
1 order, except for the ‘upper’ and ‘lower’ classes, which expand in
1 ascending order. When the ‘--delete’ (‘-d’) and
1 ‘--squeeze-repeats’ (‘-s’) options are both given, any character
1 class can be used in SET2. Otherwise, only the character classes
1 ‘lower’ and ‘upper’ are accepted in SET2, and then only if the
1 corresponding character class (‘upper’ and ‘lower’, respectively)
1 is specified in the same relative position in SET1. Doing this
1 specifies case conversion. The class names are given below; an
1 error results when an invalid class name is given.
1
1 ‘alnum’
1 Letters and digits.
1 ‘alpha’
1 Letters.
1 ‘blank’
1 Horizontal whitespace.
1 ‘cntrl’
1 Control characters.
1 ‘digit’
1 Digits.
1 ‘graph’
1 Printable characters, not including space.
1 ‘lower’
1 Lowercase letters.
1 ‘print’
1 Printable characters, including space.
1 ‘punct’
1 Punctuation characters.
1 ‘space’
1 Horizontal or vertical whitespace.
1 ‘upper’
1 Uppercase letters.
1 ‘xdigit’
1 Hexadecimal digits.
1
1 Equivalence classes
1
1 The syntax ‘[=C=]’ expands to all of the characters that are
1 equivalent to C, in no particular order. Equivalence classes are a
1 relatively recent invention intended to support non-English
1 alphabets. But there seems to be no standard way to define them or
1 determine their contents. Therefore, they are not fully
1 implemented in GNU ‘tr’; each character’s equivalence class
1 consists only of that character, which is of no particular use.
1