sed: BRE syntax
1
1 5.3 Overview of basic regular expression syntax
1 ===============================================
1
1 Here is a brief description of regular expression syntax as used in
1 'sed'.
1
1 'CHAR'
1 A single ordinary character matches itself.
1
1 '*'
1 Matches a sequence of zero or more instances of matches for the
1 preceding regular expression, which must be an ordinary character,
1 a special character preceded by '\', a '.', a grouped regexp (see
1 below), or a bracket expression. As a GNU extension, a postfixed
1 regular expression can also be followed by '*'; for example, 'a**'
1 is equivalent to 'a*'. POSIX 1003.1-2001 says that '*' stands for
1 itself when it appears at the start of a regular expression or
1 subexpression, but many nonGNU implementations do not support this
1 and portable scripts should instead use '\*' in these contexts.
1 '.'
1 Matches any character, including newline.
1
1 '^'
1 Matches the null string at beginning of the pattern space, i.e.
1 what appears after the circumflex must appear at the beginning of
1 the pattern space.
1
1 In most scripts, pattern space is initialized to the content of
1 each line (⇒How 'sed' works Execution Cycle.). So, it is a
1 useful simplification to think of '^#include' as matching only
1 lines where '#include' is the first thing on line--if there are
1 spaces before, for example, the match fails. This simplification
1 is valid as long as the original content of pattern space is not
1 modified, for example with an 's' command.
1
1 '^' acts as a special character only at the beginning of the
1 regular expression or subexpression (that is, after '\(' or '\|').
1 Portable scripts should avoid '^' at the beginning of a
1 subexpression, though, as POSIX allows implementations that treat
1 '^' as an ordinary character in that context.
1
1 '$'
1 It is the same as '^', but refers to end of pattern space. '$'
1 also acts as a special character only at the end of the regular
1 expression or subexpression (that is, before '\)' or '\|'), and its
1 use at the end of a subexpression is not portable.
1
1 '[LIST]'
1 '[^LIST]'
1 Matches any single character in LIST: for example, '[aeiou]'
1 matches all vowels. A list may include sequences like
1 'CHAR1-CHAR2', which matches any character between (inclusive)
11 CHAR1 and CHAR2. ⇒Character Classes and Bracket
Expressions.
1
1 '\+'
1 As '*', but matches one or more. It is a GNU extension.
1
1 '\?'
1 As '*', but only matches zero or one. It is a GNU extension.
1
1 '\{I\}'
1 As '*', but matches exactly I sequences (I is a decimal integer;
1 for portability, keep it between 0 and 255 inclusive).
1
1 '\{I,J\}'
1 Matches between I and J, inclusive, sequences.
1
1 '\{I,\}'
1 Matches more than or equal to I sequences.
1
1 '\(REGEXP\)'
1 Groups the inner REGEXP as a whole, this is used to:
1
1 * Apply postfix operators, like '\(abcd\)*': this will search
1 for zero or more whole sequences of 'abcd', while 'abcd*'
1 would search for 'abc' followed by zero or more occurrences of
1 'd'. Note that support for '\(abcd\)*' is required by POSIX
1 1003.1-2001, but many non-GNU implementations do not support
1 it and hence it is not universally portable.
1
1 * Use back references (see below).
1
1 'REGEXP1\|REGEXP2'
1 Matches either REGEXP1 or REGEXP2. Use parentheses to use complex
1 alternative regular expressions. The matching process tries each
1 alternative in turn, from left to right, and the first one that
1 succeeds is used. It is a GNU extension.
1
1 'REGEXP1REGEXP2'
1 Matches the concatenation of REGEXP1 and REGEXP2. Concatenation
1 binds more tightly than '\|', '^', and '$', but less tightly than
1 the other regular expression operators.
1
1 '\DIGIT'
1 Matches the DIGIT-th '\(...\)' parenthesized subexpression in the
1 regular expression. This is called a "back reference".
1 Subexpressions are implicitly numbered by counting occurrences of
1 '\(' left-to-right.
1
1 '\n'
1 Matches the newline character.
1
1 '\CHAR'
1 Matches CHAR, where CHAR is one of '$', '*', '.', '[', '\', or '^'.
1 Note that the only C-like backslash sequences that you can portably
1 assume to be interpreted are '\n' and '\\'; in particular '\t' is
1 not portable, and matches a 't' under most implementations of
1 'sed', rather than a tab character.
1
1 Note that the regular expression matcher is greedy, i.e., matches are
1 attempted from left to right and, if two or more matches are possible
1 starting at the same character, it selects the longest.
1
1 Examples:
1 'abcdef'
1 Matches 'abcdef'.
1
1 'a*b'
1 Matches zero or more 'a's followed by a single 'b'. For example,
1 'b' or 'aaaaab'.
1
1 'a\?b'
1 Matches 'b' or 'ab'.
1
1 'a\+b\+'
1 Matches one or more 'a's followed by one or more 'b's: 'ab' is the
1 shortest possible match, but other examples are 'aaaab' or 'abbbbb'
1 or 'aaaaaabbbbbbb'.
1
1 '.*'
1 '.\+'
1 These two both match all the characters in a string; however, the
1 first matches every string (including the empty string), while the
1 second matches only strings containing at least one character.
1
1 '^main.*(.*)'
1 This matches a string starting with 'main', followed by an opening
1 and closing parenthesis. The 'n', '(' and ')' need not be
1 adjacent.
1
1 '^#'
1 This matches a string beginning with '#'.
1
1 '\\$'
1 This matches a string ending with a single backslash. The regexp
1 contains two backslashes for escaping.
1
1 '\$'
1 Instead, this matches a string consisting of a single dollar sign,
1 because it is escaped.
1
1 '[a-zA-Z0-9]'
1 In the C locale, this matches any ASCII letters or digits.
1
1 '[^ 'tab']\+'
1 (Here 'tab' stands for a single tab character.) This matches a
1 string of one or more characters, none of which is a space or a
1 tab. Usually this means a word.
1
1 '^\(.*\)\n\1$'
1 This matches a string consisting of two equal substrings separated
1 by a newline.
1
1 '.\{9\}A$'
1 This matches nine characters followed by an 'A' at the end of a
1 line.
1
1 '^.\{15\}A'
1 This matches the start of a string that contains 16 characters, the
1 last of which is an 'A'.
1