sed: BRE syntax

1 
1 5.3 Overview of basic regular expression syntax
1 ===============================================
1 
1 Here is a brief description of regular expression syntax as used in
1 'sed'.
1 
1 'CHAR'
1      A single ordinary character matches itself.
1 
1 '*'
1      Matches a sequence of zero or more instances of matches for the
1      preceding regular expression, which must be an ordinary character,
1      a special character preceded by '\', a '.', a grouped regexp (see
1      below), or a bracket expression.  As a GNU extension, a postfixed
1      regular expression can also be followed by '*'; for example, 'a**'
1      is equivalent to 'a*'.  POSIX 1003.1-2001 says that '*' stands for
1      itself when it appears at the start of a regular expression or
1      subexpression, but many nonGNU implementations do not support this
1      and portable scripts should instead use '\*' in these contexts.
1 '.'
1      Matches any character, including newline.
1 
1 '^'
1      Matches the null string at beginning of the pattern space, i.e.
1      what appears after the circumflex must appear at the beginning of
1      the pattern space.
1 
1      In most scripts, pattern space is initialized to the content of
1      each line (⇒How 'sed' works Execution Cycle.).  So, it is a
1      useful simplification to think of '^#include' as matching only
1      lines where '#include' is the first thing on line--if there are
1      spaces before, for example, the match fails.  This simplification
1      is valid as long as the original content of pattern space is not
1      modified, for example with an 's' command.
1 
1      '^' acts as a special character only at the beginning of the
1      regular expression or subexpression (that is, after '\(' or '\|').
1      Portable scripts should avoid '^' at the beginning of a
1      subexpression, though, as POSIX allows implementations that treat
1      '^' as an ordinary character in that context.
1 
1 '$'
1      It is the same as '^', but refers to end of pattern space.  '$'
1      also acts as a special character only at the end of the regular
1      expression or subexpression (that is, before '\)' or '\|'), and its
1      use at the end of a subexpression is not portable.
1 
1 '[LIST]'
1 '[^LIST]'
1      Matches any single character in LIST: for example, '[aeiou]'
1      matches all vowels.  A list may include sequences like
1      'CHAR1-CHAR2', which matches any character between (inclusive)
11      CHAR1 and CHAR2.  ⇒Character Classes and Bracket
      Expressions.
1 
1 '\+'
1      As '*', but matches one or more.  It is a GNU extension.
1 
1 '\?'
1      As '*', but only matches zero or one.  It is a GNU extension.
1 
1 '\{I\}'
1      As '*', but matches exactly I sequences (I is a decimal integer;
1      for portability, keep it between 0 and 255 inclusive).
1 
1 '\{I,J\}'
1      Matches between I and J, inclusive, sequences.
1 
1 '\{I,\}'
1      Matches more than or equal to I sequences.
1 
1 '\(REGEXP\)'
1      Groups the inner REGEXP as a whole, this is used to:
1 
1         * Apply postfix operators, like '\(abcd\)*': this will search
1           for zero or more whole sequences of 'abcd', while 'abcd*'
1           would search for 'abc' followed by zero or more occurrences of
1           'd'.  Note that support for '\(abcd\)*' is required by POSIX
1           1003.1-2001, but many non-GNU implementations do not support
1           it and hence it is not universally portable.
1 
1         * Use back references (see below).
1 
1 'REGEXP1\|REGEXP2'
1      Matches either REGEXP1 or REGEXP2.  Use parentheses to use complex
1      alternative regular expressions.  The matching process tries each
1      alternative in turn, from left to right, and the first one that
1      succeeds is used.  It is a GNU extension.
1 
1 'REGEXP1REGEXP2'
1      Matches the concatenation of REGEXP1 and REGEXP2.  Concatenation
1      binds more tightly than '\|', '^', and '$', but less tightly than
1      the other regular expression operators.
1 
1 '\DIGIT'
1      Matches the DIGIT-th '\(...\)' parenthesized subexpression in the
1      regular expression.  This is called a "back reference".
1      Subexpressions are implicitly numbered by counting occurrences of
1      '\(' left-to-right.
1 
1 '\n'
1      Matches the newline character.
1 
1 '\CHAR'
1      Matches CHAR, where CHAR is one of '$', '*', '.', '[', '\', or '^'.
1      Note that the only C-like backslash sequences that you can portably
1      assume to be interpreted are '\n' and '\\'; in particular '\t' is
1      not portable, and matches a 't' under most implementations of
1      'sed', rather than a tab character.
1 
1    Note that the regular expression matcher is greedy, i.e., matches are
1 attempted from left to right and, if two or more matches are possible
1 starting at the same character, it selects the longest.
1 
1 Examples:
1 'abcdef'
1      Matches 'abcdef'.
1 
1 'a*b'
1      Matches zero or more 'a's followed by a single 'b'.  For example,
1      'b' or 'aaaaab'.
1 
1 'a\?b'
1      Matches 'b' or 'ab'.
1 
1 'a\+b\+'
1      Matches one or more 'a's followed by one or more 'b's: 'ab' is the
1      shortest possible match, but other examples are 'aaaab' or 'abbbbb'
1      or 'aaaaaabbbbbbb'.
1 
1 '.*'
1 '.\+'
1      These two both match all the characters in a string; however, the
1      first matches every string (including the empty string), while the
1      second matches only strings containing at least one character.
1 
1 '^main.*(.*)'
1      This matches a string starting with 'main', followed by an opening
1      and closing parenthesis.  The 'n', '(' and ')' need not be
1      adjacent.
1 
1 '^#'
1      This matches a string beginning with '#'.
1 
1 '\\$'
1      This matches a string ending with a single backslash.  The regexp
1      contains two backslashes for escaping.
1 
1 '\$'
1      Instead, this matches a string consisting of a single dollar sign,
1      because it is escaped.
1 
1 '[a-zA-Z0-9]'
1      In the C locale, this matches any ASCII letters or digits.
1 
1 '[^ 'tab']\+'
1      (Here 'tab' stands for a single tab character.)  This matches a
1      string of one or more characters, none of which is a space or a
1      tab.  Usually this means a word.
1 
1 '^\(.*\)\n\1$'
1      This matches a string consisting of two equal substrings separated
1      by a newline.
1 
1 '.\{9\}A$'
1      This matches nine characters followed by an 'A' at the end of a
1      line.
1 
1 '^.\{15\}A'
1      This matches the start of a string that contains 16 characters, the
1      last of which is an 'A'.
1