gawk: Escape Sequences

1 
1 3.2 Escape Sequences
1 ====================
1 
1 Some characters cannot be included literally in string constants
1 ('"foo"') or regexp constants ('/foo/').  Instead, they should be
1 represented with "escape sequences", which are character sequences
1 beginning with a backslash ('\').  One use of an escape sequence is to
1 include a double-quote character in a string constant.  Because a plain
1 double quote ends the string, you must use '\"' to represent an actual
1 double-quote character as a part of the string.  For example:
1 
1      $ awk 'BEGIN { print "He said \"hi!\" to her." }'
1      -| He said "hi!" to her.
1 
1    The backslash character itself is another character that cannot be
1 included normally; you must write '\\' to put one backslash in the
1 string or regexp.  Thus, the string whose contents are the two
1 characters '"' and '\' must be written '"\"\\"'.
1 
1    Other escape sequences represent unprintable characters such as TAB
1 or newline.  There is nothing to stop you from entering most unprintable
1 characters directly in a string constant or regexp constant, but they
1 may look ugly.
1 
1    The following list presents all the escape sequences used in 'awk'
1 and what they represent.  Unless noted otherwise, all these escape
1 sequences apply to both string constants and regexp constants:
1 
1 '\\'
1      A literal backslash, '\'.
1 
1 '\a'
1      The "alert" character, 'Ctrl-g', ASCII code 7 (BEL). (This often
1      makes some sort of audible noise.)
1 
1 '\b'
1      Backspace, 'Ctrl-h', ASCII code 8 (BS).
1 
1 '\f'
1      Formfeed, 'Ctrl-l', ASCII code 12 (FF).
1 
1 '\n'
1      Newline, 'Ctrl-j', ASCII code 10 (LF).
1 
1 '\r'
1      Carriage return, 'Ctrl-m', ASCII code 13 (CR).
1 
1 '\t'
1      Horizontal TAB, 'Ctrl-i', ASCII code 9 (HT).
1 
1 '\v'
1      Vertical TAB, 'Ctrl-k', ASCII code 11 (VT).
1 
1 '\NNN'
1      The octal value NNN, where NNN stands for 1 to 3 digits between '0'
1      and '7'.  For example, the code for the ASCII ESC (escape)
1      character is '\033'.
1 
1 '\xHH...'
1      The hexadecimal value HH, where HH stands for a sequence of
1      hexadecimal digits ('0'-'9', and either 'A'-'F' or 'a'-'f').  A
1      maximum of two digts are allowed after the '\x'.  Any further
1      hexadecimal digits are treated as simple letters or numbers.
1      (c.e.)  (The '\x' escape sequence is not allowed in POSIX awk.)
1 
1           CAUTION: In ISO C, the escape sequence continues until the
1           first nonhexadecimal digit is seen.  For many years, 'gawk'
1           would continue incorporating hexadecimal digits into the value
1           until a non-hexadecimal digit or the end of the string was
1           encountered.  However, using more than two hexadecimal digits
1           produced undefined results.  As of version 4.2, only two
1           digits are processed.
1 
1 '\/'
1      A literal slash (necessary for regexp constants only).  This
1      sequence is used when you want to write a regexp constant that
1      contains a slash (such as '/.*:\/home\/[[:alnum:]]+:.*/'; the
11      '[[:alnum:]]' notation is discussed in ⇒Bracket
      Expressions).  Because the regexp is delimited by slashes, you
1      need to escape any slash that is part of the pattern, in order to
1      tell 'awk' to keep processing the rest of the regexp.
1 
1 '\"'
1      A literal double quote (necessary for string constants only).  This
1      sequence is used when you want to write a string constant that
1      contains a double quote (such as '"He said \"hi!\" to her."').
1      Because the string is delimited by double quotes, you need to
1      escape any quote that is part of the string, in order to tell 'awk'
1      to keep processing the rest of the string.
1 
1    In 'gawk', a number of additional two-character sequences that begin
11 with a backslash have special meaning in regexps.  ⇒GNU Regexp
 Operators.
1 
1    In a regexp, a backslash before any character that is not in the
1 previous list and not listed in ⇒GNU Regexp Operators means that
1 the next character should be taken literally, even if it would normally
1 be a regexp operator.  For example, '/a\+b/' matches the three
1 characters 'a+b'.
1 
1    For complete portability, do not use a backslash before any character
1 not shown in the previous list or that is not an operator.
1 
1                   Backslash Before Regular Characters
1 
1    If you place a backslash in a string constant before something that
1 is not one of the characters previously listed, POSIX 'awk' purposely
1 leaves what happens as undefined.  There are two choices:
1 
1 Strip the backslash out
1      This is what BWK 'awk' and 'gawk' both do.  For example, '"a\qc"'
1      is the same as '"aqc"'.  (Because this is such an easy bug both to
1      introduce and to miss, 'gawk' warns you about it.)  Consider 'FS =
1      "[ \t]+\|[ \t]+"' to use vertical bars surrounded by whitespace as
1      the field separator.  There should be two backslashes in the
1      string: 'FS = "[ \t]+\\|[ \t]+"'.)
1 
1 Leave the backslash alone
1      Some other 'awk' implementations do this.  In such implementations,
1      typing '"a\qc"' is the same as typing '"a\\qc"'.
1 
1    To summarize:
1 
1    * The escape sequences in the preceding list are always processed
1      first, for both string constants and regexp constants.  This
1      happens very early, as soon as 'awk' reads your program.
1 
DONTPRINTYET 11    * 'gawk' processes both regexp constants and dynamic regexps (⇒
      Computed Regexps), for the special operators listed in *noteGNU
1DONTPRINTYET 11    * 'gawk' processes both regexp constants and dynamic regexps (⇒
      Computed Regexps), for the special operators listed in ⇒GNU

      Regexp Operators.
1 
1    * A backslash before any other character means to treat that
1      character literally.
1 
1                   Escape Sequences for Metacharacters
1 
1    Suppose you use an octal or hexadecimal escape to represent a regexp
1 metacharacter.  (See ⇒Regexp Operators.)  Does 'awk' treat the
1 character as a literal character or as a regexp operator?
1 
1    Historically, such characters were taken literally.  (d.c.)  However,
1 the POSIX standard indicates that they should be treated as real
11 metacharacters, which is what 'gawk' does.  In compatibility mode (⇒
 Options), 'gawk' treats the characters represented by octal and
1 hexadecimal escape sequences literally when used in regexp constants.
1 Thus, '/a\52b/' is equivalent to '/a\*b/'.
1