gawk: Escape Sequences
1
1 3.2 Escape Sequences
1 ====================
1
1 Some characters cannot be included literally in string constants
1 ('"foo"') or regexp constants ('/foo/'). Instead, they should be
1 represented with "escape sequences", which are character sequences
1 beginning with a backslash ('\'). One use of an escape sequence is to
1 include a double-quote character in a string constant. Because a plain
1 double quote ends the string, you must use '\"' to represent an actual
1 double-quote character as a part of the string. For example:
1
1 $ awk 'BEGIN { print "He said \"hi!\" to her." }'
1 -| He said "hi!" to her.
1
1 The backslash character itself is another character that cannot be
1 included normally; you must write '\\' to put one backslash in the
1 string or regexp. Thus, the string whose contents are the two
1 characters '"' and '\' must be written '"\"\\"'.
1
1 Other escape sequences represent unprintable characters such as TAB
1 or newline. There is nothing to stop you from entering most unprintable
1 characters directly in a string constant or regexp constant, but they
1 may look ugly.
1
1 The following list presents all the escape sequences used in 'awk'
1 and what they represent. Unless noted otherwise, all these escape
1 sequences apply to both string constants and regexp constants:
1
1 '\\'
1 A literal backslash, '\'.
1
1 '\a'
1 The "alert" character, 'Ctrl-g', ASCII code 7 (BEL). (This often
1 makes some sort of audible noise.)
1
1 '\b'
1 Backspace, 'Ctrl-h', ASCII code 8 (BS).
1
1 '\f'
1 Formfeed, 'Ctrl-l', ASCII code 12 (FF).
1
1 '\n'
1 Newline, 'Ctrl-j', ASCII code 10 (LF).
1
1 '\r'
1 Carriage return, 'Ctrl-m', ASCII code 13 (CR).
1
1 '\t'
1 Horizontal TAB, 'Ctrl-i', ASCII code 9 (HT).
1
1 '\v'
1 Vertical TAB, 'Ctrl-k', ASCII code 11 (VT).
1
1 '\NNN'
1 The octal value NNN, where NNN stands for 1 to 3 digits between '0'
1 and '7'. For example, the code for the ASCII ESC (escape)
1 character is '\033'.
1
1 '\xHH...'
1 The hexadecimal value HH, where HH stands for a sequence of
1 hexadecimal digits ('0'-'9', and either 'A'-'F' or 'a'-'f'). A
1 maximum of two digts are allowed after the '\x'. Any further
1 hexadecimal digits are treated as simple letters or numbers.
1 (c.e.) (The '\x' escape sequence is not allowed in POSIX awk.)
1
1 CAUTION: In ISO C, the escape sequence continues until the
1 first nonhexadecimal digit is seen. For many years, 'gawk'
1 would continue incorporating hexadecimal digits into the value
1 until a non-hexadecimal digit or the end of the string was
1 encountered. However, using more than two hexadecimal digits
1 produced undefined results. As of version 4.2, only two
1 digits are processed.
1
1 '\/'
1 A literal slash (necessary for regexp constants only). This
1 sequence is used when you want to write a regexp constant that
1 contains a slash (such as '/.*:\/home\/[[:alnum:]]+:.*/'; the
11 '[[:alnum:]]' notation is discussed in ⇒Bracket
Expressions). Because the regexp is delimited by slashes, you
1 need to escape any slash that is part of the pattern, in order to
1 tell 'awk' to keep processing the rest of the regexp.
1
1 '\"'
1 A literal double quote (necessary for string constants only). This
1 sequence is used when you want to write a string constant that
1 contains a double quote (such as '"He said \"hi!\" to her."').
1 Because the string is delimited by double quotes, you need to
1 escape any quote that is part of the string, in order to tell 'awk'
1 to keep processing the rest of the string.
1
1 In 'gawk', a number of additional two-character sequences that begin
11 with a backslash have special meaning in regexps. ⇒GNU Regexp
Operators.
1
1 In a regexp, a backslash before any character that is not in the
1 previous list and not listed in ⇒GNU Regexp Operators means that
1 the next character should be taken literally, even if it would normally
1 be a regexp operator. For example, '/a\+b/' matches the three
1 characters 'a+b'.
1
1 For complete portability, do not use a backslash before any character
1 not shown in the previous list or that is not an operator.
1
1 Backslash Before Regular Characters
1
1 If you place a backslash in a string constant before something that
1 is not one of the characters previously listed, POSIX 'awk' purposely
1 leaves what happens as undefined. There are two choices:
1
1 Strip the backslash out
1 This is what BWK 'awk' and 'gawk' both do. For example, '"a\qc"'
1 is the same as '"aqc"'. (Because this is such an easy bug both to
1 introduce and to miss, 'gawk' warns you about it.) Consider 'FS =
1 "[ \t]+\|[ \t]+"' to use vertical bars surrounded by whitespace as
1 the field separator. There should be two backslashes in the
1 string: 'FS = "[ \t]+\\|[ \t]+"'.)
1
1 Leave the backslash alone
1 Some other 'awk' implementations do this. In such implementations,
1 typing '"a\qc"' is the same as typing '"a\\qc"'.
1
1 To summarize:
1
1 * The escape sequences in the preceding list are always processed
1 first, for both string constants and regexp constants. This
1 happens very early, as soon as 'awk' reads your program.
1
DONTPRINTYET 11 * 'gawk' processes both regexp constants and dynamic regexps (⇒
Computed Regexps), for the special operators listed in *noteGNU
1DONTPRINTYET 11 * 'gawk' processes both regexp constants and dynamic regexps (⇒
Computed Regexps), for the special operators listed in ⇒GNU
Regexp Operators.
1
1 * A backslash before any other character means to treat that
1 character literally.
1
1 Escape Sequences for Metacharacters
1
1 Suppose you use an octal or hexadecimal escape to represent a regexp
1 metacharacter. (See ⇒Regexp Operators.) Does 'awk' treat the
1 character as a literal character or as a regexp operator?
1
1 Historically, such characters were taken literally. (d.c.) However,
1 the POSIX standard indicates that they should be treated as real
11 metacharacters, which is what 'gawk' does. In compatibility mode (⇒
Options), 'gawk' treats the characters represented by octal and
1 hexadecimal escape sequences literally when used in regexp constants.
1 Thus, '/a\52b/' is equivalent to '/a\*b/'.
1