cpp: Initial processing
1
1 1.2 Initial processing
1 ======================
1
1 The preprocessor performs a series of textual transformations on its
1 input. These happen before all other processing. Conceptually, they
1 happen in a rigid order, and the entire file is run through each
1 transformation before the next one begins. CPP actually does them all
1 at once, for performance reasons. These transformations correspond
1 roughly to the first three "phases of translation" described in the C
1 standard.
1
1 1. The input file is read into memory and broken into lines.
1
1 Different systems use different conventions to indicate the end of
1 a line. GCC accepts the ASCII control sequences 'LF', 'CR LF' and
1 'CR' as end-of-line markers. These are the canonical sequences
1 used by Unix, DOS and VMS, and the classic Mac OS (before OSX)
1 respectively. You may therefore safely copy source code written on
1 any of those systems to a different one and use it without
1 conversion. (GCC may lose track of the current line number if a
1 file doesn't consistently use one convention, as sometimes happens
1 when it is edited on computers with different conventions that
1 share a network file system.)
1
1 If the last line of any input file lacks an end-of-line marker, the
1 end of the file is considered to implicitly supply one. The C
1 standard says that this condition provokes undefined behavior, so
1 GCC will emit a warning message.
1
1 2. If trigraphs are enabled, they are replaced by their corresponding
1 single characters. By default GCC ignores trigraphs, but if you
1 request a strictly conforming mode with the '-std' option, or you
1 specify the '-trigraphs' option, then it converts them.
1
1 These are nine three-character sequences, all starting with '??',
1 that are defined by ISO C to stand for single characters. They
1 permit obsolete systems that lack some of C's punctuation to use C.
1 For example, '??/' stands for '\', so '??/n' is a character
1 constant for a newline.
1
1 Trigraphs are not popular and many compilers implement them
1 incorrectly. Portable code should not rely on trigraphs being
1 either converted or ignored. With '-Wtrigraphs' GCC will warn you
1 when a trigraph may change the meaning of your program if it were
1 converted. ⇒Wtrigraphs.
1
1 In a string constant, you can prevent a sequence of question marks
1 from being confused with a trigraph by inserting a backslash
1 between the question marks, or by separating the string literal at
1 the trigraph and making use of string literal concatenation.
1 "(??\?)" is the string '(???)', not '(?]'. Traditional C compilers
1 do not recognize these idioms.
1
1 The nine trigraphs and their replacements are
1
1 Trigraph: ??( ??) ??< ??> ??= ??/ ??' ??! ??-
1 Replacement: [ ] { } # \ ^ | ~
1
1 3. Continued lines are merged into one long line.
1
1 A continued line is a line which ends with a backslash, '\'. The
1 backslash is removed and the following line is joined with the
1 current one. No space is inserted, so you may split a line
1 anywhere, even in the middle of a word. (It is generally more
1 readable to split lines only at white space.)
1
1 The trailing backslash on a continued line is commonly referred to
1 as a "backslash-newline".
1
1 If there is white space between a backslash and the end of a line,
1 that is still a continued line. However, as this is usually the
1 result of an editing mistake, and many compilers will not accept it
1 as a continued line, GCC will warn you about it.
1
1 4. All comments are replaced with single spaces.
1
1 There are two kinds of comments. "Block comments" begin with '/*'
1 and continue until the next '*/'. Block comments do not nest:
1
1 /* this is /* one comment */ text outside comment
1
1 "Line comments" begin with '//' and continue to the end of the
1 current line. Line comments do not nest either, but it does not
1 matter, because they would end in the same place anyway.
1
1 // this is // one comment
1 text outside comment
1
1 It is safe to put line comments inside block comments, or vice versa.
1
1 /* block comment
1 // contains line comment
1 yet more comment
1 */ outside comment
1
1 // line comment /* contains block comment */
1
1 But beware of commenting out one end of a block comment with a line
1 comment.
1
1 // l.c. /* block comment begins
1 oops! this isn't a comment anymore */
1
1 Comments are not recognized within string literals. "/* blah */" is
1 the string constant '/* blah */', not an empty string.
1
1 Line comments are not in the 1989 edition of the C standard, but they
1 are recognized by GCC as an extension. In C++ and in the 1999 edition
1 of the C standard, they are an official part of the language.
1
1 Since these transformations happen before all other processing, you
1 can split a line mechanically with backslash-newline anywhere. You can
1 comment out the end of a line. You can continue a line comment onto the
1 next line with backslash-newline. You can even split '/*', '*/', and
1 '//' onto multiple lines with backslash-newline. For example:
1
1 /\
1 *
1 */ # /*
1 */ defi\
1 ne FO\
1 O 10\
1 20
1
1 is equivalent to '#define FOO 1020'. All these tricks are extremely
1 confusing and should not be used in code intended to be readable.
1
1 There is no way to prevent a backslash at the end of a line from
1 being interpreted as a backslash-newline. This cannot affect any
1 correct program, however.
1