cppinternals: Line Numbering

1 
1 Line numbering
1 **************
1 
1 Just which line number anyway?
1 ==============================
1 
1 There are three reasonable requirements a cpplib client might have for
1 the line number of a token passed to it:
1 
1    * The source line it was lexed on.
1    * The line it is output on.  This can be different to the line it was
1      lexed on if, for example, there are intervening escaped newlines or
1      C-style comments.  For example:
1 
1           foo /* A long
1           comment */ bar \
1           baz
1           =>
1           foo bar baz
1 
1    * If the token results from a macro expansion, the line of the macro
1      name, or possibly the line of the closing parenthesis in the case
1      of function-like macro expansion.
1 
1    The 'cpp_token' structure contains 'line' and 'col' members.  The
1 lexer fills these in with the line and column of the first character of
1 the token.  Consequently, but maybe unexpectedly, a token from the
1 replacement list of a macro expansion carries the location of the token
1 within the '#define' directive, because cpplib expands a macro by
1 returning pointers to the tokens in its replacement list.  The current
1 implementation of cpplib assigns tokens created from built-in macros and
1 the '#' and '##' operators the location of the most recently lexed
1 token.  This is a because they are allocated from the lexer's token
1 runs, and because of the way the diagnostic routines infer the
1 appropriate location to report.
1 
1    The diagnostic routines in cpplib display the location of the most
1 recently _lexed_ token, unless they are passed a specific line and
1 column to report.  For diagnostics regarding tokens that arise from
1 macro expansions, it might also be helpful for the user to see the
1 original location in the macro definition that the token came from.
1 Since that is exactly the information each token carries, such an
1 enhancement could be made relatively easily in future.
1 
1    The stand-alone preprocessor faces a similar problem when determining
1 the correct line to output the token on: the position attached to a
1 token is fairly useless if the token came from a macro expansion.  All
1 tokens on a logical line should be output on its first physical line, so
1 the token's reported location is also wrong if it is part of a physical
1 line other than the first.
1 
1    To solve these issues, cpplib provides a callback that is generated
1 whenever it lexes a preprocessing token that starts a new logical line
1 other than a directive.  It passes this token (which may be a 'CPP_EOF'
1 token indicating the end of the translation unit) to the callback
1 routine, which can then use the line and column of this token to produce
1 correct output.
1 
1 Representation of line numbers
1 ==============================
1 
1 As mentioned above, cpplib stores with each token the line number that
1 it was lexed on.  In fact, this number is not the number of the line in
1 the source file, but instead bears more resemblance to the number of the
1 line in the translation unit.
1 
1    The preprocessor maintains a monotonic increasing line count, which
1 is incremented at every new line character (and also at the end of any
1 buffer that does not end in a new line).  Since a line number of zero is
1 useful to indicate certain special states and conditions, this variable
1 starts counting from one.
1 
1    This variable therefore uniquely enumerates each line in the
1 translation unit.  With some simple infrastructure, it is straight
1 forward to map from this to the original source file and line number
1 pair, saving space whenever line number information needs to be saved.
1 The code the implements this mapping lies in the files 'line-map.c' and
1 'line-map.h'.
1 
1    Command-line macros and assertions are implemented by pushing a
1 buffer containing the right hand side of an equivalent '#define' or
1 '#assert' directive.  Some built-in macros are handled similarly.  Since
1 these are all processed before the first line of the main input file, it
1 will typically have an assigned line closer to twenty than to one.
1