cppinternals: Hash Nodes

1 
1 Hash Nodes
1 **********
1 
1 When cpplib encounters an "identifier", it generates a hash code for it
1 and stores it in the hash table.  By "identifier" we mean tokens with
1 type 'CPP_NAME'; this includes identifiers in the usual C sense, as well
1 as keywords, directive names, macro names and so on.  For example, all
1 of 'pragma', 'int', 'foo' and '__GNUC__' are identifiers and hashed when
1 lexed.
1 
1    Each node in the hash table contain various information about the
1 identifier it represents.  For example, its length and type.  At any one
1 time, each identifier falls into exactly one of three categories:
1 
1    * Macros
1 
1      These have been declared to be macros, either on the command line
1      or with '#define'.  A few, such as '__TIME__' are built-ins entered
1      in the hash table during initialization.  The hash node for a
1      normal macro points to a structure with more information about the
1      macro, such as whether it is function-like, how many arguments it
1      takes, and its expansion.  Built-in macros are flagged as special,
1      and instead contain an enum indicating which of the various
1      built-in macros it is.
1 
1    * Assertions
1 
1      Assertions are in a separate namespace to macros.  To enforce this,
1      cpp actually prepends a '#' character before hashing and entering
1      it in the hash table.  An assertion's node points to a chain of
1      answers to that assertion.
1 
1    * Void
1 
1      Everything else falls into this category--an identifier that is not
1      currently a macro, or a macro that has since been undefined with
1      '#undef'.
1 
1      When preprocessing C++, this category also includes the named
1      operators, such as 'xor'.  In expressions these behave like the
1      operators they represent, but in contexts where the spelling of a
1      token matters they are spelt differently.  This spelling
1      distinction is relevant when they are operands of the stringizing
1      and pasting macro operators '#' and '##'.  Named operator hash
1      nodes are flagged, both to catch the spelling distinction and to
1      prevent them from being defined as macros.
1 
1    The same identifiers share the same hash node.  Since each identifier
1 token, after lexing, contains a pointer to its hash node, this is used
1 to provide rapid lookup of various information.  For example, when
1 parsing a '#define' statement, CPP flags each argument's identifier hash
1 node with the index of that argument.  This makes duplicated argument
1 checking an O(1) operation for each argument.  Similarly, for each
1 identifier in the macro's expansion, lookup to see if it is an argument,
1 and which argument it is, is also an O(1) operation.  Further, each
1 directive name, such as 'endif', has an associated directive enum stored
1 in its hash node, so that directive lookup is also O(1).
1