cppinternals: Hash Nodes
1
1 Hash Nodes
1 **********
1
1 When cpplib encounters an "identifier", it generates a hash code for it
1 and stores it in the hash table. By "identifier" we mean tokens with
1 type 'CPP_NAME'; this includes identifiers in the usual C sense, as well
1 as keywords, directive names, macro names and so on. For example, all
1 of 'pragma', 'int', 'foo' and '__GNUC__' are identifiers and hashed when
1 lexed.
1
1 Each node in the hash table contain various information about the
1 identifier it represents. For example, its length and type. At any one
1 time, each identifier falls into exactly one of three categories:
1
1 * Macros
1
1 These have been declared to be macros, either on the command line
1 or with '#define'. A few, such as '__TIME__' are built-ins entered
1 in the hash table during initialization. The hash node for a
1 normal macro points to a structure with more information about the
1 macro, such as whether it is function-like, how many arguments it
1 takes, and its expansion. Built-in macros are flagged as special,
1 and instead contain an enum indicating which of the various
1 built-in macros it is.
1
1 * Assertions
1
1 Assertions are in a separate namespace to macros. To enforce this,
1 cpp actually prepends a '#' character before hashing and entering
1 it in the hash table. An assertion's node points to a chain of
1 answers to that assertion.
1
1 * Void
1
1 Everything else falls into this category--an identifier that is not
1 currently a macro, or a macro that has since been undefined with
1 '#undef'.
1
1 When preprocessing C++, this category also includes the named
1 operators, such as 'xor'. In expressions these behave like the
1 operators they represent, but in contexts where the spelling of a
1 token matters they are spelt differently. This spelling
1 distinction is relevant when they are operands of the stringizing
1 and pasting macro operators '#' and '##'. Named operator hash
1 nodes are flagged, both to catch the spelling distinction and to
1 prevent them from being defined as macros.
1
1 The same identifiers share the same hash node. Since each identifier
1 token, after lexing, contains a pointer to its hash node, this is used
1 to provide rapid lookup of various information. For example, when
1 parsing a '#define' statement, CPP flags each argument's identifier hash
1 node with the index of that argument. This makes duplicated argument
1 checking an O(1) operation for each argument. Similarly, for each
1 identifier in the macro's expansion, lookup to see if it is an argument,
1 and which argument it is, is also an O(1) operation. Further, each
1 directive name, such as 'endif', has an associated directive enum stored
1 in its hash node, so that directive lookup is also O(1).
1