cppinternals: Files

1 
1 File Handling
1 *************
1 
1 Fairly obviously, the file handling code of cpplib resides in the file
1 'files.c'.  It takes care of the details of file searching, opening,
1 reading and caching, for both the main source file and all the headers
1 it recursively includes.
1 
1    The basic strategy is to minimize the number of system calls.  On
1 many systems, the basic 'open ()' and 'fstat ()' system calls can be
1 quite expensive.  For every '#include'-d file, we need to try all the
1 directories in the search path until we find a match.  Some projects,
1 such as glibc, pass twenty or thirty include paths on the command line,
1 so this can rapidly become time consuming.
1 
1    For a header file we have not encountered before we have little
1 choice but to do this.  However, it is often the case that the same
1 headers are repeatedly included, and in these cases we try to avoid
1 repeating the filesystem queries whilst searching for the correct file.
1 
1    For each file we try to open, we store the constructed path in a
1 splay tree.  This path first undergoes simplification by the function
1 '_cpp_simplify_pathname'.  For example, '/usr/include/bits/../foo.h' is
1 simplified to '/usr/include/foo.h' before we enter it in the splay tree
1 and try to 'open ()' the file.  CPP will then find subsequent uses of
1 'foo.h', even as '/usr/include/foo.h', in the splay tree and save system
1 calls.
1 
1    Further, it is likely the file contents have also been cached, saving
1 a 'read ()' system call.  We don't bother caching the contents of header
1 files that are re-inclusion protected, and whose re-inclusion macro is
1 defined when we leave the header file for the first time.  If the host
1 supports it, we try to map suitably large files into memory, rather than
1 reading them in directly.
1 
1    The include paths are internally stored on a null-terminated
1 singly-linked list, starting with the '"header.h"' directory search
1 chain, which then links into the '<header.h>' directory chain.
1 
1    Files included with the '<foo.h>' syntax start the lookup directly in
1 the second half of this chain.  However, files included with the
1 '"foo.h"' syntax start at the beginning of the chain, but with one extra
1 directory prepended.  This is the directory of the current file; the one
1 containing the '#include' directive.  Prepending this directory on a
1 per-file basis is handled by the function 'search_from'.
1 
1    Note that a header included with a directory component, such as
1 '#include "mydir/foo.h"' and opened as '/usr/local/include/mydir/foo.h',
1 will have the complete path minus the basename 'foo.h' as the current
1 directory.
1 
1    Enough information is stored in the splay tree that CPP can
1 immediately tell whether it can skip the header file because of the
1 multiple include optimization, whether the file didn't exist or couldn't
1 be opened for some reason, or whether the header was flagged not to be
1 re-used, as it is with the obsolete '#import' directive.
1 
1    For the benefit of MS-DOS filesystems with an 8.3 filename
1 limitation, CPP offers the ability to treat various include file names
1 as aliases for the real header files with shorter names.  The map from
1 one to the other is found in a special file called 'header.gcc', stored
1 in the command line (or system) include directories to which the mapping
1 applies.  This may be higher up the directory tree than the full path to
1 the file minus the base name.
1