gprof: Internals

1 
1 9.3 'gprof''s Internal Operation
1 ================================
1 
1 Like most programs, 'gprof' begins by processing its options.  During
1 this stage, it may building its symspec list ('sym_ids.c:sym_id_add'),
1 if options are specified which use symspecs.  'gprof' maintains a single
1 linked list of symspecs, which will eventually get turned into 12 symbol
1 tables, organized into six include/exclude pairs--one pair each for the
1 flat profile (INCL_FLAT/EXCL_FLAT), the call graph arcs
1 (INCL_ARCS/EXCL_ARCS), printing in the call graph
1 (INCL_GRAPH/EXCL_GRAPH), timing propagation in the call graph
1 (INCL_TIME/EXCL_TIME), the annotated source listing
1 (INCL_ANNO/EXCL_ANNO), and the execution count listing
1 (INCL_EXEC/EXCL_EXEC).
1 
1    After option processing, 'gprof' finishes building the symspec list
1 by adding all the symspecs in 'default_excluded_list' to the exclude
1 lists EXCL_TIME and EXCL_GRAPH, and if line-by-line profiling is
1 specified, EXCL_FLAT as well.  These default excludes are not added to
1 EXCL_ANNO, EXCL_ARCS, and EXCL_EXEC.
1 
1    Next, the BFD library is called to open the object file, verify that
1 it is an object file, and read its symbol table ('core.c:core_init'),
1 using 'bfd_canonicalize_symtab' after mallocing an appropriately sized
1 array of symbols.  At this point, function mappings are read (if the
1 '--file-ordering' option has been specified), and the core text space is
1 read into memory (if the '-c' option was given).
1 
1    'gprof''s own symbol table, an array of Sym structures, is now built.
1 This is done in one of two ways, by one of two routines, depending on
1 whether line-by-line profiling ('-l' option) has been enabled.  For
1 normal profiling, the BFD canonical symbol table is scanned.  For
1 line-by-line profiling, every text space address is examined, and a new
1 symbol table entry gets created every time the line number changes.  In
1 either case, two passes are made through the symbol table--one to count
1 the size of the symbol table required, and the other to actually read
1 the symbols.  In between the two passes, a single array of type 'Sym' is
1 created of the appropriate length.  Finally, 'symtab.c:symtab_finalize'
1 is called to sort the symbol table and remove duplicate entries (entries
1 with the same memory address).
1 
1    The symbol table must be a contiguous array for two reasons.  First,
1 the 'qsort' library function (which sorts an array) will be used to sort
1 the symbol table.  Also, the symbol lookup routine
1 ('symtab.c:sym_lookup'), which finds symbols based on memory address,
1 uses a binary search algorithm which requires the symbol table to be a
1 sorted array.  Function symbols are indicated with an 'is_func' flag.
1 Line number symbols have no special flags set.  Additionally, a symbol
1 can have an 'is_static' flag to indicate that it is a local symbol.
1 
1    With the symbol table read, the symspecs can now be translated into
1 Syms ('sym_ids.c:sym_id_parse').  Remember that a single symspec can
1 match multiple symbols.  An array of symbol tables ('syms') is created,
1 each entry of which is a symbol table of Syms to be included or excluded
1 from a particular listing.  The master symbol table and the symspecs are
1 examined by nested loops, and every symbol that matches a symspec is
1 inserted into the appropriate syms table.  This is done twice, once to
1 count the size of each required symbol table, and again to build the
1 tables, which have been malloced between passes.  From now on, to
1 determine whether a symbol is on an include or exclude symspec list,
1 'gprof' simply uses its standard symbol lookup routine on the
1 appropriate table in the 'syms' array.
1 
1    Now the profile data file(s) themselves are read
1 ('gmon_io.c:gmon_out_read'), first by checking for a new-style
1 'gmon.out' header, then assuming this is an old-style BSD 'gmon.out' if
1 the magic number test failed.
1 
1    New-style histogram records are read by 'hist.c:hist_read_rec'.  For
1 the first histogram record, allocate a memory array to hold all the
1 bins, and read them in.  When multiple profile data files (or files with
1 multiple histogram records) are read, the memory ranges of each pair of
1 histogram records must be either equal, or non-overlapping.  For each
1 pair of histogram records, the resolution (memory region size divided by
1 the number of bins) must be the same.  The time unit must be the same
1 for all histogram records.  If the above containts are met, all
1 histograms for the same memory range are merged.
1 
1    As each call graph record is read ('call_graph.c:cg_read_rec'), the
1 parent and child addresses are matched to symbol table entries, and a
1 call graph arc is created by 'cg_arcs.c:arc_add', unless the arc fails a
1 symspec check against INCL_ARCS/EXCL_ARCS. As each arc is added, a
1 linked list is maintained of the parent's child arcs, and of the child's
1 parent arcs.  Both the child's call count and the arc's call count are
1 incremented by the record's call count.
1 
1    Basic-block records are read ('basic_blocks.c:bb_read_rec'), but only
1 if line-by-line profiling has been selected.  Each basic-block address
1 is matched to a corresponding line symbol in the symbol table, and an
1 entry made in the symbol's bb_addr and bb_calls arrays.  Again, if
1 multiple basic-block records are present for the same address, the call
1 counts are cumulative.
1 
1    A gmon.sum file is dumped, if requested ('gmon_io.c:gmon_out_write').
1 
1    If histograms were present in the data files, assign them to symbols
1 ('hist.c:hist_assign_samples') by iterating over all the sample bins and
1 assigning them to symbols.  Since the symbol table is sorted in order of
1 ascending memory addresses, we can simple follow along in the symbol
1 table as we make our pass over the sample bins.  This step includes a
1 symspec check against INCL_FLAT/EXCL_FLAT. Depending on the histogram
1 scale factor, a sample bin may span multiple symbols, in which case a
1 fraction of the sample count is allocated to each symbol, proportional
1 to the degree of overlap.  This effect is rare for normal profiling, but
1 overlaps are more common during line-by-line profiling, and can cause
1 each of two adjacent lines to be credited with half a hit, for example.
1 
1    If call graph data is present, 'cg_arcs.c:cg_assemble' is called.
1 First, if '-c' was specified, a machine-dependent routine ('find_call')
1 scans through each symbol's machine code, looking for subroutine call
1 instructions, and adding them to the call graph with a zero call count.
1 A topological sort is performed by depth-first numbering all the symbols
1 ('cg_dfn.c:cg_dfn'), so that children are always numbered less than
1 their parents, then making a array of pointers into the symbol table and
1 sorting it into numerical order, which is reverse topological order
1 (children appear before parents).  Cycles are also detected at this
1 point, all members of which are assigned the same topological number.
1 Two passes are now made through this sorted array of symbol pointers.
1 The first pass, from end to beginning (parents to children), computes
1 the fraction of child time to propagate to each parent and a print flag.
1 The print flag reflects symspec handling of INCL_GRAPH/EXCL_GRAPH, with
1 a parent's include or exclude (print or no print) property being
1 propagated to its children, unless they themselves explicitly appear in
1 INCL_GRAPH or EXCL_GRAPH. A second pass, from beginning to end (children
1 to parents) actually propagates the timings along the call graph,
1 subject to a check against INCL_TIME/EXCL_TIME. With the print flag,
1 fractions, and timings now stored in the symbol structures, the
1 topological sort array is now discarded, and a new array of pointers is
1 assembled, this time sorted by propagated time.
1 
1    Finally, print the various outputs the user requested, which is now
1 fairly straightforward.  The call graph ('cg_print.c:cg_print') and flat
1 profile ('hist.c:hist_print') are regurgitations of values already
1 computed.  The annotated source listing
1 ('basic_blocks.c:print_annotated_source') uses basic-block information,
1 if present, to label each line of code with call counts, otherwise only
1 the function call counts are presented.
1 
1    The function ordering code is marginally well documented in the
1 source code itself ('cg_print.c').  Basically, the functions with the
1 most use and the most parents are placed first, followed by other
1 functions with the most use, followed by lower use functions, followed
1 by unused functions at the end.
1