gccint: Profile information

1 
1 15.3 Profile information
1 ========================
1 
1 In many cases a compiler must make a choice whether to trade speed in
1 one part of code for speed in another, or to trade code size for code
1 speed.  In such cases it is useful to know information about how often
1 some given block will be executed.  That is the purpose for maintaining
1 profile within the flow graph.  GCC can handle profile information
1 obtained through "profile feedback", but it can also estimate branch
1 probabilities based on statics and heuristics.
1 
1  The feedback based profile is produced by compiling the program with
1 instrumentation, executing it on a train run and reading the numbers of
1 executions of basic blocks and edges back to the compiler while
1 re-compiling the program to produce the final executable.  This method
1 provides very accurate information about where a program spends most of
1 its time on the train run.  Whether it matches the average run of course
1 depends on the choice of train data set, but several studies have shown
1 that the behavior of a program usually changes just marginally over
1 different data sets.
1 
1  When profile feedback is not available, the compiler may be asked to
1 attempt to predict the behavior of each branch in the program using a
1 set of heuristics (see 'predict.def' for details) and compute estimated
1 frequencies of each basic block by propagating the probabilities over
1 the graph.
1 
1  Each 'basic_block' contains two integer fields to represent profile
1 information: 'frequency' and 'count'.  The 'frequency' is an estimation
1 how often is basic block executed within a function.  It is represented
1 as an integer scaled in the range from 0 to 'BB_FREQ_BASE'.  The most
1 frequently executed basic block in function is initially set to
1 'BB_FREQ_BASE' and the rest of frequencies are scaled accordingly.
1 During optimization, the frequency of the most frequent basic block can
1 both decrease (for instance by loop unrolling) or grow (for instance by
1 cross-jumping optimization), so scaling sometimes has to be performed
1 multiple times.
1 
1  The 'count' contains hard-counted numbers of execution measured during
1 training runs and is nonzero only when profile feedback is available.
1 This value is represented as the host's widest integer (typically a 64
1 bit integer) of the special type 'gcov_type'.
1 
1  Most optimization passes can use only the frequency information of a
1 basic block, but a few passes may want to know hard execution counts.
1 The frequencies should always match the counts after scaling, however
1 during updating of the profile information numerical error may
1 accumulate into quite large errors.
1 
1  Each edge also contains a branch probability field: an integer in the
1 range from 0 to 'REG_BR_PROB_BASE'.  It represents probability of
1 passing control from the end of the 'src' basic block to the 'dest'
1 basic block, i.e. the probability that control will flow along this
1 edge.  The 'EDGE_FREQUENCY' macro is available to compute how frequently
1 a given edge is taken.  There is a 'count' field for each edge as well,
1 representing same information as for a basic block.
1 
1  The basic block frequencies are not represented in the instruction
1 stream, but in the RTL representation the edge frequencies are
1 represented for conditional jumps (via the 'REG_BR_PROB' macro) since
1 they are used when instructions are output to the assembly file and the
1 flow graph is no longer maintained.
1 
1  The probability that control flow arrives via a given edge to its
1 destination basic block is called "reverse probability" and is not
1 directly represented, but it may be easily computed from frequencies of
1 basic blocks.
1 
1  Updating profile information is a delicate task that can unfortunately
1 not be easily integrated with the CFG manipulation API.  Many of the
1 functions and hooks to modify the CFG, such as
1 'redirect_edge_and_branch', do not have enough information to easily
1 update the profile, so updating it is in the majority of cases left up
1 to the caller.  It is difficult to uncover bugs in the profile updating
1 code, because they manifest themselves only by producing worse code, and
1 checking profile consistency is not possible because of numeric error
1 accumulation.  Hence special attention needs to be given to this issue
1 in each pass that modifies the CFG.
1 
1  It is important to point out that 'REG_BR_PROB_BASE' and 'BB_FREQ_BASE'
1 are both set low enough to be possible to compute second power of any
1 frequency or probability in the flow graph, it is not possible to even
1 square the 'count' field, as modern CPUs are fast enough to execute
1 $2^32$ operations quickly.
1