gccint: Maintaining the CFG

1 
1 15.4 Maintaining the CFG
1 ========================
1 
1 An important task of each compiler pass is to keep both the control flow
1 graph and all profile information up-to-date.  Reconstruction of the
1 control flow graph after each pass is not an option, since it may be
1 very expensive and lost profile information cannot be reconstructed at
1 all.
1 
1  GCC has two major intermediate representations, and both use the
1 'basic_block' and 'edge' data types to represent control flow.  Both
1 representations share as much of the CFG maintenance code as possible.
1 For each representation, a set of "hooks" is defined so that each
1 representation can provide its own implementation of CFG manipulation
1 routines when necessary.  These hooks are defined in 'cfghooks.h'.
1 There are hooks for almost all common CFG manipulations, including block
1 splitting and merging, edge redirection and creating and deleting basic
1 blocks.  These hooks should provide everything you need to maintain and
1 manipulate the CFG in both the RTL and 'GIMPLE' representation.
1 
1  At the moment, the basic block boundaries are maintained transparently
1 when modifying instructions, so there rarely is a need to move them
1 manually (such as in case someone wants to output instruction outside
1 basic block explicitly).
1 
1  In the RTL representation, each instruction has a 'BLOCK_FOR_INSN'
1 value that represents pointer to the basic block that contains the
1 instruction.  In the 'GIMPLE' representation, the function 'gimple_bb'
1 returns a pointer to the basic block containing the queried statement.
1 
1  When changes need to be applied to a function in its 'GIMPLE'
1 representation, "GIMPLE statement iterators" should be used.  These
1 iterators provide an integrated abstraction of the flow graph and the
1 instruction stream.  Block statement iterators are constructed using the
1 'gimple_stmt_iterator' data structure and several modifiers are
1 available, including the following:
1 
1 'gsi_start'
1      This function initializes a 'gimple_stmt_iterator' that points to
1      the first non-empty statement in a basic block.
1 
1 'gsi_last'
1      This function initializes a 'gimple_stmt_iterator' that points to
1      the last statement in a basic block.
1 
1 'gsi_end_p'
1      This predicate is 'true' if a 'gimple_stmt_iterator' represents the
1      end of a basic block.
1 
1 'gsi_next'
1      This function takes a 'gimple_stmt_iterator' and makes it point to
1      its successor.
1 
1 'gsi_prev'
1      This function takes a 'gimple_stmt_iterator' and makes it point to
1      its predecessor.
1 
1 'gsi_insert_after'
1      This function inserts a statement after the 'gimple_stmt_iterator'
1      passed in.  The final parameter determines whether the statement
1      iterator is updated to point to the newly inserted statement, or
1      left pointing to the original statement.
1 
1 'gsi_insert_before'
1      This function inserts a statement before the 'gimple_stmt_iterator'
1      passed in.  The final parameter determines whether the statement
1      iterator is updated to point to the newly inserted statement, or
1      left pointing to the original statement.
1 
1 'gsi_remove'
1      This function removes the 'gimple_stmt_iterator' passed in and
1      rechains the remaining statements in a basic block, if any.
1 
1  In the RTL representation, the macros 'BB_HEAD' and 'BB_END' may be
1 used to get the head and end 'rtx' of a basic block.  No abstract
1 iterators are defined for traversing the insn chain, but you can just
1 use 'NEXT_INSN' and 'PREV_INSN' instead.  ⇒Insns.
1 
1  Usually a code manipulating pass simplifies the instruction stream and
1 the flow of control, possibly eliminating some edges.  This may for
1 example happen when a conditional jump is replaced with an unconditional
1 jump.  Updating of edges is not transparent and each optimization pass
1 is required to do so manually.  However only few cases occur in
1 practice.  The pass may call 'purge_dead_edges' on a given basic block
1 to remove superfluous edges, if any.
1 
1  Another common scenario is redirection of branch instructions, but this
1 is best modeled as redirection of edges in the control flow graph and
1 thus use of 'redirect_edge_and_branch' is preferred over more low level
1 functions, such as 'redirect_jump' that operate on RTL chain only.  The
1 CFG hooks defined in 'cfghooks.h' should provide the complete API
1 required for manipulating and maintaining the CFG.
1 
1  It is also possible that a pass has to insert control flow instruction
1 into the middle of a basic block, thus creating an entry point in the
1 middle of the basic block, which is impossible by definition: The block
1 must be split to make sure it only has one entry point, i.e. the head of
1 the basic block.  The CFG hook 'split_block' may be used when an
1 instruction in the middle of a basic block has to become the target of a
1 jump or branch instruction.
1 
1  For a global optimizer, a common operation is to split edges in the
1 flow graph and insert instructions on them.  In the RTL representation,
1 this can be easily done using the 'insert_insn_on_edge' function that
1 emits an instruction "on the edge", caching it for a later
1 'commit_edge_insertions' call that will take care of moving the inserted
1 instructions off the edge into the instruction stream contained in a
1 basic block.  This includes the creation of new basic blocks where
1 needed.  In the 'GIMPLE' representation, the equivalent functions are
1 'gsi_insert_on_edge' which inserts a block statement iterator on an
1 edge, and 'gsi_commit_edge_inserts' which flushes the instruction to
1 actual instruction stream.
1 
1  While debugging the optimization pass, the 'verify_flow_info' function
1 may be useful to find bugs in the control flow graph updating code.
1