gccint: Basic Blocks

1 
1 15.1 Basic Blocks
1 =================
1 
1 A basic block is a straight-line sequence of code with only one entry
1 point and only one exit.  In GCC, basic blocks are represented using the
1 'basic_block' data type.
1 
1  Special basic blocks represent possible entry and exit points of a
1 function.  These blocks are called 'ENTRY_BLOCK_PTR' and
1 'EXIT_BLOCK_PTR'.  These blocks do not contain any code.
1 
1  The 'BASIC_BLOCK' array contains all basic blocks in an unspecified
1 order.  Each 'basic_block' structure has a field that holds a unique
1 integer identifier 'index' that is the index of the block in the
1 'BASIC_BLOCK' array.  The total number of basic blocks in the function
1 is 'n_basic_blocks'.  Both the basic block indices and the total number
1 of basic blocks may vary during the compilation process, as passes
1 reorder, create, duplicate, and destroy basic blocks.  The index for any
1 block should never be greater than 'last_basic_block'.  The indices 0
1 and 1 are special codes reserved for 'ENTRY_BLOCK' and 'EXIT_BLOCK', the
1 indices of 'ENTRY_BLOCK_PTR' and 'EXIT_BLOCK_PTR'.
1 
1  Two pointer members of the 'basic_block' structure are the pointers
1 'next_bb' and 'prev_bb'.  These are used to keep doubly linked chain of
1 basic blocks in the same order as the underlying instruction stream.
1 The chain of basic blocks is updated transparently by the provided API
1 for manipulating the CFG.  The macro 'FOR_EACH_BB' can be used to visit
1 all the basic blocks in lexicographical order, except 'ENTRY_BLOCK' and
1 'EXIT_BLOCK'.  The macro 'FOR_ALL_BB' also visits all basic blocks in
1 lexicographical order, including 'ENTRY_BLOCK' and 'EXIT_BLOCK'.
1 
1  The functions 'post_order_compute' and 'inverted_post_order_compute'
1 can be used to compute topological orders of the CFG. The orders are
1 stored as vectors of basic block indices.  The 'BASIC_BLOCK' array can
1 be used to iterate each basic block by index.  Dominator traversals are
1 also possible using 'walk_dominator_tree'.  Given two basic blocks A and
1 B, block A dominates block B if A is _always_ executed before B.
1 
1  Each 'basic_block' also contains pointers to the first instruction (the
1 "head") and the last instruction (the "tail") or "end" of the
1 instruction stream contained in a basic block.  In fact, since the
1 'basic_block' data type is used to represent blocks in both major
1 intermediate representations of GCC ('GIMPLE' and RTL), there are
1 pointers to the head and end of a basic block for both representations,
1 stored in intermediate representation specific data in the 'il' field of
1 'struct basic_block_def'.
1 
1  For RTL, these pointers are 'BB_HEAD' and 'BB_END'.
1 
1  In the RTL representation of a function, the instruction stream
1 contains not only the "real" instructions, but also "notes" or "insn
1 notes" (to distinguish them from "reg notes").  Any function that moves
1 or duplicates the basic blocks needs to take care of updating of these
1 notes.  Many of these notes expect that the instruction stream consists
1 of linear regions, so updating can sometimes be tedious.  All types of
1 insn notes are defined in 'insn-notes.def'.
1 
1  In the RTL function representation, the instructions contained in a
1 basic block always follow a 'NOTE_INSN_BASIC_BLOCK', but zero or more
1 'CODE_LABEL' nodes can precede the block note.  A basic block ends with
1 a control flow instruction or with the last instruction before the next
1 'CODE_LABEL' or 'NOTE_INSN_BASIC_BLOCK'.  By definition, a 'CODE_LABEL'
1 cannot appear in the middle of the instruction stream of a basic block.
1 
1  In addition to notes, the jump table vectors are also represented as
1 "pseudo-instructions" inside the insn stream.  These vectors never
1 appear in the basic block and should always be placed just after the
1 table jump instructions referencing them.  After removing the table-jump
1 it is often difficult to eliminate the code computing the address and
1 referencing the vector, so cleaning up these vectors is postponed until
1 after liveness analysis.  Thus the jump table vectors may appear in the
1 insn stream unreferenced and without any purpose.  Before any edge is
1 made "fall-thru", the existence of such construct in the way needs to be
1 checked by calling 'can_fallthru' function.
1 
1  For the 'GIMPLE' representation, the PHI nodes and statements contained
1 in a basic block are in a 'gimple_seq' pointed to by the basic block
1 intermediate language specific pointers.  Abstract containers and
1 iterators are used to access the PHI nodes and statements in a basic
1 blocks.  These iterators are called "GIMPLE statement iterators" (GSIs).
1 Grep for '^gsi' in the various 'gimple-*' and 'tree-*' files.  There is
1 a 'gimple_stmt_iterator' type for iterating over all kinds of statement,
1 and a 'gphi_iterator' subclass for iterating over PHI nodes.  The
1 following snippet will pretty-print all PHI nodes the statements of the
1 current function in the GIMPLE representation.
1 
1      basic_block bb;
1 
1      FOR_EACH_BB (bb)
1        {
1         gphi_iterator pi;
1         gimple_stmt_iterator si;
1 
1         for (pi = gsi_start_phis (bb); !gsi_end_p (pi); gsi_next (&pi))
1           {
1             gphi *phi = pi.phi ();
1             print_gimple_stmt (dump_file, phi, 0, TDF_SLIM);
1           }
1         for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
1           {
1             gimple stmt = gsi_stmt (si);
1             print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
1           }
1        }
1