gccint: RTL passes
1
1 9.5 RTL passes
1 ==============
1
1 The following briefly describes the RTL generation and optimization
1 passes that are run after the Tree optimization passes.
1
1 * RTL generation
1
1 The source files for RTL generation include 'stmt.c', 'calls.c',
1 'expr.c', 'explow.c', 'expmed.c', 'function.c', 'optabs.c' and
1 'emit-rtl.c'. Also, the file 'insn-emit.c', generated from the
1 machine description by the program 'genemit', is used in this pass.
1 The header file 'expr.h' is used for communication within this
1 pass.
1
1 The header files 'insn-flags.h' and 'insn-codes.h', generated from
1 the machine description by the programs 'genflags' and 'gencodes',
1 tell this pass which standard names are available for use and which
1 patterns correspond to them.
1
1 * Generation of exception landing pads
1
1 This pass generates the glue that handles communication between the
1 exception handling library routines and the exception handlers
1 within the function. Entry points in the function that are invoked
1 by the exception handling library are called "landing pads". The
1 code for this pass is located in 'except.c'.
1
1 * Control flow graph cleanup
1
1 This pass removes unreachable code, simplifies jumps to next, jumps
1 to jump, jumps across jumps, etc. The pass is run multiple times.
1 For historical reasons, it is occasionally referred to as the "jump
1 optimization pass". The bulk of the code for this pass is in
1 'cfgcleanup.c', and there are support routines in 'cfgrtl.c' and
1 'jump.c'.
1
1 * Forward propagation of single-def values
1
1 This pass attempts to remove redundant computation by substituting
1 variables that come from a single definition, and seeing if the
1 result can be simplified. It performs copy propagation and
1 addressing mode selection. The pass is run twice, with values
1 being propagated into loops only on the second run. The code is
1 located in 'fwprop.c'.
1
1 * Common subexpression elimination
1
1 This pass removes redundant computation within basic blocks, and
1 optimizes addressing modes based on cost. The pass is run twice.
1 The code for this pass is located in 'cse.c'.
1
1 * Global common subexpression elimination
1
1 This pass performs two different types of GCSE depending on whether
1 you are optimizing for size or not (LCM based GCSE tends to
1 increase code size for a gain in speed, while Morel-Renvoise based
1 GCSE does not). When optimizing for size, GCSE is done using
1 Morel-Renvoise Partial Redundancy Elimination, with the exception
1 that it does not try to move invariants out of loops--that is left
1 to the loop optimization pass. If MR PRE GCSE is done, code
1 hoisting (aka unification) is also done, as well as load motion.
1 If you are optimizing for speed, LCM (lazy code motion) based GCSE
1 is done. LCM is based on the work of Knoop, Ruthing, and Steffen.
1 LCM based GCSE also does loop invariant code motion. We also
1 perform load and store motion when optimizing for speed.
1 Regardless of which type of GCSE is used, the GCSE pass also
1 performs global constant and copy propagation. The source file for
1 this pass is 'gcse.c', and the LCM routines are in 'lcm.c'.
1
1 * Loop optimization
1
1 This pass performs several loop related optimizations. The source
1 files 'cfgloopanal.c' and 'cfgloopmanip.c' contain generic loop
1 analysis and manipulation code. Initialization and finalization of
1 loop structures is handled by 'loop-init.c'. A loop invariant
1 motion pass is implemented in 'loop-invariant.c'. Basic block
1 level optimizations--unrolling, and peeling loops-- are implemented
1 in 'loop-unroll.c'. Replacing of the exit condition of loops by
1 special machine-dependent instructions is handled by
1 'loop-doloop.c'.
1
1 * Jump bypassing
1
1 This pass is an aggressive form of GCSE that transforms the control
1 flow graph of a function by propagating constants into conditional
1 branch instructions. The source file for this pass is 'gcse.c'.
1
1 * If conversion
1
1 This pass attempts to replace conditional branches and surrounding
1 assignments with arithmetic, boolean value producing comparison
1 instructions, and conditional move instructions. In the very last
1 invocation after reload/LRA, it will generate predicated
1 instructions when supported by the target. The code is located in
1 'ifcvt.c'.
1
1 * Web construction
1
1 This pass splits independent uses of each pseudo-register. This
1 can improve effect of the other transformation, such as CSE or
1 register allocation. The code for this pass is located in 'web.c'.
1
1 * Instruction combination
1
1 This pass attempts to combine groups of two or three instructions
1 that are related by data flow into single instructions. It
1 combines the RTL expressions for the instructions by substitution,
1 simplifies the result using algebra, and then attempts to match the
1 result against the machine description. The code is located in
1 'combine.c'.
1
1 * Mode switching optimization
1
1 This pass looks for instructions that require the processor to be
1 in a specific "mode" and minimizes the number of mode changes
1 required to satisfy all users. What these modes are, and what they
1 apply to are completely target-specific. The code for this pass is
1 located in 'mode-switching.c'.
1
1 * Modulo scheduling
1
1 This pass looks at innermost loops and reorders their instructions
1 by overlapping different iterations. Modulo scheduling is
1 performed immediately before instruction scheduling. The code for
1 this pass is located in 'modulo-sched.c'.
1
1 * Instruction scheduling
1
1 This pass looks for instructions whose output will not be available
1 by the time that it is used in subsequent instructions. Memory
1 loads and floating point instructions often have this behavior on
1 RISC machines. It re-orders instructions within a basic block to
1 try to separate the definition and use of items that otherwise
1 would cause pipeline stalls. This pass is performed twice, before
1 and after register allocation. The code for this pass is located
1 in 'haifa-sched.c', 'sched-deps.c', 'sched-ebb.c', 'sched-rgn.c'
1 and 'sched-vis.c'.
1
1 * Register allocation
1
1 These passes make sure that all occurrences of pseudo registers are
1 eliminated, either by allocating them to a hard register, replacing
1 them by an equivalent expression (e.g. a constant) or by placing
1 them on the stack. This is done in several subpasses:
1
1 * The integrated register allocator (IRA). It is called
1 integrated because coalescing, register live range splitting,
1 and hard register preferencing are done on-the-fly during
1 coloring. It also has better integration with the reload/LRA
1 pass. Pseudo-registers spilled by the allocator or the
1 reload/LRA have still a chance to get hard-registers if the
1 reload/LRA evicts some pseudo-registers from hard-registers.
1 The allocator helps to choose better pseudos for spilling
1 based on their live ranges and to coalesce stack slots
1 allocated for the spilled pseudo-registers. IRA is a regional
1 register allocator which is transformed into Chaitin-Briggs
1 allocator if there is one region. By default, IRA chooses
1 regions using register pressure but the user can force it to
1 use one region or regions corresponding to all loops.
1
1 Source files of the allocator are 'ira.c', 'ira-build.c',
1 'ira-costs.c', 'ira-conflicts.c', 'ira-color.c', 'ira-emit.c',
1 'ira-lives', plus header files 'ira.h' and 'ira-int.h' used
1 for the communication between the allocator and the rest of
1 the compiler and between the IRA files.
1
1 * Reloading. This pass renumbers pseudo registers with the
1 hardware registers numbers they were allocated. Pseudo
1 registers that did not get hard registers are replaced with
1 stack slots. Then it finds instructions that are invalid
1 because a value has failed to end up in a register, or has
1 ended up in a register of the wrong kind. It fixes up these
1 instructions by reloading the problematical values temporarily
1 into registers. Additional instructions are generated to do
1 the copying.
1
1 The reload pass also optionally eliminates the frame pointer
1 and inserts instructions to save and restore call-clobbered
1 registers around calls.
1
1 Source files are 'reload.c' and 'reload1.c', plus the header
1 'reload.h' used for communication between them.
1
1 * This pass is a modern replacement of the reload pass. Source
1 files are 'lra.c', 'lra-assign.c', 'lra-coalesce.c',
1 'lra-constraints.c', 'lra-eliminations.c', 'lra-lives.c',
1 'lra-remat.c', 'lra-spills.c', the header 'lra-int.h' used for
1 communication between them, and the header 'lra.h' used for
1 communication between LRA and the rest of compiler.
1
1 Unlike the reload pass, intermediate LRA decisions are
1 reflected in RTL as much as possible. This reduces the number
1 of target-dependent macros and hooks, leaving instruction
1 constraints as the primary source of control.
1
1 LRA is run on targets for which TARGET_LRA_P returns true.
1
1 * Basic block reordering
1
1 This pass implements profile guided code positioning. If profile
1 information is not available, various types of static analysis are
1 performed to make the predictions normally coming from the profile
1 feedback (IE execution frequency, branch probability, etc). It is
1 implemented in the file 'bb-reorder.c', and the various prediction
1 routines are in 'predict.c'.
1
1 * Variable tracking
1
1 This pass computes where the variables are stored at each position
1 in code and generates notes describing the variable locations to
1 RTL code. The location lists are then generated according to these
1 notes to debug information if the debugging information format
1 supports location lists. The code is located in 'var-tracking.c'.
1
1 * Delayed branch scheduling
1
1 This optional pass attempts to find instructions that can go into
1 the delay slots of other instructions, usually jumps and calls.
1 The code for this pass is located in 'reorg.c'.
1
1 * Branch shortening
1
1 On many RISC machines, branch instructions have a limited range.
1 Thus, longer sequences of instructions must be used for long
1 branches. In this pass, the compiler figures out what how far each
1 instruction will be from each other instruction, and therefore
1 whether the usual instructions, or the longer sequences, must be
1 used for each branch. The code for this pass is located in
1 'final.c'.
1
1 * Register-to-stack conversion
1
1 Conversion from usage of some hard registers to usage of a register
1 stack may be done at this point. Currently, this is supported only
1 for the floating-point registers of the Intel 80387 coprocessor.
1 The code for this pass is located in 'reg-stack.c'.
1
1 * Final
1
1 This pass outputs the assembler code for the function. The source
1 files are 'final.c' plus 'insn-output.c'; the latter is generated
1 automatically from the machine description by the tool 'genoutput'.
1 The header file 'conditions.h' is used for communication between
1 these files.
1
1 * Debugging information output
1
1 This is run after final because it must output the stack slot
1 offsets for pseudo registers that did not get hard registers.
1 Source files are 'dbxout.c' for DBX symbol table format,
1 'dwarfout.c' for DWARF symbol table format, files 'dwarf2out.c' and
1 'dwarf2asm.c' for DWARF2 symbol table format, and 'vmsdbgout.c' for
1 VMS debug symbol table format.
1