gccint: RTL passes

1 
1 9.5 RTL passes
1 ==============
1 
1 The following briefly describes the RTL generation and optimization
1 passes that are run after the Tree optimization passes.
1 
1    * RTL generation
1 
1      The source files for RTL generation include 'stmt.c', 'calls.c',
1      'expr.c', 'explow.c', 'expmed.c', 'function.c', 'optabs.c' and
1      'emit-rtl.c'.  Also, the file 'insn-emit.c', generated from the
1      machine description by the program 'genemit', is used in this pass.
1      The header file 'expr.h' is used for communication within this
1      pass.
1 
1      The header files 'insn-flags.h' and 'insn-codes.h', generated from
1      the machine description by the programs 'genflags' and 'gencodes',
1      tell this pass which standard names are available for use and which
1      patterns correspond to them.
1 
1    * Generation of exception landing pads
1 
1      This pass generates the glue that handles communication between the
1      exception handling library routines and the exception handlers
1      within the function.  Entry points in the function that are invoked
1      by the exception handling library are called "landing pads".  The
1      code for this pass is located in 'except.c'.
1 
1    * Control flow graph cleanup
1 
1      This pass removes unreachable code, simplifies jumps to next, jumps
1      to jump, jumps across jumps, etc.  The pass is run multiple times.
1      For historical reasons, it is occasionally referred to as the "jump
1      optimization pass".  The bulk of the code for this pass is in
1      'cfgcleanup.c', and there are support routines in 'cfgrtl.c' and
1      'jump.c'.
1 
1    * Forward propagation of single-def values
1 
1      This pass attempts to remove redundant computation by substituting
1      variables that come from a single definition, and seeing if the
1      result can be simplified.  It performs copy propagation and
1      addressing mode selection.  The pass is run twice, with values
1      being propagated into loops only on the second run.  The code is
1      located in 'fwprop.c'.
1 
1    * Common subexpression elimination
1 
1      This pass removes redundant computation within basic blocks, and
1      optimizes addressing modes based on cost.  The pass is run twice.
1      The code for this pass is located in 'cse.c'.
1 
1    * Global common subexpression elimination
1 
1      This pass performs two different types of GCSE depending on whether
1      you are optimizing for size or not (LCM based GCSE tends to
1      increase code size for a gain in speed, while Morel-Renvoise based
1      GCSE does not).  When optimizing for size, GCSE is done using
1      Morel-Renvoise Partial Redundancy Elimination, with the exception
1      that it does not try to move invariants out of loops--that is left
1      to the loop optimization pass.  If MR PRE GCSE is done, code
1      hoisting (aka unification) is also done, as well as load motion.
1      If you are optimizing for speed, LCM (lazy code motion) based GCSE
1      is done.  LCM is based on the work of Knoop, Ruthing, and Steffen.
1      LCM based GCSE also does loop invariant code motion.  We also
1      perform load and store motion when optimizing for speed.
1      Regardless of which type of GCSE is used, the GCSE pass also
1      performs global constant and copy propagation.  The source file for
1      this pass is 'gcse.c', and the LCM routines are in 'lcm.c'.
1 
1    * Loop optimization
1 
1      This pass performs several loop related optimizations.  The source
1      files 'cfgloopanal.c' and 'cfgloopmanip.c' contain generic loop
1      analysis and manipulation code.  Initialization and finalization of
1      loop structures is handled by 'loop-init.c'.  A loop invariant
1      motion pass is implemented in 'loop-invariant.c'.  Basic block
1      level optimizations--unrolling, and peeling loops-- are implemented
1      in 'loop-unroll.c'.  Replacing of the exit condition of loops by
1      special machine-dependent instructions is handled by
1      'loop-doloop.c'.
1 
1    * Jump bypassing
1 
1      This pass is an aggressive form of GCSE that transforms the control
1      flow graph of a function by propagating constants into conditional
1      branch instructions.  The source file for this pass is 'gcse.c'.
1 
1    * If conversion
1 
1      This pass attempts to replace conditional branches and surrounding
1      assignments with arithmetic, boolean value producing comparison
1      instructions, and conditional move instructions.  In the very last
1      invocation after reload/LRA, it will generate predicated
1      instructions when supported by the target.  The code is located in
1      'ifcvt.c'.
1 
1    * Web construction
1 
1      This pass splits independent uses of each pseudo-register.  This
1      can improve effect of the other transformation, such as CSE or
1      register allocation.  The code for this pass is located in 'web.c'.
1 
1    * Instruction combination
1 
1      This pass attempts to combine groups of two or three instructions
1      that are related by data flow into single instructions.  It
1      combines the RTL expressions for the instructions by substitution,
1      simplifies the result using algebra, and then attempts to match the
1      result against the machine description.  The code is located in
1      'combine.c'.
1 
1    * Mode switching optimization
1 
1      This pass looks for instructions that require the processor to be
1      in a specific "mode" and minimizes the number of mode changes
1      required to satisfy all users.  What these modes are, and what they
1      apply to are completely target-specific.  The code for this pass is
1      located in 'mode-switching.c'.
1 
1    * Modulo scheduling
1 
1      This pass looks at innermost loops and reorders their instructions
1      by overlapping different iterations.  Modulo scheduling is
1      performed immediately before instruction scheduling.  The code for
1      this pass is located in 'modulo-sched.c'.
1 
1    * Instruction scheduling
1 
1      This pass looks for instructions whose output will not be available
1      by the time that it is used in subsequent instructions.  Memory
1      loads and floating point instructions often have this behavior on
1      RISC machines.  It re-orders instructions within a basic block to
1      try to separate the definition and use of items that otherwise
1      would cause pipeline stalls.  This pass is performed twice, before
1      and after register allocation.  The code for this pass is located
1      in 'haifa-sched.c', 'sched-deps.c', 'sched-ebb.c', 'sched-rgn.c'
1      and 'sched-vis.c'.
1 
1    * Register allocation
1 
1      These passes make sure that all occurrences of pseudo registers are
1      eliminated, either by allocating them to a hard register, replacing
1      them by an equivalent expression (e.g. a constant) or by placing
1      them on the stack.  This is done in several subpasses:
1 
1         * The integrated register allocator (IRA).  It is called
1           integrated because coalescing, register live range splitting,
1           and hard register preferencing are done on-the-fly during
1           coloring.  It also has better integration with the reload/LRA
1           pass.  Pseudo-registers spilled by the allocator or the
1           reload/LRA have still a chance to get hard-registers if the
1           reload/LRA evicts some pseudo-registers from hard-registers.
1           The allocator helps to choose better pseudos for spilling
1           based on their live ranges and to coalesce stack slots
1           allocated for the spilled pseudo-registers.  IRA is a regional
1           register allocator which is transformed into Chaitin-Briggs
1           allocator if there is one region.  By default, IRA chooses
1           regions using register pressure but the user can force it to
1           use one region or regions corresponding to all loops.
1 
1           Source files of the allocator are 'ira.c', 'ira-build.c',
1           'ira-costs.c', 'ira-conflicts.c', 'ira-color.c', 'ira-emit.c',
1           'ira-lives', plus header files 'ira.h' and 'ira-int.h' used
1           for the communication between the allocator and the rest of
1           the compiler and between the IRA files.
1 
1         * Reloading.  This pass renumbers pseudo registers with the
1           hardware registers numbers they were allocated.  Pseudo
1           registers that did not get hard registers are replaced with
1           stack slots.  Then it finds instructions that are invalid
1           because a value has failed to end up in a register, or has
1           ended up in a register of the wrong kind.  It fixes up these
1           instructions by reloading the problematical values temporarily
1           into registers.  Additional instructions are generated to do
1           the copying.
1 
1           The reload pass also optionally eliminates the frame pointer
1           and inserts instructions to save and restore call-clobbered
1           registers around calls.
1 
1           Source files are 'reload.c' and 'reload1.c', plus the header
1           'reload.h' used for communication between them.
1 
1         * This pass is a modern replacement of the reload pass.  Source
1           files are 'lra.c', 'lra-assign.c', 'lra-coalesce.c',
1           'lra-constraints.c', 'lra-eliminations.c', 'lra-lives.c',
1           'lra-remat.c', 'lra-spills.c', the header 'lra-int.h' used for
1           communication between them, and the header 'lra.h' used for
1           communication between LRA and the rest of compiler.
1 
1           Unlike the reload pass, intermediate LRA decisions are
1           reflected in RTL as much as possible.  This reduces the number
1           of target-dependent macros and hooks, leaving instruction
1           constraints as the primary source of control.
1 
1           LRA is run on targets for which TARGET_LRA_P returns true.
1 
1    * Basic block reordering
1 
1      This pass implements profile guided code positioning.  If profile
1      information is not available, various types of static analysis are
1      performed to make the predictions normally coming from the profile
1      feedback (IE execution frequency, branch probability, etc).  It is
1      implemented in the file 'bb-reorder.c', and the various prediction
1      routines are in 'predict.c'.
1 
1    * Variable tracking
1 
1      This pass computes where the variables are stored at each position
1      in code and generates notes describing the variable locations to
1      RTL code.  The location lists are then generated according to these
1      notes to debug information if the debugging information format
1      supports location lists.  The code is located in 'var-tracking.c'.
1 
1    * Delayed branch scheduling
1 
1      This optional pass attempts to find instructions that can go into
1      the delay slots of other instructions, usually jumps and calls.
1      The code for this pass is located in 'reorg.c'.
1 
1    * Branch shortening
1 
1      On many RISC machines, branch instructions have a limited range.
1      Thus, longer sequences of instructions must be used for long
1      branches.  In this pass, the compiler figures out what how far each
1      instruction will be from each other instruction, and therefore
1      whether the usual instructions, or the longer sequences, must be
1      used for each branch.  The code for this pass is located in
1      'final.c'.
1 
1    * Register-to-stack conversion
1 
1      Conversion from usage of some hard registers to usage of a register
1      stack may be done at this point.  Currently, this is supported only
1      for the floating-point registers of the Intel 80387 coprocessor.
1      The code for this pass is located in 'reg-stack.c'.
1 
1    * Final
1 
1      This pass outputs the assembler code for the function.  The source
1      files are 'final.c' plus 'insn-output.c'; the latter is generated
1      automatically from the machine description by the tool 'genoutput'.
1      The header file 'conditions.h' is used for communication between
1      these files.
1 
1    * Debugging information output
1 
1      This is run after final because it must output the stack slot
1      offsets for pseudo registers that did not get hard registers.
1      Source files are 'dbxout.c' for DBX symbol table format,
1      'dwarfout.c' for DWARF symbol table format, files 'dwarf2out.c' and
1      'dwarf2asm.c' for DWARF2 symbol table format, and 'vmsdbgout.c' for
1      VMS debug symbol table format.
1