gccint: Looping Patterns

1 
1 17.13 Defining Looping Instruction Patterns
1 ===========================================
1 
1 Some machines have special jump instructions that can be utilized to
1 make loops more efficient.  A common example is the 68000 'dbra'
1 instruction which performs a decrement of a register and a branch if the
1 result was greater than zero.  Other machines, in particular digital
1 signal processors (DSPs), have special block repeat instructions to
1 provide low-overhead loop support.  For example, the TI TMS320C3x/C4x
1 DSPs have a block repeat instruction that loads special registers to
1 mark the top and end of a loop and to count the number of loop
1 iterations.  This avoids the need for fetching and executing a
1 'dbra'-like instruction and avoids pipeline stalls associated with the
1 jump.
1 
1  GCC has three special named patterns to support low overhead looping.
1 They are 'decrement_and_branch_until_zero', 'doloop_begin', and
1 'doloop_end'.  The first pattern, 'decrement_and_branch_until_zero', is
1 not emitted during RTL generation but may be emitted during the
1 instruction combination phase.  This requires the assistance of the loop
1 optimizer, using information collected during strength reduction, to
1 reverse a loop to count down to zero.  Some targets also require the
1 loop optimizer to add a 'REG_NONNEG' note to indicate that the iteration
1 count is always positive.  This is needed if the target performs a
1 signed loop termination test.  For example, the 68000 uses a pattern
1 similar to the following for its 'dbra' instruction:
1 
1      (define_insn "decrement_and_branch_until_zero"
1        [(set (pc)
1              (if_then_else
1                (ge (plus:SI (match_operand:SI 0 "general_operand" "+d*am")
1                             (const_int -1))
1                    (const_int 0))
1                (label_ref (match_operand 1 "" ""))
1                (pc)))
1         (set (match_dup 0)
1              (plus:SI (match_dup 0)
1                       (const_int -1)))]
1        "find_reg_note (insn, REG_NONNEG, 0)"
1        "...")
1 
1  Note that since the insn is both a jump insn and has an output, it must
1 deal with its own reloads, hence the 'm' constraints.  Also note that
1 since this insn is generated by the instruction combination phase
1 combining two sequential insns together into an implicit parallel insn,
1 the iteration counter needs to be biased by the same amount as the
1 decrement operation, in this case -1.  Note that the following similar
1 pattern will not be matched by the combiner.
1 
1      (define_insn "decrement_and_branch_until_zero"
1        [(set (pc)
1              (if_then_else
1                (ge (match_operand:SI 0 "general_operand" "+d*am")
1                    (const_int 1))
1                (label_ref (match_operand 1 "" ""))
1                (pc)))
1         (set (match_dup 0)
1              (plus:SI (match_dup 0)
1                       (const_int -1)))]
1        "find_reg_note (insn, REG_NONNEG, 0)"
1        "...")
1 
1  The other two special looping patterns, 'doloop_begin' and
1 'doloop_end', are emitted by the loop optimizer for certain well-behaved
1 loops with a finite number of loop iterations using information
1 collected during strength reduction.
1 
1  The 'doloop_end' pattern describes the actual looping instruction (or
1 the implicit looping operation) and the 'doloop_begin' pattern is an
1 optional companion pattern that can be used for initialization needed
1 for some low-overhead looping instructions.
1 
1  Note that some machines require the actual looping instruction to be
1 emitted at the top of the loop (e.g., the TMS320C3x/C4x DSPs).  Emitting
1 the true RTL for a looping instruction at the top of the loop can cause
1 problems with flow analysis.  So instead, a dummy 'doloop' insn is
1 emitted at the end of the loop.  The machine dependent reorg pass checks
1 for the presence of this 'doloop' insn and then searches back to the top
1 of the loop, where it inserts the true looping insn (provided there are
1 no instructions in the loop which would cause problems).  Any additional
1 labels can be emitted at this point.  In addition, if the desired
1 special iteration counter register was not allocated, this machine
1 dependent reorg pass could emit a traditional compare and jump
1 instruction pair.
1 
1  The essential difference between the 'decrement_and_branch_until_zero'
1 and the 'doloop_end' patterns is that the loop optimizer allocates an
1 additional pseudo register for the latter as an iteration counter.  This
1 pseudo register cannot be used within the loop (i.e., general induction
1 variables cannot be derived from it), however, in many cases the loop
1 induction variable may become redundant and removed by the flow pass.
1