gccint: Looping Patterns
1
1 17.13 Defining Looping Instruction Patterns
1 ===========================================
1
1 Some machines have special jump instructions that can be utilized to
1 make loops more efficient. A common example is the 68000 'dbra'
1 instruction which performs a decrement of a register and a branch if the
1 result was greater than zero. Other machines, in particular digital
1 signal processors (DSPs), have special block repeat instructions to
1 provide low-overhead loop support. For example, the TI TMS320C3x/C4x
1 DSPs have a block repeat instruction that loads special registers to
1 mark the top and end of a loop and to count the number of loop
1 iterations. This avoids the need for fetching and executing a
1 'dbra'-like instruction and avoids pipeline stalls associated with the
1 jump.
1
1 GCC has three special named patterns to support low overhead looping.
1 They are 'decrement_and_branch_until_zero', 'doloop_begin', and
1 'doloop_end'. The first pattern, 'decrement_and_branch_until_zero', is
1 not emitted during RTL generation but may be emitted during the
1 instruction combination phase. This requires the assistance of the loop
1 optimizer, using information collected during strength reduction, to
1 reverse a loop to count down to zero. Some targets also require the
1 loop optimizer to add a 'REG_NONNEG' note to indicate that the iteration
1 count is always positive. This is needed if the target performs a
1 signed loop termination test. For example, the 68000 uses a pattern
1 similar to the following for its 'dbra' instruction:
1
1 (define_insn "decrement_and_branch_until_zero"
1 [(set (pc)
1 (if_then_else
1 (ge (plus:SI (match_operand:SI 0 "general_operand" "+d*am")
1 (const_int -1))
1 (const_int 0))
1 (label_ref (match_operand 1 "" ""))
1 (pc)))
1 (set (match_dup 0)
1 (plus:SI (match_dup 0)
1 (const_int -1)))]
1 "find_reg_note (insn, REG_NONNEG, 0)"
1 "...")
1
1 Note that since the insn is both a jump insn and has an output, it must
1 deal with its own reloads, hence the 'm' constraints. Also note that
1 since this insn is generated by the instruction combination phase
1 combining two sequential insns together into an implicit parallel insn,
1 the iteration counter needs to be biased by the same amount as the
1 decrement operation, in this case -1. Note that the following similar
1 pattern will not be matched by the combiner.
1
1 (define_insn "decrement_and_branch_until_zero"
1 [(set (pc)
1 (if_then_else
1 (ge (match_operand:SI 0 "general_operand" "+d*am")
1 (const_int 1))
1 (label_ref (match_operand 1 "" ""))
1 (pc)))
1 (set (match_dup 0)
1 (plus:SI (match_dup 0)
1 (const_int -1)))]
1 "find_reg_note (insn, REG_NONNEG, 0)"
1 "...")
1
1 The other two special looping patterns, 'doloop_begin' and
1 'doloop_end', are emitted by the loop optimizer for certain well-behaved
1 loops with a finite number of loop iterations using information
1 collected during strength reduction.
1
1 The 'doloop_end' pattern describes the actual looping instruction (or
1 the implicit looping operation) and the 'doloop_begin' pattern is an
1 optional companion pattern that can be used for initialization needed
1 for some low-overhead looping instructions.
1
1 Note that some machines require the actual looping instruction to be
1 emitted at the top of the loop (e.g., the TMS320C3x/C4x DSPs). Emitting
1 the true RTL for a looping instruction at the top of the loop can cause
1 problems with flow analysis. So instead, a dummy 'doloop' insn is
1 emitted at the end of the loop. The machine dependent reorg pass checks
1 for the presence of this 'doloop' insn and then searches back to the top
1 of the loop, where it inserts the true looping insn (provided there are
1 no instructions in the loop which would cause problems). Any additional
1 labels can be emitted at this point. In addition, if the desired
1 special iteration counter register was not allocated, this machine
1 dependent reorg pass could emit a traditional compare and jump
1 instruction pair.
1
1 The essential difference between the 'decrement_and_branch_until_zero'
1 and the 'doloop_end' patterns is that the loop optimizer allocates an
1 additional pseudo register for the latter as an iteration counter. This
1 pseudo register cannot be used within the loop (i.e., general induction
1 variables cannot be derived from it), however, in many cases the loop
1 induction variable may become redundant and removed by the flow pass.
1