gccint: Processor pipeline description
1
1 17.19.9 Specifying processor pipeline description
1 -------------------------------------------------
1
1 To achieve better performance, most modern processors (super-pipelined,
1 superscalar RISC, and VLIW processors) have many "functional units" on
1 which several instructions can be executed simultaneously. An
1 instruction starts execution if its issue conditions are satisfied. If
1 not, the instruction is stalled until its conditions are satisfied.
1 Such "interlock (pipeline) delay" causes interruption of the fetching of
1 successor instructions (or demands nop instructions, e.g. for some MIPS
1 processors).
1
1 There are two major kinds of interlock delays in modern processors.
1 The first one is a data dependence delay determining "instruction
1 latency time". The instruction execution is not started until all
1 source data have been evaluated by prior instructions (there are more
1 complex cases when the instruction execution starts even when the data
1 are not available but will be ready in given time after the instruction
1 execution start). Taking the data dependence delays into account is
1 simple. The data dependence (true, output, and anti-dependence) delay
1 between two instructions is given by a constant. In most cases this
1 approach is adequate. The second kind of interlock delays is a
1 reservation delay. The reservation delay means that two instructions
1 under execution will be in need of shared processors resources, i.e.
1 buses, internal registers, and/or functional units, which are reserved
1 for some time. Taking this kind of delay into account is complex
1 especially for modern RISC processors.
1
1 The task of exploiting more processor parallelism is solved by an
1 instruction scheduler. For a better solution to this problem, the
1 instruction scheduler has to have an adequate description of the
1 processor parallelism (or "pipeline description"). GCC machine
1 descriptions describe processor parallelism and functional unit
1 reservations for groups of instructions with the aid of "regular
1 expressions".
1
1 The GCC instruction scheduler uses a "pipeline hazard recognizer" to
1 figure out the possibility of the instruction issue by the processor on
1 a given simulated processor cycle. The pipeline hazard recognizer is
1 automatically generated from the processor pipeline description. The
1 pipeline hazard recognizer generated from the machine description is
1 based on a deterministic finite state automaton (DFA): the instruction
1 issue is possible if there is a transition from one automaton state to
1 another one. This algorithm is very fast, and furthermore, its speed is
1 not dependent on processor complexity(1).
1
1 The rest of this section describes the directives that constitute an
1 automaton-based processor pipeline description. The order of these
1 constructions within the machine description file is not important.
1
1 The following optional construction describes names of automata
1 generated and used for the pipeline hazards recognition. Sometimes the
1 generated finite state automaton used by the pipeline hazard recognizer
1 is large. If we use more than one automaton and bind functional units
1 to the automata, the total size of the automata is usually less than the
1 size of the single automaton. If there is no one such construction,
1 only one finite state automaton is generated.
1
1 (define_automaton AUTOMATA-NAMES)
1
1 AUTOMATA-NAMES is a string giving names of the automata. The names are
1 separated by commas. All the automata should have unique names. The
1 automaton name is used in the constructions 'define_cpu_unit' and
1 'define_query_cpu_unit'.
1
1 Each processor functional unit used in the description of instruction
1 reservations should be described by the following construction.
1
1 (define_cpu_unit UNIT-NAMES [AUTOMATON-NAME])
1
1 UNIT-NAMES is a string giving the names of the functional units
1 separated by commas. Don't use name 'nothing', it is reserved for other
1 goals.
1
1 AUTOMATON-NAME is a string giving the name of the automaton with which
1 the unit is bound. The automaton should be described in construction
1 'define_automaton'. You should give "automaton-name", if there is a
1 defined automaton.
1
1 The assignment of units to automata are constrained by the uses of the
1 units in insn reservations. The most important constraint is: if a unit
1 reservation is present on a particular cycle of an alternative for an
1 insn reservation, then some unit from the same automaton must be present
1 on the same cycle for the other alternatives of the insn reservation.
1 The rest of the constraints are mentioned in the description of the
1 subsequent constructions.
1
1 The following construction describes CPU functional units analogously
1 to 'define_cpu_unit'. The reservation of such units can be queried for
1 an automaton state. The instruction scheduler never queries reservation
1 of functional units for given automaton state. So as a rule, you don't
1 need this construction. This construction could be used for future code
1 generation goals (e.g. to generate VLIW insn templates).
1
1 (define_query_cpu_unit UNIT-NAMES [AUTOMATON-NAME])
1
1 UNIT-NAMES is a string giving names of the functional units separated
1 by commas.
1
1 AUTOMATON-NAME is a string giving the name of the automaton with which
1 the unit is bound.
1
1 The following construction is the major one to describe pipeline
1 characteristics of an instruction.
1
1 (define_insn_reservation INSN-NAME DEFAULT_LATENCY
1 CONDITION REGEXP)
1
1 DEFAULT_LATENCY is a number giving latency time of the instruction.
1 There is an important difference between the old description and the
1 automaton based pipeline description. The latency time is used for all
1 dependencies when we use the old description. In the automaton based
1 pipeline description, the given latency time is only used for true
1 dependencies. The cost of anti-dependencies is always zero and the cost
1 of output dependencies is the difference between latency times of the
1 producing and consuming insns (if the difference is negative, the cost
1 is considered to be zero). You can always change the default costs for
1 any description by using the target hook 'TARGET_SCHED_ADJUST_COST'
1 (⇒Scheduling).
1
1 INSN-NAME is a string giving the internal name of the insn. The
1 internal names are used in constructions 'define_bypass' and in the
1 automaton description file generated for debugging. The internal name
1 has nothing in common with the names in 'define_insn'. It is a good
1 practice to use insn classes described in the processor manual.
1
1 CONDITION defines what RTL insns are described by this construction.
1 You should remember that you will be in trouble if CONDITION for two or
1 more different 'define_insn_reservation' constructions is TRUE for an
1 insn. In this case what reservation will be used for the insn is not
1 defined. Such cases are not checked during generation of the pipeline
1 hazards recognizer because in general recognizing that two conditions
1 may have the same value is quite difficult (especially if the conditions
1 contain 'symbol_ref'). It is also not checked during the pipeline
1 hazard recognizer work because it would slow down the recognizer
1 considerably.
1
1 REGEXP is a string describing the reservation of the cpu's functional
1 units by the instruction. The reservations are described by a regular
1 expression according to the following syntax:
1
1 regexp = regexp "," oneof
1 | oneof
1
1 oneof = oneof "|" allof
1 | allof
1
1 allof = allof "+" repeat
1 | repeat
1
1 repeat = element "*" number
1 | element
1
1 element = cpu_function_unit_name
1 | reservation_name
1 | result_name
1 | "nothing"
1 | "(" regexp ")"
1
1 * ',' is used for describing the start of the next cycle in the
1 reservation.
1
1 * '|' is used for describing a reservation described by the first
1 regular expression *or* a reservation described by the second
1 regular expression *or* etc.
1
1 * '+' is used for describing a reservation described by the first
1 regular expression *and* a reservation described by the second
1 regular expression *and* etc.
1
1 * '*' is used for convenience and simply means a sequence in which
1 the regular expression are repeated NUMBER times with cycle
1 advancing (see ',').
1
1 * 'cpu_function_unit_name' denotes reservation of the named
1 functional unit.
1
1 * 'reservation_name' -- see description of construction
1 'define_reservation'.
1
1 * 'nothing' denotes no unit reservations.
1
1 Sometimes unit reservations for different insns contain common parts.
1 In such case, you can simplify the pipeline description by describing
1 the common part by the following construction
1
1 (define_reservation RESERVATION-NAME REGEXP)
1
1 RESERVATION-NAME is a string giving name of REGEXP. Functional unit
1 names and reservation names are in the same name space. So the
1 reservation names should be different from the functional unit names and
1 can not be the reserved name 'nothing'.
1
1 The following construction is used to describe exceptions in the
1 latency time for given instruction pair. This is so called bypasses.
1
1 (define_bypass NUMBER OUT_INSN_NAMES IN_INSN_NAMES
1 [GUARD])
1
1 NUMBER defines when the result generated by the instructions given in
1 string OUT_INSN_NAMES will be ready for the instructions given in string
1 IN_INSN_NAMES. Each of these strings is a comma-separated list of
1 filename-style globs and they refer to the names of
1 'define_insn_reservation's. For example:
1 (define_bypass 1 "cpu1_load_*, cpu1_store_*" "cpu1_load_*")
1 defines a bypass between instructions that start with 'cpu1_load_' or
1 'cpu1_store_' and those that start with 'cpu1_load_'.
1
1 GUARD is an optional string giving the name of a C function which
1 defines an additional guard for the bypass. The function will get the
1 two insns as parameters. If the function returns zero the bypass will
1 be ignored for this case. The additional guard is necessary to
1 recognize complicated bypasses, e.g. when the consumer is only an
1 address of insn 'store' (not a stored value).
1
1 If there are more one bypass with the same output and input insns, the
1 chosen bypass is the first bypass with a guard in description whose
1 guard function returns nonzero. If there is no such bypass, then bypass
1 without the guard function is chosen.
1
1 The following five constructions are usually used to describe VLIW
1 processors, or more precisely, to describe a placement of small
1 instructions into VLIW instruction slots. They can be used for RISC
1 processors, too.
1
1 (exclusion_set UNIT-NAMES UNIT-NAMES)
1 (presence_set UNIT-NAMES PATTERNS)
1 (final_presence_set UNIT-NAMES PATTERNS)
1 (absence_set UNIT-NAMES PATTERNS)
1 (final_absence_set UNIT-NAMES PATTERNS)
1
1 UNIT-NAMES is a string giving names of functional units separated by
1 commas.
1
1 PATTERNS is a string giving patterns of functional units separated by
1 comma. Currently pattern is one unit or units separated by
1 white-spaces.
1
1 The first construction ('exclusion_set') means that each functional
1 unit in the first string can not be reserved simultaneously with a unit
1 whose name is in the second string and vice versa. For example, the
1 construction is useful for describing processors (e.g. some SPARC
1 processors) with a fully pipelined floating point functional unit which
1 can execute simultaneously only single floating point insns or only
1 double floating point insns.
1
1 The second construction ('presence_set') means that each functional
1 unit in the first string can not be reserved unless at least one of
1 pattern of units whose names are in the second string is reserved. This
1 is an asymmetric relation. For example, it is useful for description
1 that VLIW 'slot1' is reserved after 'slot0' reservation. We could
1 describe it by the following construction
1
1 (presence_set "slot1" "slot0")
1
1 Or 'slot1' is reserved only after 'slot0' and unit 'b0' reservation.
1 In this case we could write
1
1 (presence_set "slot1" "slot0 b0")
1
1 The third construction ('final_presence_set') is analogous to
1 'presence_set'. The difference between them is when checking is done.
1 When an instruction is issued in given automaton state reflecting all
1 current and planned unit reservations, the automaton state is changed.
1 The first state is a source state, the second one is a result state.
1 Checking for 'presence_set' is done on the source state reservation,
1 checking for 'final_presence_set' is done on the result reservation.
1 This construction is useful to describe a reservation which is actually
1 two subsequent reservations. For example, if we use
1
1 (presence_set "slot1" "slot0")
1
1 the following insn will be never issued (because 'slot1' requires
1 'slot0' which is absent in the source state).
1
1 (define_reservation "insn_and_nop" "slot0 + slot1")
1
1 but it can be issued if we use analogous 'final_presence_set'.
1
1 The forth construction ('absence_set') means that each functional unit
1 in the first string can be reserved only if each pattern of units whose
1 names are in the second string is not reserved. This is an asymmetric
1 relation (actually 'exclusion_set' is analogous to this one but it is
1 symmetric). For example it might be useful in a VLIW description to say
1 that 'slot0' cannot be reserved after either 'slot1' or 'slot2' have
1 been reserved. This can be described as:
1
1 (absence_set "slot0" "slot1, slot2")
1
1 Or 'slot2' can not be reserved if 'slot0' and unit 'b0' are reserved or
1 'slot1' and unit 'b1' are reserved. In this case we could write
1
1 (absence_set "slot2" "slot0 b0, slot1 b1")
1
1 All functional units mentioned in a set should belong to the same
1 automaton.
1
1 The last construction ('final_absence_set') is analogous to
1 'absence_set' but checking is done on the result (state) reservation.
1 See comments for 'final_presence_set'.
1
1 You can control the generator of the pipeline hazard recognizer with
1 the following construction.
1
1 (automata_option OPTIONS)
1
1 OPTIONS is a string giving options which affect the generated code.
1 Currently there are the following options:
1
1 * "no-minimization" makes no minimization of the automaton. This is
1 only worth to do when we are debugging the description and need to
1 look more accurately at reservations of states.
1
1 * "time" means printing time statistics about the generation of
1 automata.
1
1 * "stats" means printing statistics about the generated automata such
1 as the number of DFA states, NDFA states and arcs.
1
1 * "v" means a generation of the file describing the result automata.
1 The file has suffix '.dfa' and can be used for the description
1 verification and debugging.
1
1 * "w" means a generation of warning instead of error for non-critical
1 errors.
1
1 * "no-comb-vect" prevents the automaton generator from generating two
1 data structures and comparing them for space efficiency. Using a
1 comb vector to represent transitions may be better, but it can be
1 very expensive to construct. This option is useful if the build
1 process spends an unacceptably long time in genautomata.
1
1 * "ndfa" makes nondeterministic finite state automata. This affects
1 the treatment of operator '|' in the regular expressions. The
1 usual treatment of the operator is to try the first alternative
1 and, if the reservation is not possible, the second alternative.
1 The nondeterministic treatment means trying all alternatives, some
1 of them may be rejected by reservations in the subsequent insns.
1
1 * "collapse-ndfa" modifies the behavior of the generator when
1 producing an automaton. An additional state transition to collapse
1 a nondeterministic NDFA state to a deterministic DFA state is
1 generated. It can be triggered by passing 'const0_rtx' to
1 state_transition. In such an automaton, cycle advance transitions
1 are available only for these collapsed states. This option is
1 useful for ports that want to use the 'ndfa' option, but also want
1 to use 'define_query_cpu_unit' to assign units to insns issued in a
1 cycle.
1
1 * "progress" means output of a progress bar showing how many states
1 were generated so far for automaton being processed. This is
1 useful during debugging a DFA description. If you see too many
1 generated states, you could interrupt the generator of the pipeline
1 hazard recognizer and try to figure out a reason for generation of
1 the huge automaton.
1
1 As an example, consider a superscalar RISC machine which can issue
1 three insns (two integer insns and one floating point insn) on the cycle
1 but can finish only two insns. To describe this, we define the
1 following functional units.
1
1 (define_cpu_unit "i0_pipeline, i1_pipeline, f_pipeline")
1 (define_cpu_unit "port0, port1")
1
1 All simple integer insns can be executed in any integer pipeline and
1 their result is ready in two cycles. The simple integer insns are
1 issued into the first pipeline unless it is reserved, otherwise they are
1 issued into the second pipeline. Integer division and multiplication
1 insns can be executed only in the second integer pipeline and their
1 results are ready correspondingly in 9 and 4 cycles. The integer
1 division is not pipelined, i.e. the subsequent integer division insn can
1 not be issued until the current division insn finished. Floating point
1 insns are fully pipelined and their results are ready in 3 cycles.
1 Where the result of a floating point insn is used by an integer insn, an
1 additional delay of one cycle is incurred. To describe all of this we
1 could specify
1
1 (define_cpu_unit "div")
1
1 (define_insn_reservation "simple" 2 (eq_attr "type" "int")
1 "(i0_pipeline | i1_pipeline), (port0 | port1)")
1
1 (define_insn_reservation "mult" 4 (eq_attr "type" "mult")
1 "i1_pipeline, nothing*2, (port0 | port1)")
1
1 (define_insn_reservation "div" 9 (eq_attr "type" "div")
1 "i1_pipeline, div*7, div + (port0 | port1)")
1
1 (define_insn_reservation "float" 3 (eq_attr "type" "float")
1 "f_pipeline, nothing, (port0 | port1))
1
1 (define_bypass 4 "float" "simple,mult,div")
1
1 To simplify the description we could describe the following reservation
1
1 (define_reservation "finish" "port0|port1")
1
1 and use it in all 'define_insn_reservation' as in the following
1 construction
1
1 (define_insn_reservation "simple" 2 (eq_attr "type" "int")
1 "(i0_pipeline | i1_pipeline), finish")
1
1 ---------- Footnotes ----------
1
1 (1) However, the size of the automaton depends on processor
1 complexity. To limit this effect, machine descriptions can split
1 orthogonal parts of the machine description among several automata: but
1 then, since each of these must be stepped independently, this does cause
1 a small decrease in the algorithm's performance.
1