gccint: Processor pipeline description

1 
1 17.19.9 Specifying processor pipeline description
1 -------------------------------------------------
1 
1 To achieve better performance, most modern processors (super-pipelined,
1 superscalar RISC, and VLIW processors) have many "functional units" on
1 which several instructions can be executed simultaneously.  An
1 instruction starts execution if its issue conditions are satisfied.  If
1 not, the instruction is stalled until its conditions are satisfied.
1 Such "interlock (pipeline) delay" causes interruption of the fetching of
1 successor instructions (or demands nop instructions, e.g. for some MIPS
1 processors).
1 
1  There are two major kinds of interlock delays in modern processors.
1 The first one is a data dependence delay determining "instruction
1 latency time".  The instruction execution is not started until all
1 source data have been evaluated by prior instructions (there are more
1 complex cases when the instruction execution starts even when the data
1 are not available but will be ready in given time after the instruction
1 execution start).  Taking the data dependence delays into account is
1 simple.  The data dependence (true, output, and anti-dependence) delay
1 between two instructions is given by a constant.  In most cases this
1 approach is adequate.  The second kind of interlock delays is a
1 reservation delay.  The reservation delay means that two instructions
1 under execution will be in need of shared processors resources, i.e.
1 buses, internal registers, and/or functional units, which are reserved
1 for some time.  Taking this kind of delay into account is complex
1 especially for modern RISC processors.
1 
1  The task of exploiting more processor parallelism is solved by an
1 instruction scheduler.  For a better solution to this problem, the
1 instruction scheduler has to have an adequate description of the
1 processor parallelism (or "pipeline description").  GCC machine
1 descriptions describe processor parallelism and functional unit
1 reservations for groups of instructions with the aid of "regular
1 expressions".
1 
1  The GCC instruction scheduler uses a "pipeline hazard recognizer" to
1 figure out the possibility of the instruction issue by the processor on
1 a given simulated processor cycle.  The pipeline hazard recognizer is
1 automatically generated from the processor pipeline description.  The
1 pipeline hazard recognizer generated from the machine description is
1 based on a deterministic finite state automaton (DFA): the instruction
1 issue is possible if there is a transition from one automaton state to
1 another one.  This algorithm is very fast, and furthermore, its speed is
1 not dependent on processor complexity(1).
1 
1  The rest of this section describes the directives that constitute an
1 automaton-based processor pipeline description.  The order of these
1 constructions within the machine description file is not important.
1 
1  The following optional construction describes names of automata
1 generated and used for the pipeline hazards recognition.  Sometimes the
1 generated finite state automaton used by the pipeline hazard recognizer
1 is large.  If we use more than one automaton and bind functional units
1 to the automata, the total size of the automata is usually less than the
1 size of the single automaton.  If there is no one such construction,
1 only one finite state automaton is generated.
1 
1      (define_automaton AUTOMATA-NAMES)
1 
1  AUTOMATA-NAMES is a string giving names of the automata.  The names are
1 separated by commas.  All the automata should have unique names.  The
1 automaton name is used in the constructions 'define_cpu_unit' and
1 'define_query_cpu_unit'.
1 
1  Each processor functional unit used in the description of instruction
1 reservations should be described by the following construction.
1 
1      (define_cpu_unit UNIT-NAMES [AUTOMATON-NAME])
1 
1  UNIT-NAMES is a string giving the names of the functional units
1 separated by commas.  Don't use name 'nothing', it is reserved for other
1 goals.
1 
1  AUTOMATON-NAME is a string giving the name of the automaton with which
1 the unit is bound.  The automaton should be described in construction
1 'define_automaton'.  You should give "automaton-name", if there is a
1 defined automaton.
1 
1  The assignment of units to automata are constrained by the uses of the
1 units in insn reservations.  The most important constraint is: if a unit
1 reservation is present on a particular cycle of an alternative for an
1 insn reservation, then some unit from the same automaton must be present
1 on the same cycle for the other alternatives of the insn reservation.
1 The rest of the constraints are mentioned in the description of the
1 subsequent constructions.
1 
1  The following construction describes CPU functional units analogously
1 to 'define_cpu_unit'.  The reservation of such units can be queried for
1 an automaton state.  The instruction scheduler never queries reservation
1 of functional units for given automaton state.  So as a rule, you don't
1 need this construction.  This construction could be used for future code
1 generation goals (e.g. to generate VLIW insn templates).
1 
1      (define_query_cpu_unit UNIT-NAMES [AUTOMATON-NAME])
1 
1  UNIT-NAMES is a string giving names of the functional units separated
1 by commas.
1 
1  AUTOMATON-NAME is a string giving the name of the automaton with which
1 the unit is bound.
1 
1  The following construction is the major one to describe pipeline
1 characteristics of an instruction.
1 
1      (define_insn_reservation INSN-NAME DEFAULT_LATENCY
1                               CONDITION REGEXP)
1 
1  DEFAULT_LATENCY is a number giving latency time of the instruction.
1 There is an important difference between the old description and the
1 automaton based pipeline description.  The latency time is used for all
1 dependencies when we use the old description.  In the automaton based
1 pipeline description, the given latency time is only used for true
1 dependencies.  The cost of anti-dependencies is always zero and the cost
1 of output dependencies is the difference between latency times of the
1 producing and consuming insns (if the difference is negative, the cost
1 is considered to be zero).  You can always change the default costs for
1 any description by using the target hook 'TARGET_SCHED_ADJUST_COST'
1 (⇒Scheduling).
1 
1  INSN-NAME is a string giving the internal name of the insn.  The
1 internal names are used in constructions 'define_bypass' and in the
1 automaton description file generated for debugging.  The internal name
1 has nothing in common with the names in 'define_insn'.  It is a good
1 practice to use insn classes described in the processor manual.
1 
1  CONDITION defines what RTL insns are described by this construction.
1 You should remember that you will be in trouble if CONDITION for two or
1 more different 'define_insn_reservation' constructions is TRUE for an
1 insn.  In this case what reservation will be used for the insn is not
1 defined.  Such cases are not checked during generation of the pipeline
1 hazards recognizer because in general recognizing that two conditions
1 may have the same value is quite difficult (especially if the conditions
1 contain 'symbol_ref').  It is also not checked during the pipeline
1 hazard recognizer work because it would slow down the recognizer
1 considerably.
1 
1  REGEXP is a string describing the reservation of the cpu's functional
1 units by the instruction.  The reservations are described by a regular
1 expression according to the following syntax:
1 
1             regexp = regexp "," oneof
1                    | oneof
1 
1             oneof = oneof "|" allof
1                   | allof
1 
1             allof = allof "+" repeat
1                   | repeat
1 
1             repeat = element "*" number
1                    | element
1 
1             element = cpu_function_unit_name
1                     | reservation_name
1                     | result_name
1                     | "nothing"
1                     | "(" regexp ")"
1 
1    * ',' is used for describing the start of the next cycle in the
1      reservation.
1 
1    * '|' is used for describing a reservation described by the first
1      regular expression *or* a reservation described by the second
1      regular expression *or* etc.
1 
1    * '+' is used for describing a reservation described by the first
1      regular expression *and* a reservation described by the second
1      regular expression *and* etc.
1 
1    * '*' is used for convenience and simply means a sequence in which
1      the regular expression are repeated NUMBER times with cycle
1      advancing (see ',').
1 
1    * 'cpu_function_unit_name' denotes reservation of the named
1      functional unit.
1 
1    * 'reservation_name' -- see description of construction
1      'define_reservation'.
1 
1    * 'nothing' denotes no unit reservations.
1 
1  Sometimes unit reservations for different insns contain common parts.
1 In such case, you can simplify the pipeline description by describing
1 the common part by the following construction
1 
1      (define_reservation RESERVATION-NAME REGEXP)
1 
1  RESERVATION-NAME is a string giving name of REGEXP.  Functional unit
1 names and reservation names are in the same name space.  So the
1 reservation names should be different from the functional unit names and
1 can not be the reserved name 'nothing'.
1 
1  The following construction is used to describe exceptions in the
1 latency time for given instruction pair.  This is so called bypasses.
1 
1      (define_bypass NUMBER OUT_INSN_NAMES IN_INSN_NAMES
1                     [GUARD])
1 
1  NUMBER defines when the result generated by the instructions given in
1 string OUT_INSN_NAMES will be ready for the instructions given in string
1 IN_INSN_NAMES.  Each of these strings is a comma-separated list of
1 filename-style globs and they refer to the names of
1 'define_insn_reservation's.  For example:
1      (define_bypass 1 "cpu1_load_*, cpu1_store_*" "cpu1_load_*")
1  defines a bypass between instructions that start with 'cpu1_load_' or
1 'cpu1_store_' and those that start with 'cpu1_load_'.
1 
1  GUARD is an optional string giving the name of a C function which
1 defines an additional guard for the bypass.  The function will get the
1 two insns as parameters.  If the function returns zero the bypass will
1 be ignored for this case.  The additional guard is necessary to
1 recognize complicated bypasses, e.g. when the consumer is only an
1 address of insn 'store' (not a stored value).
1 
1  If there are more one bypass with the same output and input insns, the
1 chosen bypass is the first bypass with a guard in description whose
1 guard function returns nonzero.  If there is no such bypass, then bypass
1 without the guard function is chosen.
1 
1  The following five constructions are usually used to describe VLIW
1 processors, or more precisely, to describe a placement of small
1 instructions into VLIW instruction slots.  They can be used for RISC
1 processors, too.
1 
1      (exclusion_set UNIT-NAMES UNIT-NAMES)
1      (presence_set UNIT-NAMES PATTERNS)
1      (final_presence_set UNIT-NAMES PATTERNS)
1      (absence_set UNIT-NAMES PATTERNS)
1      (final_absence_set UNIT-NAMES PATTERNS)
1 
1  UNIT-NAMES is a string giving names of functional units separated by
1 commas.
1 
1  PATTERNS is a string giving patterns of functional units separated by
1 comma.  Currently pattern is one unit or units separated by
1 white-spaces.
1 
1  The first construction ('exclusion_set') means that each functional
1 unit in the first string can not be reserved simultaneously with a unit
1 whose name is in the second string and vice versa.  For example, the
1 construction is useful for describing processors (e.g. some SPARC
1 processors) with a fully pipelined floating point functional unit which
1 can execute simultaneously only single floating point insns or only
1 double floating point insns.
1 
1  The second construction ('presence_set') means that each functional
1 unit in the first string can not be reserved unless at least one of
1 pattern of units whose names are in the second string is reserved.  This
1 is an asymmetric relation.  For example, it is useful for description
1 that VLIW 'slot1' is reserved after 'slot0' reservation.  We could
1 describe it by the following construction
1 
1      (presence_set "slot1" "slot0")
1 
1  Or 'slot1' is reserved only after 'slot0' and unit 'b0' reservation.
1 In this case we could write
1 
1      (presence_set "slot1" "slot0 b0")
1 
1  The third construction ('final_presence_set') is analogous to
1 'presence_set'.  The difference between them is when checking is done.
1 When an instruction is issued in given automaton state reflecting all
1 current and planned unit reservations, the automaton state is changed.
1 The first state is a source state, the second one is a result state.
1 Checking for 'presence_set' is done on the source state reservation,
1 checking for 'final_presence_set' is done on the result reservation.
1 This construction is useful to describe a reservation which is actually
1 two subsequent reservations.  For example, if we use
1 
1      (presence_set "slot1" "slot0")
1 
1  the following insn will be never issued (because 'slot1' requires
1 'slot0' which is absent in the source state).
1 
1      (define_reservation "insn_and_nop" "slot0 + slot1")
1 
1  but it can be issued if we use analogous 'final_presence_set'.
1 
1  The forth construction ('absence_set') means that each functional unit
1 in the first string can be reserved only if each pattern of units whose
1 names are in the second string is not reserved.  This is an asymmetric
1 relation (actually 'exclusion_set' is analogous to this one but it is
1 symmetric).  For example it might be useful in a VLIW description to say
1 that 'slot0' cannot be reserved after either 'slot1' or 'slot2' have
1 been reserved.  This can be described as:
1 
1      (absence_set "slot0" "slot1, slot2")
1 
1  Or 'slot2' can not be reserved if 'slot0' and unit 'b0' are reserved or
1 'slot1' and unit 'b1' are reserved.  In this case we could write
1 
1      (absence_set "slot2" "slot0 b0, slot1 b1")
1 
1  All functional units mentioned in a set should belong to the same
1 automaton.
1 
1  The last construction ('final_absence_set') is analogous to
1 'absence_set' but checking is done on the result (state) reservation.
1 See comments for 'final_presence_set'.
1 
1  You can control the generator of the pipeline hazard recognizer with
1 the following construction.
1 
1      (automata_option OPTIONS)
1 
1  OPTIONS is a string giving options which affect the generated code.
1 Currently there are the following options:
1 
1    * "no-minimization" makes no minimization of the automaton.  This is
1      only worth to do when we are debugging the description and need to
1      look more accurately at reservations of states.
1 
1    * "time" means printing time statistics about the generation of
1      automata.
1 
1    * "stats" means printing statistics about the generated automata such
1      as the number of DFA states, NDFA states and arcs.
1 
1    * "v" means a generation of the file describing the result automata.
1      The file has suffix '.dfa' and can be used for the description
1      verification and debugging.
1 
1    * "w" means a generation of warning instead of error for non-critical
1      errors.
1 
1    * "no-comb-vect" prevents the automaton generator from generating two
1      data structures and comparing them for space efficiency.  Using a
1      comb vector to represent transitions may be better, but it can be
1      very expensive to construct.  This option is useful if the build
1      process spends an unacceptably long time in genautomata.
1 
1    * "ndfa" makes nondeterministic finite state automata.  This affects
1      the treatment of operator '|' in the regular expressions.  The
1      usual treatment of the operator is to try the first alternative
1      and, if the reservation is not possible, the second alternative.
1      The nondeterministic treatment means trying all alternatives, some
1      of them may be rejected by reservations in the subsequent insns.
1 
1    * "collapse-ndfa" modifies the behavior of the generator when
1      producing an automaton.  An additional state transition to collapse
1      a nondeterministic NDFA state to a deterministic DFA state is
1      generated.  It can be triggered by passing 'const0_rtx' to
1      state_transition.  In such an automaton, cycle advance transitions
1      are available only for these collapsed states.  This option is
1      useful for ports that want to use the 'ndfa' option, but also want
1      to use 'define_query_cpu_unit' to assign units to insns issued in a
1      cycle.
1 
1    * "progress" means output of a progress bar showing how many states
1      were generated so far for automaton being processed.  This is
1      useful during debugging a DFA description.  If you see too many
1      generated states, you could interrupt the generator of the pipeline
1      hazard recognizer and try to figure out a reason for generation of
1      the huge automaton.
1 
1  As an example, consider a superscalar RISC machine which can issue
1 three insns (two integer insns and one floating point insn) on the cycle
1 but can finish only two insns.  To describe this, we define the
1 following functional units.
1 
1      (define_cpu_unit "i0_pipeline, i1_pipeline, f_pipeline")
1      (define_cpu_unit "port0, port1")
1 
1  All simple integer insns can be executed in any integer pipeline and
1 their result is ready in two cycles.  The simple integer insns are
1 issued into the first pipeline unless it is reserved, otherwise they are
1 issued into the second pipeline.  Integer division and multiplication
1 insns can be executed only in the second integer pipeline and their
1 results are ready correspondingly in 9 and 4 cycles.  The integer
1 division is not pipelined, i.e. the subsequent integer division insn can
1 not be issued until the current division insn finished.  Floating point
1 insns are fully pipelined and their results are ready in 3 cycles.
1 Where the result of a floating point insn is used by an integer insn, an
1 additional delay of one cycle is incurred.  To describe all of this we
1 could specify
1 
1      (define_cpu_unit "div")
1 
1      (define_insn_reservation "simple" 2 (eq_attr "type" "int")
1                               "(i0_pipeline | i1_pipeline), (port0 | port1)")
1 
1      (define_insn_reservation "mult" 4 (eq_attr "type" "mult")
1                               "i1_pipeline, nothing*2, (port0 | port1)")
1 
1      (define_insn_reservation "div" 9 (eq_attr "type" "div")
1                               "i1_pipeline, div*7, div + (port0 | port1)")
1 
1      (define_insn_reservation "float" 3 (eq_attr "type" "float")
1                               "f_pipeline, nothing, (port0 | port1))
1 
1      (define_bypass 4 "float" "simple,mult,div")
1 
1  To simplify the description we could describe the following reservation
1 
1      (define_reservation "finish" "port0|port1")
1 
1  and use it in all 'define_insn_reservation' as in the following
1 construction
1 
1      (define_insn_reservation "simple" 2 (eq_attr "type" "int")
1                               "(i0_pipeline | i1_pipeline), finish")
1 
1    ---------- Footnotes ----------
1 
1    (1) However, the size of the automaton depends on processor
1 complexity.  To limit this effect, machine descriptions can split
1 orthogonal parts of the machine description among several automata: but
1 then, since each of these must be stepped independently, this does cause
1 a small decrease in the algorithm's performance.
1