gccint: Costs
1
1 18.16 Describing Relative Costs of Operations
1 =============================================
1
1 These macros let you describe the relative speed of various operations
1 on the target machine.
1
1 -- Macro: REGISTER_MOVE_COST (MODE, FROM, TO)
1 A C expression for the cost of moving data of mode MODE from a
1 register in class FROM to one in class TO. The classes are
1 expressed using the enumeration values such as 'GENERAL_REGS'. A
1 value of 2 is the default; other values are interpreted relative to
1 that.
1
1 It is not required that the cost always equal 2 when FROM is the
1 same as TO; on some machines it is expensive to move between
1 registers if they are not general registers.
1
1 If reload sees an insn consisting of a single 'set' between two
1 hard registers, and if 'REGISTER_MOVE_COST' applied to their
1 classes returns a value of 2, reload does not check to ensure that
1 the constraints of the insn are met. Setting a cost of other than
1 2 will allow reload to verify that the constraints are met. You
1 should do this if the 'movM' pattern's constraints do not allow
1 such copying.
1
1 These macros are obsolete, new ports should use the target hook
1 'TARGET_REGISTER_MOVE_COST' instead.
1
1 -- Target Hook: int TARGET_REGISTER_MOVE_COST (machine_mode MODE,
1 reg_class_t FROM, reg_class_t TO)
1 This target hook should return the cost of moving data of mode MODE
1 from a register in class FROM to one in class TO. The classes are
1 expressed using the enumeration values such as 'GENERAL_REGS'. A
1 value of 2 is the default; other values are interpreted relative to
1 that.
1
1 It is not required that the cost always equal 2 when FROM is the
1 same as TO; on some machines it is expensive to move between
1 registers if they are not general registers.
1
1 If reload sees an insn consisting of a single 'set' between two
1 hard registers, and if 'TARGET_REGISTER_MOVE_COST' applied to their
1 classes returns a value of 2, reload does not check to ensure that
1 the constraints of the insn are met. Setting a cost of other than
1 2 will allow reload to verify that the constraints are met. You
1 should do this if the 'movM' pattern's constraints do not allow
1 such copying.
1
1 The default version of this function returns 2.
1
1 -- Macro: MEMORY_MOVE_COST (MODE, CLASS, IN)
1 A C expression for the cost of moving data of mode MODE between a
1 register of class CLASS and memory; IN is zero if the value is to
1 be written to memory, nonzero if it is to be read in. This cost is
1 relative to those in 'REGISTER_MOVE_COST'. If moving between
1 registers and memory is more expensive than between two registers,
1 you should define this macro to express the relative cost.
1
1 If you do not define this macro, GCC uses a default cost of 4 plus
1 the cost of copying via a secondary reload register, if one is
1 needed. If your machine requires a secondary reload register to
1 copy between memory and a register of CLASS but the reload
1 mechanism is more complex than copying via an intermediate, define
1 this macro to reflect the actual cost of the move.
1
1 GCC defines the function 'memory_move_secondary_cost' if secondary
1 reloads are needed. It computes the costs due to copying via a
1 secondary register. If your machine copies from memory using a
1 secondary register in the conventional way but the default base
1 value of 4 is not correct for your machine, define this macro to
1 add some other value to the result of that function. The arguments
1 to that function are the same as to this macro.
1
1 These macros are obsolete, new ports should use the target hook
1 'TARGET_MEMORY_MOVE_COST' instead.
1
1 -- Target Hook: int TARGET_MEMORY_MOVE_COST (machine_mode MODE,
1 reg_class_t RCLASS, bool IN)
1 This target hook should return the cost of moving data of mode MODE
1 between a register of class RCLASS and memory; IN is 'false' if the
1 value is to be written to memory, 'true' if it is to be read in.
1 This cost is relative to those in 'TARGET_REGISTER_MOVE_COST'. If
1 moving between registers and memory is more expensive than between
1 two registers, you should add this target hook to express the
1 relative cost.
1
1 If you do not add this target hook, GCC uses a default cost of 4
1 plus the cost of copying via a secondary reload register, if one is
1 needed. If your machine requires a secondary reload register to
1 copy between memory and a register of RCLASS but the reload
1 mechanism is more complex than copying via an intermediate, use
1 this target hook to reflect the actual cost of the move.
1
1 GCC defines the function 'memory_move_secondary_cost' if secondary
1 reloads are needed. It computes the costs due to copying via a
1 secondary register. If your machine copies from memory using a
1 secondary register in the conventional way but the default base
1 value of 4 is not correct for your machine, use this target hook to
1 add some other value to the result of that function. The arguments
1 to that function are the same as to this target hook.
1
1 -- Macro: BRANCH_COST (SPEED_P, PREDICTABLE_P)
1 A C expression for the cost of a branch instruction. A value of 1
1 is the default; other values are interpreted relative to that.
1 Parameter SPEED_P is true when the branch in question should be
1 optimized for speed. When it is false, 'BRANCH_COST' should return
1 a value optimal for code size rather than performance.
1 PREDICTABLE_P is true for well-predicted branches. On many
1 architectures the 'BRANCH_COST' can be reduced then.
1
1 Here are additional macros which do not specify precise relative costs,
1 but only that certain actions are more expensive than GCC would
1 ordinarily expect.
1
1 -- Macro: SLOW_BYTE_ACCESS
1 Define this macro as a C expression which is nonzero if accessing
1 less than a word of memory (i.e. a 'char' or a 'short') is no
1 faster than accessing a word of memory, i.e., if such access
1 require more than one instruction or if there is no difference in
1 cost between byte and (aligned) word loads.
1
1 When this macro is not defined, the compiler will access a field by
1 finding the smallest containing object; when it is defined, a
1 fullword load will be used if alignment permits. Unless bytes
1 accesses are faster than word accesses, using word accesses is
1 preferable since it may eliminate subsequent memory access if
1 subsequent accesses occur to other fields in the same word of the
1 structure, but to different bytes.
1
1 -- Target Hook: bool TARGET_SLOW_UNALIGNED_ACCESS (machine_mode MODE,
1 unsigned int ALIGN)
1 This hook returns true if memory accesses described by the MODE and
1 ALIGNMENT parameters have a cost many times greater than aligned
1 accesses, for example if they are emulated in a trap handler. This
1 hook is invoked only for unaligned accesses, i.e. when 'ALIGNMENT
1 < GET_MODE_ALIGNMENT (MODE)'.
1
1 When this hook returns true, the compiler will act as if
1 'STRICT_ALIGNMENT' were true when generating code for block moves.
1 This can cause significantly more instructions to be produced.
1 Therefore, do not make this hook return true if unaligned accesses
1 only add a cycle or two to the time for a memory access.
1
1 The hook must return true whenever 'STRICT_ALIGNMENT' is true. The
1 default implementation returns 'STRICT_ALIGNMENT'.
1
1 -- Macro: MOVE_RATIO (SPEED)
1 The threshold of number of scalar memory-to-memory move insns,
1 _below_ which a sequence of insns should be generated instead of a
1 string move insn or a library call. Increasing the value will
1 always make code faster, but eventually incurs high cost in
1 increased code size.
1
1 Note that on machines where the corresponding move insn is a
1 'define_expand' that emits a sequence of insns, this macro counts
1 the number of such sequences.
1
1 The parameter SPEED is true if the code is currently being
1 optimized for speed rather than size.
1
1 If you don't define this, a reasonable default is used.
1
1 -- Target Hook: bool TARGET_USE_BY_PIECES_INFRASTRUCTURE_P (unsigned
1 HOST_WIDE_INT SIZE, unsigned int ALIGNMENT, enum
1 by_pieces_operation OP, bool SPEED_P)
1 GCC will attempt several strategies when asked to copy between two
1 areas of memory, or to set, clear or store to memory, for example
1 when copying a 'struct'. The 'by_pieces' infrastructure implements
1 such memory operations as a sequence of load, store or move insns.
1 Alternate strategies are to expand the 'movmem' or 'setmem' optabs,
1 to emit a library call, or to emit unit-by-unit, loop-based
1 operations.
1
1 This target hook should return true if, for a memory operation with
1 a given SIZE and ALIGNMENT, using the 'by_pieces' infrastructure is
1 expected to result in better code generation. Both SIZE and
1 ALIGNMENT are measured in terms of storage units.
1
1 The parameter OP is one of: 'CLEAR_BY_PIECES', 'MOVE_BY_PIECES',
1 'SET_BY_PIECES', 'STORE_BY_PIECES' or 'COMPARE_BY_PIECES'. These
1 describe the type of memory operation under consideration.
1
1 The parameter SPEED_P is true if the code is currently being
1 optimized for speed rather than size.
1
1 Returning true for higher values of SIZE can improve code
1 generation for speed if the target does not provide an
1 implementation of the 'movmem' or 'setmem' standard names, if the
1 'movmem' or 'setmem' implementation would be more expensive than a
1 sequence of insns, or if the overhead of a library call would
1 dominate that of the body of the memory operation.
1
1 Returning true for higher values of 'size' may also cause an
1 increase in code size, for example where the number of insns
1 emitted to perform a move would be greater than that of a library
1 call.
1
1 -- Target Hook: int TARGET_COMPARE_BY_PIECES_BRANCH_RATIO (machine_mode
1 MODE)
1 When expanding a block comparison in MODE, gcc can try to reduce
1 the number of branches at the expense of more memory operations.
1 This hook allows the target to override the default choice. It
1 should return the factor by which branches should be reduced over
1 the plain expansion with one comparison per MODE-sized piece. A
1 port can also prevent a particular mode from being used for block
1 comparisons by returning a negative number from this hook.
1
1 -- Macro: MOVE_MAX_PIECES
1 A C expression used by 'move_by_pieces' to determine the largest
1 unit a load or store used to copy memory is. Defaults to
1 'MOVE_MAX'.
1
1 -- Macro: STORE_MAX_PIECES
1 A C expression used by 'store_by_pieces' to determine the largest
1 unit a store used to memory is. Defaults to 'MOVE_MAX_PIECES', or
1 two times the size of 'HOST_WIDE_INT', whichever is smaller.
1
1 -- Macro: COMPARE_MAX_PIECES
1 A C expression used by 'compare_by_pieces' to determine the largest
1 unit a load or store used to compare memory is. Defaults to
1 'MOVE_MAX_PIECES'.
1
1 -- Macro: CLEAR_RATIO (SPEED)
1 The threshold of number of scalar move insns, _below_ which a
1 sequence of insns should be generated to clear memory instead of a
1 string clear insn or a library call. Increasing the value will
1 always make code faster, but eventually incurs high cost in
1 increased code size.
1
1 The parameter SPEED is true if the code is currently being
1 optimized for speed rather than size.
1
1 If you don't define this, a reasonable default is used.
1
1 -- Macro: SET_RATIO (SPEED)
1 The threshold of number of scalar move insns, _below_ which a
1 sequence of insns should be generated to set memory to a constant
1 value, instead of a block set insn or a library call. Increasing
1 the value will always make code faster, but eventually incurs high
1 cost in increased code size.
1
1 The parameter SPEED is true if the code is currently being
1 optimized for speed rather than size.
1
1 If you don't define this, it defaults to the value of 'MOVE_RATIO'.
1
1 -- Macro: USE_LOAD_POST_INCREMENT (MODE)
1 A C expression used to determine whether a load postincrement is a
1 good thing to use for a given mode. Defaults to the value of
1 'HAVE_POST_INCREMENT'.
1
1 -- Macro: USE_LOAD_POST_DECREMENT (MODE)
1 A C expression used to determine whether a load postdecrement is a
1 good thing to use for a given mode. Defaults to the value of
1 'HAVE_POST_DECREMENT'.
1
1 -- Macro: USE_LOAD_PRE_INCREMENT (MODE)
1 A C expression used to determine whether a load preincrement is a
1 good thing to use for a given mode. Defaults to the value of
1 'HAVE_PRE_INCREMENT'.
1
1 -- Macro: USE_LOAD_PRE_DECREMENT (MODE)
1 A C expression used to determine whether a load predecrement is a
1 good thing to use for a given mode. Defaults to the value of
1 'HAVE_PRE_DECREMENT'.
1
1 -- Macro: USE_STORE_POST_INCREMENT (MODE)
1 A C expression used to determine whether a store postincrement is a
1 good thing to use for a given mode. Defaults to the value of
1 'HAVE_POST_INCREMENT'.
1
1 -- Macro: USE_STORE_POST_DECREMENT (MODE)
1 A C expression used to determine whether a store postdecrement is a
1 good thing to use for a given mode. Defaults to the value of
1 'HAVE_POST_DECREMENT'.
1
1 -- Macro: USE_STORE_PRE_INCREMENT (MODE)
1 This macro is used to determine whether a store preincrement is a
1 good thing to use for a given mode. Defaults to the value of
1 'HAVE_PRE_INCREMENT'.
1
1 -- Macro: USE_STORE_PRE_DECREMENT (MODE)
1 This macro is used to determine whether a store predecrement is a
1 good thing to use for a given mode. Defaults to the value of
1 'HAVE_PRE_DECREMENT'.
1
1 -- Macro: NO_FUNCTION_CSE
1 Define this macro to be true if it is as good or better to call a
1 constant function address than to call an address kept in a
1 register.
1
1 -- Macro: LOGICAL_OP_NON_SHORT_CIRCUIT
1 Define this macro if a non-short-circuit operation produced by
1 'fold_range_test ()' is optimal. This macro defaults to true if
1 'BRANCH_COST' is greater than or equal to the value 2.
1
1 -- Target Hook: bool TARGET_OPTAB_SUPPORTED_P (int OP, machine_mode
1 MODE1, machine_mode MODE2, optimization_type OPT_TYPE)
1 Return true if the optimizers should use optab OP with modes MODE1
1 and MODE2 for optimization type OPT_TYPE. The optab is known to
1 have an associated '.md' instruction whose C condition is true.
1 MODE2 is only meaningful for conversion optabs; for direct optabs
1 it is a copy of MODE1.
1
1 For example, when called with OP equal to 'rint_optab' and MODE1
1 equal to 'DFmode', the hook should say whether the optimizers
1 should use optab 'rintdf2'.
1
1 The default hook returns true for all inputs.
1
1 -- Target Hook: bool TARGET_RTX_COSTS (rtx X, machine_mode MODE, int
1 OUTER_CODE, int OPNO, int *TOTAL, bool SPEED)
1 This target hook describes the relative costs of RTL expressions.
1
1 The cost may depend on the precise form of the expression, which is
1 available for examination in X, and the fact that X appears as
1 operand OPNO of an expression with rtx code OUTER_CODE. That is,
1 the hook can assume that there is some rtx Y such that 'GET_CODE
1 (Y) == OUTER_CODE' and such that either (a) 'XEXP (Y, OPNO) == X'
1 or (b) 'XVEC (Y, OPNO)' contains X.
1
1 MODE is X's machine mode, or for cases like 'const_int' that do not
1 have a mode, the mode in which X is used.
1
1 In implementing this hook, you can use the construct 'COSTS_N_INSNS
1 (N)' to specify a cost equal to N fast instructions.
1
1 On entry to the hook, '*TOTAL' contains a default estimate for the
1 cost of the expression. The hook should modify this value as
1 necessary. Traditionally, the default costs are 'COSTS_N_INSNS
1 (5)' for multiplications, 'COSTS_N_INSNS (7)' for division and
1 modulus operations, and 'COSTS_N_INSNS (1)' for all other
1 operations.
1
1 When optimizing for code size, i.e. when 'speed' is false, this
1 target hook should be used to estimate the relative size cost of an
1 expression, again relative to 'COSTS_N_INSNS'.
1
1 The hook returns true when all subexpressions of X have been
1 processed, and false when 'rtx_cost' should recurse.
1
1 -- Target Hook: int TARGET_ADDRESS_COST (rtx ADDRESS, machine_mode
1 MODE, addr_space_t AS, bool SPEED)
1 This hook computes the cost of an addressing mode that contains
1 ADDRESS. If not defined, the cost is computed from the ADDRESS
1 expression and the 'TARGET_RTX_COST' hook.
1
1 For most CISC machines, the default cost is a good approximation of
1 the true cost of the addressing mode. However, on RISC machines,
1 all instructions normally have the same length and execution time.
1 Hence all addresses will have equal costs.
1
1 In cases where more than one form of an address is known, the form
1 with the lowest cost will be used. If multiple forms have the
1 same, lowest, cost, the one that is the most complex will be used.
1
1 For example, suppose an address that is equal to the sum of a
1 register and a constant is used twice in the same basic block.
1 When this macro is not defined, the address will be computed in a
1 register and memory references will be indirect through that
1 register. On machines where the cost of the addressing mode
1 containing the sum is no higher than that of a simple indirect
1 reference, this will produce an additional instruction and possibly
1 require an additional register. Proper specification of this macro
1 eliminates this overhead for such machines.
1
1 This hook is never called with an invalid address.
1
1 On machines where an address involving more than one register is as
1 cheap as an address computation involving only one register,
1 defining 'TARGET_ADDRESS_COST' to reflect this can cause two
1 registers to be live over a region of code where only one would
1 have been if 'TARGET_ADDRESS_COST' were not defined in that manner.
1 This effect should be considered in the definition of this macro.
1 Equivalent costs should probably only be given to addresses with
1 different numbers of registers on machines with lots of registers.
1
1 -- Target Hook: int TARGET_INSN_COST (rtx_insn *INSN, bool SPEED)
1 This target hook describes the relative costs of RTL instructions.
1
1 In implementing this hook, you can use the construct 'COSTS_N_INSNS
1 (N)' to specify a cost equal to N fast instructions.
1
1 When optimizing for code size, i.e. when 'speed' is false, this
1 target hook should be used to estimate the relative size cost of an
1 expression, again relative to 'COSTS_N_INSNS'.
1
1 -- Target Hook: unsigned int TARGET_MAX_NOCE_IFCVT_SEQ_COST (edge E)
1 This hook returns a value in the same units as 'TARGET_RTX_COSTS',
1 giving the maximum acceptable cost for a sequence generated by the
1 RTL if-conversion pass when conditional execution is not available.
1 The RTL if-conversion pass attempts to convert conditional
1 operations that would require a branch to a series of unconditional
1 operations and 'movMODEcc' insns. This hook returns the maximum
1 cost of the unconditional instructions and the 'movMODEcc' insns.
1 RTL if-conversion is cancelled if the cost of the converted
1 sequence is greater than the value returned by this hook.
1
1 'e' is the edge between the basic block containing the conditional
1 branch to the basic block which would be executed if the condition
1 were true.
1
1 The default implementation of this hook uses the
1 'max-rtl-if-conversion-[un]predictable' parameters if they are set,
1 and uses a multiple of 'BRANCH_COST' otherwise.
1
1 -- Target Hook: bool TARGET_NOCE_CONVERSION_PROFITABLE_P (rtx_insn
1 *SEQ, struct noce_if_info *IF_INFO)
1 This hook returns true if the instruction sequence 'seq' is a good
1 candidate as a replacement for the if-convertible sequence
1 described in 'if_info'.
1
1 -- Target Hook: bool TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P (void)
1 This predicate controls the use of the eager delay slot filler to
1 disallow speculatively executed instructions being placed in delay
1 slots. Targets such as certain MIPS architectures possess both
1 branches with and without delay slots. As the eager delay slot
1 filler can decrease performance, disabling it is beneficial when
1 ordinary branches are available. Use of delay slot branches filled
1 using the basic filler is often still desirable as the delay slot
1 can hide a pipeline bubble.
1
1 -- Target Hook: HOST_WIDE_INT TARGET_ESTIMATED_POLY_VALUE (poly_int64
1 VAL)
1 Return an estimate of the runtime value of VAL, for use in things
1 like cost calculations or profiling frequencies. The default
1 implementation returns the lowest possible value of VAL.
1