gcc: Optimize Options
1
1 3.10 Options That Control Optimization
1 ======================================
1
1 These options control various sorts of optimizations.
1
1 Without any optimization option, the compiler's goal is to reduce the
1 cost of compilation and to make debugging produce the expected results.
1 Statements are independent: if you stop the program with a breakpoint
1 between statements, you can then assign a new value to any variable or
1 change the program counter to any other statement in the function and
1 get exactly the results you expect from the source code.
1
1 Turning on optimization flags makes the compiler attempt to improve the
1 performance and/or code size at the expense of compilation time and
1 possibly the ability to debug the program.
1
1 The compiler performs optimization based on the knowledge it has of the
1 program. Compiling multiple files at once to a single output file mode
1 allows the compiler to use information gained from all of the files when
1 compiling each of them.
1
1 Not all optimizations are controlled directly by a flag. Only
1 optimizations that have a flag are listed in this section.
1
1 Most optimizations are only enabled if an '-O' level is set on the
1 command line. Otherwise they are disabled, even if individual
1 optimization flags are specified.
1
1 Depending on the target and how GCC was configured, a slightly
1 different set of optimizations may be enabled at each '-O' level than
1 those listed here. You can invoke GCC with '-Q --help=optimizers' to
1 find out the exact set of optimizations that are enabled at each level.
1 ⇒Overall Options, for examples.
1
1 '-O'
1 '-O1'
1 Optimize. Optimizing compilation takes somewhat more time, and a
1 lot more memory for a large function.
1
1 With '-O', the compiler tries to reduce code size and execution
1 time, without performing any optimizations that take a great deal
1 of compilation time.
1
1 '-O' turns on the following optimization flags:
1 -fauto-inc-dec
1 -fbranch-count-reg
1 -fcombine-stack-adjustments
1 -fcompare-elim
1 -fcprop-registers
1 -fdce
1 -fdefer-pop
1 -fdelayed-branch
1 -fdse
1 -fforward-propagate
1 -fguess-branch-probability
1 -fif-conversion2
1 -fif-conversion
1 -finline-functions-called-once
1 -fipa-pure-const
1 -fipa-profile
1 -fipa-reference
1 -fmerge-constants
1 -fmove-loop-invariants
1 -fomit-frame-pointer
1 -freorder-blocks
1 -fshrink-wrap
1 -fshrink-wrap-separate
1 -fsplit-wide-types
1 -fssa-backprop
1 -fssa-phiopt
1 -ftree-bit-ccp
1 -ftree-ccp
1 -ftree-ch
1 -ftree-coalesce-vars
1 -ftree-copy-prop
1 -ftree-dce
1 -ftree-dominator-opts
1 -ftree-dse
1 -ftree-forwprop
1 -ftree-fre
1 -ftree-phiprop
1 -ftree-sink
1 -ftree-slsr
1 -ftree-sra
1 -ftree-pta
1 -ftree-ter
1 -funit-at-a-time
1
1 '-O2'
1 Optimize even more. GCC performs nearly all supported
1 optimizations that do not involve a space-speed tradeoff. As
1 compared to '-O', this option increases both compilation time and
1 the performance of the generated code.
1
1 '-O2' turns on all optimization flags specified by '-O'. It also
1 turns on the following optimization flags:
1 -fthread-jumps
1 -falign-functions -falign-jumps
1 -falign-loops -falign-labels
1 -fcaller-saves
1 -fcrossjumping
1 -fcse-follow-jumps -fcse-skip-blocks
1 -fdelete-null-pointer-checks
1 -fdevirtualize -fdevirtualize-speculatively
1 -fexpensive-optimizations
1 -fgcse -fgcse-lm
1 -fhoist-adjacent-loads
1 -finline-small-functions
1 -findirect-inlining
1 -fipa-cp
1 -fipa-bit-cp
1 -fipa-vrp
1 -fipa-sra
1 -fipa-icf
1 -fisolate-erroneous-paths-dereference
1 -flra-remat
1 -foptimize-sibling-calls
1 -foptimize-strlen
1 -fpartial-inlining
1 -fpeephole2
1 -freorder-blocks-algorithm=stc
1 -freorder-blocks-and-partition -freorder-functions
1 -frerun-cse-after-loop
1 -fsched-interblock -fsched-spec
1 -fschedule-insns -fschedule-insns2
1 -fstore-merging
1 -fstrict-aliasing
1 -ftree-builtin-call-dce
1 -ftree-switch-conversion -ftree-tail-merge
1 -fcode-hoisting
1 -ftree-pre
1 -ftree-vrp
1 -fipa-ra
1
1 Please note the warning under '-fgcse' about invoking '-O2' on
1 programs that use computed gotos.
1
1 '-O3'
1 Optimize yet more. '-O3' turns on all optimizations specified by
1 '-O2' and also turns on the following optimization flags:
1 -finline-functions
1 -funswitch-loops
1 -fpredictive-commoning
1 -fgcse-after-reload
1 -ftree-loop-vectorize
1 -ftree-loop-distribution
1 -ftree-loop-distribute-patterns
1 -floop-interchange
1 -floop-unroll-and-jam
1 -fsplit-paths
1 -ftree-slp-vectorize
1 -fvect-cost-model
1 -ftree-partial-pre
1 -fpeel-loops
1 -fipa-cp-clone
1
1 '-O0'
1 Reduce compilation time and make debugging produce the expected
1 results. This is the default.
1
1 '-Os'
1 Optimize for size. '-Os' enables all '-O2' optimizations that do
1 not typically increase code size.
1
1 '-Os' disables the following optimization flags:
1 -falign-functions -falign-jumps -falign-loops
1 -falign-labels -fprefetch-loop-arrays
1
1 It also enables '-finline-functions', causes the compiler to tune
1 for code size rather than execution speed, and performs further
1 optimizations designed to reduce code size.
1
1 '-Ofast'
1 Disregard strict standards compliance. '-Ofast' enables all '-O3'
1 optimizations. It also enables optimizations that are not valid
1 for all standard-compliant programs. It turns on '-ffast-math' and
1 the Fortran-specific '-fstack-arrays', unless
1 '-fmax-stack-var-size' is specified, and '-fno-protect-parens'.
1
1 '-Og'
1 Optimize debugging experience. '-Og' enables optimizations that do
1 not interfere with debugging. It should be the optimization level
1 of choice for the standard edit-compile-debug cycle, offering a
1 reasonable level of optimization while maintaining fast compilation
1 and a good debugging experience.
1
1 If you use multiple '-O' options, with or without level numbers, the
1 last such option is the one that is effective.
1
1 Options of the form '-fFLAG' specify machine-independent flags. Most
1 flags have both positive and negative forms; the negative form of
1 '-ffoo' is '-fno-foo'. In the table below, only one of the forms is
1 listed--the one you typically use. You can figure out the other form by
1 either removing 'no-' or adding it.
1
1 The following options control specific optimizations. They are either
1 activated by '-O' options or are related to ones that are. You can use
1 the following flags in the rare cases when "fine-tuning" of
1 optimizations to be performed is desired.
1
1 '-fno-defer-pop'
1 Always pop the arguments to each function call as soon as that
1 function returns. For machines that must pop arguments after a
1 function call, the compiler normally lets arguments accumulate on
1 the stack for several function calls and pops them all at once.
1
1 Disabled at levels '-O', '-O2', '-O3', '-Os'.
1
1 '-fforward-propagate'
1 Perform a forward propagation pass on RTL. The pass tries to
1 combine two instructions and checks if the result can be
1 simplified. If loop unrolling is active, two passes are performed
1 and the second is scheduled after loop unrolling.
1
1 This option is enabled by default at optimization levels '-O',
1 '-O2', '-O3', '-Os'.
1
1 '-ffp-contract=STYLE'
1 '-ffp-contract=off' disables floating-point expression contraction.
1 '-ffp-contract=fast' enables floating-point expression contraction
1 such as forming of fused multiply-add operations if the target has
1 native support for them. '-ffp-contract=on' enables floating-point
1 expression contraction if allowed by the language standard. This
1 is currently not implemented and treated equal to
1 '-ffp-contract=off'.
1
1 The default is '-ffp-contract=fast'.
1
1 '-fomit-frame-pointer'
1 Omit the frame pointer in functions that don't need one. This
1 avoids the instructions to save, set up and restore the frame
1 pointer; on many targets it also makes an extra register available.
1
1 On some targets this flag has no effect because the standard
1 calling sequence always uses a frame pointer, so it cannot be
1 omitted.
1
1 Note that '-fno-omit-frame-pointer' doesn't guarantee the frame
1 pointer is used in all functions. Several targets always omit the
1 frame pointer in leaf functions.
1
1 Enabled by default at '-O' and higher.
1
1 '-foptimize-sibling-calls'
1 Optimize sibling and tail recursive calls.
1
1 Enabled at levels '-O2', '-O3', '-Os'.
1
1 '-foptimize-strlen'
1 Optimize various standard C string functions (e.g. 'strlen',
1 'strchr' or 'strcpy') and their '_FORTIFY_SOURCE' counterparts into
1 faster alternatives.
1
1 Enabled at levels '-O2', '-O3'.
1
1 '-fno-inline'
1 Do not expand any functions inline apart from those marked with the
1 'always_inline' attribute. This is the default when not
1 optimizing.
1
1 Single functions can be exempted from inlining by marking them with
1 the 'noinline' attribute.
1
1 '-finline-small-functions'
1 Integrate functions into their callers when their body is smaller
1 than expected function call code (so overall size of program gets
1 smaller). The compiler heuristically decides which functions are
1 simple enough to be worth integrating in this way. This inlining
1 applies to all functions, even those not declared inline.
1
1 Enabled at levels '-O2', '-O3', '-Os'.
1
1 '-findirect-inlining'
1 Inline also indirect calls that are discovered to be known at
1 compile time thanks to previous inlining. This option has any
1 effect only when inlining itself is turned on by the
1 '-finline-functions' or '-finline-small-functions' options.
1
1 Enabled at levels '-O3', '-Os'. Also enabled by '-fprofile-use'
1 and '-fauto-profile'.
1
1 '-finline-functions'
1 Consider all functions for inlining, even if they are not declared
1 inline. The compiler heuristically decides which functions are
1 worth integrating in this way.
1
1 If all calls to a given function are integrated, and the function
1 is declared 'static', then the function is normally not output as
1 assembler code in its own right.
1
1 Enabled at levels '-O2', '-O3', '-Os'.
1
1 '-finline-functions-called-once'
1 Consider all 'static' functions called once for inlining into their
1 caller even if they are not marked 'inline'. If a call to a given
1 function is integrated, then the function is not output as
1 assembler code in its own right.
1
1 Enabled at levels '-O1', '-O2', '-O3' and '-Os'.
1
1 '-fearly-inlining'
1 Inline functions marked by 'always_inline' and functions whose body
1 seems smaller than the function call overhead early before doing
1 '-fprofile-generate' instrumentation and real inlining pass. Doing
1 so makes profiling significantly cheaper and usually inlining
1 faster on programs having large chains of nested wrapper functions.
1
1 Enabled by default.
1
1 '-fipa-sra'
1 Perform interprocedural scalar replacement of aggregates, removal
1 of unused parameters and replacement of parameters passed by
1 reference by parameters passed by value.
1
1 Enabled at levels '-O2', '-O3' and '-Os'.
1
1 '-finline-limit=N'
1 By default, GCC limits the size of functions that can be inlined.
1 This flag allows coarse control of this limit. N is the size of
1 functions that can be inlined in number of pseudo instructions.
1
1 Inlining is actually controlled by a number of parameters, which
1 may be specified individually by using '--param NAME=VALUE'. The
1 '-finline-limit=N' option sets some of these parameters as follows:
1
1 'max-inline-insns-single'
1 is set to N/2.
1 'max-inline-insns-auto'
1 is set to N/2.
1
1 See below for a documentation of the individual parameters
1 controlling inlining and for the defaults of these parameters.
1
1 _Note:_ there may be no value to '-finline-limit' that results in
1 default behavior.
1
1 _Note:_ pseudo instruction represents, in this particular context,
1 an abstract measurement of function's size. In no way does it
1 represent a count of assembly instructions and as such its exact
1 meaning might change from one release to an another.
1
1 '-fno-keep-inline-dllexport'
1 This is a more fine-grained version of '-fkeep-inline-functions',
1 which applies only to functions that are declared using the
11 'dllexport' attribute or declspec. ⇒Declaring Attributes of
Functions Function Attributes.
1
1 '-fkeep-inline-functions'
1 In C, emit 'static' functions that are declared 'inline' into the
1 object file, even if the function has been inlined into all of its
1 callers. This switch does not affect functions using the 'extern
1 inline' extension in GNU C90. In C++, emit any and all inline
1 functions into the object file.
1
1 '-fkeep-static-functions'
1 Emit 'static' functions into the object file, even if the function
1 is never used.
1
1 '-fkeep-static-consts'
1 Emit variables declared 'static const' when optimization isn't
1 turned on, even if the variables aren't referenced.
1
1 GCC enables this option by default. If you want to force the
1 compiler to check if a variable is referenced, regardless of
1 whether or not optimization is turned on, use the
1 '-fno-keep-static-consts' option.
1
1 '-fmerge-constants'
1 Attempt to merge identical constants (string constants and
1 floating-point constants) across compilation units.
1
1 This option is the default for optimized compilation if the
1 assembler and linker support it. Use '-fno-merge-constants' to
1 inhibit this behavior.
1
1 Enabled at levels '-O', '-O2', '-O3', '-Os'.
1
1 '-fmerge-all-constants'
1 Attempt to merge identical constants and identical variables.
1
1 This option implies '-fmerge-constants'. In addition to
1 '-fmerge-constants' this considers e.g. even constant initialized
1 arrays or initialized constant variables with integral or
1 floating-point types. Languages like C or C++ require each
1 variable, including multiple instances of the same variable in
1 recursive calls, to have distinct locations, so using this option
1 results in non-conforming behavior.
1
1 '-fmodulo-sched'
1 Perform swing modulo scheduling immediately before the first
1 scheduling pass. This pass looks at innermost loops and reorders
1 their instructions by overlapping different iterations.
1
1 '-fmodulo-sched-allow-regmoves'
1 Perform more aggressive SMS-based modulo scheduling with register
1 moves allowed. By setting this flag certain anti-dependences edges
1 are deleted, which triggers the generation of reg-moves based on
1 the life-range analysis. This option is effective only with
1 '-fmodulo-sched' enabled.
1
1 '-fno-branch-count-reg'
1 Avoid running a pass scanning for opportunities to use "decrement
1 and branch" instructions on a count register instead of generating
1 sequences of instructions that decrement a register, compare it
1 against zero, and then branch based upon the result. This option
1 is only meaningful on architectures that support such instructions,
1 which include x86, PowerPC, IA-64 and S/390. Note that the
1 '-fno-branch-count-reg' option doesn't remove the decrement and
1 branch instructions from the generated instruction stream
1 introduced by other optimization passes.
1
1 Enabled by default at '-O1' and higher.
1
1 The default is '-fbranch-count-reg'.
1
1 '-fno-function-cse'
1 Do not put function addresses in registers; make each instruction
1 that calls a constant function contain the function's address
1 explicitly.
1
1 This option results in less efficient code, but some strange hacks
1 that alter the assembler output may be confused by the
1 optimizations performed when this option is not used.
1
1 The default is '-ffunction-cse'
1
1 '-fno-zero-initialized-in-bss'
1 If the target supports a BSS section, GCC by default puts variables
1 that are initialized to zero into BSS. This can save space in the
1 resulting code.
1
1 This option turns off this behavior because some programs
1 explicitly rely on variables going to the data section--e.g., so
1 that the resulting executable can find the beginning of that
1 section and/or make assumptions based on that.
1
1 The default is '-fzero-initialized-in-bss'.
1
1 '-fthread-jumps'
1 Perform optimizations that check to see if a jump branches to a
1 location where another comparison subsumed by the first is found.
1 If so, the first branch is redirected to either the destination of
1 the second branch or a point immediately following it, depending on
1 whether the condition is known to be true or false.
1
1 Enabled at levels '-O2', '-O3', '-Os'.
1
1 '-fsplit-wide-types'
1 When using a type that occupies multiple registers, such as 'long
1 long' on a 32-bit system, split the registers apart and allocate
1 them independently. This normally generates better code for those
1 types, but may make debugging more difficult.
1
1 Enabled at levels '-O', '-O2', '-O3', '-Os'.
1
1 '-fcse-follow-jumps'
1 In common subexpression elimination (CSE), scan through jump
1 instructions when the target of the jump is not reached by any
1 other path. For example, when CSE encounters an 'if' statement
1 with an 'else' clause, CSE follows the jump when the condition
1 tested is false.
1
1 Enabled at levels '-O2', '-O3', '-Os'.
1
1 '-fcse-skip-blocks'
1 This is similar to '-fcse-follow-jumps', but causes CSE to follow
1 jumps that conditionally skip over blocks. When CSE encounters a
1 simple 'if' statement with no else clause, '-fcse-skip-blocks'
1 causes CSE to follow the jump around the body of the 'if'.
1
1 Enabled at levels '-O2', '-O3', '-Os'.
1
1 '-frerun-cse-after-loop'
1 Re-run common subexpression elimination after loop optimizations
1 are performed.
1
1 Enabled at levels '-O2', '-O3', '-Os'.
1
1 '-fgcse'
1 Perform a global common subexpression elimination pass. This pass
1 also performs global constant and copy propagation.
1
1 _Note:_ When compiling a program using computed gotos, a GCC
1 extension, you may get better run-time performance if you disable
1 the global common subexpression elimination pass by adding
1 '-fno-gcse' to the command line.
1
1 Enabled at levels '-O2', '-O3', '-Os'.
1
1 '-fgcse-lm'
1 When '-fgcse-lm' is enabled, global common subexpression
1 elimination attempts to move loads that are only killed by stores
1 into themselves. This allows a loop containing a load/store
1 sequence to be changed to a load outside the loop, and a copy/store
1 within the loop.
1
1 Enabled by default when '-fgcse' is enabled.
1
1 '-fgcse-sm'
1 When '-fgcse-sm' is enabled, a store motion pass is run after
1 global common subexpression elimination. This pass attempts to
1 move stores out of loops. When used in conjunction with
1 '-fgcse-lm', loops containing a load/store sequence can be changed
1 to a load before the loop and a store after the loop.
1
1 Not enabled at any optimization level.
1
1 '-fgcse-las'
1 When '-fgcse-las' is enabled, the global common subexpression
1 elimination pass eliminates redundant loads that come after stores
1 to the same memory location (both partial and full redundancies).
1
1 Not enabled at any optimization level.
1
1 '-fgcse-after-reload'
1 When '-fgcse-after-reload' is enabled, a redundant load elimination
1 pass is performed after reload. The purpose of this pass is to
1 clean up redundant spilling.
1
1 '-faggressive-loop-optimizations'
1 This option tells the loop optimizer to use language constraints to
1 derive bounds for the number of iterations of a loop. This assumes
1 that loop code does not invoke undefined behavior by for example
1 causing signed integer overflows or out-of-bound array accesses.
1 The bounds for the number of iterations of a loop are used to guide
1 loop unrolling and peeling and loop exit test optimizations. This
1 option is enabled by default.
1
1 '-funconstrained-commons'
1 This option tells the compiler that variables declared in common
1 blocks (e.g. Fortran) may later be overridden with longer trailing
1 arrays. This prevents certain optimizations that depend on knowing
1 the array bounds.
1
1 '-fcrossjumping'
1 Perform cross-jumping transformation. This transformation unifies
1 equivalent code and saves code size. The resulting code may or may
1 not perform better than without cross-jumping.
1
1 Enabled at levels '-O2', '-O3', '-Os'.
1
1 '-fauto-inc-dec'
1 Combine increments or decrements of addresses with memory accesses.
1 This pass is always skipped on architectures that do not have
1 instructions to support this. Enabled by default at '-O' and
1 higher on architectures that support this.
1
1 '-fdce'
1 Perform dead code elimination (DCE) on RTL. Enabled by default at
1 '-O' and higher.
1
1 '-fdse'
1 Perform dead store elimination (DSE) on RTL. Enabled by default at
1 '-O' and higher.
1
1 '-fif-conversion'
1 Attempt to transform conditional jumps into branch-less
1 equivalents. This includes use of conditional moves, min, max, set
1 flags and abs instructions, and some tricks doable by standard
1 arithmetics. The use of conditional execution on chips where it is
1 available is controlled by '-fif-conversion2'.
1
1 Enabled at levels '-O', '-O2', '-O3', '-Os'.
1
1 '-fif-conversion2'
1 Use conditional execution (where available) to transform
1 conditional jumps into branch-less equivalents.
1
1 Enabled at levels '-O', '-O2', '-O3', '-Os'.
1
1 '-fdeclone-ctor-dtor'
1 The C++ ABI requires multiple entry points for constructors and
1 destructors: one for a base subobject, one for a complete object,
1 and one for a virtual destructor that calls operator delete
1 afterwards. For a hierarchy with virtual bases, the base and
1 complete variants are clones, which means two copies of the
1 function. With this option, the base and complete variants are
1 changed to be thunks that call a common implementation.
1
1 Enabled by '-Os'.
1
1 '-fdelete-null-pointer-checks'
1 Assume that programs cannot safely dereference null pointers, and
1 that no code or data element resides at address zero. This option
1 enables simple constant folding optimizations at all optimization
1 levels. In addition, other optimization passes in GCC use this
1 flag to control global dataflow analyses that eliminate useless
1 checks for null pointers; these assume that a memory access to
1 address zero always results in a trap, so that if a pointer is
1 checked after it has already been dereferenced, it cannot be null.
1
1 Note however that in some environments this assumption is not true.
1 Use '-fno-delete-null-pointer-checks' to disable this optimization
1 for programs that depend on that behavior.
1
1 This option is enabled by default on most targets. On Nios II ELF,
1 it defaults to off. On AVR, CR16, and MSP430, this option is
1 completely disabled.
1
1 Passes that use the dataflow information are enabled independently
1 at different optimization levels.
1
1 '-fdevirtualize'
1 Attempt to convert calls to virtual functions to direct calls.
1 This is done both within a procedure and interprocedurally as part
1 of indirect inlining ('-findirect-inlining') and interprocedural
1 constant propagation ('-fipa-cp'). Enabled at levels '-O2', '-O3',
1 '-Os'.
1
1 '-fdevirtualize-speculatively'
1 Attempt to convert calls to virtual functions to speculative direct
1 calls. Based on the analysis of the type inheritance graph,
1 determine for a given call the set of likely targets. If the set
1 is small, preferably of size 1, change the call into a conditional
1 deciding between direct and indirect calls. The speculative calls
1 enable more optimizations, such as inlining. When they seem
1 useless after further optimization, they are converted back into
1 original form.
1
1 '-fdevirtualize-at-ltrans'
1 Stream extra information needed for aggressive devirtualization
1 when running the link-time optimizer in local transformation mode.
1 This option enables more devirtualization but significantly
1 increases the size of streamed data. For this reason it is
1 disabled by default.
1
1 '-fexpensive-optimizations'
1 Perform a number of minor optimizations that are relatively
1 expensive.
1
1 Enabled at levels '-O2', '-O3', '-Os'.
1
1 '-free'
1 Attempt to remove redundant extension instructions. This is
1 especially helpful for the x86-64 architecture, which implicitly
1 zero-extends in 64-bit registers after writing to their lower
1 32-bit half.
1
1 Enabled for Alpha, AArch64 and x86 at levels '-O2', '-O3', '-Os'.
1
1 '-fno-lifetime-dse'
1 In C++ the value of an object is only affected by changes within
1 its lifetime: when the constructor begins, the object has an
1 indeterminate value, and any changes during the lifetime of the
1 object are dead when the object is destroyed. Normally dead store
1 elimination will take advantage of this; if your code relies on the
1 value of the object storage persisting beyond the lifetime of the
1 object, you can use this flag to disable this optimization. To
1 preserve stores before the constructor starts (e.g. because your
1 operator new clears the object storage) but still treat the object
1 as dead after the destructor you, can use '-flifetime-dse=1'. The
1 default behavior can be explicitly selected with
1 '-flifetime-dse=2'. '-flifetime-dse=0' is equivalent to
1 '-fno-lifetime-dse'.
1
1 '-flive-range-shrinkage'
1 Attempt to decrease register pressure through register live range
1 shrinkage. This is helpful for fast processors with small or
1 moderate size register sets.
1
1 '-fira-algorithm=ALGORITHM'
1 Use the specified coloring algorithm for the integrated register
1 allocator. The ALGORITHM argument can be 'priority', which
1 specifies Chow's priority coloring, or 'CB', which specifies
1 Chaitin-Briggs coloring. Chaitin-Briggs coloring is not
1 implemented for all architectures, but for those targets that do
1 support it, it is the default because it generates better code.
1
1 '-fira-region=REGION'
1 Use specified regions for the integrated register allocator. The
1 REGION argument should be one of the following:
1
1 'all'
1 Use all loops as register allocation regions. This can give
1 the best results for machines with a small and/or irregular
1 register set.
1
1 'mixed'
1 Use all loops except for loops with small register pressure as
1 the regions. This value usually gives the best results in
1 most cases and for most architectures, and is enabled by
1 default when compiling with optimization for speed ('-O',
1 '-O2', ...).
1
1 'one'
1 Use all functions as a single region. This typically results
1 in the smallest code size, and is enabled by default for '-Os'
1 or '-O0'.
1
1 '-fira-hoist-pressure'
1 Use IRA to evaluate register pressure in the code hoisting pass for
1 decisions to hoist expressions. This option usually results in
1 smaller code, but it can slow the compiler down.
1
1 This option is enabled at level '-Os' for all targets.
1
1 '-fira-loop-pressure'
1 Use IRA to evaluate register pressure in loops for decisions to
1 move loop invariants. This option usually results in generation of
1 faster and smaller code on machines with large register files (>=
1 32 registers), but it can slow the compiler down.
1
1 This option is enabled at level '-O3' for some targets.
1
1 '-fno-ira-share-save-slots'
1 Disable sharing of stack slots used for saving call-used hard
1 registers living through a call. Each hard register gets a
1 separate stack slot, and as a result function stack frames are
1 larger.
1
1 '-fno-ira-share-spill-slots'
1 Disable sharing of stack slots allocated for pseudo-registers.
1 Each pseudo-register that does not get a hard register gets a
1 separate stack slot, and as a result function stack frames are
1 larger.
1
1 '-flra-remat'
1 Enable CFG-sensitive rematerialization in LRA. Instead of loading
1 values of spilled pseudos, LRA tries to rematerialize (recalculate)
1 values if it is profitable.
1
1 Enabled at levels '-O2', '-O3', '-Os'.
1
1 '-fdelayed-branch'
1 If supported for the target machine, attempt to reorder
1 instructions to exploit instruction slots available after delayed
1 branch instructions.
1
1 Enabled at levels '-O', '-O2', '-O3', '-Os'.
1
1 '-fschedule-insns'
1 If supported for the target machine, attempt to reorder
1 instructions to eliminate execution stalls due to required data
1 being unavailable. This helps machines that have slow floating
1 point or memory load instructions by allowing other instructions to
1 be issued until the result of the load or floating-point
1 instruction is required.
1
1 Enabled at levels '-O2', '-O3'.
1
1 '-fschedule-insns2'
1 Similar to '-fschedule-insns', but requests an additional pass of
1 instruction scheduling after register allocation has been done.
1 This is especially useful on machines with a relatively small
1 number of registers and where memory load instructions take more
1 than one cycle.
1
1 Enabled at levels '-O2', '-O3', '-Os'.
1
1 '-fno-sched-interblock'
1 Don't schedule instructions across basic blocks. This is normally
1 enabled by default when scheduling before register allocation, i.e.
1 with '-fschedule-insns' or at '-O2' or higher.
1
1 '-fno-sched-spec'
1 Don't allow speculative motion of non-load instructions. This is
1 normally enabled by default when scheduling before register
1 allocation, i.e. with '-fschedule-insns' or at '-O2' or higher.
1
1 '-fsched-pressure'
1 Enable register pressure sensitive insn scheduling before register
1 allocation. This only makes sense when scheduling before register
1 allocation is enabled, i.e. with '-fschedule-insns' or at '-O2' or
1 higher. Usage of this option can improve the generated code and
1 decrease its size by preventing register pressure increase above
1 the number of available hard registers and subsequent spills in
1 register allocation.
1
1 '-fsched-spec-load'
1 Allow speculative motion of some load instructions. This only
1 makes sense when scheduling before register allocation, i.e. with
1 '-fschedule-insns' or at '-O2' or higher.
1
1 '-fsched-spec-load-dangerous'
1 Allow speculative motion of more load instructions. This only
1 makes sense when scheduling before register allocation, i.e. with
1 '-fschedule-insns' or at '-O2' or higher.
1
1 '-fsched-stalled-insns'
1 '-fsched-stalled-insns=N'
1 Define how many insns (if any) can be moved prematurely from the
1 queue of stalled insns into the ready list during the second
1 scheduling pass. '-fno-sched-stalled-insns' means that no insns
1 are moved prematurely, '-fsched-stalled-insns=0' means there is no
1 limit on how many queued insns can be moved prematurely.
1 '-fsched-stalled-insns' without a value is equivalent to
1 '-fsched-stalled-insns=1'.
1
1 '-fsched-stalled-insns-dep'
1 '-fsched-stalled-insns-dep=N'
1 Define how many insn groups (cycles) are examined for a dependency
1 on a stalled insn that is a candidate for premature removal from
1 the queue of stalled insns. This has an effect only during the
1 second scheduling pass, and only if '-fsched-stalled-insns' is
1 used. '-fno-sched-stalled-insns-dep' is equivalent to
1 '-fsched-stalled-insns-dep=0'. '-fsched-stalled-insns-dep' without
1 a value is equivalent to '-fsched-stalled-insns-dep=1'.
1
1 '-fsched2-use-superblocks'
1 When scheduling after register allocation, use superblock
1 scheduling. This allows motion across basic block boundaries,
1 resulting in faster schedules. This option is experimental, as not
1 all machine descriptions used by GCC model the CPU closely enough
1 to avoid unreliable results from the algorithm.
1
1 This only makes sense when scheduling after register allocation,
1 i.e. with '-fschedule-insns2' or at '-O2' or higher.
1
1 '-fsched-group-heuristic'
1 Enable the group heuristic in the scheduler. This heuristic favors
1 the instruction that belongs to a schedule group. This is enabled
1 by default when scheduling is enabled, i.e. with '-fschedule-insns'
1 or '-fschedule-insns2' or at '-O2' or higher.
1
1 '-fsched-critical-path-heuristic'
1 Enable the critical-path heuristic in the scheduler. This
1 heuristic favors instructions on the critical path. This is
1 enabled by default when scheduling is enabled, i.e. with
1 '-fschedule-insns' or '-fschedule-insns2' or at '-O2' or higher.
1
1 '-fsched-spec-insn-heuristic'
1 Enable the speculative instruction heuristic in the scheduler.
1 This heuristic favors speculative instructions with greater
1 dependency weakness. This is enabled by default when scheduling is
1 enabled, i.e. with '-fschedule-insns' or '-fschedule-insns2' or at
1 '-O2' or higher.
1
1 '-fsched-rank-heuristic'
1 Enable the rank heuristic in the scheduler. This heuristic favors
1 the instruction belonging to a basic block with greater size or
1 frequency. This is enabled by default when scheduling is enabled,
1 i.e. with '-fschedule-insns' or '-fschedule-insns2' or at '-O2' or
1 higher.
1
1 '-fsched-last-insn-heuristic'
1 Enable the last-instruction heuristic in the scheduler. This
1 heuristic favors the instruction that is less dependent on the last
1 instruction scheduled. This is enabled by default when scheduling
1 is enabled, i.e. with '-fschedule-insns' or '-fschedule-insns2' or
1 at '-O2' or higher.
1
1 '-fsched-dep-count-heuristic'
1 Enable the dependent-count heuristic in the scheduler. This
1 heuristic favors the instruction that has more instructions
1 depending on it. This is enabled by default when scheduling is
1 enabled, i.e. with '-fschedule-insns' or '-fschedule-insns2' or at
1 '-O2' or higher.
1
1 '-freschedule-modulo-scheduled-loops'
1 Modulo scheduling is performed before traditional scheduling. If a
1 loop is modulo scheduled, later scheduling passes may change its
1 schedule. Use this option to control that behavior.
1
1 '-fselective-scheduling'
1 Schedule instructions using selective scheduling algorithm.
1 Selective scheduling runs instead of the first scheduler pass.
1
1 '-fselective-scheduling2'
1 Schedule instructions using selective scheduling algorithm.
1 Selective scheduling runs instead of the second scheduler pass.
1
1 '-fsel-sched-pipelining'
1 Enable software pipelining of innermost loops during selective
1 scheduling. This option has no effect unless one of
1 '-fselective-scheduling' or '-fselective-scheduling2' is turned on.
1
1 '-fsel-sched-pipelining-outer-loops'
1 When pipelining loops during selective scheduling, also pipeline
1 outer loops. This option has no effect unless
1 '-fsel-sched-pipelining' is turned on.
1
1 '-fsemantic-interposition'
1 Some object formats, like ELF, allow interposing of symbols by the
1 dynamic linker. This means that for symbols exported from the DSO,
1 the compiler cannot perform interprocedural propagation, inlining
1 and other optimizations in anticipation that the function or
1 variable in question may change. While this feature is useful, for
1 example, to rewrite memory allocation functions by a debugging
1 implementation, it is expensive in the terms of code quality. With
1 '-fno-semantic-interposition' the compiler assumes that if
1 interposition happens for functions the overwriting function will
1 have precisely the same semantics (and side effects). Similarly if
1 interposition happens for variables, the constructor of the
1 variable will be the same. The flag has no effect for functions
1 explicitly declared inline (where it is never allowed for
1 interposition to change semantics) and for symbols explicitly
1 declared weak.
1
1 '-fshrink-wrap'
1 Emit function prologues only before parts of the function that need
1 it, rather than at the top of the function. This flag is enabled
1 by default at '-O' and higher.
1
1 '-fshrink-wrap-separate'
1 Shrink-wrap separate parts of the prologue and epilogue separately,
1 so that those parts are only executed when needed. This option is
1 on by default, but has no effect unless '-fshrink-wrap' is also
1 turned on and the target supports this.
1
1 '-fcaller-saves'
1 Enable allocation of values to registers that are clobbered by
1 function calls, by emitting extra instructions to save and restore
1 the registers around such calls. Such allocation is done only when
1 it seems to result in better code.
1
1 This option is always enabled by default on certain machines,
1 usually those which have no call-preserved registers to use
1 instead.
1
1 Enabled at levels '-O2', '-O3', '-Os'.
1
1 '-fcombine-stack-adjustments'
1 Tracks stack adjustments (pushes and pops) and stack memory
1 references and then tries to find ways to combine them.
1
1 Enabled by default at '-O1' and higher.
1
1 '-fipa-ra'
1 Use caller save registers for allocation if those registers are not
1 used by any called function. In that case it is not necessary to
1 save and restore them around calls. This is only possible if
1 called functions are part of same compilation unit as current
1 function and they are compiled before it.
1
1 Enabled at levels '-O2', '-O3', '-Os', however the option is
1 disabled if generated code will be instrumented for profiling
1 ('-p', or '-pg') or if callee's register usage cannot be known
1 exactly (this happens on targets that do not expose prologues and
1 epilogues in RTL).
1
1 '-fconserve-stack'
1 Attempt to minimize stack usage. The compiler attempts to use less
1 stack space, even if that makes the program slower. This option
1 implies setting the 'large-stack-frame' parameter to 100 and the
1 'large-stack-frame-growth' parameter to 400.
1
1 '-ftree-reassoc'
1 Perform reassociation on trees. This flag is enabled by default at
1 '-O' and higher.
1
1 '-fcode-hoisting'
1 Perform code hoisting. Code hoisting tries to move the evaluation
1 of expressions executed on all paths to the function exit as early
1 as possible. This is especially useful as a code size
1 optimization, but it often helps for code speed as well. This flag
1 is enabled by default at '-O2' and higher.
1
1 '-ftree-pre'
1 Perform partial redundancy elimination (PRE) on trees. This flag
1 is enabled by default at '-O2' and '-O3'.
1
1 '-ftree-partial-pre'
1 Make partial redundancy elimination (PRE) more aggressive. This
1 flag is enabled by default at '-O3'.
1
1 '-ftree-forwprop'
1 Perform forward propagation on trees. This flag is enabled by
1 default at '-O' and higher.
1
1 '-ftree-fre'
1 Perform full redundancy elimination (FRE) on trees. The difference
1 between FRE and PRE is that FRE only considers expressions that are
1 computed on all paths leading to the redundant computation. This
1 analysis is faster than PRE, though it exposes fewer redundancies.
1 This flag is enabled by default at '-O' and higher.
1
1 '-ftree-phiprop'
1 Perform hoisting of loads from conditional pointers on trees. This
1 pass is enabled by default at '-O' and higher.
1
1 '-fhoist-adjacent-loads'
1 Speculatively hoist loads from both branches of an if-then-else if
1 the loads are from adjacent locations in the same structure and the
1 target architecture has a conditional move instruction. This flag
1 is enabled by default at '-O2' and higher.
1
1 '-ftree-copy-prop'
1 Perform copy propagation on trees. This pass eliminates
1 unnecessary copy operations. This flag is enabled by default at
1 '-O' and higher.
1
1 '-fipa-pure-const'
1 Discover which functions are pure or constant. Enabled by default
1 at '-O' and higher.
1
1 '-fipa-reference'
1 Discover which static variables do not escape the compilation unit.
1 Enabled by default at '-O' and higher.
1
1 '-fipa-pta'
1 Perform interprocedural pointer analysis and interprocedural
1 modification and reference analysis. This option can cause
1 excessive memory and compile-time usage on large compilation units.
1 It is not enabled by default at any optimization level.
1
1 '-fipa-profile'
1 Perform interprocedural profile propagation. The functions called
1 only from cold functions are marked as cold. Also functions
1 executed once (such as 'cold', 'noreturn', static constructors or
1 destructors) are identified. Cold functions and loop less parts of
1 functions executed once are then optimized for size. Enabled by
1 default at '-O' and higher.
1
1 '-fipa-cp'
1 Perform interprocedural constant propagation. This optimization
1 analyzes the program to determine when values passed to functions
1 are constants and then optimizes accordingly. This optimization
1 can substantially increase performance if the application has
1 constants passed to functions. This flag is enabled by default at
1 '-O2', '-Os' and '-O3'.
1
1 '-fipa-cp-clone'
1 Perform function cloning to make interprocedural constant
1 propagation stronger. When enabled, interprocedural constant
1 propagation performs function cloning when externally visible
1 function can be called with constant arguments. Because this
1 optimization can create multiple copies of functions, it may
1 significantly increase code size (see '--param
1 ipcp-unit-growth=VALUE'). This flag is enabled by default at
1 '-O3'.
1
1 '-fipa-bit-cp'
1 When enabled, perform interprocedural bitwise constant propagation.
1 This flag is enabled by default at '-O2'. It requires that
1 '-fipa-cp' is enabled.
1
1 '-fipa-vrp'
1 When enabled, perform interprocedural propagation of value ranges.
1 This flag is enabled by default at '-O2'. It requires that
1 '-fipa-cp' is enabled.
1
1 '-fipa-icf'
1 Perform Identical Code Folding for functions and read-only
1 variables. The optimization reduces code size and may disturb
1 unwind stacks by replacing a function by equivalent one with a
1 different name. The optimization works more effectively with
1 link-time optimization enabled.
1
1 Nevertheless the behavior is similar to Gold Linker ICF
1 optimization, GCC ICF works on different levels and thus the
1 optimizations are not same - there are equivalences that are found
1 only by GCC and equivalences found only by Gold.
1
1 This flag is enabled by default at '-O2' and '-Os'.
1
1 '-flive-patching=LEVEL'
1 Control GCC's optimizations to produce output suitable for
1 live-patching.
1
1 If the compiler's optimization uses a function's body or
1 information extracted from its body to optimize/change another
1 function, the latter is called an impacted function of the former.
1 If a function is patched, its impacted functions should be patched
1 too.
1
1 The impacted functions are determined by the compiler's
1 interprocedural optimizations. For example, a caller is impacted
1 when inlining a function into its caller, cloning a function and
1 changing its caller to call this new clone, or extracting a
1 function's pureness/constness information to optimize its direct or
1 indirect callers, etc.
1
1 Usually, the more IPA optimizations enabled, the larger the number
1 of impacted functions for each function. In order to control the
1 number of impacted functions and more easily compute the list of
1 impacted function, IPA optimizations can be partially enabled at
1 two different levels.
1
1 The LEVEL argument should be one of the following:
1
1 'inline-clone'
1
1 Only enable inlining and cloning optimizations, which includes
1 inlining, cloning, interprocedural scalar replacement of
1 aggregates and partial inlining. As a result, when patching a
1 function, all its callers and its clones' callers are
1 impacted, therefore need to be patched as well.
1
1 '-flive-patching=inline-clone' disables the following
1 optimization flags:
1 -fwhole-program -fipa-pta -fipa-reference -fipa-ra
1 -fipa-icf -fipa-icf-functions -fipa-icf-variables
1 -fipa-bit-cp -fipa-vrp -fipa-pure-const -fipa-reference-addressable
1 -fipa-stack-alignment
1
1 'inline-only-static'
1
1 Only enable inlining of static functions. As a result, when
1 patching a static function, all its callers are impacted and
1 so need to be patched as well.
1
1 In addition to all the flags that
1 '-flive-patching=inline-clone' disables,
1 '-flive-patching=inline-only-static' disables the following
1 additional optimization flags:
1 -fipa-cp-clone -fipa-sra -fpartial-inlining -fipa-cp
1
1 When '-flive-patching' is specified without any value, the default
1 value is INLINE-CLONE.
1
1 This flag is disabled by default.
1
1 Note that '-flive-patching' is not supported with link-time
1 optimization ('-flto').
1
1 '-fisolate-erroneous-paths-dereference'
1 Detect paths that trigger erroneous or undefined behavior due to
1 dereferencing a null pointer. Isolate those paths from the main
1 control flow and turn the statement with erroneous or undefined
1 behavior into a trap. This flag is enabled by default at '-O2' and
1 higher and depends on '-fdelete-null-pointer-checks' also being
1 enabled.
1
1 '-fisolate-erroneous-paths-attribute'
1 Detect paths that trigger erroneous or undefined behavior due to a
1 null value being used in a way forbidden by a 'returns_nonnull' or
1 'nonnull' attribute. Isolate those paths from the main control
1 flow and turn the statement with erroneous or undefined behavior
1 into a trap. This is not currently enabled, but may be enabled by
1 '-O2' in the future.
1
1 '-ftree-sink'
1 Perform forward store motion on trees. This flag is enabled by
1 default at '-O' and higher.
1
1 '-ftree-bit-ccp'
1 Perform sparse conditional bit constant propagation on trees and
1 propagate pointer alignment information. This pass only operates
1 on local scalar variables and is enabled by default at '-O' and
1 higher. It requires that '-ftree-ccp' is enabled.
1
1 '-ftree-ccp'
1 Perform sparse conditional constant propagation (CCP) on trees.
1 This pass only operates on local scalar variables and is enabled by
1 default at '-O' and higher.
1
1 '-fssa-backprop'
1 Propagate information about uses of a value up the definition chain
1 in order to simplify the definitions. For example, this pass
1 strips sign operations if the sign of a value never matters. The
1 flag is enabled by default at '-O' and higher.
1
1 '-fssa-phiopt'
1 Perform pattern matching on SSA PHI nodes to optimize conditional
1 code. This pass is enabled by default at '-O' and higher.
1
1 '-ftree-switch-conversion'
1 Perform conversion of simple initializations in a switch to
1 initializations from a scalar array. This flag is enabled by
1 default at '-O2' and higher.
1
1 '-ftree-tail-merge'
1 Look for identical code sequences. When found, replace one with a
1 jump to the other. This optimization is known as tail merging or
1 cross jumping. This flag is enabled by default at '-O2' and
1 higher. The compilation time in this pass can be limited using
1 'max-tail-merge-comparisons' parameter and
1 'max-tail-merge-iterations' parameter.
1
1 '-ftree-dce'
1 Perform dead code elimination (DCE) on trees. This flag is enabled
1 by default at '-O' and higher.
1
1 '-ftree-builtin-call-dce'
1 Perform conditional dead code elimination (DCE) for calls to
1 built-in functions that may set 'errno' but are otherwise free of
1 side effects. This flag is enabled by default at '-O2' and higher
1 if '-Os' is not also specified.
1
1 '-ftree-dominator-opts'
1 Perform a variety of simple scalar cleanups (constant/copy
1 propagation, redundancy elimination, range propagation and
1 expression simplification) based on a dominator tree traversal.
1 This also performs jump threading (to reduce jumps to jumps). This
1 flag is enabled by default at '-O' and higher.
1
1 '-ftree-dse'
1 Perform dead store elimination (DSE) on trees. A dead store is a
1 store into a memory location that is later overwritten by another
1 store without any intervening loads. In this case the earlier
1 store can be deleted. This flag is enabled by default at '-O' and
1 higher.
1
1 '-ftree-ch'
1 Perform loop header copying on trees. This is beneficial since it
1 increases effectiveness of code motion optimizations. It also
1 saves one jump. This flag is enabled by default at '-O' and
1 higher. It is not enabled for '-Os', since it usually increases
1 code size.
1
1 '-ftree-loop-optimize'
1 Perform loop optimizations on trees. This flag is enabled by
1 default at '-O' and higher.
1
1 '-ftree-loop-linear'
1 '-floop-strip-mine'
1 '-floop-block'
1 Perform loop nest optimizations. Same as '-floop-nest-optimize'.
1 To use this code transformation, GCC has to be configured with
1 '--with-isl' to enable the Graphite loop transformation
1 infrastructure.
1
1 '-fgraphite-identity'
1 Enable the identity transformation for graphite. For every SCoP we
1 generate the polyhedral representation and transform it back to
1 gimple. Using '-fgraphite-identity' we can check the costs or
1 benefits of the GIMPLE -> GRAPHITE -> GIMPLE transformation. Some
1 minimal optimizations are also performed by the code generator isl,
1 like index splitting and dead code elimination in loops.
1
1 '-floop-nest-optimize'
1 Enable the isl based loop nest optimizer. This is a generic loop
1 nest optimizer based on the Pluto optimization algorithms. It
1 calculates a loop structure optimized for data-locality and
1 parallelism. This option is experimental.
1
1 '-floop-parallelize-all'
1 Use the Graphite data dependence analysis to identify loops that
1 can be parallelized. Parallelize all the loops that can be
1 analyzed to not contain loop carried dependences without checking
1 that it is profitable to parallelize the loops.
1
1 '-ftree-coalesce-vars'
1 While transforming the program out of the SSA representation,
1 attempt to reduce copying by coalescing versions of different
1 user-defined variables, instead of just compiler temporaries. This
1 may severely limit the ability to debug an optimized program
1 compiled with '-fno-var-tracking-assignments'. In the negated
1 form, this flag prevents SSA coalescing of user variables. This
1 option is enabled by default if optimization is enabled, and it
1 does very little otherwise.
1
1 '-ftree-loop-if-convert'
1 Attempt to transform conditional jumps in the innermost loops to
1 branch-less equivalents. The intent is to remove control-flow from
1 the innermost loops in order to improve the ability of the
1 vectorization pass to handle these loops. This is enabled by
1 default if vectorization is enabled.
1
1 '-ftree-loop-distribution'
1 Perform loop distribution. This flag can improve cache performance
1 on big loop bodies and allow further loop optimizations, like
1 parallelization or vectorization, to take place. For example, the
1 loop
1 DO I = 1, N
1 A(I) = B(I) + C
1 D(I) = E(I) * F
1 ENDDO
1 is transformed to
1 DO I = 1, N
1 A(I) = B(I) + C
1 ENDDO
1 DO I = 1, N
1 D(I) = E(I) * F
1 ENDDO
1
1 '-ftree-loop-distribute-patterns'
1 Perform loop distribution of patterns that can be code generated
1 with calls to a library. This flag is enabled by default at '-O3'.
1
1 This pass distributes the initialization loops and generates a call
1 to memset zero. For example, the loop
1 DO I = 1, N
1 A(I) = 0
1 B(I) = A(I) + I
1 ENDDO
1 is transformed to
1 DO I = 1, N
1 A(I) = 0
1 ENDDO
1 DO I = 1, N
1 B(I) = A(I) + I
1 ENDDO
1 and the initialization loop is transformed into a call to memset
1 zero.
1
1 '-floop-interchange'
1 Perform loop interchange outside of graphite. This flag can
1 improve cache performance on loop nest and allow further loop
1 optimizations, like vectorization, to take place. For example, the
1 loop
1 for (int i = 0; i < N; i++)
1 for (int j = 0; j < N; j++)
1 for (int k = 0; k < N; k++)
1 c[i][j] = c[i][j] + a[i][k]*b[k][j];
1 is transformed to
1 for (int i = 0; i < N; i++)
1 for (int k = 0; k < N; k++)
1 for (int j = 0; j < N; j++)
1 c[i][j] = c[i][j] + a[i][k]*b[k][j];
1 This flag is enabled by default at '-O3'.
1
1 '-floop-unroll-and-jam'
1 Apply unroll and jam transformations on feasible loops. In a loop
1 nest this unrolls the outer loop by some factor and fuses the
1 resulting multiple inner loops. This flag is enabled by default at
1 '-O3'.
1
1 '-ftree-loop-im'
1 Perform loop invariant motion on trees. This pass moves only
1 invariants that are hard to handle at RTL level (function calls,
1 operations that expand to nontrivial sequences of insns). With
1 '-funswitch-loops' it also moves operands of conditions that are
1 invariant out of the loop, so that we can use just trivial
1 invariantness analysis in loop unswitching. The pass also includes
1 store motion.
1
1 '-ftree-loop-ivcanon'
1 Create a canonical counter for number of iterations in loops for
1 which determining number of iterations requires complicated
1 analysis. Later optimizations then may determine the number
1 easily. Useful especially in connection with unrolling.
1
1 '-fivopts'
1 Perform induction variable optimizations (strength reduction,
1 induction variable merging and induction variable elimination) on
1 trees.
1
1 '-ftree-parallelize-loops=n'
1 Parallelize loops, i.e., split their iteration space to run in n
1 threads. This is only possible for loops whose iterations are
1 independent and can be arbitrarily reordered. The optimization is
1 only profitable on multiprocessor machines, for loops that are
1 CPU-intensive, rather than constrained e.g. by memory bandwidth.
1 This option implies '-pthread', and thus is only supported on
1 targets that have support for '-pthread'.
1
1 '-ftree-pta'
1 Perform function-local points-to analysis on trees. This flag is
1 enabled by default at '-O' and higher.
1
1 '-ftree-sra'
1 Perform scalar replacement of aggregates. This pass replaces
1 structure references with scalars to prevent committing structures
1 to memory too early. This flag is enabled by default at '-O' and
1 higher.
1
1 '-fstore-merging'
1 Perform merging of narrow stores to consecutive memory addresses.
1 This pass merges contiguous stores of immediate values narrower
1 than a word into fewer wider stores to reduce the number of
1 instructions. This is enabled by default at '-O2' and higher as
1 well as '-Os'.
1
1 '-ftree-ter'
1 Perform temporary expression replacement during the SSA->normal
1 phase. Single use/single def temporaries are replaced at their use
1 location with their defining expression. This results in
1 non-GIMPLE code, but gives the expanders much more complex trees to
1 work on resulting in better RTL generation. This is enabled by
1 default at '-O' and higher.
1
1 '-ftree-slsr'
1 Perform straight-line strength reduction on trees. This recognizes
1 related expressions involving multiplications and replaces them by
1 less expensive calculations when possible. This is enabled by
1 default at '-O' and higher.
1
1 '-ftree-vectorize'
1 Perform vectorization on trees. This flag enables
1 '-ftree-loop-vectorize' and '-ftree-slp-vectorize' if not
1 explicitly specified.
1
1 '-ftree-loop-vectorize'
1 Perform loop vectorization on trees. This flag is enabled by
1 default at '-O3' and when '-ftree-vectorize' is enabled.
1
1 '-ftree-slp-vectorize'
1 Perform basic block vectorization on trees. This flag is enabled
1 by default at '-O3' and when '-ftree-vectorize' is enabled.
1
1 '-fvect-cost-model=MODEL'
1 Alter the cost model used for vectorization. The MODEL argument
1 should be one of 'unlimited', 'dynamic' or 'cheap'. With the
1 'unlimited' model the vectorized code-path is assumed to be
1 profitable while with the 'dynamic' model a runtime check guards
1 the vectorized code-path to enable it only for iteration counts
1 that will likely execute faster than when executing the original
1 scalar loop. The 'cheap' model disables vectorization of loops
1 where doing so would be cost prohibitive for example due to
1 required runtime checks for data dependence or alignment but
1 otherwise is equal to the 'dynamic' model. The default cost model
1 depends on other optimization flags and is either 'dynamic' or
1 'cheap'.
1
1 '-fsimd-cost-model=MODEL'
1 Alter the cost model used for vectorization of loops marked with
1 the OpenMP simd directive. The MODEL argument should be one of
1 'unlimited', 'dynamic', 'cheap'. All values of MODEL have the same
1 meaning as described in '-fvect-cost-model' and by default a cost
1 model defined with '-fvect-cost-model' is used.
1
1 '-ftree-vrp'
1 Perform Value Range Propagation on trees. This is similar to the
1 constant propagation pass, but instead of values, ranges of values
1 are propagated. This allows the optimizers to remove unnecessary
1 range checks like array bound checks and null pointer checks. This
1 is enabled by default at '-O2' and higher. Null pointer check
1 elimination is only done if '-fdelete-null-pointer-checks' is
1 enabled.
1
1 '-fsplit-paths'
1 Split paths leading to loop backedges. This can improve dead code
1 elimination and common subexpression elimination. This is enabled
1 by default at '-O2' and above.
1
1 '-fsplit-ivs-in-unroller'
1 Enables expression of values of induction variables in later
1 iterations of the unrolled loop using the value in the first
1 iteration. This breaks long dependency chains, thus improving
1 efficiency of the scheduling passes.
1
1 A combination of '-fweb' and CSE is often sufficient to obtain the
1 same effect. However, that is not reliable in cases where the loop
1 body is more complicated than a single basic block. It also does
1 not work at all on some architectures due to restrictions in the
1 CSE pass.
1
1 This optimization is enabled by default.
1
1 '-fvariable-expansion-in-unroller'
1 With this option, the compiler creates multiple copies of some
1 local variables when unrolling a loop, which can result in superior
1 code.
1
1 '-fpartial-inlining'
1 Inline parts of functions. This option has any effect only when
1 inlining itself is turned on by the '-finline-functions' or
1 '-finline-small-functions' options.
1
1 Enabled at levels '-O2', '-O3', '-Os'.
1
1 '-fpredictive-commoning'
1 Perform predictive commoning optimization, i.e., reusing
1 computations (especially memory loads and stores) performed in
1 previous iterations of loops.
1
1 This option is enabled at level '-O3'.
1
1 '-fprefetch-loop-arrays'
1 If supported by the target machine, generate instructions to
1 prefetch memory to improve the performance of loops that access
1 large arrays.
1
1 This option may generate better or worse code; results are highly
1 dependent on the structure of loops within the source code.
1
1 Disabled at level '-Os'.
1
1 '-fno-printf-return-value'
1 Do not substitute constants for known return value of formatted
1 output functions such as 'sprintf', 'snprintf', 'vsprintf', and
1 'vsnprintf' (but not 'printf' of 'fprintf'). This transformation
1 allows GCC to optimize or even eliminate branches based on the
1 known return value of these functions called with arguments that
1 are either constant, or whose values are known to be in a range
1 that makes determining the exact return value possible. For
1 example, when '-fprintf-return-value' is in effect, both the branch
1 and the body of the 'if' statement (but not the call to 'snprint')
1 can be optimized away when 'i' is a 32-bit or smaller integer
1 because the return value is guaranteed to be at most 8.
1
1 char buf[9];
1 if (snprintf (buf, "%08x", i) >= sizeof buf)
1 ...
1
1 The '-fprintf-return-value' option relies on other optimizations
1 and yields best results with '-O2' and above. It works in tandem
1 with the '-Wformat-overflow' and '-Wformat-truncation' options.
1 The '-fprintf-return-value' option is enabled by default.
1
1 '-fno-peephole'
1 '-fno-peephole2'
1 Disable any machine-specific peephole optimizations. The
1 difference between '-fno-peephole' and '-fno-peephole2' is in how
1 they are implemented in the compiler; some targets use one, some
1 use the other, a few use both.
1
1 '-fpeephole' is enabled by default. '-fpeephole2' enabled at
1 levels '-O2', '-O3', '-Os'.
1
1 '-fno-guess-branch-probability'
1 Do not guess branch probabilities using heuristics.
1
1 GCC uses heuristics to guess branch probabilities if they are not
1 provided by profiling feedback ('-fprofile-arcs'). These
1 heuristics are based on the control flow graph. If some branch
1 probabilities are specified by '__builtin_expect', then the
1 heuristics are used to guess branch probabilities for the rest of
1 the control flow graph, taking the '__builtin_expect' info into
1 account. The interactions between the heuristics and
1 '__builtin_expect' can be complex, and in some cases, it may be
1 useful to disable the heuristics so that the effects of
1 '__builtin_expect' are easier to understand.
1
1 The default is '-fguess-branch-probability' at levels '-O', '-O2',
1 '-O3', '-Os'.
1
1 '-freorder-blocks'
1 Reorder basic blocks in the compiled function in order to reduce
1 number of taken branches and improve code locality.
1
1 Enabled at levels '-O', '-O2', '-O3', '-Os'.
1
1 '-freorder-blocks-algorithm=ALGORITHM'
1 Use the specified algorithm for basic block reordering. The
1 ALGORITHM argument can be 'simple', which does not increase code
1 size (except sometimes due to secondary effects like alignment), or
1 'stc', the "software trace cache" algorithm, which tries to put all
1 often executed code together, minimizing the number of branches
1 executed by making extra copies of code.
1
1 The default is 'simple' at levels '-O', '-Os', and 'stc' at levels
1 '-O2', '-O3'.
1
1 '-freorder-blocks-and-partition'
1 In addition to reordering basic blocks in the compiled function, in
1 order to reduce number of taken branches, partitions hot and cold
1 basic blocks into separate sections of the assembly and '.o' files,
1 to improve paging and cache locality performance.
1
1 This optimization is automatically turned off in the presence of
1 exception handling or unwind tables (on targets using
1 setjump/longjump or target specific scheme), for linkonce sections,
1 for functions with a user-defined section attribute and on any
1 architecture that does not support named sections. When
1 '-fsplit-stack' is used this option is not enabled by default (to
1 avoid linker errors), but may be enabled explicitly (if using a
1 working linker).
1
1 Enabled for x86 at levels '-O2', '-O3', '-Os'.
1
1 '-freorder-functions'
1 Reorder functions in the object file in order to improve code
1 locality. This is implemented by using special subsections
1 '.text.hot' for most frequently executed functions and
1 '.text.unlikely' for unlikely executed functions. Reordering is
1 done by the linker so object file format must support named
1 sections and linker must place them in a reasonable way.
1
1 Also profile feedback must be available to make this option
1 effective. See '-fprofile-arcs' for details.
1
1 Enabled at levels '-O2', '-O3', '-Os'.
1
1 '-fstrict-aliasing'
1 Allow the compiler to assume the strictest aliasing rules
1 applicable to the language being compiled. For C (and C++), this
1 activates optimizations based on the type of expressions. In
1 particular, an object of one type is assumed never to reside at the
1 same address as an object of a different type, unless the types are
1 almost the same. For example, an 'unsigned int' can alias an
1 'int', but not a 'void*' or a 'double'. A character type may alias
1 any other type.
1
1 Pay special attention to code like this:
1 union a_union {
1 int i;
1 double d;
1 };
1
1 int f() {
1 union a_union t;
1 t.d = 3.0;
1 return t.i;
1 }
1 The practice of reading from a different union member than the one
1 most recently written to (called "type-punning") is common. Even
1 with '-fstrict-aliasing', type-punning is allowed, provided the
1 memory is accessed through the union type. So, the code above
11 works as expected. ⇒Structures unions enumerations and
bit-fields implementation. However, this code might not:
1 int f() {
1 union a_union t;
1 int* ip;
1 t.d = 3.0;
1 ip = &t.i;
1 return *ip;
1 }
1
1 Similarly, access by taking the address, casting the resulting
1 pointer and dereferencing the result has undefined behavior, even
1 if the cast uses a union type, e.g.:
1 int f() {
1 double d = 3.0;
1 return ((union a_union *) &d)->i;
1 }
1
1 The '-fstrict-aliasing' option is enabled at levels '-O2', '-O3',
1 '-Os'.
1
1 '-falign-functions'
1 '-falign-functions=N'
1 Align the start of functions to the next power-of-two greater than
1 N, skipping up to N bytes. For instance, '-falign-functions=32'
1 aligns functions to the next 32-byte boundary, but
1 '-falign-functions=24' aligns to the next 32-byte boundary only if
1 this can be done by skipping 23 bytes or less.
1
1 '-fno-align-functions' and '-falign-functions=1' are equivalent and
1 mean that functions are not aligned.
1
1 Some assemblers only support this flag when N is a power of two; in
1 that case, it is rounded up.
1
1 If N is not specified or is zero, use a machine-dependent default.
1 The maximum allowed N option value is 65536.
1
1 Enabled at levels '-O2', '-O3'.
1
1 '-flimit-function-alignment'
1 If this option is enabled, the compiler tries to avoid
1 unnecessarily overaligning functions. It attempts to instruct the
1 assembler to align by the amount specified by '-falign-functions',
1 but not to skip more bytes than the size of the function.
1
1 '-falign-labels'
1 '-falign-labels=N'
1 Align all branch targets to a power-of-two boundary, skipping up to
1 N bytes like '-falign-functions'. This option can easily make code
1 slower, because it must insert dummy operations for when the branch
1 target is reached in the usual flow of the code.
1
1 '-fno-align-labels' and '-falign-labels=1' are equivalent and mean
1 that labels are not aligned.
1
1 If '-falign-loops' or '-falign-jumps' are applicable and are
1 greater than this value, then their values are used instead.
1
1 If N is not specified or is zero, use a machine-dependent default
1 which is very likely to be '1', meaning no alignment. The maximum
1 allowed N option value is 65536.
1
1 Enabled at levels '-O2', '-O3'.
1
1 '-falign-loops'
1 '-falign-loops=N'
1 Align loops to a power-of-two boundary, skipping up to N bytes like
1 '-falign-functions'. If the loops are executed many times, this
1 makes up for any execution of the dummy operations.
1
1 '-fno-align-loops' and '-falign-loops=1' are equivalent and mean
1 that loops are not aligned. The maximum allowed N option value is
1 65536.
1
1 If N is not specified or is zero, use a machine-dependent default.
1
1 Enabled at levels '-O2', '-O3'.
1
1 '-falign-jumps'
1 '-falign-jumps=N'
1 Align branch targets to a power-of-two boundary, for branch targets
1 where the targets can only be reached by jumping, skipping up to N
1 bytes like '-falign-functions'. In this case, no dummy operations
1 need be executed.
1
1 '-fno-align-jumps' and '-falign-jumps=1' are equivalent and mean
1 that loops are not aligned.
1
1 If N is not specified or is zero, use a machine-dependent default.
1 The maximum allowed N option value is 65536.
1
1 Enabled at levels '-O2', '-O3'.
1
1 '-funit-at-a-time'
1 This option is left for compatibility reasons. '-funit-at-a-time'
1 has no effect, while '-fno-unit-at-a-time' implies
1 '-fno-toplevel-reorder' and '-fno-section-anchors'.
1
1 Enabled by default.
1
1 '-fno-toplevel-reorder'
1 Do not reorder top-level functions, variables, and 'asm'
1 statements. Output them in the same order that they appear in the
1 input file. When this option is used, unreferenced static
1 variables are not removed. This option is intended to support
1 existing code that relies on a particular ordering. For new code,
1 it is better to use attributes when possible.
1
1 Enabled at level '-O0'. When disabled explicitly, it also implies
1 '-fno-section-anchors', which is otherwise enabled at '-O0' on some
1 targets.
1
1 '-fweb'
1 Constructs webs as commonly used for register allocation purposes
1 and assign each web individual pseudo register. This allows the
1 register allocation pass to operate on pseudos directly, but also
1 strengthens several other optimization passes, such as CSE, loop
1 optimizer and trivial dead code remover. It can, however, make
1 debugging impossible, since variables no longer stay in a "home
1 register".
1
1 Enabled by default with '-funroll-loops'.
1
1 '-fwhole-program'
1 Assume that the current compilation unit represents the whole
1 program being compiled. All public functions and variables with
1 the exception of 'main' and those merged by attribute
1 'externally_visible' become static functions and in effect are
1 optimized more aggressively by interprocedural optimizers.
1
1 This option should not be used in combination with '-flto'.
1 Instead relying on a linker plugin should provide safer and more
1 precise information.
1
1 '-flto[=N]'
1 This option runs the standard link-time optimizer. When invoked
1 with source code, it generates GIMPLE (one of GCC's internal
1 representations) and writes it to special ELF sections in the
1 object file. When the object files are linked together, all the
1 function bodies are read from these ELF sections and instantiated
1 as if they had been part of the same translation unit.
1
1 To use the link-time optimizer, '-flto' and optimization options
1 should be specified at compile time and during the final link. It
1 is recommended that you compile all the files participating in the
1 same link with the same options and also specify those options at
1 link time. For example:
1
1 gcc -c -O2 -flto foo.c
1 gcc -c -O2 -flto bar.c
1 gcc -o myprog -flto -O2 foo.o bar.o
1
1 The first two invocations to GCC save a bytecode representation of
1 GIMPLE into special ELF sections inside 'foo.o' and 'bar.o'. The
1 final invocation reads the GIMPLE bytecode from 'foo.o' and
1 'bar.o', merges the two files into a single internal image, and
1 compiles the result as usual. Since both 'foo.o' and 'bar.o' are
1 merged into a single image, this causes all the interprocedural
1 analyses and optimizations in GCC to work across the two files as
1 if they were a single one. This means, for example, that the
1 inliner is able to inline functions in 'bar.o' into functions in
1 'foo.o' and vice-versa.
1
1 Another (simpler) way to enable link-time optimization is:
1
1 gcc -o myprog -flto -O2 foo.c bar.c
1
1 The above generates bytecode for 'foo.c' and 'bar.c', merges them
1 together into a single GIMPLE representation and optimizes them as
1 usual to produce 'myprog'.
1
1 The only important thing to keep in mind is that to enable
1 link-time optimizations you need to use the GCC driver to perform
1 the link step. GCC then automatically performs link-time
1 optimization if any of the objects involved were compiled with the
1 '-flto' command-line option. You generally should specify the
1 optimization options to be used for link-time optimization though
1 GCC tries to be clever at guessing an optimization level to use
1 from the options used at compile time if you fail to specify one at
1 link time. You can always override the automatic decision to do
1 link-time optimization by passing '-fno-lto' to the link command.
1
1 To make whole program optimization effective, it is necessary to
1 make certain whole program assumptions. The compiler needs to know
1 what functions and variables can be accessed by libraries and
1 runtime outside of the link-time optimized unit. When supported by
1 the linker, the linker plugin (see '-fuse-linker-plugin') passes
1 information to the compiler about used and externally visible
1 symbols. When the linker plugin is not available,
1 '-fwhole-program' should be used to allow the compiler to make
1 these assumptions, which leads to more aggressive optimization
1 decisions.
1
1 When '-fuse-linker-plugin' is not enabled, when a file is compiled
1 with '-flto', the generated object file is larger than a regular
1 object file because it contains GIMPLE bytecodes and the usual
1 final code (see '-ffat-lto-objects'. This means that object files
1 with LTO information can be linked as normal object files; if
1 '-fno-lto' is passed to the linker, no interprocedural
1 optimizations are applied. Note that when '-fno-fat-lto-objects'
1 is enabled the compile stage is faster but you cannot perform a
1 regular, non-LTO link on them.
1
1 Additionally, the optimization flags used to compile individual
1 files are not necessarily related to those used at link time. For
1 instance,
1
1 gcc -c -O0 -ffat-lto-objects -flto foo.c
1 gcc -c -O0 -ffat-lto-objects -flto bar.c
1 gcc -o myprog -O3 foo.o bar.o
1
1 This produces individual object files with unoptimized assembler
1 code, but the resulting binary 'myprog' is optimized at '-O3'. If,
1 instead, the final binary is generated with '-fno-lto', then
1 'myprog' is not optimized.
1
1 When producing the final binary, GCC only applies link-time
1 optimizations to those files that contain bytecode. Therefore, you
1 can mix and match object files and libraries with GIMPLE bytecodes
1 and final object code. GCC automatically selects which files to
1 optimize in LTO mode and which files to link without further
1 processing.
1
1 There are some code generation flags preserved by GCC when
1 generating bytecodes, as they need to be used during the final link
1 stage. Generally options specified at link time override those
1 specified at compile time.
1
1 If you do not specify an optimization level option '-O' at link
1 time, then GCC uses the highest optimization level used when
1 compiling the object files.
1
1 Currently, the following options and their settings are taken from
1 the first object file that explicitly specifies them: '-fPIC',
1 '-fpic', '-fpie', '-fcommon', '-fexceptions',
1 '-fnon-call-exceptions', '-fgnu-tm' and all the '-m' target flags.
1
1 Certain ABI-changing flags are required to match in all compilation
1 units, and trying to override this at link time with a conflicting
1 value is ignored. This includes options such as
1 '-freg-struct-return' and '-fpcc-struct-return'.
1
1 Other options such as '-ffp-contract', '-fno-strict-overflow',
1 '-fwrapv', '-fno-trapv' or '-fno-strict-aliasing' are passed
1 through to the link stage and merged conservatively for conflicting
1 translation units. Specifically '-fno-strict-overflow', '-fwrapv'
1 and '-fno-trapv' take precedence; and for example
1 '-ffp-contract=off' takes precedence over '-ffp-contract=fast'.
1 You can override them at link time.
1
1 If LTO encounters objects with C linkage declared with incompatible
1 types in separate translation units to be linked together
1 (undefined behavior according to ISO C99 6.2.7), a non-fatal
1 diagnostic may be issued. The behavior is still undefined at run
1 time. Similar diagnostics may be raised for other languages.
1
1 Another feature of LTO is that it is possible to apply
1 interprocedural optimizations on files written in different
1 languages:
1
1 gcc -c -flto foo.c
1 g++ -c -flto bar.cc
1 gfortran -c -flto baz.f90
1 g++ -o myprog -flto -O3 foo.o bar.o baz.o -lgfortran
1
1 Notice that the final link is done with 'g++' to get the C++
1 runtime libraries and '-lgfortran' is added to get the Fortran
1 runtime libraries. In general, when mixing languages in LTO mode,
1 you should use the same link command options as when mixing
1 languages in a regular (non-LTO) compilation.
1
1 If object files containing GIMPLE bytecode are stored in a library
1 archive, say 'libfoo.a', it is possible to extract and use them in
1 an LTO link if you are using a linker with plugin support. To
1 create static libraries suitable for LTO, use 'gcc-ar' and
1 'gcc-ranlib' instead of 'ar' and 'ranlib'; to show the symbols of
1 object files with GIMPLE bytecode, use 'gcc-nm'. Those commands
1 require that 'ar', 'ranlib' and 'nm' have been compiled with plugin
1 support. At link time, use the flag '-fuse-linker-plugin' to
1 ensure that the library participates in the LTO optimization
1 process:
1
1 gcc -o myprog -O2 -flto -fuse-linker-plugin a.o b.o -lfoo
1
1 With the linker plugin enabled, the linker extracts the needed
1 GIMPLE files from 'libfoo.a' and passes them on to the running GCC
1 to make them part of the aggregated GIMPLE image to be optimized.
1
1 If you are not using a linker with plugin support and/or do not
1 enable the linker plugin, then the objects inside 'libfoo.a' are
1 extracted and linked as usual, but they do not participate in the
1 LTO optimization process. In order to make a static library
1 suitable for both LTO optimization and usual linkage, compile its
1 object files with '-flto' '-ffat-lto-objects'.
1
1 Link-time optimizations do not require the presence of the whole
1 program to operate. If the program does not require any symbols to
1 be exported, it is possible to combine '-flto' and
1 '-fwhole-program' to allow the interprocedural optimizers to use
1 more aggressive assumptions which may lead to improved optimization
1 opportunities. Use of '-fwhole-program' is not needed when linker
1 plugin is active (see '-fuse-linker-plugin').
1
1 The current implementation of LTO makes no attempt to generate
1 bytecode that is portable between different types of hosts. The
1 bytecode files are versioned and there is a strict version check,
1 so bytecode files generated in one version of GCC do not work with
1 an older or newer version of GCC.
1
1 Link-time optimization does not work well with generation of
1 debugging information on systems other than those using a
1 combination of ELF and DWARF.
1
1 If you specify the optional N, the optimization and code generation
1 done at link time is executed in parallel using N parallel jobs by
1 utilizing an installed 'make' program. The environment variable
1 'MAKE' may be used to override the program used. The default value
1 for N is 1.
1
1 You can also specify '-flto=jobserver' to use GNU make's job server
1 mode to determine the number of parallel jobs. This is useful when
1 the Makefile calling GCC is already executing in parallel. You
1 must prepend a '+' to the command recipe in the parent Makefile for
1 this to work. This option likely only works if 'MAKE' is GNU make.
1
1 '-flto-partition=ALG'
1 Specify the partitioning algorithm used by the link-time optimizer.
1 The value is either '1to1' to specify a partitioning mirroring the
1 original source files or 'balanced' to specify partitioning into
1 equally sized chunks (whenever possible) or 'max' to create new
1 partition for every symbol where possible. Specifying 'none' as an
1 algorithm disables partitioning and streaming completely. The
1 default value is 'balanced'. While '1to1' can be used as an
1 workaround for various code ordering issues, the 'max' partitioning
1 is intended for internal testing only. The value 'one' specifies
1 that exactly one partition should be used while the value 'none'
1 bypasses partitioning and executes the link-time optimization step
1 directly from the WPA phase.
1
1 '-flto-odr-type-merging'
1 Enable streaming of mangled types names of C++ types and their
1 unification at link time. This increases size of LTO object files,
1 but enables diagnostics about One Definition Rule violations.
1
1 '-flto-compression-level=N'
1 This option specifies the level of compression used for
1 intermediate language written to LTO object files, and is only
1 meaningful in conjunction with LTO mode ('-flto'). Valid values
1 are 0 (no compression) to 9 (maximum compression). Values outside
1 this range are clamped to either 0 or 9. If the option is not
1 given, a default balanced compression setting is used.
1
1 '-fuse-linker-plugin'
1 Enables the use of a linker plugin during link-time optimization.
1 This option relies on plugin support in the linker, which is
1 available in gold or in GNU ld 2.21 or newer.
1
1 This option enables the extraction of object files with GIMPLE
1 bytecode out of library archives. This improves the quality of
1 optimization by exposing more code to the link-time optimizer.
1 This information specifies what symbols can be accessed externally
1 (by non-LTO object or during dynamic linking). Resulting code
1 quality improvements on binaries (and shared libraries that use
1 hidden visibility) are similar to '-fwhole-program'. See '-flto'
1 for a description of the effect of this flag and how to use it.
1
1 This option is enabled by default when LTO support in GCC is
1 enabled and GCC was configured for use with a linker supporting
1 plugins (GNU ld 2.21 or newer or gold).
1
1 '-ffat-lto-objects'
1 Fat LTO objects are object files that contain both the intermediate
1 language and the object code. This makes them usable for both LTO
1 linking and normal linking. This option is effective only when
1 compiling with '-flto' and is ignored at link time.
1
1 '-fno-fat-lto-objects' improves compilation time over plain LTO,
1 but requires the complete toolchain to be aware of LTO. It requires
1 a linker with linker plugin support for basic functionality.
1 Additionally, 'nm', 'ar' and 'ranlib' need to support linker
1 plugins to allow a full-featured build environment (capable of
1 building static libraries etc). GCC provides the 'gcc-ar',
1 'gcc-nm', 'gcc-ranlib' wrappers to pass the right options to these
1 tools. With non fat LTO makefiles need to be modified to use them.
1
1 Note that modern binutils provide plugin auto-load mechanism.
1 Installing the linker plugin into '$libdir/bfd-plugins' has the
1 same effect as usage of the command wrappers ('gcc-ar', 'gcc-nm'
1 and 'gcc-ranlib').
1
1 The default is '-fno-fat-lto-objects' on targets with linker plugin
1 support.
1
1 '-fcompare-elim'
1 After register allocation and post-register allocation instruction
1 splitting, identify arithmetic instructions that compute processor
1 flags similar to a comparison operation based on that arithmetic.
1 If possible, eliminate the explicit comparison operation.
1
1 This pass only applies to certain targets that cannot explicitly
1 represent the comparison operation before register allocation is
1 complete.
1
1 Enabled at levels '-O', '-O2', '-O3', '-Os'.
1
1 '-fcprop-registers'
1 After register allocation and post-register allocation instruction
1 splitting, perform a copy-propagation pass to try to reduce
1 scheduling dependencies and occasionally eliminate the copy.
1
1 Enabled at levels '-O', '-O2', '-O3', '-Os'.
1
1 '-fprofile-correction'
1 Profiles collected using an instrumented binary for multi-threaded
1 programs may be inconsistent due to missed counter updates. When
1 this option is specified, GCC uses heuristics to correct or smooth
1 out such inconsistencies. By default, GCC emits an error message
1 when an inconsistent profile is detected.
1
1 '-fprofile-use'
1 '-fprofile-use=PATH'
1 Enable profile feedback-directed optimizations, and the following
1 optimizations which are generally profitable only with profile
1 feedback available: '-fbranch-probabilities', '-fvpt',
1 '-funroll-loops', '-fpeel-loops', '-ftracer', '-ftree-vectorize',
1 and 'ftree-loop-distribute-patterns'.
1
1 Before you can use this option, you must first generate profiling
1 information. ⇒Instrumentation Options, for information
1 about the '-fprofile-generate' option.
1
1 By default, GCC emits an error message if the feedback profiles do
1 not match the source code. This error can be turned into a warning
1 by using '-Wcoverage-mismatch'. Note this may result in poorly
1 optimized code.
1
1 If PATH is specified, GCC looks at the PATH to find the profile
1 feedback data files. See '-fprofile-dir'.
1
1 '-fauto-profile'
1 '-fauto-profile=PATH'
1 Enable sampling-based feedback-directed optimizations, and the
1 following optimizations which are generally profitable only with
1 profile feedback available: '-fbranch-probabilities', '-fvpt',
1 '-funroll-loops', '-fpeel-loops', '-ftracer', '-ftree-vectorize',
1 '-finline-functions', '-fipa-cp', '-fipa-cp-clone',
1 '-fpredictive-commoning', '-funswitch-loops',
1 '-fgcse-after-reload', and '-ftree-loop-distribute-patterns'.
1
1 PATH is the name of a file containing AutoFDO profile information.
1 If omitted, it defaults to 'fbdata.afdo' in the current directory.
1
1 Producing an AutoFDO profile data file requires running your
1 program with the 'perf' utility on a supported GNU/Linux target
1 system. For more information, see <https://perf.wiki.kernel.org/>.
1
1 E.g.
1 perf record -e br_inst_retired:near_taken -b -o perf.data \
1 -- your_program
1
1 Then use the 'create_gcov' tool to convert the raw profile data to
1 a format that can be used by GCC. You must also supply the
1 unstripped binary for your program to this tool. See
1 <https://github.com/google/autofdo>.
1
1 E.g.
1 create_gcov --binary=your_program.unstripped --profile=perf.data \
1 --gcov=profile.afdo
1
1 The following options control compiler behavior regarding
1 floating-point arithmetic. These options trade off between speed and
1 correctness. All must be specifically enabled.
1
1 '-ffloat-store'
1 Do not store floating-point variables in registers, and inhibit
1 other options that might change whether a floating-point value is
1 taken from a register or memory.
1
1 This option prevents undesirable excess precision on machines such
1 as the 68000 where the floating registers (of the 68881) keep more
1 precision than a 'double' is supposed to have. Similarly for the
1 x86 architecture. For most programs, the excess precision does
1 only good, but a few programs rely on the precise definition of
1 IEEE floating point. Use '-ffloat-store' for such programs, after
1 modifying them to store all pertinent intermediate computations
1 into variables.
1
1 '-fexcess-precision=STYLE'
1 This option allows further control over excess precision on
1 machines where floating-point operations occur in a format with
1 more precision or range than the IEEE standard and interchange
1 floating-point types. By default, '-fexcess-precision=fast' is in
1 effect; this means that operations may be carried out in a wider
1 precision than the types specified in the source if that would
1 result in faster code, and it is unpredictable when rounding to the
1 types specified in the source code takes place. When compiling C,
1 if '-fexcess-precision=standard' is specified then excess precision
1 follows the rules specified in ISO C99; in particular, both casts
1 and assignments cause values to be rounded to their semantic types
1 (whereas '-ffloat-store' only affects assignments). This option is
1 enabled by default for C if a strict conformance option such as
1 '-std=c99' is used. '-ffast-math' enables
1 '-fexcess-precision=fast' by default regardless of whether a strict
1 conformance option is used.
1
1 '-fexcess-precision=standard' is not implemented for languages
1 other than C. On the x86, it has no effect if '-mfpmath=sse' or
1 '-mfpmath=sse+387' is specified; in the former case, IEEE semantics
1 apply without excess precision, and in the latter, rounding is
1 unpredictable.
1
1 '-ffast-math'
1 Sets the options '-fno-math-errno', '-funsafe-math-optimizations',
1 '-ffinite-math-only', '-fno-rounding-math', '-fno-signaling-nans',
1 '-fcx-limited-range' and '-fexcess-precision=fast'.
1
1 This option causes the preprocessor macro '__FAST_MATH__' to be
1 defined.
1
1 This option is not turned on by any '-O' option besides '-Ofast'
1 since it can result in incorrect output for programs that depend on
1 an exact implementation of IEEE or ISO rules/specifications for
1 math functions. It may, however, yield faster code for programs
1 that do not require the guarantees of these specifications.
1
1 '-fno-math-errno'
1 Do not set 'errno' after calling math functions that are executed
1 with a single instruction, e.g., 'sqrt'. A program that relies on
1 IEEE exceptions for math error handling may want to use this flag
1 for speed while maintaining IEEE arithmetic compatibility.
1
1 This option is not turned on by any '-O' option since it can result
1 in incorrect output for programs that depend on an exact
1 implementation of IEEE or ISO rules/specifications for math
1 functions. It may, however, yield faster code for programs that do
1 not require the guarantees of these specifications.
1
1 The default is '-fmath-errno'.
1
1 On Darwin systems, the math library never sets 'errno'. There is
1 therefore no reason for the compiler to consider the possibility
1 that it might, and '-fno-math-errno' is the default.
1
1 '-funsafe-math-optimizations'
1
1 Allow optimizations for floating-point arithmetic that (a) assume
1 that arguments and results are valid and (b) may violate IEEE or
1 ANSI standards. When used at link time, it may include libraries
1 or startup files that change the default FPU control word or other
1 similar optimizations.
1
1 This option is not turned on by any '-O' option since it can result
1 in incorrect output for programs that depend on an exact
1 implementation of IEEE or ISO rules/specifications for math
1 functions. It may, however, yield faster code for programs that do
1 not require the guarantees of these specifications. Enables
1 '-fno-signed-zeros', '-fno-trapping-math', '-fassociative-math' and
1 '-freciprocal-math'.
1
1 The default is '-fno-unsafe-math-optimizations'.
1
1 '-fassociative-math'
1
1 Allow re-association of operands in series of floating-point
1 operations. This violates the ISO C and C++ language standard by
1 possibly changing computation result. NOTE: re-ordering may change
1 the sign of zero as well as ignore NaNs and inhibit or create
1 underflow or overflow (and thus cannot be used on code that relies
1 on rounding behavior like '(x + 2**52) - 2**52'. May also reorder
1 floating-point comparisons and thus may not be used when ordered
1 comparisons are required. This option requires that both
1 '-fno-signed-zeros' and '-fno-trapping-math' be in effect.
1 Moreover, it doesn't make much sense with '-frounding-math'. For
1 Fortran the option is automatically enabled when both
1 '-fno-signed-zeros' and '-fno-trapping-math' are in effect.
1
1 The default is '-fno-associative-math'.
1
1 '-freciprocal-math'
1
1 Allow the reciprocal of a value to be used instead of dividing by
1 the value if this enables optimizations. For example 'x / y' can
1 be replaced with 'x * (1/y)', which is useful if '(1/y)' is subject
1 to common subexpression elimination. Note that this loses
1 precision and increases the number of flops operating on the value.
1
1 The default is '-fno-reciprocal-math'.
1
1 '-ffinite-math-only'
1 Allow optimizations for floating-point arithmetic that assume that
1 arguments and results are not NaNs or +-Infs.
1
1 This option is not turned on by any '-O' option since it can result
1 in incorrect output for programs that depend on an exact
1 implementation of IEEE or ISO rules/specifications for math
1 functions. It may, however, yield faster code for programs that do
1 not require the guarantees of these specifications.
1
1 The default is '-fno-finite-math-only'.
1
1 '-fno-signed-zeros'
1 Allow optimizations for floating-point arithmetic that ignore the
1 signedness of zero. IEEE arithmetic specifies the behavior of
1 distinct +0.0 and -0.0 values, which then prohibits simplification
1 of expressions such as x+0.0 or 0.0*x (even with
1 '-ffinite-math-only'). This option implies that the sign of a zero
1 result isn't significant.
1
1 The default is '-fsigned-zeros'.
1
1 '-fno-trapping-math'
1 Compile code assuming that floating-point operations cannot
1 generate user-visible traps. These traps include division by zero,
1 overflow, underflow, inexact result and invalid operation. This
1 option requires that '-fno-signaling-nans' be in effect. Setting
1 this option may allow faster code if one relies on "non-stop" IEEE
1 arithmetic, for example.
1
1 This option should never be turned on by any '-O' option since it
1 can result in incorrect output for programs that depend on an exact
1 implementation of IEEE or ISO rules/specifications for math
1 functions.
1
1 The default is '-ftrapping-math'.
1
1 '-frounding-math'
1 Disable transformations and optimizations that assume default
1 floating-point rounding behavior. This is round-to-zero for all
1 floating point to integer conversions, and round-to-nearest for all
1 other arithmetic truncations. This option should be specified for
1 programs that change the FP rounding mode dynamically, or that may
1 be executed with a non-default rounding mode. This option disables
1 constant folding of floating-point expressions at compile time
1 (which may be affected by rounding mode) and arithmetic
1 transformations that are unsafe in the presence of sign-dependent
1 rounding modes.
1
1 The default is '-fno-rounding-math'.
1
1 This option is experimental and does not currently guarantee to
1 disable all GCC optimizations that are affected by rounding mode.
1 Future versions of GCC may provide finer control of this setting
1 using C99's 'FENV_ACCESS' pragma. This command-line option will be
1 used to specify the default state for 'FENV_ACCESS'.
1
1 '-fsignaling-nans'
1 Compile code assuming that IEEE signaling NaNs may generate
1 user-visible traps during floating-point operations. Setting this
1 option disables optimizations that may change the number of
1 exceptions visible with signaling NaNs. This option implies
1 '-ftrapping-math'.
1
1 This option causes the preprocessor macro '__SUPPORT_SNAN__' to be
1 defined.
1
1 The default is '-fno-signaling-nans'.
1
1 This option is experimental and does not currently guarantee to
1 disable all GCC optimizations that affect signaling NaN behavior.
1
1 '-fno-fp-int-builtin-inexact'
1 Do not allow the built-in functions 'ceil', 'floor', 'round' and
1 'trunc', and their 'float' and 'long double' variants, to generate
1 code that raises the "inexact" floating-point exception for
1 noninteger arguments. ISO C99 and C11 allow these functions to
1 raise the "inexact" exception, but ISO/IEC TS 18661-1:2014, the C
1 bindings to IEEE 754-2008, does not allow these functions to do so.
1
1 The default is '-ffp-int-builtin-inexact', allowing the exception
1 to be raised. This option does nothing unless '-ftrapping-math' is
1 in effect.
1
1 Even if '-fno-fp-int-builtin-inexact' is used, if the functions
1 generate a call to a library function then the "inexact" exception
1 may be raised if the library implementation does not follow TS
1 18661.
1
1 '-fsingle-precision-constant'
1 Treat floating-point constants as single precision instead of
1 implicitly converting them to double-precision constants.
1
1 '-fcx-limited-range'
1 When enabled, this option states that a range reduction step is not
1 needed when performing complex division. Also, there is no
1 checking whether the result of a complex multiplication or division
1 is 'NaN + I*NaN', with an attempt to rescue the situation in that
1 case. The default is '-fno-cx-limited-range', but is enabled by
1 '-ffast-math'.
1
1 This option controls the default setting of the ISO C99
1 'CX_LIMITED_RANGE' pragma. Nevertheless, the option applies to all
1 languages.
1
1 '-fcx-fortran-rules'
1 Complex multiplication and division follow Fortran rules. Range
1 reduction is done as part of complex division, but there is no
1 checking whether the result of a complex multiplication or division
1 is 'NaN + I*NaN', with an attempt to rescue the situation in that
1 case.
1
1 The default is '-fno-cx-fortran-rules'.
1
1 The following options control optimizations that may improve
1 performance, but are not enabled by any '-O' options. This section
1 includes experimental options that may produce broken code.
1
1 '-fbranch-probabilities'
11 After running a program compiled with '-fprofile-arcs' (⇒
Instrumentation Options), you can compile it a second time using
1 '-fbranch-probabilities', to improve optimizations based on the
1 number of times each branch was taken. When a program compiled
1 with '-fprofile-arcs' exits, it saves arc execution counts to a
1 file called 'SOURCENAME.gcda' for each source file. The
1 information in this data file is very dependent on the structure of
1 the generated code, so you must use the same source code and the
1 same optimization options for both compilations.
1
1 With '-fbranch-probabilities', GCC puts a 'REG_BR_PROB' note on
1 each 'JUMP_INSN' and 'CALL_INSN'. These can be used to improve
1 optimization. Currently, they are only used in one place: in
1 'reorg.c', instead of guessing which path a branch is most likely
1 to take, the 'REG_BR_PROB' values are used to exactly determine
1 which path is taken more often.
1
1 '-fprofile-values'
1 If combined with '-fprofile-arcs', it adds code so that some data
1 about values of expressions in the program is gathered.
1
1 With '-fbranch-probabilities', it reads back the data gathered from
1 profiling values of expressions for usage in optimizations.
1
1 Enabled with '-fprofile-generate' and '-fprofile-use'.
1
1 '-fprofile-reorder-functions'
1 Function reordering based on profile instrumentation collects first
1 time of execution of a function and orders these functions in
1 ascending order.
1
1 Enabled with '-fprofile-use'.
1
1 '-fvpt'
1 If combined with '-fprofile-arcs', this option instructs the
1 compiler to add code to gather information about values of
1 expressions.
1
1 With '-fbranch-probabilities', it reads back the data gathered and
1 actually performs the optimizations based on them. Currently the
1 optimizations include specialization of division operations using
1 the knowledge about the value of the denominator.
1
1 '-frename-registers'
1 Attempt to avoid false dependencies in scheduled code by making use
1 of registers left over after register allocation. This
1 optimization most benefits processors with lots of registers.
1 Depending on the debug information format adopted by the target,
1 however, it can make debugging impossible, since variables no
1 longer stay in a "home register".
1
1 Enabled by default with '-funroll-loops'.
1
1 '-fschedule-fusion'
1 Performs a target dependent pass over the instruction stream to
1 schedule instructions of same type together because target machine
1 can execute them more efficiently if they are adjacent to each
1 other in the instruction flow.
1
1 Enabled at levels '-O2', '-O3', '-Os'.
1
1 '-ftracer'
1 Perform tail duplication to enlarge superblock size. This
1 transformation simplifies the control flow of the function allowing
1 other optimizations to do a better job.
1
1 Enabled with '-fprofile-use'.
1
1 '-funroll-loops'
1 Unroll loops whose number of iterations can be determined at
1 compile time or upon entry to the loop. '-funroll-loops' implies
1 '-frerun-cse-after-loop', '-fweb' and '-frename-registers'. It
1 also turns on complete loop peeling (i.e. complete removal of loops
1 with a small constant number of iterations). This option makes
1 code larger, and may or may not make it run faster.
1
1 Enabled with '-fprofile-use'.
1
1 '-funroll-all-loops'
1 Unroll all loops, even if their number of iterations is uncertain
1 when the loop is entered. This usually makes programs run more
1 slowly. '-funroll-all-loops' implies the same options as
1 '-funroll-loops'.
1
1 '-fpeel-loops'
1 Peels loops for which there is enough information that they do not
1 roll much (from profile feedback or static analysis). It also
1 turns on complete loop peeling (i.e. complete removal of loops with
1 small constant number of iterations).
1
1 Enabled with '-O3' and/or '-fprofile-use'.
1
1 '-fmove-loop-invariants'
1 Enables the loop invariant motion pass in the RTL loop optimizer.
1 Enabled at level '-O1'
1
1 '-fsplit-loops'
1 Split a loop into two if it contains a condition that's always true
1 for one side of the iteration space and false for the other.
1
1 '-funswitch-loops'
1 Move branches with loop invariant conditions out of the loop, with
1 duplicates of the loop on both branches (modified according to
1 result of the condition).
1
1 '-ffunction-sections'
1 '-fdata-sections'
1 Place each function or data item into its own section in the output
1 file if the target supports arbitrary sections. The name of the
1 function or the name of the data item determines the section's name
1 in the output file.
1
1 Use these options on systems where the linker can perform
1 optimizations to improve locality of reference in the instruction
1 space. Most systems using the ELF object format have linkers with
1 such optimizations. On AIX, the linker rearranges sections
1 (CSECTs) based on the call graph. The performance impact varies.
1
1 Together with a linker garbage collection (linker '--gc-sections'
1 option) these options may lead to smaller statically-linked
1 executables (after stripping).
1
1 On ELF/DWARF systems these options do not degenerate the quality of
1 the debug information. There could be issues with other object
1 files/debug info formats.
1
1 Only use these options when there are significant benefits from
1 doing so. When you specify these options, the assembler and linker
1 create larger object and executable files and are also slower.
1 These options affect code generation. They prevent optimizations
1 by the compiler and assembler using relative locations inside a
1 translation unit since the locations are unknown until link time.
1 An example of such an optimization is relaxing calls to short call
1 instructions.
1
1 '-fbranch-target-load-optimize'
1 Perform branch target register load optimization before prologue /
1 epilogue threading. The use of target registers can typically be
1 exposed only during reload, thus hoisting loads out of loops and
1 doing inter-block scheduling needs a separate optimization pass.
1
1 '-fbranch-target-load-optimize2'
1 Perform branch target register load optimization after prologue /
1 epilogue threading.
1
1 '-fbtr-bb-exclusive'
1 When performing branch target register load optimization, don't
1 reuse branch target registers within any basic block.
1
1 '-fstdarg-opt'
1 Optimize the prologue of variadic argument functions with respect
1 to usage of those arguments.
1
1 '-fsection-anchors'
1 Try to reduce the number of symbolic address calculations by using
1 shared "anchor" symbols to address nearby objects. This
1 transformation can help to reduce the number of GOT entries and GOT
1 accesses on some targets.
1
1 For example, the implementation of the following function 'foo':
1
1 static int a, b, c;
1 int foo (void) { return a + b + c; }
1
1 usually calculates the addresses of all three variables, but if you
1 compile it with '-fsection-anchors', it accesses the variables from
1 a common anchor point instead. The effect is similar to the
1 following pseudocode (which isn't valid C):
1
1 int foo (void)
1 {
1 register int *xr = &x;
1 return xr[&a - &x] + xr[&b - &x] + xr[&c - &x];
1 }
1
1 Not all targets support this option.
1
1 '--param NAME=VALUE'
1 In some places, GCC uses various constants to control the amount of
1 optimization that is done. For example, GCC does not inline
1 functions that contain more than a certain number of instructions.
1 You can control some of these constants on the command line using
1 the '--param' option.
1
1 The names of specific parameters, and the meaning of the values,
1 are tied to the internals of the compiler, and are subject to
1 change without notice in future releases.
1
1 In each case, the VALUE is an integer. The allowable choices for
1 NAME are:
1
1 'predictable-branch-outcome'
1 When branch is predicted to be taken with probability lower
1 than this threshold (in percent), then it is considered well
1 predictable. The default is 10.
1
1 'max-rtl-if-conversion-insns'
1 RTL if-conversion tries to remove conditional branches around
1 a block and replace them with conditionally executed
1 instructions. This parameter gives the maximum number of
1 instructions in a block which should be considered for
1 if-conversion. The default is 10, though the compiler will
1 also use other heuristics to decide whether if-conversion is
1 likely to be profitable.
1
1 'max-rtl-if-conversion-predictable-cost'
1 'max-rtl-if-conversion-unpredictable-cost'
1 RTL if-conversion will try to remove conditional branches
1 around a block and replace them with conditionally executed
1 instructions. These parameters give the maximum permissible
1 cost for the sequence that would be generated by if-conversion
1 depending on whether the branch is statically determined to be
1 predictable or not. The units for this parameter are the same
1 as those for the GCC internal seq_cost metric. The compiler
1 will try to provide a reasonable default for this parameter
1 using the BRANCH_COST target macro.
1
1 'max-crossjump-edges'
1 The maximum number of incoming edges to consider for
1 cross-jumping. The algorithm used by '-fcrossjumping' is
1 O(N^2) in the number of edges incoming to each block.
1 Increasing values mean more aggressive optimization, making
1 the compilation time increase with probably small improvement
1 in executable size.
1
1 'min-crossjump-insns'
1 The minimum number of instructions that must be matched at the
1 end of two blocks before cross-jumping is performed on them.
1 This value is ignored in the case where all instructions in
1 the block being cross-jumped from are matched. The default
1 value is 5.
1
1 'max-grow-copy-bb-insns'
1 The maximum code size expansion factor when copying basic
1 blocks instead of jumping. The expansion is relative to a
1 jump instruction. The default value is 8.
1
1 'max-goto-duplication-insns'
1 The maximum number of instructions to duplicate to a block
1 that jumps to a computed goto. To avoid O(N^2) behavior in a
1 number of passes, GCC factors computed gotos early in the
1 compilation process, and unfactors them as late as possible.
1 Only computed jumps at the end of a basic blocks with no more
1 than max-goto-duplication-insns are unfactored. The default
1 value is 8.
1
1 'max-delay-slot-insn-search'
1 The maximum number of instructions to consider when looking
1 for an instruction to fill a delay slot. If more than this
1 arbitrary number of instructions are searched, the time
1 savings from filling the delay slot are minimal, so stop
1 searching. Increasing values mean more aggressive
1 optimization, making the compilation time increase with
1 probably small improvement in execution time.
1
1 'max-delay-slot-live-search'
1 When trying to fill delay slots, the maximum number of
1 instructions to consider when searching for a block with valid
1 live register information. Increasing this arbitrarily chosen
1 value means more aggressive optimization, increasing the
1 compilation time. This parameter should be removed when the
1 delay slot code is rewritten to maintain the control-flow
1 graph.
1
1 'max-gcse-memory'
1 The approximate maximum amount of memory that can be allocated
1 in order to perform the global common subexpression
1 elimination optimization. If more memory than specified is
1 required, the optimization is not done.
1
1 'max-gcse-insertion-ratio'
1 If the ratio of expression insertions to deletions is larger
1 than this value for any expression, then RTL PRE inserts or
1 removes the expression and thus leaves partially redundant
1 computations in the instruction stream. The default value is
1 20.
1
1 'max-pending-list-length'
1 The maximum number of pending dependencies scheduling allows
1 before flushing the current state and starting over. Large
1 functions with few branches or calls can create excessively
1 large lists which needlessly consume memory and resources.
1
1 'max-modulo-backtrack-attempts'
1 The maximum number of backtrack attempts the scheduler should
1 make when modulo scheduling a loop. Larger values can
1 exponentially increase compilation time.
1
1 'max-inline-insns-single'
1 Several parameters control the tree inliner used in GCC. This
1 number sets the maximum number of instructions (counted in
1 GCC's internal representation) in a single function that the
1 tree inliner considers for inlining. This only affects
1 functions declared inline and methods implemented in a class
1 declaration (C++). The default value is 400.
1
1 'max-inline-insns-auto'
1 When you use '-finline-functions' (included in '-O3'), a lot
1 of functions that would otherwise not be considered for
1 inlining by the compiler are investigated. To those
1 functions, a different (more restrictive) limit compared to
1 functions declared inline can be applied. The default value
1 is 30.
1
1 'inline-min-speedup'
1 When estimated performance improvement of caller + callee
1 runtime exceeds this threshold (in percent), the function can
1 be inlined regardless of the limit on '--param
1 max-inline-insns-single' and '--param max-inline-insns-auto'.
1 The default value is 15.
1
1 'large-function-insns'
1 The limit specifying really large functions. For functions
1 larger than this limit after inlining, inlining is constrained
1 by '--param large-function-growth'. This parameter is useful
1 primarily to avoid extreme compilation time caused by
1 non-linear algorithms used by the back end. The default value
1 is 2700.
1
1 'large-function-growth'
1 Specifies maximal growth of large function caused by inlining
1 in percents. The default value is 100 which limits large
1 function growth to 2.0 times the original size.
1
1 'large-unit-insns'
1 The limit specifying large translation unit. Growth caused by
1 inlining of units larger than this limit is limited by
1 '--param inline-unit-growth'. For small units this might be
1 too tight. For example, consider a unit consisting of
1 function A that is inline and B that just calls A three times.
1 If B is small relative to A, the growth of unit is 300\% and
1 yet such inlining is very sane. For very large units
1 consisting of small inlineable functions, however, the overall
1 unit growth limit is needed to avoid exponential explosion of
1 code size. Thus for smaller units, the size is increased to
1 '--param large-unit-insns' before applying '--param
1 inline-unit-growth'. The default is 10000.
1
1 'inline-unit-growth'
1 Specifies maximal overall growth of the compilation unit
1 caused by inlining. The default value is 20 which limits unit
1 growth to 1.2 times the original size. Cold functions (either
1 marked cold via an attribute or by profile feedback) are not
1 accounted into the unit size.
1
1 'ipcp-unit-growth'
1 Specifies maximal overall growth of the compilation unit
1 caused by interprocedural constant propagation. The default
1 value is 10 which limits unit growth to 1.1 times the original
1 size.
1
1 'large-stack-frame'
1 The limit specifying large stack frames. While inlining the
1 algorithm is trying to not grow past this limit too much. The
1 default value is 256 bytes.
1
1 'large-stack-frame-growth'
1 Specifies maximal growth of large stack frames caused by
1 inlining in percents. The default value is 1000 which limits
1 large stack frame growth to 11 times the original size.
1
1 'max-inline-insns-recursive'
1 'max-inline-insns-recursive-auto'
1 Specifies the maximum number of instructions an out-of-line
1 copy of a self-recursive inline function can grow into by
1 performing recursive inlining.
1
1 '--param max-inline-insns-recursive' applies to functions
1 declared inline. For functions not declared inline, recursive
1 inlining happens only when '-finline-functions' (included in
1 '-O3') is enabled; '--param max-inline-insns-recursive-auto'
1 applies instead. The default value is 450.
1
1 'max-inline-recursive-depth'
1 'max-inline-recursive-depth-auto'
1 Specifies the maximum recursion depth used for recursive
1 inlining.
1
1 '--param max-inline-recursive-depth' applies to functions
1 declared inline. For functions not declared inline, recursive
1 inlining happens only when '-finline-functions' (included in
1 '-O3') is enabled; '--param max-inline-recursive-depth-auto'
1 applies instead. The default value is 8.
1
1 'min-inline-recursive-probability'
1 Recursive inlining is profitable only for function having deep
1 recursion in average and can hurt for function having little
1 recursion depth by increasing the prologue size or complexity
1 of function body to other optimizers.
1
1 When profile feedback is available (see '-fprofile-generate')
1 the actual recursion depth can be guessed from the probability
1 that function recurses via a given call expression. This
1 parameter limits inlining only to call expressions whose
1 probability exceeds the given threshold (in percents). The
1 default value is 10.
1
1 'early-inlining-insns'
1 Specify growth that the early inliner can make. In effect it
1 increases the amount of inlining for code having a large
1 abstraction penalty. The default value is 14.
1
1 'max-early-inliner-iterations'
1 Limit of iterations of the early inliner. This basically
1 bounds the number of nested indirect calls the early inliner
1 can resolve. Deeper chains are still handled by late
1 inlining.
1
1 'comdat-sharing-probability'
1 Probability (in percent) that C++ inline function with comdat
1 visibility are shared across multiple compilation units. The
1 default value is 20.
1
1 'profile-func-internal-id'
1 A parameter to control whether to use function internal id in
1 profile database lookup. If the value is 0, the compiler uses
1 an id that is based on function assembler name and filename,
1 which makes old profile data more tolerant to source changes
1 such as function reordering etc. The default value is 0.
1
1 'min-vect-loop-bound'
1 The minimum number of iterations under which loops are not
1 vectorized when '-ftree-vectorize' is used. The number of
1 iterations after vectorization needs to be greater than the
1 value specified by this option to allow vectorization. The
1 default value is 0.
1
1 'gcse-cost-distance-ratio'
1 Scaling factor in calculation of maximum distance an
1 expression can be moved by GCSE optimizations. This is
1 currently supported only in the code hoisting pass. The
1 bigger the ratio, the more aggressive code hoisting is with
1 simple expressions, i.e., the expressions that have cost less
1 than 'gcse-unrestricted-cost'. Specifying 0 disables hoisting
1 of simple expressions. The default value is 10.
1
1 'gcse-unrestricted-cost'
1 Cost, roughly measured as the cost of a single typical machine
1 instruction, at which GCSE optimizations do not constrain the
1 distance an expression can travel. This is currently
1 supported only in the code hoisting pass. The lesser the
1 cost, the more aggressive code hoisting is. Specifying 0
1 allows all expressions to travel unrestricted distances. The
1 default value is 3.
1
1 'max-hoist-depth'
1 The depth of search in the dominator tree for expressions to
1 hoist. This is used to avoid quadratic behavior in hoisting
1 algorithm. The value of 0 does not limit on the search, but
1 may slow down compilation of huge functions. The default
1 value is 30.
1
1 'max-tail-merge-comparisons'
1 The maximum amount of similar bbs to compare a bb with. This
1 is used to avoid quadratic behavior in tree tail merging. The
1 default value is 10.
1
1 'max-tail-merge-iterations'
1 The maximum amount of iterations of the pass over the
1 function. This is used to limit compilation time in tree tail
1 merging. The default value is 2.
1
1 'store-merging-allow-unaligned'
1 Allow the store merging pass to introduce unaligned stores if
1 it is legal to do so. The default value is 1.
1
1 'max-stores-to-merge'
1 The maximum number of stores to attempt to merge into wider
1 stores in the store merging pass. The minimum value is 2 and
1 the default is 64.
1
1 'max-unrolled-insns'
1 The maximum number of instructions that a loop may have to be
1 unrolled. If a loop is unrolled, this parameter also
1 determines how many times the loop code is unrolled.
1
1 'max-average-unrolled-insns'
1 The maximum number of instructions biased by probabilities of
1 their execution that a loop may have to be unrolled. If a
1 loop is unrolled, this parameter also determines how many
1 times the loop code is unrolled.
1
1 'max-unroll-times'
1 The maximum number of unrollings of a single loop.
1
1 'max-peeled-insns'
1 The maximum number of instructions that a loop may have to be
1 peeled. If a loop is peeled, this parameter also determines
1 how many times the loop code is peeled.
1
1 'max-peel-times'
1 The maximum number of peelings of a single loop.
1
1 'max-peel-branches'
1 The maximum number of branches on the hot path through the
1 peeled sequence.
1
1 'max-completely-peeled-insns'
1 The maximum number of insns of a completely peeled loop.
1
1 'max-completely-peel-times'
1 The maximum number of iterations of a loop to be suitable for
1 complete peeling.
1
1 'max-completely-peel-loop-nest-depth'
1 The maximum depth of a loop nest suitable for complete
1 peeling.
1
1 'max-unswitch-insns'
1 The maximum number of insns of an unswitched loop.
1
1 'max-unswitch-level'
1 The maximum number of branches unswitched in a single loop.
1
1 'max-loop-headers-insns'
1 The maximum number of insns in loop header duplicated by the
1 copy loop headers pass.
1
1 'lim-expensive'
1 The minimum cost of an expensive expression in the loop
1 invariant motion.
1
1 'iv-consider-all-candidates-bound'
1 Bound on number of candidates for induction variables, below
1 which all candidates are considered for each use in induction
1 variable optimizations. If there are more candidates than
1 this, only the most relevant ones are considered to avoid
1 quadratic time complexity.
1
1 'iv-max-considered-uses'
1 The induction variable optimizations give up on loops that
1 contain more induction variable uses.
1
1 'iv-always-prune-cand-set-bound'
1 If the number of candidates in the set is smaller than this
1 value, always try to remove unnecessary ivs from the set when
1 adding a new one.
1
1 'avg-loop-niter'
1 Average number of iterations of a loop.
1
1 'dse-max-object-size'
1 Maximum size (in bytes) of objects tracked bytewise by dead
1 store elimination. Larger values may result in larger
1 compilation times.
1
1 'scev-max-expr-size'
1 Bound on size of expressions used in the scalar evolutions
1 analyzer. Large expressions slow the analyzer.
1
1 'scev-max-expr-complexity'
1 Bound on the complexity of the expressions in the scalar
1 evolutions analyzer. Complex expressions slow the analyzer.
1
1 'max-tree-if-conversion-phi-args'
1 Maximum number of arguments in a PHI supported by TREE if
1 conversion unless the loop is marked with simd pragma.
1
1 'vect-max-version-for-alignment-checks'
1 The maximum number of run-time checks that can be performed
1 when doing loop versioning for alignment in the vectorizer.
1
1 'vect-max-version-for-alias-checks'
1 The maximum number of run-time checks that can be performed
1 when doing loop versioning for alias in the vectorizer.
1
1 'vect-max-peeling-for-alignment'
1 The maximum number of loop peels to enhance access alignment
1 for vectorizer. Value -1 means no limit.
1
1 'max-iterations-to-track'
1 The maximum number of iterations of a loop the brute-force
1 algorithm for analysis of the number of iterations of the loop
1 tries to evaluate.
1
1 'hot-bb-count-ws-permille'
1 A basic block profile count is considered hot if it
1 contributes to the given permillage (i.e. 0...1000) of the
1 entire profiled execution.
1
1 'hot-bb-frequency-fraction'
1 Select fraction of the entry block frequency of executions of
1 basic block in function given basic block needs to have to be
1 considered hot.
1
1 'max-predicted-iterations'
1 The maximum number of loop iterations we predict statically.
1 This is useful in cases where a function contains a single
1 loop with known bound and another loop with unknown bound.
1 The known number of iterations is predicted correctly, while
1 the unknown number of iterations average to roughly 10. This
1 means that the loop without bounds appears artificially cold
1 relative to the other one.
1
1 'builtin-expect-probability'
1 Control the probability of the expression having the specified
1 value. This parameter takes a percentage (i.e. 0 ... 100)
1 as input. The default probability of 90 is obtained
1 empirically.
1
1 'align-threshold'
1
1 Select fraction of the maximal frequency of executions of a
1 basic block in a function to align the basic block.
1
1 'align-loop-iterations'
1
1 A loop expected to iterate at least the selected number of
1 iterations is aligned.
1
1 'tracer-dynamic-coverage'
1 'tracer-dynamic-coverage-feedback'
1
1 This value is used to limit superblock formation once the
1 given percentage of executed instructions is covered. This
1 limits unnecessary code size expansion.
1
1 The 'tracer-dynamic-coverage-feedback' parameter is used only
1 when profile feedback is available. The real profiles (as
1 opposed to statically estimated ones) are much less balanced
1 allowing the threshold to be larger value.
1
1 'tracer-max-code-growth'
1 Stop tail duplication once code growth has reached given
1 percentage. This is a rather artificial limit, as most of the
1 duplicates are eliminated later in cross jumping, so it may be
1 set to much higher values than is the desired code growth.
1
1 'tracer-min-branch-ratio'
1
1 Stop reverse growth when the reverse probability of best edge
1 is less than this threshold (in percent).
1
1 'tracer-min-branch-probability'
1 'tracer-min-branch-probability-feedback'
1
1 Stop forward growth if the best edge has probability lower
1 than this threshold.
1
1 Similarly to 'tracer-dynamic-coverage' two parameters are
1 provided. 'tracer-min-branch-probability-feedback' is used
1 for compilation with profile feedback and
1 'tracer-min-branch-probability' compilation without. The
1 value for compilation with profile feedback needs to be more
1 conservative (higher) in order to make tracer effective.
1
1 'stack-clash-protection-guard-size'
1 Specify the size of the operating system provided stack guard
1 as 2 raised to NUM bytes. The default value is 12 (4096
1 bytes). Acceptable values are between 12 and 30. Higher
1 values may reduce the number of explicit probes, but a value
1 larger than the operating system provided guard will leave
1 code vulnerable to stack clash style attacks.
1
1 'stack-clash-protection-probe-interval'
1 Stack clash protection involves probing stack space as it is
1 allocated. This param controls the maximum distance between
1 probes into the stack as 2 raised to NUM bytes. Acceptable
1 values are between 10 and 16 and defaults to 12. Higher
1 values may reduce the number of explicit probes, but a value
1 larger than the operating system provided guard will leave
1 code vulnerable to stack clash style attacks.
1
1 'max-cse-path-length'
1
1 The maximum number of basic blocks on path that CSE considers.
1 The default is 10.
1
1 'max-cse-insns'
1 The maximum number of instructions CSE processes before
1 flushing. The default is 1000.
1
1 'ggc-min-expand'
1
1 GCC uses a garbage collector to manage its own memory
1 allocation. This parameter specifies the minimum percentage
1 by which the garbage collector's heap should be allowed to
1 expand between collections. Tuning this may improve
1 compilation speed; it has no effect on code generation.
1
1 The default is 30% + 70% * (RAM/1GB) with an upper bound of
1 100% when RAM >= 1GB. If 'getrlimit' is available, the notion
1 of "RAM" is the smallest of actual RAM and 'RLIMIT_DATA' or
1 'RLIMIT_AS'. If GCC is not able to calculate RAM on a
1 particular platform, the lower bound of 30% is used. Setting
1 this parameter and 'ggc-min-heapsize' to zero causes a full
1 collection to occur at every opportunity. This is extremely
1 slow, but can be useful for debugging.
1
1 'ggc-min-heapsize'
1
1 Minimum size of the garbage collector's heap before it begins
1 bothering to collect garbage. The first collection occurs
1 after the heap expands by 'ggc-min-expand'% beyond
1 'ggc-min-heapsize'. Again, tuning this may improve
1 compilation speed, and has no effect on code generation.
1
1 The default is the smaller of RAM/8, RLIMIT_RSS, or a limit
1 that tries to ensure that RLIMIT_DATA or RLIMIT_AS are not
1 exceeded, but with a lower bound of 4096 (four megabytes) and
1 an upper bound of 131072 (128 megabytes). If GCC is not able
1 to calculate RAM on a particular platform, the lower bound is
1 used. Setting this parameter very large effectively disables
1 garbage collection. Setting this parameter and
1 'ggc-min-expand' to zero causes a full collection to occur at
1 every opportunity.
1
1 'max-reload-search-insns'
1 The maximum number of instruction reload should look backward
1 for equivalent register. Increasing values mean more
1 aggressive optimization, making the compilation time increase
1 with probably slightly better performance. The default value
1 is 100.
1
1 'max-cselib-memory-locations'
1 The maximum number of memory locations cselib should take into
1 account. Increasing values mean more aggressive optimization,
1 making the compilation time increase with probably slightly
1 better performance. The default value is 500.
1
1 'max-sched-ready-insns'
1 The maximum number of instructions ready to be issued the
1 scheduler should consider at any given time during the first
1 scheduling pass. Increasing values mean more thorough
1 searches, making the compilation time increase with probably
1 little benefit. The default value is 100.
1
1 'max-sched-region-blocks'
1 The maximum number of blocks in a region to be considered for
1 interblock scheduling. The default value is 10.
1
1 'max-pipeline-region-blocks'
1 The maximum number of blocks in a region to be considered for
1 pipelining in the selective scheduler. The default value is
1 15.
1
1 'max-sched-region-insns'
1 The maximum number of insns in a region to be considered for
1 interblock scheduling. The default value is 100.
1
1 'max-pipeline-region-insns'
1 The maximum number of insns in a region to be considered for
1 pipelining in the selective scheduler. The default value is
1 200.
1
1 'min-spec-prob'
1 The minimum probability (in percents) of reaching a source
1 block for interblock speculative scheduling. The default
1 value is 40.
1
1 'max-sched-extend-regions-iters'
1 The maximum number of iterations through CFG to extend
1 regions. A value of 0 (the default) disables region
1 extensions.
1
1 'max-sched-insn-conflict-delay'
1 The maximum conflict delay for an insn to be considered for
1 speculative motion. The default value is 3.
1
1 'sched-spec-prob-cutoff'
1 The minimal probability of speculation success (in percents),
1 so that speculative insns are scheduled. The default value is
1 40.
1
1 'sched-state-edge-prob-cutoff'
1 The minimum probability an edge must have for the scheduler to
1 save its state across it. The default value is 10.
1
1 'sched-mem-true-dep-cost'
1 Minimal distance (in CPU cycles) between store and load
1 targeting same memory locations. The default value is 1.
1
1 'selsched-max-lookahead'
1 The maximum size of the lookahead window of selective
1 scheduling. It is a depth of search for available
1 instructions. The default value is 50.
1
1 'selsched-max-sched-times'
1 The maximum number of times that an instruction is scheduled
1 during selective scheduling. This is the limit on the number
1 of iterations through which the instruction may be pipelined.
1 The default value is 2.
1
1 'selsched-insns-to-rename'
1 The maximum number of best instructions in the ready list that
1 are considered for renaming in the selective scheduler. The
1 default value is 2.
1
1 'sms-min-sc'
1 The minimum value of stage count that swing modulo scheduler
1 generates. The default value is 2.
1
1 'max-last-value-rtl'
1 The maximum size measured as number of RTLs that can be
1 recorded in an expression in combiner for a pseudo register as
1 last known value of that register. The default is 10000.
1
1 'max-combine-insns'
1 The maximum number of instructions the RTL combiner tries to
1 combine. The default value is 2 at '-Og' and 4 otherwise.
1
1 'integer-share-limit'
1 Small integer constants can use a shared data structure,
1 reducing the compiler's memory usage and increasing its speed.
1 This sets the maximum value of a shared integer constant. The
1 default value is 256.
1
1 'ssp-buffer-size'
1 The minimum size of buffers (i.e. arrays) that receive stack
1 smashing protection when '-fstack-protection' is used.
1
1 'min-size-for-stack-sharing'
1 The minimum size of variables taking part in stack slot
1 sharing when not optimizing. The default value is 32.
1
1 'max-jump-thread-duplication-stmts'
1 Maximum number of statements allowed in a block that needs to
1 be duplicated when threading jumps.
1
1 'max-fields-for-field-sensitive'
1 Maximum number of fields in a structure treated in a field
1 sensitive manner during pointer analysis. The default is zero
1 for '-O0' and '-O1', and 100 for '-Os', '-O2', and '-O3'.
1
1 'prefetch-latency'
1 Estimate on average number of instructions that are executed
1 before prefetch finishes. The distance prefetched ahead is
1 proportional to this constant. Increasing this number may
1 also lead to less streams being prefetched (see
1 'simultaneous-prefetches').
1
1 'simultaneous-prefetches'
1 Maximum number of prefetches that can run at the same time.
1
1 'l1-cache-line-size'
1 The size of cache line in L1 cache, in bytes.
1
1 'l1-cache-size'
1 The size of L1 cache, in kilobytes.
1
1 'l2-cache-size'
1 The size of L2 cache, in kilobytes.
1
1 'loop-interchange-max-num-stmts'
1 The maximum number of stmts in a loop to be interchanged.
1
1 'loop-interchange-stride-ratio'
1 The minimum ratio between stride of two loops for interchange
1 to be profitable.
1
1 'min-insn-to-prefetch-ratio'
1 The minimum ratio between the number of instructions and the
1 number of prefetches to enable prefetching in a loop.
1
1 'prefetch-min-insn-to-mem-ratio'
1 The minimum ratio between the number of instructions and the
1 number of memory references to enable prefetching in a loop.
1
1 'use-canonical-types'
1 Whether the compiler should use the "canonical" type system.
1 By default, this should always be 1, which uses a more
1 efficient internal mechanism for comparing types in C++ and
1 Objective-C++. However, if bugs in the canonical type system
1 are causing compilation failures, set this value to 0 to
1 disable canonical types.
1
1 'switch-conversion-max-branch-ratio'
1 Switch initialization conversion refuses to create arrays that
1 are bigger than 'switch-conversion-max-branch-ratio' times the
1 number of branches in the switch.
1
1 'max-partial-antic-length'
1 Maximum length of the partial antic set computed during the
1 tree partial redundancy elimination optimization
1 ('-ftree-pre') when optimizing at '-O3' and above. For some
1 sorts of source code the enhanced partial redundancy
1 elimination optimization can run away, consuming all of the
1 memory available on the host machine. This parameter sets a
1 limit on the length of the sets that are computed, which
1 prevents the runaway behavior. Setting a value of 0 for this
1 parameter allows an unlimited set length.
1
1 'sccvn-max-scc-size'
1 Maximum size of a strongly connected component (SCC) during
1 SCCVN processing. If this limit is hit, SCCVN processing for
1 the whole function is not done and optimizations depending on
1 it are disabled. The default maximum SCC size is 10000.
1
1 'sccvn-max-alias-queries-per-access'
1 Maximum number of alias-oracle queries we perform when looking
1 for redundancies for loads and stores. If this limit is hit
1 the search is aborted and the load or store is not considered
1 redundant. The number of queries is algorithmically limited
1 to the number of stores on all paths from the load to the
1 function entry. The default maximum number of queries is
1 1000.
1
1 'ira-max-loops-num'
1 IRA uses regional register allocation by default. If a
1 function contains more loops than the number given by this
1 parameter, only at most the given number of the most
1 frequently-executed loops form regions for regional register
1 allocation. The default value of the parameter is 100.
1
1 'ira-max-conflict-table-size'
1 Although IRA uses a sophisticated algorithm to compress the
1 conflict table, the table can still require excessive amounts
1 of memory for huge functions. If the conflict table for a
1 function could be more than the size in MB given by this
1 parameter, the register allocator instead uses a faster,
1 simpler, and lower-quality algorithm that does not require
1 building a pseudo-register conflict table. The default value
1 of the parameter is 2000.
1
1 'ira-loop-reserved-regs'
1 IRA can be used to evaluate more accurate register pressure in
1 loops for decisions to move loop invariants (see '-O3'). The
1 number of available registers reserved for some other purposes
1 is given by this parameter. The default value of the
1 parameter is 2, which is the minimal number of registers
1 needed by typical instructions. This value is the best found
1 from numerous experiments.
1
1 'lra-inheritance-ebb-probability-cutoff'
1 LRA tries to reuse values reloaded in registers in subsequent
1 insns. This optimization is called inheritance. EBB is used
1 as a region to do this optimization. The parameter defines a
1 minimal fall-through edge probability in percentage used to
1 add BB to inheritance EBB in LRA. The default value of the
1 parameter is 40. The value was chosen from numerous runs of
1 SPEC2000 on x86-64.
1
1 'loop-invariant-max-bbs-in-loop'
1 Loop invariant motion can be very expensive, both in
1 compilation time and in amount of needed compile-time memory,
1 with very large loops. Loops with more basic blocks than this
1 parameter won't have loop invariant motion optimization
1 performed on them. The default value of the parameter is 1000
1 for '-O1' and 10000 for '-O2' and above.
1
1 'loop-max-datarefs-for-datadeps'
1 Building data dependencies is expensive for very large loops.
1 This parameter limits the number of data references in loops
1 that are considered for data dependence analysis. These large
1 loops are no handled by the optimizations using loop data
1 dependencies. The default value is 1000.
1
1 'max-vartrack-size'
1 Sets a maximum number of hash table slots to use during
1 variable tracking dataflow analysis of any function. If this
1 limit is exceeded with variable tracking at assignments
1 enabled, analysis for that function is retried without it,
1 after removing all debug insns from the function. If the
1 limit is exceeded even without debug insns, var tracking
1 analysis is completely disabled for the function. Setting the
1 parameter to zero makes it unlimited.
1
1 'max-vartrack-expr-depth'
1 Sets a maximum number of recursion levels when attempting to
1 map variable names or debug temporaries to value expressions.
1 This trades compilation time for more complete debug
1 information. If this is set too low, value expressions that
1 are available and could be represented in debug information
1 may end up not being used; setting this higher may enable the
1 compiler to find more complex debug expressions, but compile
1 time and memory use may grow. The default is 12.
1
1 'max-debug-marker-count'
1 Sets a threshold on the number of debug markers (e.g. begin
1 stmt markers) to avoid complexity explosion at inlining or
1 expanding to RTL. If a function has more such gimple stmts
1 than the set limit, such stmts will be dropped from the
1 inlined copy of a function, and from its RTL expansion. The
1 default is 100000.
1
1 'min-nondebug-insn-uid'
1 Use uids starting at this parameter for nondebug insns. The
1 range below the parameter is reserved exclusively for debug
1 insns created by '-fvar-tracking-assignments', but debug insns
1 may get (non-overlapping) uids above it if the reserved range
1 is exhausted.
1
1 'ipa-sra-ptr-growth-factor'
1 IPA-SRA replaces a pointer to an aggregate with one or more
1 new parameters only when their cumulative size is less or
1 equal to 'ipa-sra-ptr-growth-factor' times the size of the
1 original pointer parameter.
1
1 'sra-max-scalarization-size-Ospeed'
1 'sra-max-scalarization-size-Osize'
1 The two Scalar Reduction of Aggregates passes (SRA and
1 IPA-SRA) aim to replace scalar parts of aggregates with uses
1 of independent scalar variables. These parameters control the
1 maximum size, in storage units, of aggregate which is
1 considered for replacement when compiling for speed
1 ('sra-max-scalarization-size-Ospeed') or size
1 ('sra-max-scalarization-size-Osize') respectively.
1
1 'sra-max-propagations'
1 The maximum number of artificial accesses that Scalar
1 Replacement of Aggregates (SRA) will track, per one local
1 variable, in order to facilitate copy propagation.
1
1 'tm-max-aggregate-size'
1 When making copies of thread-local variables in a transaction,
1 this parameter specifies the size in bytes after which
1 variables are saved with the logging functions as opposed to
1 save/restore code sequence pairs. This option only applies
1 when using '-fgnu-tm'.
1
1 'graphite-max-nb-scop-params'
1 To avoid exponential effects in the Graphite loop transforms,
1 the number of parameters in a Static Control Part (SCoP) is
1 bounded. The default value is 10 parameters, a value of zero
1 can be used to lift the bound. A variable whose value is
1 unknown at compilation time and defined outside a SCoP is a
1 parameter of the SCoP.
1
1 'loop-block-tile-size'
1 Loop blocking or strip mining transforms, enabled with
1 '-floop-block' or '-floop-strip-mine', strip mine each loop in
1 the loop nest by a given number of iterations. The strip
1 length can be changed using the 'loop-block-tile-size'
1 parameter. The default value is 51 iterations.
1
1 'loop-unroll-jam-size'
1 Specify the unroll factor for the '-floop-unroll-and-jam'
1 option. The default value is 4.
1
1 'loop-unroll-jam-depth'
1 Specify the dimension to be unrolled (counting from the most
1 inner loop) for the '-floop-unroll-and-jam'. The default
1 value is 2.
1
1 'ipa-cp-value-list-size'
1 IPA-CP attempts to track all possible values and types passed
1 to a function's parameter in order to propagate them and
1 perform devirtualization. 'ipa-cp-value-list-size' is the
1 maximum number of values and types it stores per one formal
1 parameter of a function.
1
1 'ipa-cp-eval-threshold'
1 IPA-CP calculates its own score of cloning profitability
1 heuristics and performs those cloning opportunities with
1 scores that exceed 'ipa-cp-eval-threshold'.
1
1 'ipa-cp-recursion-penalty'
1 Percentage penalty the recursive functions will receive when
1 they are evaluated for cloning.
1
1 'ipa-cp-single-call-penalty'
1 Percentage penalty functions containing a single call to
1 another function will receive when they are evaluated for
1 cloning.
1
1 'ipa-max-agg-items'
1 IPA-CP is also capable to propagate a number of scalar values
1 passed in an aggregate. 'ipa-max-agg-items' controls the
1 maximum number of such values per one parameter.
1
1 'ipa-cp-loop-hint-bonus'
1 When IPA-CP determines that a cloning candidate would make the
1 number of iterations of a loop known, it adds a bonus of
1 'ipa-cp-loop-hint-bonus' to the profitability score of the
1 candidate.
1
1 'ipa-cp-array-index-hint-bonus'
1 When IPA-CP determines that a cloning candidate would make the
1 index of an array access known, it adds a bonus of
1 'ipa-cp-array-index-hint-bonus' to the profitability score of
1 the candidate.
1
1 'ipa-max-aa-steps'
1 During its analysis of function bodies, IPA-CP employs alias
1 analysis in order to track values pointed to by function
1 parameters. In order not spend too much time analyzing huge
1 functions, it gives up and consider all memory clobbered after
1 examining 'ipa-max-aa-steps' statements modifying memory.
1
1 'lto-partitions'
1 Specify desired number of partitions produced during WHOPR
1 compilation. The number of partitions should exceed the
1 number of CPUs used for compilation. The default value is 32.
1
1 'lto-min-partition'
1 Size of minimal partition for WHOPR (in estimated
1 instructions). This prevents expenses of splitting very small
1 programs into too many partitions.
1
1 'lto-max-partition'
1 Size of max partition for WHOPR (in estimated instructions).
1 to provide an upper bound for individual size of partition.
1 Meant to be used only with balanced partitioning.
1
1 'cxx-max-namespaces-for-diagnostic-help'
1 The maximum number of namespaces to consult for suggestions
1 when C++ name lookup fails for an identifier. The default is
1 1000.
1
1 'sink-frequency-threshold'
1 The maximum relative execution frequency (in percents) of the
1 target block relative to a statement's original block to allow
1 statement sinking of a statement. Larger numbers result in
1 more aggressive statement sinking. The default value is 75.
1 A small positive adjustment is applied for statements with
1 memory operands as those are even more profitable so sink.
1
1 'max-stores-to-sink'
1 The maximum number of conditional store pairs that can be
1 sunk. Set to 0 if either vectorization ('-ftree-vectorize')
1 or if-conversion ('-ftree-loop-if-convert') is disabled. The
1 default is 2.
1
1 'allow-store-data-races'
1 Allow optimizers to introduce new data races on stores. Set
1 to 1 to allow, otherwise to 0. This option is enabled by
1 default at optimization level '-Ofast'.
1
1 'case-values-threshold'
1 The smallest number of different values for which it is best
1 to use a jump-table instead of a tree of conditional branches.
1 If the value is 0, use the default for the machine. The
1 default is 0.
1
1 'tree-reassoc-width'
1 Set the maximum number of instructions executed in parallel in
1 reassociated tree. This parameter overrides target dependent
1 heuristics used by default if has non zero value.
1
1 'sched-pressure-algorithm'
1 Choose between the two available implementations of
1 '-fsched-pressure'. Algorithm 1 is the original
1 implementation and is the more likely to prevent instructions
1 from being reordered. Algorithm 2 was designed to be a
1 compromise between the relatively conservative approach taken
1 by algorithm 1 and the rather aggressive approach taken by the
1 default scheduler. It relies more heavily on having a regular
1 register file and accurate register pressure classes. See
1 'haifa-sched.c' in the GCC sources for more details.
1
1 The default choice depends on the target.
1
1 'max-slsr-cand-scan'
1 Set the maximum number of existing candidates that are
1 considered when seeking a basis for a new straight-line
1 strength reduction candidate.
1
1 'asan-globals'
1 Enable buffer overflow detection for global objects. This
1 kind of protection is enabled by default if you are using
1 '-fsanitize=address' option. To disable global objects
1 protection use '--param asan-globals=0'.
1
1 'asan-stack'
1 Enable buffer overflow detection for stack objects. This kind
1 of protection is enabled by default when using
1 '-fsanitize=address'. To disable stack protection use
1 '--param asan-stack=0' option.
1
1 'asan-instrument-reads'
1 Enable buffer overflow detection for memory reads. This kind
1 of protection is enabled by default when using
1 '-fsanitize=address'. To disable memory reads protection use
1 '--param asan-instrument-reads=0'.
1
1 'asan-instrument-writes'
1 Enable buffer overflow detection for memory writes. This kind
1 of protection is enabled by default when using
1 '-fsanitize=address'. To disable memory writes protection use
1 '--param asan-instrument-writes=0' option.
1
1 'asan-memintrin'
1 Enable detection for built-in functions. This kind of
1 protection is enabled by default when using
1 '-fsanitize=address'. To disable built-in functions
1 protection use '--param asan-memintrin=0'.
1
1 'asan-use-after-return'
1 Enable detection of use-after-return. This kind of protection
1 is enabled by default when using the '-fsanitize=address'
1 option. To disable it use '--param asan-use-after-return=0'.
1
1 Note: By default the check is disabled at run time. To enable
1 it, add 'detect_stack_use_after_return=1' to the environment
1 variable 'ASAN_OPTIONS'.
1
1 'asan-instrumentation-with-call-threshold'
1 If number of memory accesses in function being instrumented is
1 greater or equal to this number, use callbacks instead of
1 inline checks. E.g. to disable inline code use '--param
1 asan-instrumentation-with-call-threshold=0'.
1
1 'use-after-scope-direct-emission-threshold'
1 If the size of a local variable in bytes is smaller or equal
1 to this number, directly poison (or unpoison) shadow memory
1 instead of using run-time callbacks. The default value is
1 256.
1
1 'chkp-max-ctor-size'
1 Static constructors generated by Pointer Bounds Checker may
1 become very large and significantly increase compile time at
1 optimization level '-O1' and higher. This parameter is a
1 maximum number of statements in a single generated
1 constructor. Default value is 5000.
1
1 'max-fsm-thread-path-insns'
1 Maximum number of instructions to copy when duplicating blocks
1 on a finite state automaton jump thread path. The default is
1 100.
1
1 'max-fsm-thread-length'
1 Maximum number of basic blocks on a finite state automaton
1 jump thread path. The default is 10.
1
1 'max-fsm-thread-paths'
1 Maximum number of new jump thread paths to create for a finite
1 state automaton. The default is 50.
1
1 'parloops-chunk-size'
1 Chunk size of omp schedule for loops parallelized by parloops.
1 The default is 0.
1
1 'parloops-schedule'
1 Schedule type of omp schedule for loops parallelized by
1 parloops (static, dynamic, guided, auto, runtime). The
1 default is static.
1
1 'parloops-min-per-thread'
1 The minimum number of iterations per thread of an innermost
1 parallelized loop for which the parallelized variant is
1 prefered over the single threaded one. The default is 100.
1 Note that for a parallelized loop nest the minimum number of
1 iterations of the outermost loop per thread is two.
1
1 'max-ssa-name-query-depth'
1 Maximum depth of recursion when querying properties of SSA
1 names in things like fold routines. One level of recursion
1 corresponds to following a use-def chain.
1
1 'hsa-gen-debug-stores'
1 Enable emission of special debug stores within HSA kernels
1 which are then read and reported by libgomp plugin.
1 Generation of these stores is disabled by default, use
1 '--param hsa-gen-debug-stores=1' to enable it.
1
1 'max-speculative-devirt-maydefs'
1 The maximum number of may-defs we analyze when looking for a
1 must-def specifying the dynamic type of an object that invokes
1 a virtual call we may be able to devirtualize speculatively.
1
1 'max-vrp-switch-assertions'
1 The maximum number of assertions to add along the default edge
1 of a switch statement during VRP. The default is 10.
1
1 'unroll-jam-min-percent'
1 The minimum percentage of memory references that must be
1 optimized away for the unroll-and-jam transformation to be
1 considered profitable.
1
1 'unroll-jam-max-unroll'
1 The maximum number of times the outer loop should be unrolled
1 by the unroll-and-jam transformation.
1