gcc: AArch64 Options
1
1 3.18.1 AArch64 Options
1 ----------------------
1
1 These options are defined for AArch64 implementations:
1
1 '-mabi=NAME'
1 Generate code for the specified data model. Permissible values are
1 'ilp32' for SysV-like data model where int, long int and pointers
1 are 32 bits, and 'lp64' for SysV-like data model where int is 32
1 bits, but long int and pointers are 64 bits.
1
1 The default depends on the specific target configuration. Note
1 that the LP64 and ILP32 ABIs are not link-compatible; you must
1 compile your entire program with the same ABI, and link with a
1 compatible set of libraries.
1
1 '-mbig-endian'
1 Generate big-endian code. This is the default when GCC is
1 configured for an 'aarch64_be-*-*' target.
1
1 '-mgeneral-regs-only'
1 Generate code which uses only the general-purpose registers. This
1 will prevent the compiler from using floating-point and Advanced
1 SIMD registers but will not impose any restrictions on the
1 assembler.
1
1 '-mlittle-endian'
1 Generate little-endian code. This is the default when GCC is
1 configured for an 'aarch64-*-*' but not an 'aarch64_be-*-*' target.
1
1 '-mcmodel=tiny'
1 Generate code for the tiny code model. The program and its
1 statically defined symbols must be within 1MB of each other.
1 Programs can be statically or dynamically linked.
1
1 '-mcmodel=small'
1 Generate code for the small code model. The program and its
1 statically defined symbols must be within 4GB of each other.
1 Programs can be statically or dynamically linked. This is the
1 default code model.
1
1 '-mcmodel=large'
1 Generate code for the large code model. This makes no assumptions
1 about addresses and sizes of sections. Programs can be statically
1 linked only.
1
1 '-mstrict-align'
1 Avoid generating memory accesses that may not be aligned on a
1 natural object boundary as described in the architecture
1 specification.
1
1 '-momit-leaf-frame-pointer'
1 '-mno-omit-leaf-frame-pointer'
1 Omit or keep the frame pointer in leaf functions. The former
1 behavior is the default.
1
1 '-mtls-dialect=desc'
1 Use TLS descriptors as the thread-local storage mechanism for
1 dynamic accesses of TLS variables. This is the default.
1
1 '-mtls-dialect=traditional'
1 Use traditional TLS as the thread-local storage mechanism for
1 dynamic accesses of TLS variables.
1
1 '-mtls-size=SIZE'
1 Specify bit size of immediate TLS offsets. Valid values are 12,
1 24, 32, 48. This option requires binutils 2.26 or newer.
1
1 '-mfix-cortex-a53-835769'
1 '-mno-fix-cortex-a53-835769'
1 Enable or disable the workaround for the ARM Cortex-A53 erratum
1 number 835769. This involves inserting a NOP instruction between
1 memory instructions and 64-bit integer multiply-accumulate
1 instructions.
1
1 '-mfix-cortex-a53-843419'
1 '-mno-fix-cortex-a53-843419'
1 Enable or disable the workaround for the ARM Cortex-A53 erratum
1 number 843419. This erratum workaround is made at link time and
1 this will only pass the corresponding flag to the linker.
1
1 '-mlow-precision-recip-sqrt'
1 '-mno-low-precision-recip-sqrt'
1 Enable or disable the reciprocal square root approximation. This
1 option only has an effect if '-ffast-math' or
1 '-funsafe-math-optimizations' is used as well. Enabling this
1 reduces precision of reciprocal square root results to about 16
1 bits for single precision and to 32 bits for double precision.
1
1 '-mlow-precision-sqrt'
1 '-mno-low-precision-sqrt'
1 Enable or disable the square root approximation. This option only
1 has an effect if '-ffast-math' or '-funsafe-math-optimizations' is
1 used as well. Enabling this reduces precision of square root
1 results to about 16 bits for single precision and to 32 bits for
1 double precision. If enabled, it implies
1 '-mlow-precision-recip-sqrt'.
1
1 '-mlow-precision-div'
1 '-mno-low-precision-div'
1 Enable or disable the division approximation. This option only has
1 an effect if '-ffast-math' or '-funsafe-math-optimizations' is used
1 as well. Enabling this reduces precision of division results to
1 about 16 bits for single precision and to 32 bits for double
1 precision.
1
1 '-moutline-atomics'
1 '-mno-outline-atomics'
1 Enable or disable calls to out-of-line helpers to implement atomic
1 operations. These helpers will, at runtime, determine if the LSE
1 instructions from ARMv8.1-A can be used; if not, they will use the
1 load/store-exclusive instructions that are present in the base
1 ARMv8.0 ISA.
1
1 This option is only applicable when compiling for the base ARMv8.0
1 instruction set. If using a later revision, e.g.
1 '-march=armv8.1-a' or '-march=armv8-a+lse', the ARMv8.1-Atomics
1 instructions will be used directly. The same applies when using
1 '-mcpu=' when the selected cpu supports the 'lse' feature.
1
1 '-march=NAME'
1 Specify the name of the target architecture and, optionally, one or
1 more feature modifiers. This option has the form
1 '-march=ARCH{+[no]FEATURE}*'.
1
1 The permissible values for ARCH are 'armv8-a', 'armv8.1-a',
1 'armv8.2-a', 'armv8.3-a' or 'armv8.4-a' or NATIVE.
1
1 The value 'armv8.4-a' implies 'armv8.3-a' and enables compiler
1 support for the ARMv8.4-A architecture extensions.
1
1 The value 'armv8.3-a' implies 'armv8.2-a' and enables compiler
1 support for the ARMv8.3-A architecture extensions.
1
1 The value 'armv8.2-a' implies 'armv8.1-a' and enables compiler
1 support for the ARMv8.2-A architecture extensions.
1
1 The value 'armv8.1-a' implies 'armv8-a' and enables compiler
1 support for the ARMv8.1-A architecture extension. In particular,
1 it enables the '+crc', '+lse', and '+rdma' features.
1
1 The value 'native' is available on native AArch64 GNU/Linux and
1 causes the compiler to pick the architecture of the host system.
1 This option has no effect if the compiler is unable to recognize
1 the architecture of the host system,
1
1 The permissible values for FEATURE are listed in the sub-section on
11 ⇒'-march' and '-mcpu' Feature Modifiers
aarch64-feature-modifiers. Where conflicting feature modifiers are
1 specified, the right-most feature is used.
1
1 GCC uses NAME to determine what kind of instructions it can emit
1 when generating assembly code. If '-march' is specified without
1 either of '-mtune' or '-mcpu' also being specified, the code is
1 tuned to perform well across a range of target processors
1 implementing the target architecture.
1
1 '-mtune=NAME'
1 Specify the name of the target processor for which GCC should tune
1 the performance of the code. Permissible values for this option
1 are: 'generic', 'cortex-a35', 'cortex-a53', 'cortex-a55',
1 'cortex-a57', 'cortex-a72', 'cortex-a73', 'cortex-a75',
1 'cortex-a76', 'ares', 'neoverse-n1', 'neoverse-n2', 'neoverse-v1',
1 'zeus', 'neoverse-512tvb', 'exynos-m1', 'falkor', 'qdf24xx',
1 'saphira', 'xgene1', 'vulcan', 'thunderx', 'thunderxt88',
1 'thunderxt88p1', 'thunderxt81', 'thunderxt83', 'thunderx2t99',
1 'cortex-a57.cortex-a53', 'cortex-a72.cortex-a53',
1 'cortex-a73.cortex-a35', 'cortex-a73.cortex-a53',
1 'cortex-a75.cortex-a55', 'native'.
1
1 The values 'cortex-a57.cortex-a53', 'cortex-a72.cortex-a53',
1 'cortex-a73.cortex-a35', 'cortex-a73.cortex-a53',
1 'cortex-a75.cortex-a55' specify that GCC should tune for a
1 big.LITTLE system.
1
1 The value 'neoverse-512tvb' specifies that GCC should tune for
1 Neoverse cores that (a) implement SVE and (b) have a total vector
1 bandwidth of 512 bits per cycle. In other words, the option tells
1 GCC to tune for Neoverse cores that can execute 4 128-bit Advanced
1 SIMD arithmetic instructions a cycle and that can execute an
1 equivalent number of SVE arithmetic instructions per cycle (2 for
1 256-bit SVE, 4 for 128-bit SVE). This is more general than tuning
1 for a specific core like Neoverse V1 but is more specific than the
1 default tuning described below.
1
1 Additionally on native AArch64 GNU/Linux systems the value 'native'
1 tunes performance to the host system. This option has no effect if
1 the compiler is unable to recognize the processor of the host
1 system.
1
1 Where none of '-mtune=', '-mcpu=' or '-march=' are specified, the
1 code is tuned to perform well across a range of target processors.
1
1 This option cannot be suffixed by feature modifiers.
1
1 '-mcpu=NAME'
1 Specify the name of the target processor, optionally suffixed by
1 one or more feature modifiers. This option has the form
1 '-mcpu=CPU{+[no]FEATURE}*', where the permissible values for CPU
1 are the same as those available for '-mtune'. The permissible
11 values for FEATURE are documented in the sub-section on ⇒
'-march' and '-mcpu' Feature Modifiers aarch64-feature-modifiers.
1 Where conflicting feature modifiers are specified, the right-most
1 feature is used.
1
1 GCC uses NAME to determine what kind of instructions it can emit
1 when generating assembly code (as if by '-march') and to determine
1 the target processor for which to tune for performance (as if by
1 '-mtune'). Where this option is used in conjunction with '-march'
1 or '-mtune', those options take precedence over the appropriate
1 part of this option.
1
1 '-mcpu=neoverse-512tvb' is special in that it does not refer to a
1 specific core, but instead refers to all Neoverse cores that (a)
1 implement SVE and (b) have a total vector bandwidth of 512 bits a
1 cycle. Unless overridden by '-march', '-mcpu=neoverse-512tvb'
1 generates code that can run on a Neoverse V1 core, since Neoverse
1 V1 is the first Neoverse core with these properties. Unless
1 overridden by '-mtune', '-mcpu=neoverse-512tvb' tunes code in the
1 same way as for '-mtune=neoverse-512tvb'.
1
1 '-moverride=STRING'
1 Override tuning decisions made by the back-end in response to a
1 '-mtune=' switch. The syntax, semantics, and accepted values for
1 STRING in this option are not guaranteed to be consistent across
1 releases.
1
1 This option is only intended to be useful when developing GCC.
1
1 '-mverbose-cost-dump'
1 Enable verbose cost model dumping in the debug dump files. This
1 option is provided for use in debugging the compiler.
1
1 '-mpc-relative-literal-loads'
1 '-mno-pc-relative-literal-loads'
1 Enable or disable PC-relative literal loads. With this option
1 literal pools are accessed using a single instruction and emitted
1 after each function. This limits the maximum size of functions to
1 1MB. This is enabled by default for '-mcmodel=tiny'.
1
1 '-msign-return-address=SCOPE'
1 Select the function scope on which return address signing will be
1 applied. Permissible values are 'none', which disables return
1 address signing, 'non-leaf', which enables pointer signing for
1 functions which are not leaf functions, and 'all', which enables
1 pointer signing for all functions. The default value is 'none'.
1
1 '-msve-vector-bits=BITS'
1 Specify the number of bits in an SVE vector register. This option
1 only has an effect when SVE is enabled.
1
1 GCC supports two forms of SVE code generation: "vector-length
1 agnostic" output that works with any size of vector register and
1 "vector-length specific" output that allows GCC to make assumptions
1 about the vector length when it is useful for optimization reasons.
1 The possible values of 'bits' are: 'scalable', '128', '256', '512',
1 '1024' and '2048'. Specifying 'scalable' selects vector-length
1 agnostic output. At present '-msve-vector-bits=128' also generates
1 vector-length agnostic output. All other values generate
1 vector-length specific code. The behavior of these values may
1 change in future releases and no value except 'scalable' should be
1 relied on for producing code that is portable across different
1 hardware SVE vector lengths.
1
1 The default is '-msve-vector-bits=scalable', which produces
1 vector-length agnostic code.
1
1 3.18.1.1 '-march' and '-mcpu' Feature Modifiers
1 ...............................................
1
1 Feature modifiers used with '-march' and '-mcpu' can be any of the
1 following and their inverses 'noFEATURE':
1
1 'crc'
1 Enable CRC extension. This is on by default for
1 '-march=armv8.1-a'.
1 'crypto'
1 Enable Crypto extension. This also enables Advanced SIMD and
1 floating-point instructions.
1 'fp'
1 Enable floating-point instructions. This is on by default for all
1 possible values for options '-march' and '-mcpu'.
1 'simd'
1 Enable Advanced SIMD instructions. This also enables
1 floating-point instructions. This is on by default for all
1 possible values for options '-march' and '-mcpu'.
1 'sve'
1 Enable Scalable Vector Extension instructions. This also enables
1 Advanced SIMD and floating-point instructions.
1 'lse'
1 Enable Large System Extension instructions. This is on by default
1 for '-march=armv8.1-a'.
1 'rdma'
1 Enable Round Double Multiply Accumulate instructions. This is on
1 by default for '-march=armv8.1-a'.
1 'fp16'
1 Enable FP16 extension. This also enables floating-point
1 instructions.
1 'fp16fml'
1 Enable FP16 fmla extension. This also enables FP16 extensions and
1 floating-point instructions. This option is enabled by default for
1 '-march=armv8.4-a'. Use of this option with architectures prior to
1 Armv8.2-A is not supported.
1
1 'rcpc'
1 Enable the RcPc extension. This does not change code generation
1 from GCC, but is passed on to the assembler, enabling inline asm
1 statements to use instructions from the RcPc extension.
1 'dotprod'
1 Enable the Dot Product extension. This also enables Advanced SIMD
1 instructions.
1 'aes'
1 Enable the Armv8-a aes and pmull crypto extension. This also
1 enables Advanced SIMD instructions.
1 'sha2'
1 Enable the Armv8-a sha2 crypto extension. This also enables
1 Advanced SIMD instructions.
1 'sha3'
1 Enable the sha512 and sha3 crypto extension. This also enables
1 Advanced SIMD instructions. Use of this option with architectures
1 prior to Armv8.2-A is not supported.
1 'sm4'
1 Enable the sm3 and sm4 crypto extension. This also enables
1 Advanced SIMD instructions. Use of this option with architectures
1 prior to Armv8.2-A is not supported.
1
1 Feature 'crypto' implies 'aes', 'sha2', and 'simd', which implies 'fp'.
1 Conversely, 'nofp' implies 'nosimd', which implies 'nocrypto', 'noaes'
1 and 'nosha2'.
1