gcc: Nvidia PTX Options
1
1 3.18.33 Nvidia PTX Options
1 --------------------------
1
1 These options are defined for Nvidia PTX:
1
1 '-m32'
1 '-m64'
1 Generate code for 32-bit or 64-bit ABI.
1
1 '-mmainkernel'
1 Link in code for a __main kernel. This is for stand-alone instead
1 of offloading execution.
1
1 '-moptimize'
1 Apply partitioned execution optimizations. This is the default
1 when any level of optimization is selected.
1
1 '-msoft-stack'
1 Generate code that does not use '.local' memory directly for stack
1 storage. Instead, a per-warp stack pointer is maintained
1 explicitly. This enables variable-length stack allocation (with
1 variable-length arrays or 'alloca'), and when global memory is used
1 for underlying storage, makes it possible to access automatic
1 variables from other threads, or with atomic instructions. This
1 code generation variant is used for OpenMP offloading, but the
1 option is exposed on its own for the purpose of testing the
1 compiler; to generate code suitable for linking into programs using
1 OpenMP offloading, use option '-mgomp'.
1
1 '-muniform-simt'
1 Switch to code generation variant that allows to execute all
1 threads in each warp, while maintaining memory state and side
1 effects as if only one thread in each warp was active outside of
1 OpenMP SIMD regions. All atomic operations and calls to runtime
1 (malloc, free, vprintf) are conditionally executed (iff current
1 lane index equals the master lane index), and the register being
1 assigned is copied via a shuffle instruction from the master lane.
1 Outside of SIMD regions lane 0 is the master; inside, each thread
1 sees itself as the master. Shared memory array 'int __nvptx_uni[]'
1 stores all-zeros or all-ones bitmasks for each warp, indicating
1 current mode (0 outside of SIMD regions). Each thread can
1 bitwise-and the bitmask at position 'tid.y' with current lane index
1 to compute the master lane index.
1
1 '-mgomp'
1 Generate code for use in OpenMP offloading: enables '-msoft-stack'
1 and '-muniform-simt' options, and selects corresponding multilib
1 variant.
1