gcc: Nvidia PTX Options

1 
1 3.18.33 Nvidia PTX Options
1 --------------------------
1 
1 These options are defined for Nvidia PTX:
1 
1 '-m32'
1 '-m64'
1      Generate code for 32-bit or 64-bit ABI.
1 
1 '-mmainkernel'
1      Link in code for a __main kernel.  This is for stand-alone instead
1      of offloading execution.
1 
1 '-moptimize'
1      Apply partitioned execution optimizations.  This is the default
1      when any level of optimization is selected.
1 
1 '-msoft-stack'
1      Generate code that does not use '.local' memory directly for stack
1      storage.  Instead, a per-warp stack pointer is maintained
1      explicitly.  This enables variable-length stack allocation (with
1      variable-length arrays or 'alloca'), and when global memory is used
1      for underlying storage, makes it possible to access automatic
1      variables from other threads, or with atomic instructions.  This
1      code generation variant is used for OpenMP offloading, but the
1      option is exposed on its own for the purpose of testing the
1      compiler; to generate code suitable for linking into programs using
1      OpenMP offloading, use option '-mgomp'.
1 
1 '-muniform-simt'
1      Switch to code generation variant that allows to execute all
1      threads in each warp, while maintaining memory state and side
1      effects as if only one thread in each warp was active outside of
1      OpenMP SIMD regions.  All atomic operations and calls to runtime
1      (malloc, free, vprintf) are conditionally executed (iff current
1      lane index equals the master lane index), and the register being
1      assigned is copied via a shuffle instruction from the master lane.
1      Outside of SIMD regions lane 0 is the master; inside, each thread
1      sees itself as the master.  Shared memory array 'int __nvptx_uni[]'
1      stores all-zeros or all-ones bitmasks for each warp, indicating
1      current mode (0 outside of SIMD regions).  Each thread can
1      bitwise-and the bitmask at position 'tid.y' with current lane index
1      to compute the master lane index.
1 
1 '-mgomp'
1      Generate code for use in OpenMP offloading: enables '-msoft-stack'
1      and '-muniform-simt' options, and selects corresponding multilib
1      variant.
1