libgomp: CUDA Streams Usage

1 
1 7 CUDA Streams Usage
1 ********************
1 
1 This applies to the 'nvptx' plugin only.
1 
1    The library provides elements that perform asynchronous movement of
1 data and asynchronous operation of computing constructs.  This
1 asynchronous functionality is implemented by making use of CUDA
1 streams(1).
1 
1    The primary means by that the asychronous functionality is accessed
1 is through the use of those OpenACC directives which make use of the
1 'async' and 'wait' clauses.  When the 'async' clause is first used with
1 a directive, it creates a CUDA stream.  If an 'async-argument' is used
1 with the 'async' clause, then the stream is associated with the
1 specified 'async-argument'.
1 
1    Following the creation of an association between a CUDA stream and
1 the 'async-argument' of an 'async' clause, both the 'wait' clause and
1 the 'wait' directive can be used.  When either the clause or directive
1 is used after stream creation, it creates a rendezvous point whereby
1 execution waits until all operations associated with the
1 'async-argument', that is, stream, have completed.
1 
1    Normally, the management of the streams that are created as a result
1 of using the 'async' clause, is done without any intervention by the
1 caller.  This implies the association between the 'async-argument' and
1 the CUDA stream will be maintained for the lifetime of the program.
1 However, this association can be changed through the use of the library
1 function 'acc_set_cuda_stream'.  When the function 'acc_set_cuda_stream'
1 is called, the CUDA stream that was originally associated with the
1 'async' clause will be destroyed.  Caution should be taken when changing
1 the association as subsequent references to the 'async-argument' refer
1 to a different CUDA stream.
1 
1    ---------- Footnotes ----------
1 
1    (1) See "Stream Management" in "CUDA Driver API", TRM-06703-001,
1 Version 5.5, for additional information
1