libgomp: OpenACC Library Interoperability

1 
1 8 OpenACC Library Interoperability
1 **********************************
1 
1 8.1 Introduction
1 ================
1 
1 The OpenACC library uses the CUDA Driver API, and may interact with
1 programs that use the Runtime library directly, or another library based
1 on the Runtime library, e.g., CUBLAS(1). This chapter describes the use
1 cases and what changes are required in order to use both the OpenACC
1 library and the CUBLAS and Runtime libraries within a program.
1 
1 8.2 First invocation: NVIDIA CUBLAS library API
1 ===============================================
1 
1 In this first use case (see below), a function in the CUBLAS library is
1 called prior to any of the functions in the OpenACC library.  More
1 specifically, the function 'cublasCreate()'.
1 
1    When invoked, the function initializes the library and allocates the
1 hardware resources on the host and the device on behalf of the caller.
1 Once the initialization and allocation has completed, a handle is
1 returned to the caller.  The OpenACC library also requires
1 initialization and allocation of hardware resources.  Since the CUBLAS
1 library has already allocated the hardware resources for the device, all
1 that is left to do is to initialize the OpenACC library and acquire the
1 hardware resources on the host.
1 
1    Prior to calling the OpenACC function that initializes the library
1 and allocate the host hardware resources, you need to acquire the device
1 number that was allocated during the call to 'cublasCreate()'.  The
1 invoking of the runtime library function 'cudaGetDevice()' accomplishes
1 this.  Once acquired, the device number is passed along with the device
1 type as parameters to the OpenACC library function
1 'acc_set_device_num()'.
1 
1    Once the call to 'acc_set_device_num()' has completed, the OpenACC
1 library uses the context that was created during the call to
1 'cublasCreate()'.  In other words, both libraries will be sharing the
1 same context.
1 
1          /* Create the handle */
1          s = cublasCreate(&h);
1          if (s != CUBLAS_STATUS_SUCCESS)
1          {
1              fprintf(stderr, "cublasCreate failed %d\n", s);
1              exit(EXIT_FAILURE);
1          }
1 
1          /* Get the device number */
1          e = cudaGetDevice(&dev);
1          if (e != cudaSuccess)
1          {
1              fprintf(stderr, "cudaGetDevice failed %d\n", e);
1              exit(EXIT_FAILURE);
1          }
1 
1          /* Initialize OpenACC library and use device 'dev' */
1          acc_set_device_num(dev, acc_device_nvidia);
1 
1                               Use Case 1
1 
1 8.3 First invocation: OpenACC library API
1 =========================================
1 
1 In this second use case (see below), a function in the OpenACC library
1 is called prior to any of the functions in the CUBLAS library.  More
1 specificially, the function 'acc_set_device_num()'.
1 
1    In the use case presented here, the function 'acc_set_device_num()'
1 is used to both initialize the OpenACC library and allocate the hardware
1 resources on the host and the device.  In the call to the function, the
1 call parameters specify which device to use and what device type to use,
1 i.e., 'acc_device_nvidia'.  It should be noted that this is but one
1 method to initialize the OpenACC library and allocate the appropriate
1 hardware resources.  Other methods are available through the use of
1 environment variables and these will be discussed in the next section.
1 
1    Once the call to 'acc_set_device_num()' has completed, other OpenACC
1 functions can be called as seen with multiple calls being made to
1 'acc_copyin()'.  In addition, calls can be made to functions in the
1 CUBLAS library.  In the use case a call to 'cublasCreate()' is made
1 subsequent to the calls to 'acc_copyin()'.  As seen in the previous use
1 case, a call to 'cublasCreate()' initializes the CUBLAS library and
1 allocates the hardware resources on the host and the device.  However,
1 since the device has already been allocated, 'cublasCreate()' will only
1 initialize the CUBLAS library and allocate the appropriate hardware
1 resources on the host.  The context that was created as part of the
1 OpenACC initialization is shared with the CUBLAS library, similarly to
1 the first use case.
1 
1          dev = 0;
1 
1          acc_set_device_num(dev, acc_device_nvidia);
1 
1          /* Copy the first set to the device */
1          d_X = acc_copyin(&h_X[0], N * sizeof (float));
1          if (d_X == NULL)
1          {
1              fprintf(stderr, "copyin error h_X\n");
1              exit(EXIT_FAILURE);
1          }
1 
1          /* Copy the second set to the device */
1          d_Y = acc_copyin(&h_Y1[0], N * sizeof (float));
1          if (d_Y == NULL)
1          {
1              fprintf(stderr, "copyin error h_Y1\n");
1              exit(EXIT_FAILURE);
1          }
1 
1          /* Create the handle */
1          s = cublasCreate(&h);
1          if (s != CUBLAS_STATUS_SUCCESS)
1          {
1              fprintf(stderr, "cublasCreate failed %d\n", s);
1              exit(EXIT_FAILURE);
1          }
1 
1          /* Perform saxpy using CUBLAS library function */
1          s = cublasSaxpy(h, N, &alpha, d_X, 1, d_Y, 1);
1          if (s != CUBLAS_STATUS_SUCCESS)
1          {
1              fprintf(stderr, "cublasSaxpy failed %d\n", s);
1              exit(EXIT_FAILURE);
1          }
1 
1          /* Copy the results from the device */
1          acc_memcpy_from_device(&h_Y1[0], d_Y, N * sizeof (float));
1 
1                               Use Case 2
1 
1 8.4 OpenACC library and environment variables
1 =============================================
1 
1 There are two environment variables associated with the OpenACC library
1 that may be used to control the device type and device number:
1 'ACC_DEVICE_TYPE' and 'ACC_DEVICE_NUM', respecively.  These two
1 environement variables can be used as an alternative to calling
1 'acc_set_device_num()'.  As seen in the second use case, the device type
1 and device number were specified using 'acc_set_device_num()'.  If
1 however, the aforementioned environment variables were set, then the
1 call to 'acc_set_device_num()' would not be required.
1 
1    The use of the environment variables is only relevant when an OpenACC
1 function is called prior to a call to 'cudaCreate()'.  If 'cudaCreate()'
1 is called prior to a call to an OpenACC function, then you must call
1 'acc_set_device_num()'(2)
1 
1    ---------- Footnotes ----------
1 
1    (1) See section 2.26, "Interactions with the CUDA Driver API" in
1 "CUDA Runtime API", Version 5.5, and section 2.27, "VDPAU
1 Interoperability", in "CUDA Driver API", TRM-06703-001, Version 5.5, for
1 additional information on library interoperability.
1 
1    (2) More complete information about 'ACC_DEVICE_TYPE' and
1 'ACC_DEVICE_NUM' can be found in sections 4.1 and 4.2 of the OpenACC
1 (https://www.openacc.org) Application Programming Interfaceā€¯, Version
1 2.0.
1