CUDA SDK Wrapper Library
The CUDA SDK wrapper library provides means for an efficient resource sharing and resource protection on multi-user GPU clusters, such as NCSA's 32-node 128-GPU system. It implements the following functionality:
- Virtualization of the physical GPU devices. The virtual devices visible to the user map to a consistent set of physical devices, which accomplishes "user fencing" on shared systems and prevents users from accidentally trampling one another.
- Rotation of the virtual to physical mapping for each new process that requests a GPU resource. This provides a method for large parallel tasks to use common startup parameters and still use multiple device targets. I.e., when each new process calls for gpu0, the underlying physical device gets shifted allowing for the next process calling for gpu0 to get the next allocated physical device.
- Ensuring NUMA affinity for GPUs on systems that have multiple memory controllers. NUMA affinity can be mapped between CPU cores and GPU devices. This has been shown to have as much as 6-8% improvement in host to device memory bandwidth.
- Memory-scrubbing to wipe the user's GPU memory after use for security from subsequent users.
When installed, the CUDA SDK wrapper library is forced preload and intercepts the device allocation calls to CUDA libraries in order to provide the above-mentioned functionality.