THRUST
Thrust provides a rich collection of data parallel primitives such as scan, sort, and reduce, which can be combined together to implement complex algorithms with concise, readable source code. By describing your computation in terms of these high-level abstractions you provide Thrust with the freedom to select the most efficient implementation automatically. As a result, Thrust can be utilized in rapid prototyping of CUDA applications, where programmer productivity matters most, as well as in production, where robustness and absolute performance are crucial.
More detail on the Thrust library is available here [1]. There are a collection of example codes here [2]. The Thrust Manual is available here [3]
Here is a basic C++ example code, which creates and fills a vector on the Host, resizes it, copies it to the Device, modifies it there, and prints out the modified values.
#include <thrust/host_vector.h> #include <thrust/device_vector.h> #include <iostream> int main(void) { // H has storage for 4 integers thrust::host_vector<int> H(4); // initialize individual elements H[0] = 14; H[1] = 20; H[2] = 38; H[3] = 46; // H.size() returns the size of vector H std::cout << "H has size " << H.size() << std::endl; // print contents of H for(int i = 0; i < H.size(); i++) std::cout << "H[" << i << "] = " << H[i] << std::endl; // resize H H.resize(2); std::cout << "H now has size " << H.size() << std::endl; // Copy host_vector H to device_vector D thrust::device_vector<int> D = H; // elements of D can be modified D[0] = 99; D[1] = 88; // print contents of D for(int i = 0; i < D.size(); i++) std::cout << "D[" << i << "] = " << D[i] << std::endl; // H and D are automatically deleted when the function returns return 0; }
Assuming this source file were called 'vectcopy.cu', it can be compiled on PENZIAS:
nvcc -o vectcopy.exe vectcopy.cu
Once compiled, the 'vectorcopy.exe' executable can be run using the following PBS script:
#!/bin/bash #PBS -q production_gpu #PBS -N THRUST_vcopy #PBS -l select=1:ncpus=1:ngpus=1 #PBS -l place=free #PBS -V # Find out which compute node the job is using echo "" echo -n "Running job on compute node ... " hostname echo "" echo "PBS node file is located here ... " $PBS_NODEFILE echo -n "PBS node file contains ... " cat $PBS_NODEFILE echo "" # Change to working directory cd $PBS_O_WORKDIR # Running executable on a single, gpu-enabled # compute node using 1 CPU and 1 GPU. echo "CUDA job is starting ... " echo "" ./vectcopy.exe echo "" echo "CUDA job is done!"