THRUST

From HPCC Wiki
Revision as of 19:02, 20 October 2022 by James (talk | contribs) (Created page with "Thrust provides a rich collection of data parallel primitives such as scan, sort, and reduce, which can be combined together to implement complex algorithms with concise, readable source code. By describing your computation in terms of these high-level abstractions you provide Thrust with the freedom to select the most efficient implementation automatically. As a result, Thrust can be utilized in rapid prototyping of CUDA applications, where programmer productivity matte...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Thrust provides a rich collection of data parallel primitives such as scan, sort, and reduce, which can be combined together to implement complex algorithms with concise, readable source code. By describing your computation in terms of these high-level abstractions you provide Thrust with the freedom to select the most efficient implementation automatically. As a result, Thrust can be utilized in rapid prototyping of CUDA applications, where programmer productivity matters most, as well as in production, where robustness and absolute performance are crucial.

More detail on the Thrust library is available here [1]. There are a collection of example codes here [2]. The Thrust Manual is available here [3]

Here is a basic C++ example code, which creates and fills a vector on the Host, resizes it, copies it to the Device, modifies it there, and prints out the modified values.

#include <thrust/host_vector.h>
#include <thrust/device_vector.h>

#include <iostream>

int main(void)
{
    // H has storage for 4 integers
    thrust::host_vector<int> H(4);

    // initialize individual elements
    H[0] = 14;
    H[1] = 20;
    H[2] = 38;
    H[3] = 46;
    
    // H.size() returns the size of vector H
    std::cout << "H has size " << H.size() << std::endl;

    // print contents of H
    for(int i = 0; i < H.size(); i++)
        std::cout << "H[" << i << "] = " << H[i] << std::endl;

    // resize H
    H.resize(2);
    
    std::cout << "H now has size " << H.size() << std::endl;

    // Copy host_vector H to device_vector D
    thrust::device_vector<int> D = H;
    
    // elements of D can be modified
    D[0] = 99;
    D[1] = 88;
    
    // print contents of D
    for(int i = 0; i < D.size(); i++)
        std::cout << "D[" << i << "] = " << D[i] << std::endl;

    // H and D are automatically deleted when the function returns
    return 0;
}

Assuming this source file were called 'vectcopy.cu', it can be compiled on PENZIAS:

nvcc -o vectcopy.exe vectcopy.cu

Once compiled, the 'vectorcopy.exe' executable can be run using the following PBS script:

#!/bin/bash
#PBS -q production_gpu
#PBS -N THRUST_vcopy
#PBS -l select=1:ncpus=1:ngpus=1 
#PBS -l place=free
#PBS -V

# Find out which compute node the job is using
echo ""
echo -n "Running job on compute node ... " 
hostname

echo ""
echo "PBS node file is located here ... "  $PBS_NODEFILE
echo -n "PBS node file contains ... "
cat  $PBS_NODEFILE
echo ""

# Change to working directory
cd $PBS_O_WORKDIR

# Running executable on a single, gpu-enabled
# compute node using 1 CPU and 1 GPU.
echo "CUDA job is starting ... "
echo ""

./vectcopy.exe

echo ""
echo "CUDA job is done!"