CUFFLINKS
At the CUNY HPC Center CUFFLINKS is installed on ANDY. CUFFLINKS is a parallel threaded code (pthreads) that takes its input from a simple text file provided on the command line. Below is an example SLURM script that will run the messenger RNA test case provided at the website here [1].
To include all required environmental variables and the path to the CUFFLINKS executable run the modules load command (the modules utility is discussed in detail above):
module load cufflinks
Running 'cufflinks' from the interactive prompt without any options will provide a brief description of the form of the command-line arguments and options. Here is SLURM batch script that runs this test case in serial mode:
#!/bin/bash #SBATCH --partition production #SBATCH --job-name CLINKS2_Serial #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --mem=2880 # Find out name of master execution host (compute node) echo -n ">>>> SLURM Master compute node is: " hostname # You must explicitly change to the working directory in SLURM cd $SLURM_SUBMIT_DIR # Invoke the executable in command-line mode to run echo ">>>> Begin CLINKS Serial Run ..." cufflinks ./mRNA_test.sam > mRNA_test.out 2>&1 echo ">>>> End CLINKS Serial Run ..."
This script can be dropped in to a file (say cufflinks.job) and started with the command:
qsub cufflinks.job
Running the mRNA test case should take less than 1 minute and will produce SLURM output and error files beginning with the job name 'CLINKS_serial'. The primary CUFFLINKS application results will be written into the user-specified file at the end of the CUFFLINKS command line after the greater-than sign. Here it is named 'mRNA_test.out'. The expression '2>&1' combines Unix standard output from the program with Unix standard error. Users should always explicitly specify the name of the application's output file in this way to ensure that it is written directly into the user's working directory which has much more disk space than the SLURM spool directory on /var.
Details on the meaning of the SLURM script are covered below in the SLURM section. The most important lines are the '#SLURM --nodes=1 ntasks=1 mem=2880'. The first instructs SLURM to select 1 resource 'chunk' with 1 processor (core) and 2,880 MBs of memory in it for the job. The second instructs SLURM to place this job wherever the least used resources are found (freely). The master compute node that SLURM finally selects to run your job will be printed in the SLURM output file by the 'hostname' command.
To run CUFFLINKS in parallel-threads mode several changes to the script are required. Here is a modified script that shows how to run CUFFLINKS using two threads. ANDY has as many as 8 physical compute cores per compute node and therefore as many as 8 threads might be chosen, but the larger the number of cores-threads requested the longer the job may wait to start as SLURM looks for a compute node with the free resources requested.
#!/bin/bash #SBATCH --partition production #SBATCH --job-name CLINKS_threads #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --mem=5760 # Find out name of master execution host (compute node) echo -n ">>>> SLURM Master compute node is: " hostname # You must explicitly change to the working directory in SLURM cd $SLURM_SUBMIT_DIR # Invoke the executable in command-line mode to run echo ">>>> Begin CLINKS Threaded Run ..." cufflinks -p 2 ./clinks_ptest.sam > clinks_ptest.out 2>&1 echo ">>>> End CLINKS Threaded Run ..."
Notice the difference in the '-l select' line where the resource 'chunk' now includes 2 cores (ncpus=2) and requests twice as much memory as before. Also, notice that the CUFFLINKS command-line now includes the '-p 2' option to run the code with 2 threads working in parallel. Perfectly or 'embarrassingly' parallel workloads can run close to 2, 4, or more times as fast as the same workload in serial mode depending on the number of threads requested, but workloads cannot be counted on to be perfectly parallel.
The speed ups that you observe will typically be less than perfect and diminish as you ask for more cores-threads. Larger jobs will typically scale more efficiently as you add cores-threads, but users should take note of the performance gains that they see as cores-threads are added and select a core-thread count the provides efficient scaling and avoids diminishing returns.