BEAST: Difference between revisions

Latest revision as of 20:39, 20 October 2022

Currently, BEAST is installed on ANDY and PENZIAS at the CUNY HPC Center, BEAST is a serial program, but can also be run in parallel with the help of a companion library (BEAGLE) on systems with Graphics Processing Units (GPUs). PENZIAS supports GPU processing using NVIDIA Kepler GPUs and the BEAGLE 1.0 and 2.0 GPU libraries have been installed there for this purpose. (NOTE: GPU processing on ANDY has been eliminated as the FERMI GPUs their have reached end-of-life.) BEAST can therefore be run either serially or in GPU-accelerated mode on PENZIAS. Benchmarks of BEAST show that GPU acceleration provides significant performance improvement over basic CPU serial operation.

BEAST's user interface program, 'BEAUti', can be run locally on an office workstation or from the head node of ANDY. The latter option assumes that the user has logged in directly to PENZIAS or ANDY via the secure shell with X-Windows tunneling enabled (e.g. ssh -X my.name@andy.csi.cuny.edu). This second approach is only convenient for those on the College of Staten Island campus that can directly address PENZIAS or ANDY. Moreover, if HPC staff find that using the BEAUTi interface is consuming too much CPU time users will be asked to move their pre-processing work to their desktop, which is the preferred location for pre-processing in general. Details on using ssh to login are provided elsewhere in this document. Among other things, BEAUti is used to convert raw '.nex' files in into BEAST XML-based input files. Using ANDY's or PENZIAS's head node for anything compute intensive is forbidden, but these file conversions should be fairly low intensity.

Once a usable BEAST input file has been created (or provided), a SLURM batch script must be written to run the job, either in serial mode or in GPU parallel mode. Below, we show how to run both a serial and GPU-accelerated job with a test input case (testMC3.xml) available in the BEAST examples directory. The input file may be copied into the users working directory from BEAST's installation tree for submission with the SLURM, as follows:

cp /share/apps/beast/2.1.2/examples/testRNA.xml .

To include all required environmental variables and the path to the BEAST executable run the modules load command (the modules utility is discussed in detail above):

module load beast/2.1.2

Next, a SLURM Pro batch script must be created to run your job. The first script below shows a serial run that uses the textRNA.xml XML input file.

#!/bin/bash
#SBATCH --partition production
#SBATCH --job-name BEAST_Serial
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --mem=2880

# Find out name of master execution host (compute node)
echo -n ">>>> SLURM Master compute node is: "
hostname

# You must explicitly change to the working directory in SLURM
cd $SLURM_SUBMIT_DIR

# Point to the execution directory to run
echo ">>>> Begin BEAST Serial Run ..."
beast -m 2880 -seed 666 ./testRNA.xml > beast_ser.out 2>&1
echo ">>>> End   BEAST Serial Run ..."

This script can be dropped into a file (say 'beast_serial.job) on ANDY or PENZIAS and run with:

qsub beast_serial.job

This case should take less fifteen minutes to run and will produce SLURM output and error files beginning with the job name 'BEAST_serial', as well files specific to BEAST. The primary BEAST application output will be written into the user-specified file at the end of the BEAST command line after the greater-than sign. Here it is named 'beast_ser.out'. The expression '2>&1' combines Unix standard output from the program with Unix standard error. Users should always explicitly specify the name of the application's output file in this way to ensure that it is written directly into the user's working directory which has much more disk space than the SLURM spool directory on /var.

Details on the meaning of the SLURM script are covered above in the SLURM section of the Wiki. The most important lines are the '#SBATCH --nodes=1 ntasks=1 mem=2880'. The first instructs SLURM to select 1 resource 'chunk' with 1 processor (core) and 2,880 MBs of memory for the job. The second line instructs SLURM to place this job wherever the least loaded compute node resources are found (freely). The master compute node that SLURM finally selects to run your job will be printed in the SLURM output file by the 'hostname' command.

The HPC Center staff has made two changes to the memory constraints in operation for ALL the BEAST distributed programs (see list below). First, the default minimum memory size has been set up from 64 MBs to 256 MBs. Second, a maximum memory control option has been added to all the programs. It is not required and if NOT used, programs will use the historical default for Linux jobs which is 1024 MBs (i.e. 1 GBs). If this option is used, it must be the FIRST option included on the execution line in your script and should take the form:

-m XXXXX

where the value 'XXXX' is the new user-selected maximum memory setting in MBytes. So, the option used in the script above:

-m 2880

would bump up the memory maximum for the 'beast' program to 2,880 MBytes. Notice that this matches the amount requested per 'chunk' in the SLURM '-l select' line above. You should not ask for more memory that you have requested through SLURM.

You may wish to request more memory than the per cpu (core) defaults on a system. This can be accomplished by asking for more cores per SLURM 'chunk' than you are going to use, but using ALL of the memory SLURM allocates to the multiple cores. For instance, a '-l select' line of:

#SBATCH --nodes=1 ntasks=4 mem=11520

requests 4 cpus (cores) and 11,520 MBs of memory. You could make this request of SLURM and then leave the extra cores unused while asking for the all of the memory allocated with a 'beast' execution line of:

beast -m 11520 -seed 666 ./testRNA.xml

This provides 4 times the single-core quantity of memory for your 'beast' run by allocating but not using the 4 SLURM cores requested the '-l select' statement. The non_GPU version of 'beast' is serial in the sense that it uses only one CPU core to control the GPU. This memory ceiling management option and technique can be used with any of the programs distributed with BEAST. Another such distributed program would be 'treeannotator' for instance.

Remember that ANDY has 2880 MBs of available memory per core (2.880 GBs). Penzias has 3686 MBs of available memory per core.

Note that there are larger number of command line options available to BEAST. This example uses the defaults, other than setting the seed with '-s 666'. All of BEAST's options can be listed as follows:

shaker.krit@andy:~> /share/apps/beast/default/bin/beast -help
Using Java MAXMEM default: 2048 MBs
  Usage: beast [-verbose] [-warnings] [-strict] [-window] [-options] [-working] [-seed] [-prefix <PREFIX>] [-overwrite] [-errors <i>] [-threads <i>] [-java] [-threshold <r>] [-beagle] [-beagle_info] [-beagle_order <order>] [-beagle_instances <i>] [-beagle_CPU] [-beagle_GPU] [-beagle_SSE] [-beagle_cuda] [-beagle_opencl] [-beagle_single] [-beagle_double] [-beagle_scaling <default|dynamic|delayed|always|none>] [-beagle_rescale <i>] [-mpi] [-mc3_chains <i>] [-mc3_delta <r>] [-mc3_temperatures] [-mc3_swap <i>] [-version] [-help] [<input-file-name>]
    -verbose Give verbose XML parsing messages
    -warnings Show warning messages about BEAST XML file
    -strict Fail on non-conforming BEAST XML file
    -window Provide a console window
    -options Display an options dialog
    -working Change working directory to input file's directory
    -seed Specify a random number generator seed
    -prefix Specify a prefix for all output log filenames
    -overwrite Allow overwriting of log files
    -errors Specify maximum number of numerical errors before stopping
    -threads The number of computational threads to use (default auto)
    -java Use Java only, no native implementations
    -threshold Full evaluation test threshold (default 1E-6)
    -beagle Use beagle library if available
    -beagle_info BEAGLE: show information on available resources
    -beagle_order BEAGLE: set order of resource use
    -beagle_instances BEAGLE: divide site patterns amongst instances
    -beagle_CPU BEAGLE: use CPU instance
    -beagle_GPU BEAGLE: use GPU instance if available
    -beagle_SSE BEAGLE: use SSE extensions if available
    -beagle_cuda BEAGLE: use CUDA parallization if available
    -beagle_opencl BEAGLE: use OpenCL parallization if available
    -beagle_single BEAGLE: use single precision if available
    -beagle_double BEAGLE: use double precision if available
    -beagle_scaling BEAGLE: specify scaling scheme to use
    -beagle_rescale BEAGLE: frequency of rescaling (dynamic scaling only)
    -mpi Use MPI rank to label output
    -mc3_chains number of chains
    -mc3_delta temperature increment parameter
    -mc3_temperatures a comma-separated list of the hot chain temperatures
    -mc3_swap frequency at which chains temperatures will be swapped
    -version Print the version and credits and stop
    -help Print this information and stop

  Example: beast test.xml
  Example: beast -window test.xml
  Example: beast -help

The CUNY HPC Center also provides a GPU-accelerated version of BEAST. This version can be run ONLY on PENZIAS which supports GPUs. The same 'beast' module which also loads the BEAGLE GPU library, must be loaded as shown previously. NOTE: ANDY no longer supports GPU computation as its Fermi GPUs have reach end-of-life failure rates.

A SLURM batch script for running the GPU-accelerated version of BEAST follows:

#!/bin/bash
#SBATCH --partition production
#SBATCH --job-name BEAST_gpu
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --gres=gpu:1
#SBATCH --mem=2880
#SBATCH --accel=kepler

# Find out name of master execution host (compute node)
echo -n ">>>> SLURM Master compute node is: "
hostname

# You must explicitly change to the working directory in SLURM
cd $SLURM_SUBMIT_DIR

# Point to the execution directory to run
echo ">>>> Begin BEAST GPU Run ..."
beast_gpu -m 2880 -beagle -beagle_GPU  -beagle_single -seed 666 ./testRNA.xml > beast_gpu.out 2>&1
echo ">>>> End   BEAST GPU Run ..."

This script has several unique features. Next, the '-l select' line includes requests for GPU-related resources. Both 1 processor (npus=1) and 1 GPU (ngpus=1) are requested. You need both. The type of GPU accelerator is specified as an NVIDIA Kepler GPU which has 832 double precision processors (and 2496 single precision processors) running at 0.705 GHz to apply to this workload (accel=kepler). These GPU processing cores, while less powerful individually that a CPU core, in concert are what deliver the performance of the highly parallel MCMC algorithm.

In addition, GPU-specific command-line options are required to invoke the GPU version of the BEAST. Here we have requested that the 'BEAGLE' GPU-library be used and that the computation be run in single- precision (32-bits as opposed to 64-bits) on the GPU which is as much as 3X faster that double-precision on NVIDIA Kepler if you can get by with single-precision.

All the programs that are part of the BEAST 2.1.2 distribution are available, even though we have only discussed 'beast' itself in detail here. The other programs, all of which can be run with similar scripts, include:

addonmanager beast beauti densitree loganalyser  logcombiner  treeannotator

@@ Line 17: / Line 17: @@
 compute intensive is forbidden, but these file conversions should be fairly low intensity.
-Once a usable BEAST input file has been created (or provided), a PBS batch script must be written to run the job, either
+Once a usable BEAST input file has been created (or provided), a SLURM batch script must be written to run the job, either
 in serial mode or in GPU parallel mode.  Below, we show how to run both a serial and GPU-accelerated job with a
 test input case (testMC3.xml) available in the BEAST examples directory. The input file may be copied into the
@@ Line 33: / Line 33: @@
 </pre>
-Next, a PBS Pro batch script must be created to run your job.  The first script below shows a serial run
+Next, a SLURM Pro batch script must be created to run your job.  The first script below shows a serial run
 that uses the textRNA.xml XML input file.
@@ Line 63: / Line 63: @@
 </pre>
-This case should take less fifteen minutes to run and will produce PBS output and error files beginning
+This case should take less fifteen minutes to run and will produce SLURM output and error files beginning
 with the job name 'BEAST_serial', as well files specific to BEAST.  The primary BEAST application output
 will be written into the user-specified file at the end of the BEAST command line after the greater-than sign.
@@ Line 71: / Line 71: @@
 space than the SLURM spool directory on /var.
-Details on the meaning of the PBS script are covered above in the PBS section of the Wiki.  The most important
+Details on the meaning of the SLURM script are covered above in the SLURM section of the Wiki.  The most important
 lines are the '#SBATCH --nodes=1 ntasks=1 mem=2880'.  The first instructs
-PBS to select 1 resource 'chunk' with 1 processor (core) and 2,880 MBs of memory for the job.  The second line
+SLURM to select 1 resource 'chunk' with 1 processor (core) and 2,880 MBs of memory for the job.  The second line
 instructs SLURM to place this job wherever the least loaded compute node resources are found (freely).  The master
 compute node that SLURM finally selects to run your job will be printed in the SLURM output file by the 'hostname'

BEAST: Difference between revisions

Latest revision as of 20:39, 20 October 2022

Navigation menu

Search