NWChem

From HPCC Wiki
Jump to navigation Jump to search

NWChem has been developed by the Molecular Sciences Software group at the Department of Energy's EMSL. The software is available on PENZIAS and ANDY. On both servers the module for NWChem must be loaded "prior" to run a job with the command:

module load nwchem

The above command will load the "default" version of NWChem.

Performance Considerations

The NWChem is GPU enabled meaning it supports GPU so the package can be used both in CPU or in "hybrid" CPU-GPU mode. ANDY lacks GPU so the CPU only mode is possible there. On PENZIAS however, NWChem can be used on both CPU or CPU-GPU mode. On other hand, because ANDY has slower interconnect is recommend to use 1 core per chunk when run NWChem jobs on ANDY. The interconnect at PENZIAS allows the full utilization of the nodes i.e. the maximum number of cores per chunk is 12. However users must consult the wiki pages http://wiki.csi.cuny.edu/cunyhpc/index.php/Running_jobs for discussion about how to run a job on a cluster. Keeping in mind the differences in interconnect and lack of GPU on ANDY it is strongly recommend that users utilize NWChem on PENZIAS in CPU-GPU mode.

NWChem input files

The ability to run and performance of a run depends very much on a proper settings of start up directives in NWChem input files. For instance one particular directive - memory - allows the user to specify the amount of memory PER PROCESSOR CORE that NWChem can use for the job in. If this directive is omitted in input file, the NWChem will use the default setting which currently is only 400MB. In NWChem there are three distinct regions of memory: stack, heap, and global. On PENZIAS (and on all distributed memory systems) all 3 types of memory compete for the same pool i.e. the memory has total size of stack+heap+global. The default partition is 25% heap, 25% stack, and 50% global, thus 4096mb will be partitioned as 1024 MB for stack, 1024 MB for heap and 2048 MB for global. In the following example first 2 lines are equivalent and they will allocate total available per core memory on PENZIAS node and will use default partitioning. The third line does the same but it will change the partition by allocating 75% of the total memory as a global one.

memory 3686 mb
memory total 3686 mb
memory total 3686 global 2764 mb

NWChem recognizes the following memory units:

real 
double
integer
byte
kb (kilobyte)
mb (megabyte)
mw (megawords - 8 bytes)

For details on the content and structure of each section of the NWChem input deck users should consult the NWChem Users Manual at http://www.emsl.pnl.gov/capabilities/computing/nwchem/docs/usermanual.pdf. A sample NWChem input file which does SCF calculation on water with 6-31g* basis set is shown here:

echo
start water2
title "an example simple water calculation"

# The memory options are system specific see below 

memory total   .... mb global .... mb

geometry units au
 O 0       0              0
 H 0       1.430    -1.107
 H 0     -1.430    -1.107
end

basis
  O library 6-31g*
  H library 6-31g*
end

task scf gradient

If the run has to be done on parallel and on PENZIAS, the standard (and maximum) per-core quantity of memory available on a node along with a portion of it used by NWChem Global Arrays computing model should not exceed 3686mb per core. Thus the memory line for parallel runs on PENZIAS should look like:

memory total 3686 mb global 2764 mb

on ANDY it would be 'memory total 2880 mb global 2160 mb'. Single core runs on PENZIAS require different settings.

Using NWChem in CPU only mode

A SLURM batch submit script to run above example on 16 processors (cores) on PENZIAS. Note that on PENZIAS the main que is called production. In this script the maximal values for memory (total and global) are added to above example automatically. However these values may not be optimal for a particular molecular system, thus users should compare the obtained performance among several memory set ups in order to find the optimal values for a particular molecular system.

 
#!/bin/csh
#SBATCH --partition production
#SBATCH --job-name watertest
#SBATCH --nodes=16
#SBATCH --ntasks=1

echo "This job's process 0 host is: " `hostname`; echo ""

# Must explicitly change to your working directory under SLURM

cd $SLURM_SUBMIT_DIR

# Set up NWCHEM environment, permanent, and scratch directory
# Do not delete/modify the following 4 lines.

setenv NWCHEM_ROOT /share/apps/nwchem/6.5
setenv PERMANENT_DIR $SLURM_SUBMIT_DIR
setenv MY_SCRDIR `whoami;date '+%m.%d.%y_%H:%M:%S'`
setenv MY_SCRDIR `echo $MY_SCRDIR | sed -e 's; ;_;'`

#Edit following 2 lines to match your input and output file.

set INPUT_FILE="h2o.nw"
set  OUTPUT_FILE="h2o.out"

# Edit following 2 lines in order to adjust the optimal memory for your system
set MEMORY_GLBL = 2764
set MEMORY_TTL = 3686

# Do not edit the following 7 lines
set RUN_FILE = "runner.nw"
cp ${INPUT_FILE} ${RUN_FILE}
setenv SCRATCH_DIR /state/partition1/nw6.3_scr/${MY_SCRDIR}_$$
mkdir -p $SCRATCH_DIR
printf '%s\t%s\n' 'scratch_dir'   $SCRATCH_DIR >> runner.nw
printf '%s\t%s\n' 'permanent_dir' $PERMANENT_DIR >> runner.nw
printf '%s%s%s%s%s%s\n' 'memory   ' 'total   ' $MEMORY_TTL '  global   ' $MEMORY_GLBL '  mb' >> runner.nw

# Starts NWCHEM job. Adjust '''only''' the number of CPU requested. 
# Do not remove/modify $RUN_FILE or $OUTPUT_FILE

mpirun -np 16 nwchem ${RUN_FILE} > ${OUTPUT_FILE}

# Clean up scratch and temporary files. Do  not remove following 2 lines. 

/bin/rm -r $SCRATCH_DIR
/bin/rm -r $RUN_FILE

echo 'Job is done!'

Please consult the sections on the SLURM Pro Batch scheduling system for information on how to modify this sample deck for different processor counts and about the meaning of each of the SLURM script lines.

On PENZIAS in order to utilize all available memory and depend on concrete studied system sometimes is better to use the below SLURM construction.

#SBATCH --nodes=8
#SBATCH --ntasks=2
#SBATCH --mem=3686

In order to run on 128 cores on PENZIAS the line should look like:

#SBATCH --nodes=16
#SBATCH --ntasks=8
#SBATCH --mem=3686

Please do not forget to adjust memory requirements in above SLURM script file according to your particular molecular system requirements. In particular that means to adjust values of MEMORY_TTL and MEMORY_GLBL variables. Remember that performance depends on how memory is allocated as well so try few allocation schemes before pick the best one for your job. For instance for some small molecular systems sometimes it is possible to gain performance by reducing memory and keep job in a single node.

The optimal number of cores also depend on studied molecular system. There is no such thing like "one fits all". Users can find the optimal number of processor cores by starting with some small number i.e. 4 or 8 and double the number of cores for each consecutive run. Repeat the process till no significant improvement in performance is recorded in 2 follow up runs. Further increase of cpu cores should be avoided.

Final Remarks

To get their NWChem jobs to run each user will need to copy or create a symbolic link to a ".nwchemrc" file in their $HOME directory to the site specific "default.nwchemrc" file located in:

/share/apps/nwchem/default/data/

Symbolic link can be created with a command:

ln -s /share/apps/nwchem/default/data/default.nwchemrc $HOME/.nwchemrc

Users may also check the Q/A section of this document for common mistakes and their solution.