OCTOPUS

From HPCC Wiki
Revision as of 20:13, 20 October 2022 by James (talk | contribs) (Created page with "Selecting the right queue based on system activity will ensure that your job starts as soon as possible. Complete information about the octopus package can be found at its homepage, http://www.tddft.org/programs/octopus. The on-line user manual is available at http://www.tddft.org/programs/octopus/wiki/index.php/Manual. The MPI parallel version Octopus has been installed on PENZIAS and ANDY (the older release is also installed on ANDY) with all its associated librarie...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Selecting the right queue based on system activity will ensure that your job starts as soon as possible. Complete information about the octopus package can be found at its homepage, http://www.tddft.org/programs/octopus. The on-line user manual is available at http://www.tddft.org/programs/octopus/wiki/index.php/Manual.

The MPI parallel version Octopus has been installed on PENZIAS and ANDY (the older release is also installed on ANDY) with all its associated libraries (metis, netcdf, sparsekit, etsfio, etc.). It was built with an Intel compiled version of the OpenMPI 1.6.4 and has passed all its internal test cases.

A sample Octopus input file (required to have the name 'inp') is provided here:

# Sample data file:
#
# This is a simple data file. It will complete a gas phase ground-state
# calculation for a neon atom. Please consult the Octopus manual for a
# brief explanation of each section and the variables.
#
FromScratch = yes

CalculationMode = gs

ParallelizationStrategy = par_domains

Dimensions = 1
Spacing = 0.2
Radius = 50.0
ExtraStates = 1

TheoryLevel = independent_particles

%Species
  "Neon1D" | 1 | spec_user_defined | 10 | "-10/sqrt(0.25 + x^2)"
%

%Coordinates
  "Neon1D" | 0
%

ConvRelDens = 1e-7

Octopus offers its users two distinct and combinable strategies to parallelize its runs. The first and default is to parallelize by domain decomposition of the mesh (METIS is used). In the input deck above, this method is chosen explicitly (ParallelizationStrategy = par_domains). The second is to compute the entire doman on each processor, but to do so for some number of distinct temporal states (ParallelizationStrategy = par_states). Users wishing to control the details of Octopus when run in parallel are advised to consult the advanced options section of the manual at http://www.tddft.org/programs/octopus/wiki/index.php/Manual:Advanced_ways_of_running_Octopus.

A sample PBS Pro batch job submission script that will run on PENZIAS for the above input file is show here:

#!/bin/csh
#SBATCH --partition production
#SBATCH --job-name neon_gstate
# The next statements select 8 chunks of 1 core and
# 3840mb of memory each (the pro-rated limit per
# core on PENZIAS), and allow SLURM to freely place
# those resource chunks on the least loaded nodes.
#SBATCH --nodes=8
#SBATCH --ntasks=1
#SBATCH --mem=3840

# Check to see if the Octopus module is loaded.
(which octopus_mpi > /dev/null) >& /dev/null
if ($status) then
echo ""
echo "Please run: 'module load octopus'"
echo "before submitting this script. Exiting ... "
echo ""
exit
else
echo ""
endif

# Find out name of master execution host (compute node)
echo -n ">>>> SLURM Master compute node is: "
hostname
echo ""

# Must explicitly change to your working directory under SLURM
cd $SLURM_SUBMIT_DIR

# Set up OCTOPUS environment, working, and temporary directory

setenv OCTOPUS_ROOT /share/apps/octopus/default

setenv OCT_WorkDir \'$SLURM_SUBMIT_DIR\'

setenv MY_SCRDIR `whoami;date '+%m.%d.%y_%H:%M:%S'`
setenv MY_SCRDIR `echo $MY_SCRDIR | sed -e 's; ;_;'`

setenv SCRATCH_DIR  /state/partition1/oct4.1_scr/${MY_SCRDIR}_$$
mkdir -p $SCRATCH_DIR
setenv OCT_TmpDir \'/state/partition1/oct4.1_scr/${MY_SCRDIR}_$$\'

echo "The scratch directory for this run is: $OCT_TmpDir"

# Start OCTOPUS job

echo ""
echo ">>>> Begin OCTOPUS MPI Parallel Run ..."
mpirun -np 8 octopus_mpi > neon_gstate.out
echo ">>>> End   OCTOPUS MPI Parallel Run ..."
echo ""

# Clean up scratch files by default

/bin/rm -r $SCRATCH_DIR

echo 'Your Octopus job is done!'


This script requests 8 resource 'chunks' each with 1 processor. The memory selected on the '-l select' line is sized to PENZIAS's pro-rated maximum memory per core. Please consult the sections on the PBS Pro Batch scheduling system below for information on how to modify this sample deck for different processor counts. The rest of the script describes its action in comments. Before this script will run the user must load the Octopus module with:

module load Octopus

which by default default loads Octopus version 4.1.1. This script would need to be modified as follows to run on ANDY:

< # 3840mb of memory each (the pro-rated limit per
---
> # 2880mb of memory each (the pro-rated limit per
8c8
< #SBATCH --nodes=8
  #SBATCH --ntasks=1
  #SBATCH --mem=3840
---
> #SBATCH --nodes=8 
  #SBATCH --ntasks=1
  #SBATCH --mem=2880
41c41
< setenv SCRATCH_DIR  /state/partition1/oct4.1_scr/${MY_SCRDIR}_$$
---
> setenv SCRATCH_DIR  /scratch/<user_id>/octopus/oct4.1_scr/${MY_SCRDIR}_$$
43c43
< setenv OCT_TmpDir \'/state/partition1/oct4.1_scr/${MY_SCRDIR}_$$\'
---
> setenv OCT_TmpDir \'/scratch/<user_id>/octopus/oct4.1_scr/${MY_SCRDIR}_$$\'

Users should become aware of the scaling properties of their work by taking note of the run times at various processor counts. When doubling processor count improves SCF cycle time only by a modest percentage then further increases in processor counts should be avoided. The ANDY system has two distinct interconnects. One is a DDR InfiniBand network that delivers 20 Gbits per second of performance and the other is a QDR InfiniBand network that delviers 40 Gbits per second. Either will serve Octopus users well, but the QDR network should provide somewhat better scaling. PENZIAS has a still faster FDR InfiniBand network and should provide the best scaling. The HPC is interested in the scaling you observe on its systems and reports are welcome.

In the example above, the 'production' queue has been requested which works on both ANDY (DDR InfiniBand) and PENZIAS (FDR InfiniBand), but by adding a terminating '_qdr' one can select the QDR interconnect on ANDY.