ADCIRC

From HPCC Wiki
Jump to navigation Jump to search

The CUNY HPC Center has installed version 50.79 on SALK (the Cray) and ANDY (the SGI) for general academic use. ADCIRC can be run in serial or MPI-parallel mode on either system. ADCIRC has demonstrated good scaling properties up to 512 cores on SALK and 64 cores on ANDY. A step-by-step walk through of running an ADCIRC test case in both serial and parallel mode follows.

Serial Execution

Create a directory where all the files needed to run the serial ADCIRC job will be kept.

salk$ mkdir test_sadcirc
salk$ cd test_sadcirc

Copy the Shinnecok Inlet example from ADCIRC installation tree and unzip it.

salk$ cp /share/apps/adcirc/default/testcase/serial_shinnecock_inlet.zip ./
salk$ unzip ./serial_shinnecock_inlet.zip 
Archive:  ./serial_shinnecock_inlet.zip
  inflating: serial_shinnecock_inlet/fort.14  
  inflating: serial_shinnecock_inlet/fort.15  
  inflating: serial_shinnecock_inlet/fort.16  
  inflating: serial_shinnecock_inlet/fort.63  
  inflating: serial_shinnecock_inlet/fort.64  

Change into the unpacked subdirectory.

salk$ cd serial_shinnecock_inlet/

There you should find the following files:

salk$ ls
fort.14  fort.15  fort.16  fort.63  fort.64

Next, create a SLURM script with the following lines in it to be used to submit the serial ADCIRC job to the Cray (SALK) SLURM queues. Note that on SALK running a serial job requires allocating (and wasting most of) 16 processors because fractional compute nodes cannot be allocated on SALK.

#!/bin/bash
#SBATCH --partition production
#SBATCH --job-name SADCIRC.test
#SBATCH --nodes=16
#SBATCH --ntasks=1
#SBATCH --mem=2048
#SBATCH --o sadcirc.out

# Find out name of master execution host (compute node)
echo ""
echo -n ">>>> SBATCH Master compute node is: "
hostname

# You must explicitly change to the working directory in SBATCH
cd $SLURM_SUBMIT_DIR

echo ">>>> Begin ADCRIC Serial Run ..."
aprun -n 1 /share/apps/adcirc/default/bin/adcirc
echo ">>>> End   ADCRIC Serial Run ..."

And finally to submit the serial job to the SLURM queue enter:

salk$ qsub sadcirc.job

Parallel Execution

The steps required to run ADCIRC in parallel include some additional mesh partitioning and decomposition steps based on the number processors planned for the job. As before, create a directory where all the files needed for the job will be kept:

salk$ mkdir test_padcirc
salk$ cd test_padcirc

Again, copy the Shinnecok Inlet example from ADCIRC installation tree and unzip it. The starting point for the serial and parallel tests is the same, but for the parallel case the serial data set used above is partitioned and decomposed for the parallel run.

salk$ cp /share/apps/adcirc/default/testcase/serial_shinnecock_inlet.zip ./
salk$ unzip ./serial_shinnecock_inlet.zip 
Archive:  ./serial_shinnecock_inlet.zip
  inflating: serial_shinnecock_inlet/fort.14  
  inflating: serial_shinnecock_inlet/fort.15  
  inflating: serial_shinnecock_inlet/fort.16  
  inflating: serial_shinnecock_inlet/fort.63  
  inflating: serial_shinnecock_inlet/fort.64  

Rename and change into directory you just unpacked:

salk$ mv  serial_shinnecock_inlet  parallel_shinnecock_inlet
salk$ cd parallel_shinnecock_inlet/

Now we need to run the ADCIRC preparation program 'adcprep' to partition the serial domain and decompose problem:

salk$ /share/apps/adcirc/default/bin/adcprep 

When prompted, enter 8 for number of processors to be used in our parallel example here:


  *****************************************
  ADCPREP Fortran90 Version 2.3  10/18/2006
  Serial version of ADCIRC Pre-processor   
  *****************************************
  
 Input number of processors for parallel ADCIRC run:
8

Next, enter 1 to complete partitioning the domain for 8 processors using METIS:


 #-------------------------------------------------------
   Preparing input files for subdomains.
   Select number or action:
     1. partmesh
      - partition mesh using metis ( perform this first)
 
     2. prepall
      - Full pre-process using default names (i.e., fort.14)

      ...

 #-------------------------------------------------------

 calling: prepinput

 use_default =  F
 partition =  T
 prep_all  =  F
 prep_15   =  F
 prep_13   =  F
 hot_local  =  F
 hot_global  =  F

Next, provide that name of the unpartitioned file unzipped from the serial test case, fort.14:

Enter the name of the ADCIRC UNIT 14 (Grid) file:
fort.14

This will generate some additional output to your terminal and complete the mesh partition step.

You must then run 'adcprep' again to decompose the problem. When prompted enter 8, number of processors as before, but this time followed by a 2 to decompose the problem. When this preparation step completes you will find the following files and directories in your working directory:

salk$ ls
fort.15     fort.63  fort.80          partmesh.txt  PE0001  PE0003  PE0005  PE0007
fort.14     fort.16  fort.64     metis_graph.txt  PE0000   PE0002  PE0004  PE0006

The 8 subdirectories created in the second 'adcprep' run contain the partitioned and decomposed problem that each MPI processor (8 in this case) will work on.

Copy the parallel ADCIRC binary to the working directory.

# cp /share/apps/adcirc/default/bin/padcirc ./

At this point you'll have all the files needed to run the parallel job. The files and directories created and required for this 8 core parallel run are shown here:

# ls 
adc  fort.14  fort.15  fort.16  fort.80  metis_graph.txt  partmesh.txt 
PE0000/  PE0001/  PE0002/  PE0003/  PE0004/  PE0005/  PE0006/  PE0007/

Create a SLURM script with the following lines in it to be used to submit the parallel ADCIRC job to the Cray (SALK) SLURM queues:

#!/bin/bash
#SBATCH --partition production
#SBATCH --job-name PADCIRC.test
#SBATCH --nodes=16
#SBATCH --ntasks=1
#SBATCH --mem=2048
#SBATCH --o padcirc.out


# Find out name of master execution host (compute node)
echo ""
echo -n ">>>> SLURM Master compute node is: "
hostname

# Change to working directory
cd $SLURM_SUBMIT_DIR

echo ">>>> Begin PADCRIC MPI Parallel Run ..."
aprun -n 8 /share/apps/adcirc/default/bin/padcirc
echo ">>>> End   PADCRIC MPI Parallel Run ..."

And finally to submit the parallel job to the SLURM queue enter:

salk$ qsub padcirc.job

The CUNY HPC Center has also built and provided a parallel-coupled version of ADCIRC and SWAN to include surface wave affects in the simulation. This executable is called 'padcswan' and can be run with largely the same preparation steps and the same SLURM script shown above for 'padcirc'. Details on the minor differences and additional input files required are available at the SWAN websites given in the introduction.