STRUCTURE
The following SLURM batch script shows how to run a single, basic Structure serial job:
#!/bin/bash #SBATCH --partition production #SBATCH --job-name STRUCT_simple #SBATCH --nodes=1 #SBATCH --ntasks=1 # Find out name of master execution host (compute node) echo -n ">>>> SLURM Master compute node is: " hostname # You must explicitly change to the working directory in SLURM cd $SLURM_SUBMIT_DIR # Set the root directory for the 'structure' binary STROOT=/share/apps/structure/default/bin # Point to the execution directory to run echo ">>>> Begin STRUCTURE Serial Run ..." echo "" ${STROOT}/structure -K 1 -m mainparams -i ./sim.str -o ./sim_k1_run1.out echo "" echo ">>>> End STRUCTURE Serial Run ..."
This script can be dropped into a file (say 'struct_serial.job') and submitted for execution using the following SLURM command:
qsub struct_serial.job
This test input file should take less than 5 minutes to run and will produce SLURM output and error files beginning with the job name 'STRUCT_simple'. Additional, Structure-specific output files will also be created, including an output file called 'sim_k1_run1.out_f'. Details on the meaning of the SLURM script are covered below in the SLURM section. The most important lines are the '#SBATCH --nodes=1 ntasks=1'. The first instructs SLURM to select 1 resource 'chunk' each with 1 processor (core). The second instructs SLURM to place this job wherever the least used resources are found (freely). The master compute node that SLURM finally selects to run your job will be printed in the SLURM output file by the 'hostname' command.
The Structure program requires its own input and data files, properly configured, to run successfully. For the example above these include the input file ('sim.str' above), the 'mainparams' file ('mainparams.10mil.k1' above), and the 'extraparams' file (the default name, 'extraparams' is used in the example above). The user is responsible for configuring these files correctly for each run, but the data files for this example and others can be found in the directory:
/share/apps/structure/default/examples
on ANDY.
Often, Structure users are interested in making multiple runs over a large simulation regime-space. This requires appropriately configured input and parameter files for each individual run. Data file configuration can be done manually or with the help of the Python-based tool StrAuto. The HPC Center has installed StrAuto to support running multiple Structure runs. StrAuto is documented at its download site here [1] and all the files, including the primary Python-based tool, 'strauto-0.3.1.py' are available in:
/share/apps/strauto/default
In this process, the StrAuto script, 'strauto-0.3.1.py', (found in '/share/apps/strauto/default/bin') is run in the presence of a user-created, regime-space configuration file called 'input.py'. This produces a Unix script file called 'runstructure' that can then be used to run the user-defined spectrum of cases, one after another. NOTE: the 'strauto-0.3.1.py' script requires Python 2.7.2 to run correctly. This version is NOT the default version of Python installed on BOB; and therefore, users of StrAuto must invoke the 'strauto-0.3.1.py' script using a specially installed version of Python, as follows:
/share/apps/epd/7.3-2/bin/python ./strauto-0.3.1.py
The above command assumes that 'strauto-0.3.1.py' has been copied into the user's directory and that the required 'input.py' file is also present there. The contents of the 'runstructure' file produced can then be integrated into a SLURM batch script similar to the simple, single-run script shown above, but designed to run each case in the simulation regime-space in succession. Here is an example of just such a runstructure-adapted SLURM script:
#!/bin/bash #SBATCH --partition production #SBATCH --job-name STRUCT_cmplx #SBATCH --nodes=1 #SBATCH --ntasks=1 #----------------------------------------------------------------------------------- # This SLURM batch script is based on the 'runstructure' script generated by # Vikram Chhatre's setup and pre-processing program 'strauto-0.3.1.py' written # in Python at Texas A&M University to be used with the 'structure' application. # # Each 'runstructure' script is custom-generated by the 'strauto-0.3.0.py' python # based on a custom input file. It completes a series of runs over a regime defined # by the 'structure' user for that custom input file only. This means it will only # work for that input data file. # Email: crypticlineage (at) tamu.edu #----------------------------------------------------------------------------------- # Find out name of master execution host (compute node) echo -n ">>>> SLURM Master compute node is: " hostname # You must explicitly change to the working directory in SLURM cd $SLURM_SUBMIT_DIR # Setup a directory structure for the multiple 'structure' runs mkdir results_f log harvester mkdir k1 mkdir k2 mkdir k3 mkdir k4 mkdir k5 cd log mkdir k1 mkdir k2 mkdir k3 mkdir k4 mkdir k5 cd .. # Set the root directory for the 'structure' binary STROOT=/share/apps/structure/default/bin # Point to the execution directory to run echo ">>>> Begin Multiple STRUCTURE Serial Runs ..." echo "" ${STROOT}/structure -K 1 -m mainparams -o k1/sim_k1_run1 2>&1 | tee log/k1/sim_k1_run1.log ${STROOT}/structure -K 1 -m mainparams -o k1/sim_k1_run2 2>&1 | tee log/k1/sim_k1_run2.log ${STROOT}/structure -K 1 -m mainparams -o k1/sim_k1_run3 2>&1 | tee log/k1/sim_k1_run3.log ${STROOT}/structure -K 2 -m mainparams -o k2/sim_k2_run1 2>&1 | tee log/k2/sim_k2_run1.log ${STROOT}/structure -K 2 -m mainparams -o k2/sim_k2_run2 2>&1 | tee log/k2/sim_k2_run2.log ${STROOT}/structure -K 2 -m mainparams -o k2/sim_k2_run3 2>&1 | tee log/k2/sim_k2_run3.log ${STROOT}/structure -K 3 -m mainparams -o k3/sim_k3_run1 2>&1 | tee log/k3/sim_k3_run1.log ${STROOT}/structure -K 3 -m mainparams -o k3/sim_k3_run2 2>&1 | tee log/k3/sim_k3_run2.log ${STROOT}/structure -K 3 -m mainparams -o k3/sim_k3_run3 2>&1 | tee log/k3/sim_k3_run3.log ${STROOT}/structure -K 4 -m mainparams -o k4/sim_k4_run1 2>&1 | tee log/k4/sim_k4_run1.log ${STROOT}/structure -K 4 -m mainparams -o k4/sim_k4_run2 2>&1 | tee log/k4/sim_k4_run2.log ${STROOT}/structure -K 4 -m mainparams -o k4/sim_k4_run3 2>&1 | tee log/k4/sim_k4_run3.log ${STROOT}/structure -K 5 -m mainparams -o k5/sim_k5_run1 2>&1 | tee log/k5/sim_k5_run1.log ${STROOT}/structure -K 5 -m mainparams -o k5/sim_k5_run2 2>&1 | tee log/k5/sim_k5_run2.log ${STROOT}/structure -K 5 -m mainparams -o k5/sim_k5_run3 2>&1 | tee log/k5/sim_k5_run3.log # Consolidate all results in a single 'zip' file mv k1 k2 k3 k4 k5 results_f/ cd results_f/ cp k*/*_f . && zip sim_Harvester-Upload.zip *_f && rm *_f mv sim_Harvester-Upload.zip ../harvester/ cd .. echo "" echo ">>>> Zip Archive: sim_Harvester-Upload.zip is Ready ... " echo ">>>> End Multiple STRUCTURE Serial Runs ..."
This script can be dropped into a file (say 'struct_cmplx.job') and submitted for execution using the following SLURM command:
qsub struct_cmplx.job
The 'struct_cmplx.job' script runs one Structure job after another each with a slightly different set of input parameters. All the associated files and directories from a successful StrAuto-supported run of Structure using this script can be found on ANDY in:
/share/apps/strauto/default/examples