STRUCTURAMA

From HPCC Wiki
Revision as of 19:01, 20 October 2022 by James (talk | contribs) (Created page with " Importantly, the program can estimate the number of populations under the Dirichlet process prior. Markov chain Monte Carlo (MCMC) is used to approximate the posterior probability that individuals are assigned to specific populations. Structurama also allows the individuals to be admixed. Structurama implements a number of methods for summarizing the results of a Bayesian MCMC analysis of population structure. Perhaps most interestingly, the program finds the mean parti...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Importantly, the program can estimate the number of populations under the Dirichlet process prior. Markov chain Monte Carlo (MCMC) is used to approximate the posterior probability that individuals are assigned to specific populations. Structurama also allows the individuals to be admixed. Structurama implements a number of methods for summarizing the results of a Bayesian MCMC analysis of population structure. Perhaps most interestingly, the program finds the mean partition, a partitioning of individuals among populations that minimizes the squared distance to the sampled partitions. More detailed information about Structurama can be found at the web site here [1] and in the manual here [2].

The February 2014 version of the Structurama is installed on ANDY and PENZIAS. Structurama is a serial program with only an interactive command-line interface; therefore, making SLURM batch serial runs requires that the user to supply the exact and complete list of commands that an interactive use of the program would have required within the SLURM batch script. In addition to referencng the executable 'st2', a Structurama data file must be present in the SLURM working directory. The following SLURM batch script shows how this is done using the Unix 'here-document' construction (i.e <<):

#!/bin/bash
#SBATCH --partition production
#SBATCH --job-name STRAMA_serial
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --mem=2880


# Find out name of master execution host (compute node)
echo -n ">>>> SLURM Master compute node is: "
hostname

# You must explicitly change to the working directory in SLURM
cd $SLURM_SUBMIT_DIR

# Point to the execution directory to run
echo ">>>> Begin STRUCTURE RAMA Serial Run ..."
echo ""

st2 << EOF
execute test.inp
yes
quit
EOF

echo ""
echo ">>>> End   STRUCTURE RAMA Serial Run ..."

After the Structurama module is loaded, this script can be dropped into a file (say 'strama_serial.job') and submitted for execution, as follows:


module avail structurama
----------------------------------- /share/apps/modules/default/modulefiles_UserApplications ------------------------------------
structurama/10.30.11        structurama/2.2.14(default)

module load structurama

qsub strama_serial.job

A basic test input file should take less than a minute to run and will produce SLURM output and error files beginning with the job name 'STRAMA_serial'. Additional, Structurama specific output files can also be requested. This job will write an Structurama output file call 'strout.p.

Details on the meaning of the SLURM script are covered below in the SLURM section. The most important lines are the '#SBATCH --nodes=1 ntasks=1 mem=2880'. The first instructs SLURM to select 1 resource 'chunk' each with 1 processor (core) and 2,880 MBs of memory in it for the job. The second instructs SLURM to place this job wherever the least used resources are found (freely). The master compute node that it finally selects to run your job will be printed in the SLURM output file by the 'hostname' command.

The lines following the reference to the Structurama executable 'str2' show what is required to deliver input to an interactive program in a batch script. The input-equivalent sequence of commands should be placed, one per line, between the first and last 'EOF' which demarcates the entire pseudo-interactive session. NOTE: If you forget to include the final command 'quit', your SLURM job will never complete, as it will be waiting for its final termination instructions and will never receive them. Such, a job should be deleted with the SLURM command 'qdel JID', where JID is the numerical SLURM job identification number. If you would like a print out of all the Structurama options include the line 'help' in your command stream.