Main Page

From CUNYHPC

Jump to: navigation, search

Image:CUNY-HPC-Logo.gif

Contents


Introduction to the City University of New York High Performance Computing Center

The City University of New York (CUNY) High Performance Computing Center (HPCC) is located on the campus of the College of Staten Island, 2800 Victory Boulevard, Staten Island, New York 10314. HPCC goals are to:

  • Support the scientific computing needs of university faculty, student, staff, and their public and private sector partners;
  • Create opportunities for the CUNY research community to develop new partnerships with the government and private sectors; and
  • Leverage the HPCC's capabilities to acquire additional research resources for its faculty and graduate students in existing and major new programs.

Please send comments on or corrections to the wiki to HPChelp@mail.csi.cuny.edu 

Installed systems

The installed HPC systems include (or will include) the following:

Athena. This system (Dell PowerEdge 1850) consists of one head node and 96 compute nodes. Each compute node has two sockets for Intel 2.86 GHz Woodcrest dual-core processors, i.e., four cores per node. Athena has a total of 384 cores available for user computations. The three-hundred-eighty-four processors have 2 Gbytes of memory per core (four cores with a total of 8 Gbytes to a node). The interconnect network is Gbit Ethernet. This system is operational.

Zeus. Zeus is dedicated to supporting users running Gaussian03. This system (Dell PowerEdge 1950) consists of one head node and 16 compute nodes. Eight of the compute nodes have two sockets for Intel 3.0 GHz quad-core Harpertown processors, i.e., eight cores per node or a total of 64 cores. Another 8 compute nodes are single socket Intel 2.86 GHz Woodcrest dual-core processors. The 80 processors each have 2 Gbytes of memory per core. Each node has a 300 GByte disk drive for user temporary files. The interconnect network is Gbit Ethernet. This system is operational.

Bob. This system is named in honor of Dr. Robert E. Kahn, an alumnus of the City College of New York who, along with Vinton G. Cerf, invented the TCP/IP protocol, the technology used to transmit information on the modern Internet (http://www.economicexpert.com/a/Robert:E:Kahn.htm). "Bob" is a Dell PowerEdge system and consists of 240 cores of quad-core processors. The 240 processors have 2 Gbytes of memory per core (eight cores with a total of 16 Gbyte of memory to a node). The interconnect network is Infiniband (10 Gbit/second). This system is scheduled to be operational in July 2009.

Systems 1-3 share a 25 Tbyte file system.

Andy. This system is named in honor of Dr. Andrew S. Grove, an alumnus of the City College of New York and one of the founders of Intel (http://educationupdate.com/archives/2005/Dec/html/col-ccnypres.htm) . "Andy" is an SGI ICE system (http://www.sgi.com/products/servers/altix/ice/) and consists of 360 cores of Intel 2.93 GHz quad-core Intel Core 7 (Nehalem) processors with a 1600 MHz front side bus. The 360 processors have 3 Gbytes of memory per core (eight cores with a total of 24 Gbyte of memory to a node). The interconnect network is dual rail Infiniband (20 Gbit/second). Andy is scheduled to be operational in October 2009. Andy will have a 36 Tbyte file system.

CFP2006 Performance numbers for various CUNY HPC Systems

The CFP2006 Speed benchmark numbers provided below represent floating point performance from the SPEC website on its floating point 2006 benchmark on a single core. This benchmark suite includes a number of Fortran and C programs normally used in scientific research such as GAMESS, FFT, etc. The CFP2006 Throughput benchmark numbers represent workload performance in terms of throughput of a suite floating applications on node (as opposed to a single core). This benchmark suite includes a number of Fortran and C programs such as GAMESS, FFT, etc. normally used in scientific research. Information on the spec benchmark suite can be found at www.spec.org.

Image:specbenchmark2.jpg

Storage and back-up policy.

By default, users are provided a 50 gigabyte home directory. Additional home directory space may be requested by contacting hpchelp@mail.csi.cuny.edu. User home directories are regularly backed-up. An incremental back-up of user home directories is performed on a daily basis. A full back-up of user home directories is performed weekly.

Scratch space is also available for temporary files. User/scratch files are not backed up and can be deleted at any time.

Software

The operating system running on the Dell PowerEdge systems in CentOS 5.0. The queuing system is Sun Grid Engine 6.0. The following compilers and library software is installed on the systems:

  • GNU C, C++ and Fortran compilers;
  • Portland Group, Inc. optimizing C, C++, and Fortran compilers;
  • The Intel Cluster Toolkit including the Intel C, C++ and Fortran compilers, Math and Kernel Library;
  • IMSL


The following third party applications are installed:

  • ADF (Amsterdam Density Functional Theory)
  • BEST
  • Dalton
  • Gauss (Economic Modeling)
  • Gaussian03
  • Mathematica
  • MATLAB
  • MrBayes
  • NAMD
  • Network Simulator2 (NS2)
  • NWCHEM
  • RAxML
  • Visualization/NAG
  • WRF (Weather Research and Forecasting Code)
  • WRF-Chem

Hours of Operation

The second and fourth Tuesday mornings in the month from 8:00AM to 12PM are normally reserved for scheduled maintenance. Please plan accordingly.

Emergency maintenance may be scheduled as needed.

User Support

Users requiring assistance in use of the systems should send an email to:


hpchelp@mail.csi.cuny.edu

System Back-ups

Incremental backups are performed daily and are retained for three weeks. Full backups are performed weekly and are retained for three months. These backups are stored in a locked box, logged, and stored in a different building. Once a quarter, a full backup will be read in off tape and verified (to ensure backups are readable and restorable). The following user and system files are backed up

/
/usr
/var
mySQL
SGE
User scratch and temp files are not backed up.

IMPORTANT NOTICE TO USERS

Athena, Bob, and Neptune are all operational effective 5 October 2009.

The Buildings and Grounds Department of the College of Staten Island has completed major infrastructure upgrades to the CUNY HPC facility. These upgrades include additional and improved air conditioning, raised computer room floor, and additional electrical power. These facility upgrades will enable to more easily integrate new HPC systems into the existing environment. A picture of the new facility is shown below.

Image:facility.jpg

USING BOB - APPLICATION SCRIPTS

The SGE queues on BOB are the same as those on Athena. Submit scripts for using BOB differ somewhat from those for Athena. The major differences are that for BOB it is necessary to explicitly define the shell script to be used and must explicitly request Infiniband resources. A sample submit script for an executable file, "demoproblem", for use on BOB and on Athena are shown below:

Compilers

The following parallel compilers are available on Bob:

GNU C

Compilation

/usr/mpi/gcc/openmpi-1.2.8/bin/mpicc -o exe ./hello.c

Submit script

#!/bin/bash
#$ -q PD16.q
#$ -N test
#$ -pe mpi 8
#$ -cwd

/usr/mpi/gcc/openmpi-1.2.8/bin/mpirun  ./exe

Output

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
Hello world from process 0 of 8
Hello world from process 4 of 8
Hello world from process 3 of 8
Hello world from process 5 of 8
Hello world from process 1 of 8
Hello world from process 6 of 8
Hello world from process 2 of 8
Hello world from process 7 of 8 


GNU FORTRAN

Compilation

/usr/mpi/gcc/openmpi-1.2.8/bin/mpif90  -o exe ./hello.f

Submit script

#!/bin/bash
#$ -q PD16.q
#$ -N test
#$ -pe mpi 8
#$ -cwd

/usr/mpi/gcc/openmpi-1.2.8/bin/mpirun  ./exe 

Output

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
node           0 : Hello world
node           3 : Hello world
node           4 : Hello world
node           1 : Hello world
node           7 : Hello world
node           2 : Hello world
node           5 : Hello world
node           6 : Hello world


Intel C

Compilation:

/share/apps/openmpi/bin/mpicc -o exe ./hello.c 

Submit script:

#!/bin/bash
#$ -S /bin/bash
#$ -q PD16.q
#$ -N test
#$ -pe mpi 8
#$ -cwd
#$ -V

/share/apps/openmpi/bin/mpirun ./exe 

Output:

Hello world from process 2 of 8
Hello world from process 3 of 8
Hello world from process 4 of 8
Hello world from process 1 of 8
Hello world from process 6 of 8
Hello world from process 0 of 8
Hello world from process 5 of 8
Hello world from process 7 of 8

Intel FORTRAN

Compilation

/share/apps/openmpi/bin/mpif90 -o exe ./hello.f90 

Submit script:

#!/bin/bash
#$ -S /bin/bash
#$ -q PD16.q
#$ -N test
#$ -pe mpi 8
#$ -cwd
#$ -V

/share/apps/openmpi/bin/mpirun ./exe

Output:

node           2 : Hello world
node           0 : Hello world
node           4 : Hello world
node           5 : Hello world
node           1 : Hello world
node           6 : Hello world
node           3 : Hello world
node           7 : Hello world


BOB submit script

#!  /bin/bash 
#$ -S /bin/bash
#$ -q MX64.q
#$ -N demoproblem
#$ -pe mpi 64
#$ -cwd
#$ -V 

/share/apps/openmpi/bin/mpirun --mca btl openib,sm,self -np $NSLOTS demoproblem


Athena submit script

#!  /bin/bash
#$ -q MX64.q
#$ -N demoproblem
#$ -pe mpi 64
#$ -cwd

/share/apps/openmpi/bin/mpirun -np $NSLOTS --hostfile $TMPDIR/machines demoproblem


DIFFERENCES

 
On BOB, you must add the command:  ''-V''
On Bob, you must add               '' --mca btl openib,sm,self'' to the /share/apps/.... line 

Applications

ADF

Not available on BOB at the present time

BEST

Submit script for BEST on BOB



#!/bin/bash
#$ -S /bin/bash
#$ -q PP16.q
#$ -N best
#$ -pe mpi 16
#$ -cwd

/usr/mpi/gcc/openmpi-1.2.8/bin/mpirun --hostfile $TMPDIR/machines -np $NSLOTS /share/apps/best-mpi/mbbest bglobin.nex

DALTON

Not available on BOB at the present time

GAUSS

Sample script to run GAUSS on BOB

#!/bin/bash
#$ -N test_gauss
#$ -q SC.q
#$ -pe mpi 1
#$ -cwd

/share/apps/gauss/tgauss < ./pxyz.e > log

Here file pxyz.e was taken from GAUSS examples /share/apps/gauss/examples/. Upon successful run file " graphic.tkf" should be created in working directory.

pxyz.e:

library pgraph;
graphset;


let v = 100 100 640 480 0 0 1 6 15 0 0 2 2;
wxyz = WinOpenPQG( v, "XYZ Plot", "XYZ" );
call WinSetActive( wxyz );


begwind;
makewind(9,6.855,0,0,1);
makewind(9/2.9,6.855/2.9,0,0,0);
makewind(9/2.9,6.855/2.9,0,3.8,0);
_psurf = 0;
title("\202XYZ Curve - \201Toroidal Spiral");
fonts("simplex complex");
xlabel("X");
ylabel("Y");
zlabel("Z");

setwind(1);
t = seqa(0,.0157,401);
a = .2; b=.8; c=20;
x = 3*((a*sin(c*t)+b) .* cos(t));
y = 3*((a*sin(c*t)+b) .* sin(t));
z = a*cos(c*t);
margin(.5,0,0,0);
ztics(-.3,.3,.3,0);
_pcolor = 10;
view(-3,-2,4);
volume(1,1,.7);
_plwidth = 5;
xyz(x,y,z);

nextwind;
margin(0,0,0,0);
title("");
x = x .* (sin(z)/10);
_paxes = 0;
_pframe = 0;
_pbox = 13;
_pcolor = 11;
_plwidth = 0;
view(15,2,10);
xyz(x,y,z);

nextwind;
_pcolor = 9;
a = .4; b=.4; c=15;
x = 3*((a*sin(c*t)+b) .* cos(t));
y = 3*((a*sin(c*t)+b) .* sin(t));
z = a*cos(c*t);
volume(1,1,.4);
xyz(x,y,z);


endwind;


call WinSetActive( 1 );

MATHEMATICA

Not available on BOB at the present time

MATLAB

MATLAB jobs on BOB should be initiated from a Linux or Windows client on the CSI-campus or for those users off campus from the CUNY gateway machine NEPTUNE. When configured correctly, MATLAB generates and places the required batch submit scripts in the user's working directory on BOB's head node prior to submission. Currently, the submit script used by SGE has the name 'sgeWrapper.sh' for MATLAB Distributed jobs and 'sgeParallelWrapper.sh' for MATLAB Parallel jobs. Additional detail on client set up and on these different jobs types is provided below.

MRBAYES


#!/bin/bash
#$ -S /bin/bash
#$ -q M64.q
#$ -N mb
#$ -cwd
#$ -pe mpi 32

/usr/mpi/gcc/openmpi-1.2.8/bin/mpirun --hostfile $TMPDIR/machines -np $NSLOTS /share/apps/mrbayes-3.1.2/mb primates.nex

NAMD

Submit script for NAMD on BOB


#!/bin/bash
#$ -q PP16.q
#$ -N namd
#$ -pe mpi 16
#$ -cwd

echo group main ++shell ssh > $TMPDIR/machines
awk '{ for (i=0;i<$2;++i) {print "host",$1} }' $PE_HOSTFILE >> $TMPDIR/machines

/share/apps/NAMD/NAMD_CVS_Source/Linux-x86_64-g++/charmrun ++nodelist $TMPDIR/machines +p$NSLOTS /share/apps/NAMD/NAMD_CVS_Source/Linux-x86_64-g++/namd2 ./alanin

NS2

Not available on BOB at the present time

NWChem

To use NWChem on BOB user need to create a

Each user will need to create a symbolic link from the ".nwchemrc" file in the user's $HOME directory to the "default.nwchemrc" file. An example of creating a symbolic link is as follows:

prompt% ln -s /share/apps/nwchem_mpi/data/default.nwchemrc $HOME/.nwchemrc

To submit a NWChem job use the following script:

#!/bin/bash
#$ -S /bin/bash
#$ -N job_name
#$ -q PD16.q
#$ -pe mpi 4 # the amount of cores that you want to run your code at
#$ -cwd

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/share/apps/intel.bak/itac/7.1/bin/rtlib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/share/apps/intel/itac/7.2.0.011/bin/rtlib
/share/apps/openmpi/bin/mpirun --mca btl openib,sm,self /share/apps/nwchem_mpi/bin/nwchem ./test.nw

Submit it to SGE queue with

qsub ./your_submit_script

Here test.nw is a NWChem file. For example:

echo
start h2o

memory global 40 mb stack 23 mb heap 5 mb

geometry units au
 O 0       0        0
 H 0       1.430   -1.107
 H 0      -1.430   -1.107
end


basis
  O library 6-31g*
  H library 6-31g*
end

task scf gradient


RAXML

Submit script for RAxML on BOB



#!/bin/bash
#$ -S /bin/bash
#$ -N test2
#$ -q PP16.q
#$ -pe mpi 16
#$ -cwd

/usr/mpi/gcc/openmpi-1.2.8/bin/mpirun --hostfile $TMPDIR/machines -np $NSLOTS /share/apps/raxml_mpi/raxmlHPC-MPI -m GTRCAT -n TEST2 -s alg.phy -N 10

Training Courses

The CUNY HPCC provides training course and organizes seminars on various HPC topics. The training courses are provided at no cost and may be held at any CUNY campus site, the CUNY HPCC at the College of Staten Island, or the Graduate Center. For more information on attending a course or having a course scheduled, please send an email to hpchelp@mail.csi.cuny.edu

The curriculum for a typical 2 1/2 day course in parallel programming using the Message Passing Interface Library (MPI) is provided below. The course is typically given as a workshop with hands-on exercises. It is expect that attendees know UNIX (or one of its variants) and either C or FORTRAN.


DAY 1 (Half day; 1:00 PM to 5:00 PM)

    Overview of computer architectures
        Distribution of class materials
        Serial computers
        Vector processors
        Symmetric Multi-processors
        Parallel computers
            Single Instruction Multiple Data
            Multiple Instruction Multiple Data
        Heterogeneous computing with general purpose graphical processing units

    The City University of New York High Performance Computing Initiative
        Why HPC?
        Installed systems
        Future plans

    Getting familiar with the systems
        Account set-up
        Logging on
        Running a sample job

DAY 2  (Full day; 9:00 AM to 5:00 PM)

        Introduction to MPI
	  MPI point-to-point communications
          Collectives
          Blocking sends and receives
          Non-blocking sends and receives
          Testing for completion
        Hands on exercises

DAY 3  (Full day; 9:00 AM to 5:00 PM)

        MPI collectives
	     Gather/scatter
           All-to-all
           Performance notes
        OpenMP
           What is OpenMP
           Compiler Directives
           Conditional Compilation
           Environmental Variables
           OpenMP Performance
        Parallel Programming Futures
	  Hands on exercises

User Accounts

Applying for a HPCC Account

Only CUNY faculty, staff, and currently enrolled students (who MUST have a faculty sponsor) are allowed to use the CUNY HPCC systems. Applications for accounts are accepted at any time, but accounts expire on 30 September and must be renewed before then.

A CUNY HPCC account is required to log into the HPCC systems. Faculty, staff or students at CUNY may apply for a HPCC account by following this link: (http://www.csi.cuny.edu/cunyhpc/application.html).

Please be sure to complete all parts of the application including information on publications, funded projects, and resources required. With regard to the latter, please indicated the number of processor hours the are required for the academic year. For example, if you expect that you will submit 30 jobs per week, each using 16 processors, and each running, for 2 hours, then you requirement is for 49,920 processor hours (30 jobs * 52 weeks *16 processors * 2 hours).

By applying for and obtaining an account, the user agrees to comply with the CUNY Acceptable Use Policy, the HPCC User Account and Password Policy, and to include a Citation regarding use of the CUNY HPC resources.

Citations

Users of the CUNY HPC systems must include the following citation on any publication or presentation that includes results or is based on work using CUNY HPC resources:

"This research was supported in part by a grant of computer time from the City University of New York's High Performance Computing Research Center."  

Renewal applications should include a list of publications or presentation that resulted from the use of the CUNY HPC resources as future grants of time will be based, in part, on past research accomplishments.

Users are request to sent a copy of the publication or presentation to the Center either electronically (hpchelp@mail.csi.cuny.edu) or by mail to CUNY HPC, Building 1M-206, College of Staten Island, 2800 Victory Boulevard, Staten Island, NY 10314.

Acceptable Use Policy

Use of the computing resources at the HPCC is governed by the CUNY Acceptable Use Policy (AUP). The AUP is documented at

http://portal.cuny.edu/cms/id/cuny/documents/level_3_page/001171.htm and http://www.csi.cuny.edu/privacy/index.html

User Account and Password Policy

A user account is issued to an individual user. Accounts are not to be shared.

Users are responsible for protecting their passwords. Passwords are not to be shared.

When an account is opened, the user will receive a one use only password sent by mail to his university mailing address. The user, upon receiving the one use password should log onto the HPCC systems and change the password. If the password is not changed within 30 days of issuance, it will be expired.

The new password must conform to the CUNY password policy, which requires that it be at least eight (8) characters long, include at least one capitalized letter, one numerical character, and one of the following special characters:

 ! @ # $ % & * = + ) ( 

Passwords are good for 92 days. You will receive a notice two weeks before the end of the 92 day period, requesting that you change your passwords. If you do not change your passwords, your accounts will be locked and the password will need to be reset.

How to change password

The command to change a password is "passwd". An example of its use follows:

[user.name@athena ~]$ passwd
Changing password for user user.name.
Changing password for user.name
(current) UNIX password: old_password
New UNIX password: new_password
Retype new UNIX password: new_password
passwd: all authentication tokens updated successfully.
[user.name@athena ~]$ 

Logging in to HPCC

Notice: Users may not access CUNY computer resources without authorization or use it for purposes beyond the scope of authorization. This includes attempting to circumvent CUNY computer resource system protection facilities by hacking, cracking or similar activities, accessing or using another person's computer account, and allowing another person to access or use the user's account. CUNY computer resources may not be used to gain unauthorized access to another computer system within or outside of CUNY. Users are responsible for all actions performed from their computer account that they permitted or failed to prevent by taking ordinary security precautions.

For security reasons, CUNY only allows users to communicate using SSH. Secure Shell (abbreviated SSH) is a secure means of connecting to a remote server over an encrypted channel. SSH is a protocol designed to allow logging into a remote machine and executing commands on a remote machine using improved secure encrypted communication between two non-trusted hosts over an insecure network, while other protocols like Telnet cannot.

The HPC systems located at the CUNY HPCC accept IP addresses only from the CSI campus. Users not located on the CSI campus must first log into an authentication server. The authentication server for the HPCC is neptune.csi.cuny.edu. To log into the HPC systems, the user must then ssh from neptune.csi.cuny.edu to the desired HPC system.

Logging from windows machine

If you are using Windows machine locally you need to have SSH client installed on it. While other SSH clients may exist, CUNY strongly recommends the use of WinSCP or PuTTY. Once you have SSH client installed run it and connect to HPCC. Using the above links you may find documentations on this applications. Another option is installing Cygwin.

Login from GNU/Linux

On Unix/Linux machines, the user should use ssh to log in to the HPCC systems. Under most Linux and Unix, ssh command is located in /usr/bin. Please refer to corresponding manpage.

Command

 
$ ssh user.name@neptune.csi.cuny.edu 

will log you onto authentication server. Once you are logged therу you are ready to go to one of the HPCC system (athena in this example)

[username@neptune ~]$ ssh athena


username@athena's password: YouR_password**HeRE
Last login: Mon Oct 20 13:04:23 2008 from neptune.csi.cuny.edu
Rocks 5.0 (V)
Profile built 19:20 30-Sep-2008
Kickstarted 16:04 30-Sep-2008
[username@athena ~]$

X11 Forwarding or Tunneling

X11 forwarding is required when logging in to a remote location, but an application GUI must be display locally. This could be done with Mathematica for instance, if using the command-line interface was not acceptable. To allow X11 forwarding or tunneling back through your 'ssh' connection, include the flag '-X' to your 'ssh' command. For users off the CSI campus, the following forwards X11 traffic back from the HPCC gateway system, NEPTUNE, to your desktop:

 
$ ssh -X username@neptune.csi.cuny.edu 

If you need to then login to ATHENA to run Mathematica you will have forward X11 traffic again through the second connection with:

 
$ ssh -X username@athena.csi.cuny.edu 

Note that double-forwarding will be significantly slower and may make working with a GUI from outside of CSI campus inconvenient.

Transfer files between HPCC systems and your PC

Sometimes user may edit files on Windows PC first before uploading them to HPCC. For Windows users the easiest way is using WinSCP. Please note that SCP protocol should be selected instead of other protocol.

For the first time use, WinSCP may give warning that the server host key was not found in the cache. User could accept the server host key. User could upload and download files to/from HPCC using drag & drop.

Text files (job files, scripts) prepared in Windows may contain non-visible symbols (end-of-line symbols for example) that are not understood by HPCC system. To avoid errors related to this use dos2unix command:

dos2uinx text_file_prepared_in_windows.txt

GNU/Linux users or MacOS users may use scp to copy files from localhost to HPCC systems. Please refer to scp manpage for details.

Basic Unix/Linux Commands

UNIX Tutorial

If you are unfamiliar with UNIX or Linux, an excellent online UNIX tutorial can be found in the "User's Guide to UNIX" from the Department of Electronic Engineering, University of Surrey, United Kingdom [1]. Although that link is to a UNIX tutorial, the commands, at the user level, are essentially identical to those of Linux.

vi Usage

While other text editors exist, vi may be the most powerful text editor under UNIX/Linux. There are two modes in vi: input and control mode. Actually vi has three modes -- editing, command, and last-line mode. We are not going to confuse the readers with information that is hardly ever used. Here we only present them in two main categories to make it clear for the beginner.

When we start vi, we are in control mode. When we add or change text, we need to shift to input mode. Pressing the ESC (Escape) key at any time, we will return to the control mode.

For the next several sections, we will begin with the basic knowledge of vi. Until section 6.5.4 we will only cover the control mode. In section 6.5.4 we discuss how to input and edit text in vi. For more information for vi, the user could reference vi man page or other resource.

Starting vi

To create a new file or edit an existing file, type "vi" followed by the filename at the shell prompt:

$> vi ''filename'' 

In the vi control mode, type

To save current file:
ESC : w 
Or to save the current file and exit from vi.
ESC : wq 

Moving the cursor

The arrow keys work in vi, but not all terminals support them. The movement keys could be used to move the cursor around. The "h" key and "l" key move the cursor left and right; the "j" and "k" move down and up.

Here is the illustration of the cursor keys on the keyboard,

k up
h left l right
j down


left down up down
h j k l

Delete, Undo

The x command deletes the current character.
The dw command deletes the current word.
The dd delete the current line.
The undo command u restores the text deleted or changed by mistake. The undo command can only restore the most recently deleted or changed text.

Input/Editing

In a vi session, user must shift to the Input Mode before entering text. Press the ESC and i to invoke the Input Mode

tar and gzip/bzip2

tar

On both UNIX and Linux, tar may be the most common used archive tool. The synopsis of tar is:

tar [option] [file...]

The most import options in tar are, –c, –x, –v, –f and –z. The –c option is used to create archive and -x to extract an archive. –v will allow tar to print important information during the archive/extract process. –z is a new feature in tar which means compress/uncompress the archive. For example, we are planning to archive and compress the text file “water” to water.tar.gz

tar cvf water.tar.gz water

To extract the text file “water” from the archive file

tar xvf water.tar.gz

gzip

There are several compression tools under Unix/Linux, such as compress. While gzip (GNU zip) is a compression utility designed to be a replacement for compress, its main advantage over compress is better compression. It has been adopted by the GNU project and is now relatively popular.

The synopsis for gzip is:

gzip [option] [file …]

The options include, -d (decompress), -l (list compress file content) and –v (verbose). There is no compress option for gzip. The default option in gzip is compress.

bzip2

Similar to gzip, bzip2 is a newer algorithm for compressing data. bzip2 is a freely available, high-quality data compressor. It typically compresses files to within 10% to 15% of the best available techniques, while being around twice as fast at compression and six times faster at decompression. Bzip2 is available at:

http://www.bzip.org/

The synopsis for bzip is:

bzip [option] [file …] 

The generally used options are –z (compress), -d (decompress), -v (verbos)

SGE, Job Submission, and Queues

General paradigm

A typical cluster system consists of two different types of compute nodes --- one head node and arbitrary amount of compute nodes. The head node runs the cluster. It schedules resources, assigns jobs to compute nodes and manages jobs. The head node uses a scheduler or queuing system to assign jobs to the compute nodes. The scheduler on systems at the HPCC is Sun Grid Engine (SGE). This idea is illustrated on the picture.


Image:Clust1.png


SGE is an open source batch-queuing system, supported by Sun Microsystems. SGE is typically used on a computer cluster and is responsible for accepting, scheduling, dispatching, and managing the remote execution of large numbers of standalone, parallel or interactive user jobs. It also manages and schedules the allocation of distributed resources such as processors, memory, disk space, and software licenses.

To submit a job for execution user must create a special script (called a "submit script" in further examples) that tells the HPC system and the SGE queuing system what resources are required to execute a job. The submit script may include information on the amount of CPU time, memory size, number of processors required, job priority, and the name of the queue into which the job is to be submitted. The submit script is submit to the computer through the use of the qsub or gsub command. Jobs not submitted through qsub or gsub will get executed on a head node and therefore killed.

The submit script may contain tens of different options. The most important ones are:

  • #$ -N job_name --- this flag specifies name that will be assigned to the job.
  • #$ -q queue --- this flag specifies the queue that SGE will put the job in. You may find a detailed description of the available queues here.
  • #$ -pe mpi 16 --- this flag specifies parallel environment and amount of processors that SGE will use for the job.
  • #$ -cwd --- this tells SGE to use current directory for outputs.

More information on SGE is available at here.

Scalar Jobs

Scalar job (as opposed to a multiprocessor job) is the one that uses only one processor. For example, the execution of the simple UNIX command date is a scalar job --- it simply returns current date and time. The question is how to register this job in the queue and execute it on one of compute nodes. To do that:

1. Create a new directory (named scalar for example ) and cd to it by executing the command mkdir scalar; cd scalar.

2. Use your favorite editor (HPCC recommends vim) to create a submit script (named send for example) and put the following lines in it:

#!/bin/bash
#$ -q SC.q
#$ -cwd

date

Note:

  • The first line declares the shell we are using.
  • Everything after # is a comment.
  • Everything after #$ is a SGE options.
  • -cwdoption tells SGE to use the current directory for outputs as described in the previous section.

3. Submit the job to the queue by executing the command qsub scalar.job. SGE will respond indicating that your job was successfully submitted:

Your job 4761 ("send") has been submitted

Here, "4761" represents a unique numerical ID that is assigned by SGE to your job.

4. You can check status of the job executing command qstat (Don't forget about man qstat).

5. Once the job is finished you may see the outputs in the file send.o4761

# cat send.o4761 
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
Wed Mar 11 17:15:59 EDT 2009

First two lines contain service messages. The last line is the output obtained from date command executed on one of compute nodes.

Symmetric Multiprocessor (SMP) Jobs

In computing, symmetric multiprocessing or SMP involves a multiprocessor computer-architecture where two or more identical processors can connect to a single shared main memory. SMP represents one of the earliest styles of multiprocessor machine architectures, typically used for building smaller computers with up to 8 processors. Early examples of SMP systems include the CRAY X-MP-464 (4 processors with 64 megawords of memory) and the CRAY C916-128 (16 processors with 128 megawords of memory) and the CRAY T932-256 (32 processors with 256 megawords of memory). Most common multiprocessor systems today use an SMP architecture. In the case of multi-core processors, the SMP architecture applies to the cores, treating them as separate processors. For example, on Athena, each node has four cores that share 8 gigabytes of memory.

The advantage of applications that use is the SMP programming mode is that it is easier to write SMP applications than MPI Library applications. The disadvantage is that SMP applications are limited to the number of cores on a node and even there, on systems like the CRAY T932 with 32 processors, scaling performance dropped off quickly above 16 processors.

There are a limited number of application on the CUNY HPC systems that operate in SMP mode. One of them is Gaussian03. Gaussian03 is currently installed on Zeus and can use between one and eight processors inclusive.

SMP systems allow any processor or core on a node to work on any task no matter where the data for that task are located in the memory of that particular node; with proper operating system support, SMP systems can easily move tasks between processors to balance the workload efficiently. The only difference between writing a submit script for a scalar job and writing a submit script for an SMP job is that we now need to specify the SMP parallel environment that we want to run the job under. An example of an SMP job submit script is shown below for a 4-processor Gaussian 03 job (How to write a SGE job submit script^smp.bash).

#!/bin/bash
#
# Script to run 4-processor SMP Gaussian 03 job
#
#$ -q g03.q
#$ -cwd
#$ -V 
#$ -N methanol_g03
#$ -pe mpi 4
#$ -R y

g03 methanol 
  • SGE submission options

Select the queue

#$ -q g03.q

Change to current working directory

#$ -cwd

Export environment variables into script

#$ -V 

A name for the job

#$ -N methanol_g03

Select the parallel environment

#$ -pe mpi 4

Switch on resource reservation

#$ -R y

Run the Gaussian 03 job

g03 methanol 

There are a few changes from a scalar job. Two obvious changes are: we now ask for a parallel job queue (g03.q.) We also switch on resource reservation, this allows SGE to make more sophisticated decisions regarding the scheduling of parallel jobs. The only other change is the requesting of the SMP parallel environment with the line

#$ -pe mpi 4

Parallel environments tell SGE which parallel protocol to use to run a multi-processor job. Thus, we should request 4 processors here. Once again, the job can be submitted to the queuing system using the 'qsub' command:

qsub smp.bash


For more information about Gaussian03 jobs, read the section Gaussian03.

Parallel Distributed Memory Jobs (MPI)

Distributed memory job scripts are generally very similar to SMP job scripts. The major difference is in how the code is executed rather than in the job script specification options. As an example, consider the following simple MPI program written in C.


1.Create a new directory (named "parallel" for example) and cd to it by executing the command mkdir parallel; cd parallel.

2. Use your favorite editor (HPCC recommends vim) to create a file with C-code (hello.c for example) and put the following lines in it:

/* C Example */
#include <stdio.h>
#include <mpi.h>


int main (argc, argv)
     int argc;
     char *argv[];
{
  int rank, size; 

  MPI_Init (&argc, &argv);	/* starts MPI */
  MPI_Comm_rank (MPI_COMM_WORLD, &rank);	/* get current process id */
  MPI_Comm_size (MPI_COMM_WORLD, &size);	/* get number of processes */
  printf( "Hello world from process %d of %d\n", rank, size );
  MPI_Finalize();
  return 0;
}

3. Compile code (PGI mpicc compiler is used in this example)

/share/apps/openmpi-pgi/bin/mpicc -o exe ./hello.c

4. Using text editor create a submit script (called send for example) and put the following in it:

#!/bin/bash
#$ -N testmpi
#$ -q PP16.q
#$ -pe mpi 16
#$ -cwd

/share/apps/openmpi-pgi/bin/mpirun -np $NSLOTS ./exe

Note:

  • #$ -N testmpi is to specify name of the job. This name will be assigned to output files generated by SGE.
  • #$ -q PP16.q is to select a queue. Here we use queue PP16.q.
  • #$ -pe mpi 16 is to specify parallel environment (mpi) and number of processes (16) that will be used.


5. Submit the job using qsub send. You will get a message

Your job 4762 ("testmpi") has been submitted

Here 4762 is a numerical ID that every SGE job gets. Job status may be checked with qstat.

6. Once job is finished you will be able to see outputs. All the outputs are stored in file testmpi.o4762:

# cat testmpi.o4762 
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
Hello world from process 0 of 16
Hello world from process 1 of 16
Hello world from process 4 of 16
Hello world from process 12 of 16
Hello world from process 8 of 16
Hello world from process 2 of 16
Hello world from process 5 of 16
Hello world from process 13 of 16
Hello world from process 9 of 16
Hello world from process 3 of 16
Hello world from process 6 of 16
Hello world from process 14 of 16
Hello world from process 10 of 16
Hello world from process 7 of 16
Hello world from process 15 of 16
Hello world from process 11 of 16

First two lines contain service messages. The rest of the output are messages generated by each of 16 processors requested in submit file.

Queues

Queues are used to efficiently match jobs to resources. The queue schema will be periodically reviewed from time to time and adjusted as necessary. User comments regarding the queuing schema are considered and should be sent to hpchelp@mail.csi.cuny.edu Seven different types of queues are established.

"SC" designates a single core queue. Jobs requiring just a single core and no more than 1.6 GB of memory should be submitted to this queue. A single core job can not run for longer than 4 hours.

"G03" designates the Gaussian03 queue. Gaussian03 jobs can run only in this queue.

"PD16" designates a 16 core resource that can be used for development or running of small, short parallel codes. The time limit for jobs using this resource is 2 hours. If the job runs longer than 2 hours, it will be terminated.

"PP16" designates a 16 core resource that can be used for production running of small parallel codes. There is no time limit for these jobs, but the user is cautioned to provide restart files in case of system failure and for times when processing must be interrupted for systems maintenance.

"MX64" designates a 64 core resource that can be used for production running of medium-sized parallel codes. The time limit for jobs using this resource is 2 hours. If the job runs longer than 2 CPU hours, it will be terminated.

"M64" designates a 64 core resource that can be used for production running of medium-sized parallel codes. There is no time limit for these jobs, but the user is cautioned to provide restart files in case of system failure and for times when processing must be interrupted for systems maintenance.

"L128" designates a 128 core resource that can be used for production running of large parallel codes. There is an 8 hour time limit for these jobs. The user is cautioned to provide restart files in case of system failure and for times when processing must be interrupted for systems maintenance.

In order to put the job into appropriate queue put the following line in your submit script:

  1. $ -q queue_designator [SC, G03, PD16, PP16, MX64, M64, L128]

A job submitted to the wrong queue will either not be run or will be terminated.

Parallel development jobs ("PD16" and "MX64") run at a high priority than parallel production jobs ("PP16", "M64", and "L128").

While the "all.q" queue still exists, jobs submitted to that queue will run at the lowest priority and are likely not to run at all.

The following table summarizes the queue schema and the resources available in the various queues.

Designator Description Number of Cores Time Limit (Hours) Priority
SC.q Single Core 1 4 Normal
PD16.q Parallel Development 16 2 High
PP16.q Parallel Development 16 8 Medium
MX64.q Parallel Development 64 2 High
M64.q Parallel Development 64 8 High
L128.q Parallel Development 128 8 Medium
G03.q Gaussian 1 to 8 cores 168 Medium



PBS Pro 10.1, Job Submission, and Queues

(***** Under Construction *****)

As the number and management needs of its systems has grown, CUNY's HPC Center has decided to move to a more fully featured and commercially supported job queueing and scheduling system (i.e. workload manager). The HPC Center has selected PBS Pro as a replacement for SGE and will begin a gradual transition from SGE to PBS Pro on all of its resources. PBS Pro offers numerous features that will improve the full and effective usage of CUNY's HPC Resources. This includes several distinct approaches to resource allocation and scheduling (priority-formula-based, fair-share, etc.), interfaces to control application license use, multiple methods of partitioning systems and scheduling jobs between them, and a full-featured usage analysis package, among other things. Initially, ATHENA (an older, less heavily used system) and ANDY (a new SGI cluster system) will be targeted for conversion to PBS Pro. Then others system will be moved over gradually to PBS Pro. A complete conversion is targeted for before the end of the year.

Users will need to make minor adjustments to their standard approach for batch job submission, but many of the commands for submitting and tracking batch work in PBS Pro have the same names and even the same options. Regardless, the adjustments required are described carefully here and presented more-or-less in the same order as the SGE presentation above. CUNY HPC staff will be available to answer questions to smooth the transition. With PBS Pro, we believe that CUNY HPC resources will be more responsive to the mixed workload that CUNY users present.

PBS Pro Design and the Cluster Paradigm

PBS Pro places 3 distinct service daemons onto the classical head-node, compute-node cluster architecture. These are the queuing service daemon (known as the Server in PBS Pro), the job scheduling daemon (known as the Scheduler), and the compute engine daemon (know as the MOM). The Server and Scheduler typically run on the cluster's head node. They receive and distribute jobs submitted by users there to the compute nodes in a resource-intelligent fashion. The Server and Scheduler do not run on the compute nodes. A MOM runs on each of the compute nodes, where it accepts, monitors, and manages the work delivered to it by the scheduler. While possible, a MOM typically is not run on the cluster head node because the head node is not usually tasked for production work. A diagram of this basic arrangement is present here.

PBS Pro Job Submission Modes

The PBS Pro workload management system is design to serve as a gateway to the resources of the cluster. All jobs (both interactive and batch) submitted through PBS Pro are tracked and placed in a way that efficiently utilizes the resources while keeping potentially competing jobs out of each other's way. The assumption that PBS Pro makes to make optimal decision about job placement depends on the idea that there is no 'back-door' production work submitted to the cluster's compute nodes without PBS Pro's knowledge. When operating as designed this results in better overall throughput for the job mix and better individual job performance. As such on CUNY's HPC systems, all application runs (whether interactive or batch, development or production) should be submitted through PBS Pro. This leaves only code compilation and basic serial testing for the head-node.

Running Batch Jobs with PBS Pro

Submitting a typical batch job for execution under PBS Pro is similar to the procedure under SGE. A user must create a job submission script that includes the set of commands that are to be run on each processor allocated to the job (as noted above this is not required for an interactive job). The script may also include a description of the resources that are required to execute the job. Alternatively, these job requirements may be provided as options on the 'qsub' command line. These command-line options (or submit script #PBS comment-lines) typically include information on the number of processors required, the estimated memory and CPU time required, the name of the job, and the queue into which the job is to be submitted, among other things. The submit script is offered to the Server through the PBS Pro 'qsub' command, which is very similar to the command used to accomplish the same thing in SGE. Jobs destined for the compute nodes that are not submitted through 'qsub' will be killed.

The submit script can contain numerous options. These are described in detail in the PBS Pro 10.0 User Guide, or on-line with 'man qsub'. All options within the submit script to be interpreted by 'qsub' should be placed at the beginning of the file and must be preceded by the special comment string '#PBS'. Options offered on the 'qsub' command-line override script-defined options. The most important PBS Pro options are similar, but not identical to those described for SGE above:

The option to specify the name that will be given to the job:

#PBS -N job_name

The option to specify the queue that the job will be placed in:

#PBS -q queue

A detailed description of the available queues is provided here.

The flag to specify the number and kind of resources required by the job:

#PBS -l select=[resource chunk]

More detail on this very important option is provided in examples below.

The flag to determine how the job's processes are to be distributed across the cluster:

#PBS -l place=[process placement]

The flag to pass the head node environment variables to each compute node process:

#PBS -V 

More information on PBS Pro is available in the PBS Pro 10.0 User Guide here.

Submitting Serial or Scalar Jobs

Serial or scalar jobs (as opposed to multiprocessor jobs) use only one processor. For example, executing the simple UNIX command date requires only one processor and simply returns the current date and time. While date and most other UNIX commands would not typically be run by themselves in a batch job, one or more longer running serial HPC applications are often run this way to avoid tying up a local workstation or as part of a parametric study of some large HPC problem space. Preparing and submitting a serial or scalar job for batch execution requires many of the same steps that are required to submit a more complicated parallel HPC job, and therefore serves as a good basic introduction to batch job submission in PBS Pro.

The following steps are typically required for scalar job submission using PBS Pro:

1. Create a new directory (named scalar for example ) in your home directory and cd to it by executing the commands mkdir scalar; cd scalar

2. Use a text editor (CUNY HPCC suggests vi or vim) to create a submit script file (named scalarjob.sh for example) and insert the following lines in it:

#!/bin/bash
# My scalar test job.
#PBS -q production
#PBS -N scalar.job
#PBS -l select=1:ncpus=1
#PBS -l place=pack
#PBS -V

cd $HOME/scalar

date

Note:

  • The first line declares the shell that will be used to interpret the lines in the script.
  • Everything after just a # is a regular comment.
  • Everything after a #PBS is an option to be interpreted by PBS Pro.

The -l select=1:ncpus=1 option needs some explanation. In PBS Pro, the -l select option specifies the number and kind of resource quantities to be associated with the job. PBS Pro refers to these resource quantities as chunks. The number of chunks defined by the "-l select" option (1 in this case) typically determines the number of processes associated with the job. No resource "chunk" should be greater than a single compute node's resources. The kind of the chunk determines the particular resources that PBS Pro will allocate to the job's processes.

In the simple example above, the -l select option requests 1 chunk that explicitly includes just 1 processor. This is all that this scalar job needs. The date command is not a parallel application and cannot take advantage of multiple processors anyway. Other resources like memory, processor time, applications licenses, or disk space can also be explicitly requested in a chunk. When resources are not requested explicitly, the job is given the default amount for the unrequested resource. Resource defaults are inherited from those defined for the execution queue that the job is finally placed in. More involved examples of the -l select option are given below.

The -l place=pack option is not strictly required for this serial job, but is included for illustration. It requests that all resource chunks and resources requested be allocated from a single compute node. In this case, because we are asking for only 1 processor and have not specified other resources requirements, it will be easy to fulfill this placement request, but if the -l select option had asked for more resources in total than were available on any single node, the job would never run because of insufficient resources. There would be no way to pack the -l select resource request on a single node. In the next example, showing a submit script for a symmetric multiprocessing (SMP) job, the issue of proper resource chunk distribution comes up again.

3. Submit the job to PBS Pro by entering the command qsub scalarjob.sh. If your submit file is correctly constructed, PBS Pro will respond by reporting the job's request ID (59 in this case) followed by the host name of the system that submitted the job:

$qsub scalarjob.sh
59.athena.csi.cuny.edu

The job request ID "59" is a unique numerical ID assigned by the PBS Pro Server to your job.

4. You can check the status of your submitted job with the command qstat, which is similar to its SGE counter part. To get a full listing for your job you can type qstat -f ID. For more detail on the PBS Pro version of 'qstat' please consult the man page, man qstat.

(Note: When you begin the conversion to PBS Pro you should make sure that the PBS Pro commands precede those for SGE in your command search PATH and that the same is true for your MANPATH. PBS Pro resides in the directory /share/apps/pbs/default where both the commands and man pages can be found.)

5. Once the job is finished you may see the job's output in the file scalar.job.o59 which is the job name followed by the job ID number. Errors will be written to scalar.job.e59, if there are problems with your job. If these files for some reason cannot be written your account will receive two email messages with their contents included.

Looking at the output of our submitted job:

# cat scalar.job.o59 
Wed Mar 11 17:15:59 EDT 2009

The output from the date command executed on one of the compute nodes is written there.

Submitting Symmetric Multiprocessing (SMP) Jobs

Symmetric multiprocessing or SMP requires a multiprocessor computer with a unified or common memory architecture. SMP programs use two or more identical processors within a common memory and program space. SMP systems were among the earliest types of multiprocessor HPC architectures, but the design limits the number of processors that work in parallel compared to today's HPC clusters or distributed computing architectures. Yet, CUNY's HPC cluster system compute nodes are themselves small SMP systems with 4 or 8 processors or cores that can cooperative on a program within a single memory space. For example, on Athena, each of its compute node has 4 cores that share 8 Gbytes of common memory. The compute nodes of the newly installed SGI system, Andy, provides 8 Nehalem cores that share 24 Gbytes of common memory. The current trend in microprocessor development away from faster clocks and toward higher on-chip core counts means that the compute nodes of next-generation clusters are likely to have even higher core count SMP systems.

While the limited core count of SMP systems limits their peak performance, their closely coupled memory architecture makes programming them in parallel simpler. OpenMP (not to be confused with OpenMPI) is a compiler-directive based parallel programming model that is commonly used on SMP systems, and it is supported by CUNY's HPCC compilers. It is relatively easy to learn compared to the Message Passing Interface (MPI) parallel programming model, which was design for entire distributed memory systems like CUNY's HPC clusters. Some HPC applications, both commercial and researcher developed, have not been re-written to run outside of the unified memory space of an SMP system or a single cluster compute node.

At CUNY HPCC, Gaussian03 (an application used in computational chemistry) is run in SMP mode only. Gaussian03 is currently installed on Zeus where it can use up to eight processors in SMP-mode on a single Zeus compute node. The serial PBS Pro submit script presented above can be easily modified to run in SMP parallel mode on a single CUNY HPC cluster compute node. The primary differences relate to defining the number of processors and their placement within a single compute node. Looking at an example file, smp.sh:

#!/bin/bash
#
# Script to run a 4-processor SMP Gaussian 03 job
#
#PBS -q production
#PBS -N methanol_g03
#PBS -l select=1:ncpus=4:mem=7680mb
#PBS -l place=pack
#PBS -V 

cd $HOME/methanol

g03 methanol

Most of the options in this SMP submit script are the same as those in the scalar script presented above.

Select the queue into which to place the job:

#PBS -q production

Specify a name for the job:

#PBS -N methanol_g03

Define the number and kind of resource chunks needed by the job:

#PBS -l select=1:ncpus=4:mem=7680mb

Describe how the processes in the job should be distributed across the compute nodes:

#PBS-l place=pack

Export local environment variables to the compute nodes:

#PBS -V 

Change to the working directory (PBS Pro 'qsub' does not have a 'cwd' option):

cd $HOME/methanol

Run the Gaussian 03 methanol job on 4 processors (cores):

g03 methanol 

In the '#PBS' options header section, the primary change is with the '-l select' option. Here, a single chunk is still being requested, but it is now defined to contain 4 processors and 7.68 Gbytes of memory. The '-l place=pack' option is the same as it was in the scalar script, but here it ensures that the processors are confined (packed) to a single compute node. The effect of these changes is to inform the PBS Pro Server that a chunk of 4 processors is needed that in total needs no more than 7.68 Gbytes of physical memory. A careful reader will note that in fact the '-l place=pack' option is unnecessary here because as stated earlier a single resource chunk must always fit within a single physical compute node. Because we have asked for only 1 chunk, the Scheduler must attempt to find the resources requested here on a single node. If it cannot do this, the job will not be run due to insufficient resources--not across the cluster, but within a single compute node.

The compute-node limits apply to the memory requested as well. Here, the memory requested (7680mb) is actually 7680 * 1,048,576 = 8,053,063,680 bytes. This is a good high-water maximum for the 4 processors on a compute node with 8 * 2^30 bytes of memory. The resource extent of a chunk should never be greater than the resource extent of the largest physical compute node in the cluster that the job is to be run on. On a Linux system, the files /proc/cpuinfo and /proc/meminfo are good sources for a compute node's processor and memory configuration. PBS Pro resource chunk defaults have been configured with these values in mind.

Finally, Gaussian, like many other SMP programs, is able to determine the number processors available to it from environmental variables set by PBS Pro with the help of the resources requested in the '-l select' option. No additional processor specification is needed on the 'g03' command-line as it would be with an MPI job. For OpenMP programs, PBS Pro sets the OMP_NUM_THREADS environmental variable to 4 from the 'ncpus=4' setting of the '-l select' option. Additional examples of the interplay between the '-l select' option and OpenMP and SMP applications are provided below.

Unlike SGE, no parallel environment needs to be specified with PBS Pro. This is abstracted from the information provided by '-l select'. As with the scalar job, this SMP job is submitted to the PBS Pro Server and the queuing system using the 'qsub' command:

qsub smp.bash

For more information about Gaussian03 jobs, read the section Gaussian03.

Submitting Parallel Distributed Memory Jobs Using MPI

Just a few modifications to the above SMP script's '#PBS' options and another to the script's command line are required to submit an MPI distributed parallel job to PBS Pro. Distributed parallel programs are by definition designed to make productive use of an arbitrary collection of cluster processors--whether they are within a single compute node or on opposite ends of a cluster's interconnecting switch. Such applications are referred to as distributed because (unlike SMP applications) each cooperating process has its own completely distinct memory space. Communication is completed through two-way, message passing among the private memory spaces managed by each process. The Message Passing Interface (MPI) library is the current standard programming model for this kind of 'cluster' or distributed parallel programming.

Follow the instructions here to submit a basic MPI program written in C to the PBS Pro batch scheduling system:

1. Create a new sub-directory (named "dparallel" for example) in your home directory and then cd to it by executing the following commands:

mkdir dparallel; cd dparallel
.

2. Use a text editor (CUNY HPCC suggests vi or vim) to create a file for the MPI C-code (hello_mpi.c in this example) and put the following lines in it:

/* MPI C Example */
#include <stdio.h>
#include <mpi.h>

int main (int argc, char *argv[])
{
  int rank, size; 

  /* Initialize MPI Run Time Environment */
  MPI_Init (&argc, &argv);

  /* Find processor count and my processor number */
  MPI_Comm_size (MPI_COMM_WORLD, &size);
  MPI_Comm_rank (MPI_COMM_WORLD, &rank);

  printf( "Hello world from process %d of %d\n", rank, size );

  /* Clean up MPI RUn Time*/
  MPI_Finalize();

  return 0;
}

3. Compile this basic MPI code (Intel's MPI distribution, 'impi', is used in this example):

/share/apps/intel/impi/3.1/bin64/mpicc -o ./hello_mpi.exe ./hello_mpi.c

4. Use a text editor (CUNY HPCC suggests vi or vim) to create a PBS Pro submit script file (named mpi_job.sh for example) and insert the following lines in it:

#!/bin/bash
# Simple MPI PBS Pro batch job
#PBS -N testmpi
#PBS -q production
#PBS -l select=16:ncpus=1:mem=1920mb
#PBS -l place=scatter
#PBS -V

cd $HOME/dparallel

# Intel's MPI is used in this example.

/share/apps/intel/impi/3.1/bin64/mpirun -r ssh -machinefile $PBS_NODEFILE -np 16 ./hello_mpi.exe

5. Submit the job ot the PBS Pro Server using qsub mpi_job.sh. You will get the message:

65.athena.csi.cuny.edu

Here 65 is the PBS Pro job request ID and 'athena.csi.cuny.edu' is the name of the system from which the job was submitted. (Note: This is not always the system on which the job is run. PBS Pro can be configured to allow users to queue jobs up on any system in a resource grid.) As with the examples above, this MPI job's status can be checked with the command qstat, or for a full listing of job 65 qstat -f 65.

6. When the job completes its output will be written to the file testmpi.o65:

$
$cat testmpi.o65
Hello world from process 0 of 16
Hello world from process 1 of 16
Hello world from process 4 of 16
Hello world from process 12 of 16
Hello world from process 8 of 16
Hello world from process 2 of 16
Hello world from process 5 of 16
Hello world from process 13 of 16
Hello world from process 9 of 16
Hello world from process 3 of 16
Hello world from process 6 of 16
Hello world from process 14 of 16
Hello world from process 10 of 16
Hello world from process 7 of 16
Hello world from process 15 of 16
Hello world from process 11 of 16

Any errors for this job would have been written to testmpi.e65

The output file contains the messages generated by each of the 16 MPI processes requested in the PBS Pro submit file.

What are the differences in the MPI script-file options? The differences are minor. First, in the '-l select' line the number of chunks has been set to the number of processor that will be used, 16 in this case. The kind of each chunk has also changed from the SMP job. The MPI script is asking for 1 processor and about 2 GBytes of memory in each. We leave it up to the reader to compute the exact amount being requested using the 2^20 multiplier. PBS Pro lists the compute nodes it has allocated in a file whose PATH is given by the $PBS_NODEFILE variable. Here it is provided as an argument to the -machine file option. Some versions of 'mpirun' do not need to be explicitly given the location of the allocated nodes file, but can determine it location from the environment.

The only other real difference is with the '-l place' option. The placement is now set to 'scatter' which forces the PBS Pro scheduler to put each of the 16 chunks associated with each MPI process on a different compute node or execution host as PBS Pro sometimes refers to the compute nodes. In this case, in which 16 processors and chunks are being requested, using the 'pack' option would not work. There are no compute nodes on ATHENA with that much resource. At this point, readers may be asking themselves the question, "How do I get performance efficient packing of my MPI processes?"

Clearly, having more MPI processes on the same compute node should reduce the time to send messages among them. To get closer to an efficient packing scheme, the 'free' option should be used with '-l place. This gives the scheduler control of chunk placement, and by default it has been setup to select nodes with the lowest level processor activity first. These will be nodes with no other jobs currently running. On ATHENA, nodes with 4 free processors will be selected first if 'free' is used. This means that if there are 4 unoccupied compute nodes, the 16 MPI processor job will be packed onto 4 compute nodes. Based on the activity on the machine, other distributions under the 'free' option are possible.

Running 'Interactive' Batch Jobs with PBS Pro

PBS Pro provides a special kind of batch job called interactive-batch. An interactive-batch job is treated just like a regular batch job (in that it is queued up, and has to wait for PBS Pro to provide it resources before it can run). However, once the resources are provided, the user's terminal input and output are connected to the job in a manner similar to an interactive session. The user is logged into one of the processor requested by job, and that processor and the rest of the resources requested (processors and otherwise) are reserved for the job's duration.

Interactive-batch jobs can take a script file like regular batch jobs, but only the '#PBS' options in the header will be read. All script-file commands are ignored. It is assumed that the user will supply their commands interactively after the session has started. As always, the '#PBS' options can also be provided on the 'qsub' command-line. All PBS Pro interactive-batch jobs must include the -I on the 'qsub' command line. The following example starts a 4 processor interactive-batch session packed onto a single compute node (compute-0-2 here) from which a 4 processor MPI job is run.

$qsub -I -q interactive -N intjob -V -l select=4:ncpus=1:mem=1920mb -l place=pack
  qsub: waiting for job 73.athena.csi.cuny.edu to start
  qsub: job 73.athena.csi.cuny.edu ready
$
$
$cd dparallel
$cat $PBS_NODEFILE
compute-0-2
compute-0-2
compute-0-2
compute-0-2
$hostname
compute-0-2
$
$/share/apps/intel/impi/3.1/bin64/mpirun -machinefile $PBS_NODEFILE -np 4 ./hello_mpi.exe
Hello world from process 0 of 4
Hello world from process 1 of 4
Hello world from process 2 of 4
Hello world from process 3 of 4
$
$CNTRL^D

$hostname
athena.csi.cuny.edu
$
$

The 'qsub' options (provided on the command-line in this case) define all that is needed for the job. Here the the '-I' option must be provided on the command-line, and the CUNY HPCC's 'interactive' queue must be selected. When the resources are found by the PBS Pro Scheduler, a shell prompt returns, a $PBS_NODEFILE is created, and the user is logged into one of the compute nodes allocated. The user must change to the directory from which he wishes to submit the job. This is all laid out in the session above. Finally, a 4 processor job is started with the mpirun command from the shell prompt. It runs and returns its output to the terminal. More such jobs could be run if desired, although there is a time limit imposed on the session. The interactive-batch session is terminated by typing a CNTRL^D, which logs the user out of the compute node and returns them to the head node where the job was submitted.

Through the 'interactive' queue, CUNY HPCC has reserved compute nodes resources for interactive-batch jobs only. The 'interactive' queue along with the 'development' queue described below have been created to ensure that some systems resources are always available for code development. More about these and CUNY HPCC's other PBS Pro queues is provided in the next section.

More on PBS Pro 'chunks' and the '-l select' Option

With the examples of options to the 'qsub' command and how to submit jobs to the PBS Pro Server presented above, a more complete description and additional examples will be easier to understand. The general form for specifying the resource 'chunks' to allocated on the compute nodes with the '-l select' option, is as follows:

-l select=[N:]chunk_type1 + [N:]chunk_type1 + ...

Here, the value N gives the number of each type of chunk requested, and each chunk type is defined with a collection of node resource attribute assignments using '=' each attribute separated from the other by a colon, as in:

ncpus=2:mem=1920mb:arch=linux: ...

More than one type of chunk can be defined within a single '-l select' option. Distinct types are added with the '+' sign as shown above. Using multiple chunk types should be a relatively infrequent occurrence at CUNY HPCC. There are many kinds of node resource attributes. Many are built-in to PBS Pro, some are site-defined, they can have a variety of types (long, float, boolean, size, etc.), and they can be consumed when they are requested (ncpus for instance) or not. More detail on the node-specific attributes used to define chunks can be found with 'man pbs_node_attributes' (once you have set you MANPATH correctly to bring up the PBS man pages first).

Once the number and type of chunk is defined, the PBS Pro scheduler maps the resource request onto the available system resources. If the system can physically fulfill the request, and there are not other jobs already using the resources request, the job will run. Jobs with resource requests that are physically impossible to fulfill will never run. Those that cannot be fulfilled because other jobs are using the resources will be queued. To determine exactly what resources you have been given, and whether your job is running or not, and the reason why not if not, use the 'qstat -f JID' command. (Note: When no resource is requested by the user, the default values for the queue that the user's job ends up in are applied to the job. These may not be exactly what is required by the job.)

Below are a number of additional examples of '-l select' resource requests with an explanation of what is being requested. These are more complicated cases to give users the idea of what is possible. They will not necessarily find an obvious use in the context of CUNY's HPCC systems. Users are directed to the PBS Pro 10.0 User Guide [here] for additional examples and a full description of PBS Pro from the users perspective.

Example 1:

-l select=2:ncpus=1:mem=10gb:arch=linux+3:ncpus=2:mem=8gb:arch=solaris

This job requests two chunk types, 2 of one and 3 of the other. The first chunk type is to be placed on a Linux compute nodes, and the second type on a Solaris compute nodes. The first chunk requires nodes with at least 1 processor and 10 GBytes of memory. The second chunk requires nodes with 2 processors and 8 GBytes of memory.

Example 2:

-l select=4:ncpus=4:bigmem=true

This jobs request 4 chunks of the same type, each with 4 processors, and each with the boolean attribute of 'bigmem'. Nodes with 4 processors available and that had been ear-marked by the site as large memory nodes would be selected. A total of 16 processors in all would required.

Example 3:

-l select=4:ncpus=2:lscratch=250gb
-l place=pack

This job also asks for 4 resource chunks of the same type. Each node selected must have 2 processors available and 250 Gbytes of node-local scratch space. The job requests a total of 8 processors and then asked for the resources to be packed on a single node. Unless the system that this job is submitted to has nodes with 8 cores and a total of 1 TByte of local storage (4 x 250 GBytes), this job will not run. The 'lscratch' resources attribute is site-local and is determined by a run-time script.

Example 4:

-l select=3:ncpus=2:mpiprocs=2

This job requests 3 identical resource chunks each with 2 processors for 6 in total. The 'mpiprocs' resource attribute effects how the $PBS_NODEFILE is constructed. It ensures that the $PBS_NODEFILE generated includes two entries for each of the three nodes allocated so that there will be two MPI processes per node. The $PBS_NODEFILE would contain:

node10
node10
node11
node11
node12
node12

Without the 'mpiprocs' attribute there would be only three entries in the file.

Example 5:

-l select=1:ncpus=1:mem=7680mb
-l place=pack:excl

This job requests just 1 chunk for a serial job that requires a large amount of memory. The '-l place=pack:excl' option ensures that when the chunk is allocated no other job will be able to allocate resource chunks to that note. This will perhaps idle some processors on that node, but make the memory that it has entirely available to this job. The 'pack:excl' states for pack exclusively (e.g. do not allow sharing of the nodes resources).

Example 6:

-l select=4:ncpus=2:ompthreads=4
-l place=pack:excl

This job is configured to run a hybrid MPI and OpenMP code. The number of processors explicitly requested in total is 8. The $PBS_NODEFILE would include 4 nodes, one for each chunk (one for each MPI process). Each node would need to have at least 2 processors on it, but could have more. If they had 4 processors, then OpenMP would start 4 threads, one on each physical processor. If they had only 2, then OpenMP would still run 4 threads on each node, but they would compete with each other. This would suite a hyper-threaded processor like the Intel Nehalem. The '-l place=pack:excl' options ensures that in the case where the nodes allocated to the job have 4 processors, no other jobs will be placed there to compete with the 4 OpenMP threads.

PBS Pro Queue Structure

As with the SGE batch queueing system, CUNY HPCC has designed its PBS Pro queue structure to efficiently map batch jobs to HPCC system resources. PBS Pro has two distinct types of queues, execution queues and routing queues. Jobs submitted to routing queues are directed to their associated execution queues based on the resources requested by the job. Resources are assigned either explicitly by the user with the '-l select' option or implicitly through the server and queue defaults. Routing queues are usually tied to general job-related features such as production jobs, debug jobs, etc. Jobs in each general class are then sorted according to their resource requirements and placed in the appropriate execution queue. This is the case at CUNY HPCC. The routing queues that users should be aware of include:

Routing Queue 1:

interactive          ::  A development and debug queue for small scale and short batch-interactive jobs.

Routing Queue 2:

development      ::  A development and debug queue for small scale and short batch jobs.

Routing Queue 3:

production         ::  A production for production work of any scale and length.


Other routing queues have been defined for reservations, dedicated time, idle cycles, and rush jobs. These are currently disabled, but will be activated as CUNY HPCC develops its 24 x 7 scheduling policy on each of its HPC systems.

Choosing the right routing queue from the above 3 is important because resources have been reserved for each class of work and limits have been set on the resources available in each queue. For instance, jobs submitted to the interactive routing queue are limited to 4 or fewer processors and can run for not more than a maximum of 16 processor minutes in total. Routing queue 3 above, the production queue, will accept a job of virtually any size and duration, and move it to one of the execution queues defined below.

At CUNY HPCC, one of the 3 currently active routing queues above must be the queue name used with the '-q' option to the PBS Pro 'qsub' command, as in '-q production'. As defined at CUNY HPCC, jobs can ONLY be submitted to routing queues. From there they are mapped to one of the following execution queues based on the resources requested:

Execution Queue 1:

qint4         ::  A queue limited to interactive work of not more than 4 processors and 16 total processor minutes

Execution Queue 2:

qdev8        ::  A queue limited to batch development work of not more than 8 processors and 32 total processor minutes

Execution Queue 3:

qserial       ::  A queue limited to batch production work of not more than 1 processor and 12 total processor hours

Execution Queue 4:

qshort16    ::  A queue limited to batch production work of between 2 to 16 processors, and fewer than 16 total processor hours

Execution Queue 5:

qlong16     ::  A queue limited to batch production work of between 2 and 16 processors, and more than 16 total processor hours

Execution Queue 6:

qshort64    ::  A queue limited to batch production work of between 17 and 64 processors, and fewer than 64 total processor hours

Execution Queue 7:

qlong64    ::   A queue limited to batch production work of between 17 and 64 processors, and more than 64 total processor hours

Execution Queue 8:

qmax        ::  A queue limited to batch production work of between 65 and 256 processors, no time limits

As you can see from the resources limits, these queues are designed to contiguously pack the resource request space. Jobs submitted to the routing queues will be sorted according to the resources requested on the '-l select' option and placed in the appropriate execution queues. The job's memory requirements are also considered. The memory of a given system's compute node is divided proportionately among the cores available on the node. This value is the default presumed requested for each resource chunk unless otherwise specified. Each execution queue limits the amount memory available to this fraction of the node times the processor (core) count of the job, up to the processor (core) limit for the queue.

Each execution queue has a priority set according to the prevailing usage pattern on each system. Currently, this priority scheme slightly favors jobs that are between 8 and 16 processors in size. Still, a job's priority is dependent on more than the priority of the execution queue that it ends up in. As it accumulates time in the queue, its priority rises and this new priority is used at the next scheduling cycle. Furthermore, the current CUNY HPCC PBS Pro configuration has backfilling enable, so that some smaller jobs with lower priority may be started if there is not space enough to run queued larger jobs. This 'priority-formula' based approach to job scheduling may be supplanted by a 'fair-share' approach in the future.

As our familiarity with PBS Pro grows and as the needs of our user community evolve it is likely that the queue structure will be refined and augmented. There may be a need to create additional queues for specific applications for instance. User comments regarding the queue structure are welcome and should be sent to hpchelp@mail.csi.cuny.edu.

Parallel Programming Using Open MPI

MPI is a language-independent communications protocol used to program parallel computers. Both point-to-point and collective communication is supported. MPI "is a message-passing application programmer interface, together with protocol and semantic specifications for how its features must behave in any implementation." MPI's goals are high performance, scalability, and portability. MPI remains the dominant model used in high-performance computing today.

MPI is not sanctioned by any major standards body; nevertheless, it has become a de facto standard for communication among processes that model a parallel program running on a distributed memory system. Actual distributed memory supercomputers such as computer clusters often run these programs. The principal MPI-1 model has no shared memory concept, and MPI-2 has only a limited distributed shared memory concept. Nonetheless, MPI programs are regularly run on shared memory computers. Designing programs around the MPI model (as opposed to explicit shared memory models) has advantages on NUMA architectures since MPI encourages memory locality.

Open MPI is a project combining technologies and resources from several other projects (FT-MPI, LA-MPI, LAM/MPI, and PACX-MPI) with the stated aim of building the best Message Passing Interface (MPI) library available. Open MPI represents the merger between three well-known MPI implementations:

  • FT-MPI from the University of Tennessee
  • LA-MPI from Los Alamos National Laboratory
  • LAM/MPI from Indiana University

with contributions from the PACX-MPI team at the University of Stuttgart. These four institutions comprise the founding members of the Open MPI development team.

These MPI implementations were selected because the Open MPI developers thought that they excelled in one or more areas. The stated driving motivation behind Open MPI is to bring the best ideas and technologies from the individual projects and create one world-class open source MPI implementation that excels in all areas. The Open MPI project names several top-level goals:

  • Create a free, open source software, peer-reviewed, production-quality complete MPI-2 implementation.
  • Provide extremely high, competitive performance (low latency or high bandwidth).
  • Directly involve the high-performance computing community with external development and feedback (vendors, 3rd party researchers, users, etc.).
  • Provide a stable platform for 3rd party research and commercial development.
  • Help prevent the "forking problem" common to other MPI projects.
  • Support a wide variety of high-performance computing platforms and environments.

LAM/MPI is no longer supported by the MPI consortia and is no longer available on the CUNY HPCC systems as it has been replaced with Open MPI.

Open MPI may be used to run jobs compiled with the GNU, Intel, and PGI compilers. Two simple MPI programs, one written in C and another in Fortran are shown as examples.


Example 1. C Example (hello.c)
#include <stdio.h>
#include <mpi.h>

int main (argc, argv)
int argc;
char *argv[];
{
 int rank, size;

 MPI_Init (&argc, &argv);    /* starts MPI */
 /* get current process id */
 MPI_Comm_rank (MPI_COMM_WORLD, &rank);
 /* get number of processes */
 MPI_Comm_size (MPI_COMM_WORLD, &size);
 printf( "Hello world from process %d of %d\n", rank, size );
 MPI_Finalize();
 return 0;
}


Example 2. Fortran example (hello.f90)
program hello
include 'mpif.h'
integer rank, size, ierror, tag, status(MPI_STATUS_SIZE)
   
call MPI_INIT(ierror)
call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror)
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror)
print*, 'node', rank, ': Hello world'
call MPI_FINALIZE(ierror)
end


Instructions for compilation and submit script are provided below:

Intel C

Compilation:

/share/apps/openmpi/bin/mpicc -o exe ./hello.c 

Submit script:

#!/bin/bash
#$ -q all.q
#$ -N test
#$ -pe mpi 8
#$ -cwd

/share/apps/openmpi/bin/mpirun --hostfile $TMPDIR/machines -np $NSLOTS exe 

Output:

Hello world from process 2 of 8
Hello world from process 3 of 8
Hello world from process 4 of 8
Hello world from process 1 of 8
Hello world from process 6 of 8
Hello world from process 0 of 8
Hello world from process 5 of 8
Hello world from process 7 of 8

Intel FORTRAN

Compilation

/share/apps/openmpi/bin/mpif90 -o exe ./hello.f90 

Submit script:

#!/bin/bash
#$ -q all.q
#$ -N test
#$ -pe mpi 8
#$ -cwd

/share/apps/openmpi/bin/mpirun --hostfile $TMPDIR/machines -np $NSLOTS exe

Output:

node           2 : Hello world
node           0 : Hello world
node           4 : Hello world
node           5 : Hello world
node           1 : Hello world
node           6 : Hello world
node           3 : Hello world
node           7 : Hello world

PGI C

Compilation:

/share/apps/openmpi-pgi/bin/mpicc -o exe ./hello.c 

Submit script

#!/bin/bash
#$ -q all.q
#$ -N test
#$ -pe mpi 8
#$ -cwd

/share/apps/openmpi-pgi/bin/mpirun --hostfile $TMPDIR/machines -np $NSLOTS exe

Output:

Hello world from process 0 of 8
Hello world from process 4 of 8
Hello world from process 3 of 8
Hello world from process 5 of 8
Hello world from process 1 of 8
Hello world from process 6 of 8
Hello world from process 2 of 8
Hello world from process 7 of 8 

PGI FORTRAN

Compilation

/share/apps/openmpi-pgi/bin/mpif90 -o exe ./hello.f90 

Submit script

#!/bin/bash
#$ -q all.q
#$ -N test
#$ -pe mpi 8
#$ -cwd

/share/apps/openmpi-pgi/bin/mpirun --hostfile $TMPDIR/machines -np $NSLOTS exe

Output

node           1: Hello world
node           2: Hello world
node           4: Hello world
node           0: Hello world
node           6: Hello world
node           3: Hello world
node           5: Hello world
node           7: Hello world

GNU C

Compilation

/opt/openmpi/bin/mpicc -o exe ./hello.c

Submit script

#!/bin/bash
#$ -q all.q
#$ -N test
#$ -pe mpi 8
#$ -cwd

/opt/openmpi/bin/mpirun --hostfile $TMPDIR/machines -np $NSLOTS exe

Output

Hello world from process 0 of 8
Hello world from process 4 of 8
Hello world from process 3 of 8
Hello world from process 5 of 8
Hello world from process 1 of 8
Hello world from process 6 of 8
Hello world from process 2 of 8
Hello world from process 7 of 8 

GNU FORTRAN

Compilation
/opt/openmpi/bin/mpif90 -o exe ./hello.f90

Submit script

#!/bin/bash
#$ -q all.q
#$ -N test
#$ -pe mpi 8
#$ -cwd

/opt/openmpi/bin/mpirun --hostfile $TMPDIR/machines -np $NSLOTS exe 

Output

node           0 : Hello world
node           3 : Hello world
node           4 : Hello world
node           1 : Hello world
node           7 : Hello world
node           2 : Hello world
node           5 : Hello world
node           6 : Hello world

Application Programs

Amsterdam Density Functional Theory (ADF)

ADF is a quantum chemistry software package based on Density Functional Theory (DFT). It consists of the molecular DFT code ADF, the periodic DFT code BAND, and the post-ADF COSMO-RS program for liquid thermodynamics, as well as graphical user interfaces (GUI) for these engines and source code for ADF and BAND. The GUI runs on the user's desktop computer under a separate user license and is not installed on the HPC systems. Additional information on ADF can be found at http://www.scm.com/ Use of ADF is subject to the provisions of the SCM End User License Agreement (http://www.scm.com/Sales/LicAgreement.html).

ADF is licensed to run on a maximum of 32 cores. ADF is supported in batch mode only.

Usage

To use ADF on the CUNY HPC system, prepare two scripts: a "job script" that has a set of ADF instructions and a "send script" that contains instructions for scheduler (SGE).

  • Use a text editor and create a "job script" with the following content (this is just an example).
#! /bin/sh

$ADFBIN/adf << eor
TITLE HF

ATOMS
 1. H  .0000  .0000  .0000
 2. F  .0000  .0000  0.917
End

Basis
End

End input
eor

$ADFBIN/adf2aim TAPE21
echo 'Contents of rdt21.res:'
cat rdt21.res | grep -v RunTime
echo 'Contents of WFN:'
cat WFN | grep -v RunTime

mv TAPE21 HF_restricted.t21
rm logfile rdt21.res WFN

$ADFBIN/adf << eor
TITLE HF

ATOMS
 1. H  .0000  .0000  .0000
 2. F  .0000  .0000  0.917
End

UNRESTRICTED
CHARGE 0 0

Basis
End

End input
eor


$ADFBIN/adf2aim TAPE21 <<eor
y
eor

mv TAPE21 HF_unrestricted.t21

echo 'Contents of rdt21.res:'
cat rdt21.res| grep -v RunTime
echo 'Contents of WFN-alpha:'
cat WFN-alpha| grep -v RunTime
echo 'Contents of WFN-beta:'
cat WFN-beta| grep -v RunTime

Save your script in a file adfjobscript.run. Make the script executable using chmod +x adfjobscript.run.

  • Use your favorite text editor and create a "send script" with the following content (this particular example uses 4 cores (-pe mpi 4 and export NSCM=4), but up to 32 is allowed:
#!/bin/bash
#$ -S /bin/sh
#$ -N JOB_NAME
#$ -q PD16.q
#$ -pe mpi 4
#$ -cwd

#Comment:  asking for 4 cpus
export NSCM=4
export MPI_IC_ORDER="TCP"

./adfjobscript.run

In this example we save this text in a file send. Submit the job executing command qsub send.

  • The outputs will be stored in file JOB_NAME.o****.

BEST

BEST is a free phylogenetics program written by Liang Liu to estimate the joint posterior distribution of gene trees and species tree using multilocus molecular data that accounts for deep coalescence but not for other issues such as horizontal transfer or gene duplication. The program works within the popular Bayesian phylogenetics package MrBayes (Ronquist and Huelsenbeck, Bioinformatics, 2003). BEST parameters are defined using the prset command in MrBayes.

For more information, click here.

Running BEST with one core

1. Prepare a job script(bglobin.nex).The job script must be in NEXUS file format. Please read section MrBayes for more information.

#NEXUS

begin data;
   dimensions ntax=17 nchar=432;
   format datatype=dna missing=?;
   matrix
   human       ctgactcctgaggagaagtctgccgttactgccctgtggggcaaggtgaacgtggatgaagttggtggtgaggccctgggcaggctgctggtggtctacccttggacccagaggttctttgagtcctttggggatctgtccactcctgatgctgttatgggcaaccctaaggtgaaggctcatggcaagaaagtgctcggtgcctttagtgatggcctggctcacctggacaacctcaagggcacctttgccacactgagtgagctgcactgtgacaagctgcacgtggatcctgagaacttcaggctcctgggcaacgtgctggtctgtgtgctggcccatcactttggcaaagaattcaccccaccagtgcaggctgcctatcagaaagtggtggctggtgtggctaatgccctggcccacaagtatcac
   tarsier     ctgactgctgaagagaaggccgccgtcactgccctgtggggcaaggtagacgtggaagatgttggtggtgaggccctgggcaggctgctggtcgtctacccatggacccagaggttctttgactcctttggggacctgtccactcctgccgctgttatgagcaatgctaaggtcaaggcccatggcaaaaaggtgctgaacgcctttagtgacggcatggctcatctggacaacctcaagggcacctttgctaagctgagtgagctgcactgtgacaaattgcacgtggatcctgagaatttcaggctcttgggcaatgtgctggtgtgtgtgctggcccaccactttggcaaagaattcaccccgcaggttcaggctgcctatcagaaggtggtggctggtgtggctactgccttggctcacaagtaccac
   bushbaby    ctgactcctgatgagaagaatgccgtttgtgccctgtggggcaaggtgaatgtggaagaagttggtggtgaggccctgggcaggctgctggttgtctacccatggacccagaggttctttgactcctttggggacctgtcctctccttctgctgttatgggcaaccctaaagtgaaggcccacggcaagaaggtgctgagtgcctttagcgagggcctgaatcacctggacaacctcaagggcacctttgctaagctgagtgagctgcattgtgacaagctgcacgtggaccctgagaacttcaggctcctgggcaacgtgctggtggttgtcctggctcaccactttggcaaggatttcaccccacaggtgcaggctgcctatcagaaggtggtggctggtgtggctactgccctggctcacaaataccac
   hare        ctgtccggtgaggagaagtctgcggtcactgccctgtggggcaaggtgaatgtggaagaagttggtggtgagaccctgggcaggctgctggttgtctacccatggacccagaggttcttcgagtcctttggggacctgtccactgcttctgctgttatgggcaaccctaaggtgaaggctcatggcaagaaggtgctggctgccttcagtgagggtctgagtcacctggacaacctcaaaggcaccttcgctaagctgagtgaactgcattgtgacaagctgcacgtggatcctgagaacttcaggctcctgggcaacgtgctggttattgtgctgtctcatcactttggcaaagaattcactcctcaggtgcaggctgcctatcagaaggtggtggctggtgtggccaatgccctggctcacaaataccac
   rabbit      ctgtccagtgaggagaagtctgcggtcactgccctgtggggcaaggtgaatgtggaagaagttggtggtgaggccctgggcaggctgctggttgtctacccatggacccagaggttcttcgagtcctttggggacctgtcctctgcaaatgctgttatgaacaatcctaaggtgaaggctcatggcaagaaggtgctggctgccttcagtgagggtctgagtcacctggacaacctcaaaggcacctttgctaagctgagtgaactgcactgtgacaagctgcacgtggatcctgagaacttcaggctcctgggcaacgtgctggttattgtgctgtctcatcattttggcaaagaattcactcctcaggtgcaggctgcctatcagaaggtggtggctggtgtggccaatgccctggctcacaaataccac
   cow         ctgactgctgaggagaaggctgccgtcaccgccttttggggcaaggtgaaagtggatgaagttggtggtgaggccctgggcaggctgctggttgtctacccctggactcagaggttctttgagtcctttggggacttgtccactgctgatgctgttatgaacaaccctaaggtgaaggcccatggcaagaaggtgctagattcctttagtaatggcatgaagcatctcgatgacctcaagggcacctttgctgcgctgagtgagctgcactgtgataagctgcatgtggatcctgagaacttcaagctcctgggcaacgtgctagtggttgtgctggctcgcaattttggcaaggaattcaccccggtgctgcaggctgactttcagaaggtggtggctggtgtggccaatgccctggcccacagatatcat
   sheep       ctgactgctgaggagaaggctgccgtcaccggcttctggggcaaggtgaaagtggatgaagttggtgctgaggccctgggcaggctgctggttgtctacccctggactcagaggttctttgagcactttggggacttgtccaatgctgatgctgttatgaacaaccctaaggtgaaggcccatggcaagaaggtgctagactcctttagtaacggcatgaagcatctcgatgacctcaagggcacctttgctcagctgagtgagctgcactgtgataagctgcacgtggatcctgagaacttcaggctcctgggcaacgtgctggtggttgtgctggctcgccaccatggcaatgaattcaccccggtgctgcaggctgactttcagaaggtggtggctggtgttgccaatgccctggcccacaaatatcac
   pig         ctgtctgctgaggagaaggaggccgtcctcggcctgtggggcaaagtgaatgtggacgaagttggtggtgaggccctgggcaggctgctggttgtctacccctggactcagaggttcttcgagtcctttggggacctgtccaatgccgatgccgtcatgggcaatcccaaggtgaaggcccacggcaagaaggtgctccagtccttcagtgacggcctgaaacatctcgacaacctcaagggcacctttgctaagctgagcgagctgcactgtgaccagctgcacgtggatcctgagaacttcaggctcctgggcaacgtgatagtggttgttctggctcgccgccttggccatgacttcaacccgaatgtgcaggctgcttttcagaaggtggtggctggtgttgctaatgccctggcccacaagtaccac
   elephseal   ttgacggcggaggagaagtctgccgtcacctccctgtggggcaaagtgaaggtggatgaagttggtggtgaagccctgggcaggctgctggttgtctacccctggactcagaggttctttgactcctttggggacctgtcctctcctaatgctattatgagcaaccccaaggtcaaggcccatggcaagaaggtgctgaattcctttagtgatggcctgaagaatctggacaacctcaagggcacctttgctaagctcagtgagctgcactgtgaccagctgcatgtggatcccgagaacttcaagctcctgggcaatgtgctggtgtgtgtgctggcccgccactttggcaaggaattcaccccacagatgcagggtgcctttcagaaggtggtagctggtgtggccaatgccctcgcccacaaatatcac
   rat         ctaactgatgctgagaaggctgctgttaatgccctgtggggaaaggtgaaccctgatgatgttggtggcgaggccctgggcaggctgctggttgtctacccttggacccagaggtactttgatagctttggggacctgtcctctgcctctgctatcatgggtaaccctaaggtgaaggcccatggcaagaaggtgataaacgccttcaatgatggcctgaaacacttggacaacctcaagggcacctttgctcatctgagtgaactccactgtgacaagctgcatgtggatcctgagaacttcaggctcctgggcaatatgattgtgattgtgttgggccaccacctgggcaaggaattcaccccctgtgcacaggctgccttccagaaggtggtggctggagtggccagtgccctggctcacaagtaccac
   mouse       ctgactgatgctgagaagtctgctgtctcttgcctgtgggcaaaggtgaaccccgatgaagttggtggtgaggccctgggcaggctgctggttgtctacccttggacccagcggtactttgatagctttggagacctatcctctgcctctgctatcatgggtaatcccaaggtgaaggcccatggcaaaaaggtgataactgcctttaacgagggcctgaaaaacctggacaacctcaagggcacctttgccagcctcagtgagctccactgtgacaagctgcatgtggatcctgagaacttcaggctcctaggcaatgcgatcgtgattgtgctgggccaccacctgggcaaggatttcacccctgctgcacaggctgccttccagaaggtggtggctggagtggccactgccctggctcacaagtaccac
   hamster     ctgactgatgctgagaaggcccttgtcactggcctgtggggaaaggtgaacgccgatgcagttggcgctgaggccctgggcaggttgctggttgtctacccttggacccagaggttctttgaacactttggagacctgtctctgccagttgctgtcatgaataacccccaggtgaaggcccatggcaagaaggtgatccactccttcgctgatggcctgaaacacctggacaacctgaagggcgccttttccagcctgagtgagctccactgtgacaagctgcacgtggatcctgagaacttcaagctcctgggcaatatgatcatcattgtgctgatccacgacctgggcaaggacttcactcccagtgcacagtctgcctttcataaggtggtggctggtgtggccaatgccctggctcacaagtaccac
   marsupial   ttgacttctgaggagaagaactgcatcactaccatctggtctaaggtgcaggttgaccagactggtggtgaggcccttggcaggatgctcgttgtctacccctggaccaccaggttttttgggagctttggtgatctgtcctctcctggcgctgtcatgtcaaattctaaggttcaagcccatggtgctaaggtgttgacctccttcggtgaagcagtcaagcatttggacaacctgaagggtacttatgccaagttgagtgagctccactgtgacaagctgcatgtggaccctgagaacttcaagatgctggggaatatcattgtgatctgcctggctgagcactttggcaaggattttactcctgaatgtcaggttgcttggcagaagctcgtggctggagttgcccatgccctggcccacaagtaccac
   duck        tggacagccgaggagaagcagctcatcaccggcctctggggcaaggtcaatgtggccgactgtggagctgaggccctggccaggctgctgatcgtctacccctggacccagaggttcttcgcctccttcgggaacctgtccagccccactgccatccttggcaaccccatggtccgtgcccatggcaagaaagtgctcacctccttcggagatgctgtgaagaacctggacaacatcaagaacaccttcgcccagctgtccgagctgcactgcgacaagctgcacgtggaccctgagaacttcaggctcctgggtgacatcctcatcatcgtcctggccgcccacttcaccaaggatttcactcctgactgccaggccgcctggcagaagctggtccgcgtggtggcccacgctctggcccgcaagtaccac
   chicken     tggactgctgaggagaagcagctcatcaccggcctctggggcaaggtcaatgtggccgaatgtggggccgaagccctggccaggctgctgatcgtctacccctggacccagaggttctttgcgtcctttgggaacctctccagccccactgccatccttggcaaccccatggtccgcgcccacggcaagaaagtgctcacctcctttggggatgctgtgaagaacctggacaacatcaagaacaccttctcccaactgtccgaactgcattgtgacaagctgcatgtggaccccgagaacttcaggctcctgggtgacatcctcatcattgtcctggccgcccacttcagcaaggacttcactcctgaatgccaggctgcctggcagaagctggtccgcgtggtggcccatgccctggctcgcaagtaccac
   xenlaev     tggacagctgaagagaaggccgccatcacttctgtatggcagaaggtcaatgtagaacatgatggccatgatgccctgggcaggctgctgattgtgtacccctggacccagagatacttcagtaactttggaaacctctccaattcagctgctgttgctggaaatgccaaggttcaagcccatggcaagaaggttctttcagctgttggcaatgccattagccatattgacagtgtgaagtcctctctccaacaactcagtaagatccatgccactgaactgtttgtggaccctgagaactttaagcgttttggtggagttctggtcattgtcttgggtgccaaactgggaactgccttcactcctaaagttcaggctgcttgggagaaattcattgcagttttggttgatggtcttagccagggctataac
   xentrop     tggacagctgaagaaaaagcaaccattgcttctgtgtgggggaaagtcgacattgaacaggatggccatgatgcattatccaggctgctggttgtttatccctggactcagaggtacttcagcagttttggaaacctctccaatgtctccgctgtctctggaaatgtcaaggttaaagcccatggaaataaagtcctgtcagctgttggcagtgcaatccagcatctggatgatgtgaagagccaccttaaaggtcttagcaagagccatgctgaggatcttcatgtggatcccgaaaacttcaagcgccttgcggatgttctggtgatcgttctggctgccaaacttggatctgccttcactccccaagtccaagctgtctgggagaagctcaatgcaactctggtggctgctcttagccatggctacttc
   ;
end;

begin mrbayes;
   charset non_coding = 1-90 358-432;
   charset coding     = 91-357;
   partition region = 2:non_coding,coding;
   set partition = region;
   lset applyto=(2) nucmodel=codon;
   prset ratepr=variable;
   mcmc ngen=5000 nchains=1 samplefreq=10;
end;


2. Create a submit script(for example, submit) as follows.

#!/bin/bash
#$ -q PP16.q
#$ -N test
#$ -cwd

/share/apps/best-mpi/mbbest bglobin.nex

Note: Add a newline to the end of the file as a line terminator, otherwise the program will NOT run correctly.


3. To submit the job, enter the command by typing "qsub submit".

qsub submit


4. Output can be found in file test.o****.

Running BEST with two or more cores

1. Prepare a job script(avian_ovomucoids.nex), and put the following in it.

#NEXUS 
begin data;
	dimensions ntax=89 nchar=88;
	format datatype=protein missing=? gap=- matchchar=.;

	matrix
	[                                        10        20        30        40        50        60        70        80       ]
	[                                        .         .         .         .         .         .         .         .        ]
	Struthio_camelus                VKYPNTNEEGKEVVLPKILSPIGSDGVYSNELANIEYTNVSK??????FAT--VDDYKPVPLDYMLDSKTSNKNNVVESSGTLRHFGK
	Rhea_americana                  .............L..E..N.V.T................?.D?????...--...H...S.E..........D.....N...S....
	Pterocnemia_pennata             .............L..E..N.V.A..................DHD?EV...--...H...S.E..........D.....N...S....
	Casuarius_casuarius             ........D....L.....N.........DD......A....DHDKEV...--..E....SPE.......N..DS....N...G....
	Dromaius_novaehollandiae        ........D....L.....N..........D......A..??D?????...--.......S.E.......N..D.....N...G....
	Nothoprocta_cinerascens         .....A.D.....P...TP...A.NA.FGS....V....I..DHDK?????T-..G...AT.E.F..NQ.A..A....KNV....L..
	Eudromia_elegans                .R.....D.....P...TP..V.AN....S....V....I?.?????????S-I.G...AT.EFF..NQ....A.A..KNV..N.I.E
	Pygoscelis_adeliae_f            .TF..........LVT.......T..................DHDKEVI..--.......S.E..............D.N...S....
	Pygoscelis_adeliae_y            .T...........LVT.......T..................DHDKEVI..--.......S.E..............D.N...S....
	Spheniscus_humboldti            .T.S.........LIT.......T..................D?DKEVI..--I......S.E..............D.N.I.S....
	Phalacrocorax_sulcirostris      .S.SK.......ALVT.......T..............KI..DHDKEVI..--.......S.E.............AD.N...S....
	Anhinga_novaehollandeae         .L.S.........LVT.......T................T.DHDKEVI.S--.......S.E..............D.N...S....
	Nycticorax_nycticorax           .T.S.A....R..LVT.......A..........M....I..DHDGEVIV.--.......SPEN.V.......D..AD.N...S....
	Chauna_chavaria                 .R...........L.T.T.....T..................DRDKEAV..--......AT.E....NQ....S...D.N...S....
	Anseranas_semipalmata           .R...S.......L.T.D...................A....DHDKEAV..--..E...AT.E....NQ........D.N...S....
	Dendrocygna_arcuata             .RF..........L.T.E...V.................I..D?DKEAV..--......AT.E....N..G......D.N...S....
	Dendrocygna_autumnalis          .RF..........L.T.D.....................I..DHDKEAV..--......AT.E....N..G......D.N...S....
	Dendrocygna_eytoni_d            .RF..........L.T.DVI.V............L....I..DHDKEAV..--....R.DT.E....N..G......D.N...S....
	Dendrocygna_eytoni_e            .RF..........L.T.DVI.V............L....I..DHDKEAV..--..E.R.DT.E....N..G......D.N...S....
	Dendrocygna_viduata             .RFS.........L.T.E...V.................I..D?D?EAV..--......AT.E....N..G.R....D.N...S....
	Coscoroba_coscoroba             ..F..........L.T.D.I...T...............I..DHDKEAV..--..G...ATME....N..G......D.N...S....
	Cygnus_atratus                  .RF..........L.T.D.....T...............I..DHDKEAV..--......ATME....N..G......D.N...S....
	Goose                           .RF..........L.T.D.....T...............I..DHDKEAV..--......ATVE....N..D......D.N...S....
	Anser_indicus                   .RF..........L.T.D.A...T...............I..DHDKEAV..--......ATVE....N..D......D.N...S....
	Branta_canadensis               .RF.......R..L.T.D.....T...............I..DHD???V..--......ATVE....N..G......D.N...S....
	Cereopsis_novaehollandiae       ..F..........L...DVI.T.T...............I..D?D??AV..--......ARME....N..G......D.N...S....
	Chloephaga_picta                .RF..........L.T.E.....T...............I..D??KEAV..--..G...ATME....N..G......D.N...S...E
	Duck                            .RF..........L.T.E...V.T...............I..DHDKEAV..--..G...ATME....N..G......D.N...S...E
	Anas_platyrhynchos              .RF........D.L.T.E...V.T...............I..DHDKEAV..--..G...ATME....N..G......D.N...S...E
	Megapodius_freycinet            .R...........LVTQDV?...T....?....G...??I????????LV.--......ST.EDK..NQ....S...D.N...S....
	Leipoa_ocellata                 IRH..........LVTEDS....T...............I..E?DK??VV.--..G.THAT.ELK..NQ....G..AQ.N...S....
	Ortalis_vetula                  ...........D.LA.EDPNL.......T-.......???????????..PN-...H..ALQEQKI.N..D..S...D.N...S....
	Penelope_jacquacu               ...........D.LA.EDP.........T-.........I..ERDKEA..PN-...H..ALQEQK..N..D..S...D.N...S....
	Penelope_superciliaris          ...........D.LVAEDP....................I..E?DKEA..PN-...H..ALQEQK..N..D..S...D.N...S....
	Bonasa_umbellus                 .RF........V.LV.EDPR...T.A.....M.......I..EHD???L.AS-..E...ATME.R..N..G........N.N.S...T
	Tympanuchus_cupido              .RF........D.LVTED.H...T...............I..EHD???L.AS-..E...ATME.R..N..G........N...S....
	Oreortyx_pictus                 .RF........D.LAT.E.H...T........S......I..EHDTEA..AS-..E...AT.E.R.....A........N...S....
	Callipepla_squamata_n           .RF........D.LAT.E.H...T........Y......I..EHD??A..AS-..E...DT.E.R..N..A........N...S....
	Callipepla_squamata_s           .RF........D.LAT.E.H...T........Y......I..EHD??A..AS-..E...DT.E.R..N..AS.......N...S....
	Lophortyx_californicus          .RF........D.LVT.E.Q...T........Y......I..EHD?EA..AS-..E...AT.E.R..N..A........N...S....
	Colinus_virginianus             .RF........D.LATEE.H...T....MS.MF......T..EHDTEA..AS-..E...AMSE.R..N..V........N...S....
	Cyrtonyx_montezumae_l           .RF........D.LVTEEV....T........S..A.?.I.?E?D???..AS-..E...AT.E.VI.N..G........N...S....
	Cyrtonyx_montezumae_s           .RF........D.LVTEEV....T........S..A.?.I.?E?D???..AS-..E...ATSE.VI.N..G........N...S....
	Alectoris_chukar                ARF..A.....D..VTED.R...T....T-.........I..EHDGETL.A--..E...AT.E.R.....G........N...S....
	Alectoris_rufa                  ARF..A.....D..VTED.H...T....T-.........I..EHD???L.A--..E...AT.E.R.....G........N...S....
	Francolinus_afer                .RF..A....RD..VSEN.R...TH........SM....I..EHDREAP.AS-..E...ATME.RV.NI.G......K.N...S....
	Francolinus_erckelii            .RF..A.....D.AVSEN.R...T...N-.....M....I??EHD?EAP.AS-..E...ATME.RV.NI.G......K.N...S.K..
	Francolinus_coqui_v             .RF..A....RD..VSEN.R...T.........SMN...I..E?D?EA???S-..E...GTME.RV.NI.G......K.N...S....
	Francolinus_coqui_a             .RF..A....RD.AVSEN.R...T.........SMN...I..E?D?EA???S-..E...GTME.RV.NI.G......K.N...S....
	Francolinus_francolinus_a       ARF........V.LDS.D.I...T..LHDS..S...H.KIK.EHDRE????S-..G...ETAEET..N..R........N........
	Francolinus_francolinus_v       .RF........V.LDS.D.I...T..LHDS..S...H.KIK.EHDRE????S-..G...ETAEET..N..R........N........
	Francolinus_pondicerianus       ARFS.A.....D.LVIDDPR.M.T....DS..F.M....I..EHD???LPAS-..E...DTTEER..N..G........N...S....
	Perdix_perdix                   .RF........D.LVTED.Q...T...............I..EHT???L.AS-..E...ATME.R..N..G..D.....N...N....
	Coturnix_delegorguei            .RF........DE.V.DE.RF..T....NH.MF.K....I..EQDGET???S-..E...A.K..RV.N..G........N...NR...
	Coturnix_coturnix_japonica_1    .RF........DE.V.DE.RL..T....NH.MF.K....I..EQDGETL.AS-..E...A.K..RV.N...........N...N....
	Coturnix_coturnix_japonica_2    .RF........DE.V.DE.RL..T....NH.MF.K....I..EQDGETL.AS-..E...A.K..RV.N..G........N...N....
	Arborophilia_torqueola          .RF..S.....V..VKEDPR...T.........H..T??I?.?????????S-....M.ATME.RV.N..G........N...S....
	Bambusicola_thoracica           ARF..A.....V.LDTQE.R...T.......MS......I.IK?DKE?L.AS-..E...ETAEERI.N..G........N....N...
	Tragopan_satyra                 .RF........D.LVTED.H...T...............I..GHDREAL.AS-..E...ATME.R..N..G........N...S....
	Tragopan_temmincki              .RF........D.LVTED.R...T...............I..GHD???L.AS-..E...ATME.R..N..G........N...S....
	Lophophorus_impejanus           .RF..A.....D.LVTED.R...T...............I..EHDREAL.AS-..E...ATME.R..N..G........N...S....
	Crossoptilon_auritum            .RF........D.LVAED.R...T...............I..ERDGEAL.AS-..E...ATME.R..N..G........N...S....
	Lophura_edwardsi                .RF........D.LVAED.R...T.......M.......I..ERDGEAL.AS-..E...ATME.R..N..G........N...S....
	Lophura_ignita                  .RF........D.LVGEDIR...T.......M.......N..ERDGEAL.AS-..E...ATME.R..N.SD........N...S....
	Gallus_gallus                   ARF..ADK...D.LVN.D.R...T....T.D..S..F..I..EHDKETL.AS-..E...DTAEDR..N..G........N...S....
	Grey_jungle_fowl                ARF..ADK...D.LVN.D.R...T....T.D..S..F..I..EHDKETL.AS-..E...DTVEDR..N..G........N...S....
	Phasianus_colchicus             .RF..........LVAED.R.V.T.....S.........I..EHEGEAL.AS-..E...ATME.R..N..G........N...NR..Q
	Syrmaticus_ellioti              .RF..K.....D.LVAED.H...T...............I..ER?G??L.AS-..E...ATME.R..N..G........N...S....
	Syrmaticus_reevesii             .RF..K.......LVAED.H...T.....S.........I..ERNGEAL.AS-..E...ATME.R..N..G........N...SR..E
	Chrysolophus_amherstiae         .RFL.....S.D-LVAED.H...T...............I..EHDG?AL.AS-..E...ATME.R..N..G........N...N....
	Polyplectron_bicalcaratum       .RF....K...D.LA.EEVR...T.....D.S..RD...I..EHDR?????S-..E.Q.TTTEHRVNNE.G......K.N..VS....
	Argusianus_argus_argus          .RF........D.LVSEDRH...T.....H..T......I..EHD?EAL.A--..EH..AT.EDR..N.I...D..L..N...S....
	Pavo_cristatus                  .RF..A.....D.LVSED.H...T.....H.........I..EHDREAL.AS-..E...AT.EHR..N..G........N...S....
	Afropavo_congensis              .RF........D.SAS.D.R...T.....H.........I..EHDGEAL.AS-..E...ATMEQR..N..G........N...S....
	Numida_meleagris                .RF..A.....D.LVTED.R...T......D........I.?????EAL.A--..E...ATME.R..N..D........N...S....
	Acryllium_vulturinum            .RF..A.....D.LVIED.R...T......D........I..EHD???L.A--..E...ATME.R..N..D........N...S....
	Meleagris_gallopavo             .RF........D.LVTED.R...T...H.-.........I..EHDREAL.AS-..E...AT.E.R..N..G........N...S....
	Grus_carunculatus               .T...........LVT.......T..................DHDKEAT..--......AT.E..F...........D.N...S....
	Anthropoides_virgo              .T...........LVT.......T..................DHDKEVT..--......AT.E..F...........D.N...S....
	Grus_vipio                      IT...........LVT.......T..................DHDKEAT..--......AT.E..F...........D.N...S....
	Fulica_atra                     .T...........LVT.....V.TN......S..........DYDKEVT..--..G.Q.AS.E.VF.N.....D..AD.N...S....
	Vanellus_spinosus               .T...........LVT.......T..........L.......DYDKEVI..--......AS................D.N...S..E.
	Larus_rudibundus                .T...........LAT.A...V....................DYDKEDI..--......AS................D.N...S..E.
	Turnix_sylvatica                .RF........DT.AD.D.P.........-.M.......I..EHD??T???S-..E...GMMERL..N..ND.......N...N...E
	Gallirallus_australis           .T.........V.LVT.NI..V.TN...T..S.I...S....DYD???T..??..G.QSA.Q..VF.N........AD.N...S....
	Geococcyx_californianus         .A...A......ALVTTARLH..T....G.....L.H..I..DYNKEVI.S--.N.....S.L....N..G.....AD.N...S....
	Dacelo_novaeguineae             .......D.....LVTE......T.R................EHDKEAI..-Q..EH..AT...RI.......D..MD.N...S....
	Carpococcyx_renauldi            .R...S......GLATT.R....T....G.....L....I..DYD???I..--.......T.ED...NI.H..Y..AH.N..FS....
	Podargus_strigoides             .T.......S...LVDEV.....T..........L.-..I..DRDK??I..--....Q..MG...............D.N...N....
	;
end;

begin mrbayes;
	prset aamodelpr=mixed;
        mcmc ngen=5000 samplefreq=100;
end;


2. Create a submit script(for example, submit) as follows. This example will run with 8 cores.

#!/bin/bash
#$ -q PP16.q
#$ -N test
#$ -pe mpi 8
#$ -cwd

/opt/openmpi/bin/mpirun --hostfile $TMPDIR/machines -np $NSLOTS /share/apps/best-mpi/mbbest avian_ovomucoids.nex

Note: Add a newline to the end of the file as a line terminator, otherwise the program will NOT run correctly.


3. To submit the job, enter the command by typing "qsub submit".

qsub submit


4. Output can be found in file test.o****.

Dalton

To be added

GAUSS Mathematical and Statistical System

An easy-to-use data analysis, mathematical and statistical environment based on the powerful, fast and efficient GAUSS Matrix Programming Language. GAUSS is used to solve real world problems and data analysis problems of exceptionally large scale. The following Gauss modules are available:

  Parallel GAUSS Engine Version 1.1 for up to 8 cores
  Maximum Likelihood MT Version 1.0 for up to 8 cores
  GAUSS Engine Personnel Edition Version 8.0 for up to 8 cores
  GAUSS Version 8.0 server locked base for up to 8 cores

Additional information on Gauss can be found at [2]

A sample GAUSS job is provided below:


1. Create a new folder by entering "mkdir gauss" and change directory to "gauss"

$ mkdir gauss
$ cd gauss
$

2. Put your Gauss input file in this directory or you can create a new one. Here we create a new input file "gauss.gau" and put the following in it:

print 3^2;

x=3; print x^2;

3. Create a submit script "submit", and put the following in it:

#!/bin/bash
#$ -N gauss
#$ -q PP16.q
#$ -cwd

/share/apps/gauss/tgauss -b ./gauss.gau

4. Submit the script by executing "qsub submit".

$ qsub submit
your job 103 (“gauss”) has been submitted
$

5. Show the output by executing "cat gauss.o103".

$ cat gauss.o103
       9.0000000 
       9.0000000 
(gauss) 
$


More gauss examples can be found under directory "/share/apps/gauss/examples".

Gaussian03

Gaussian03 is the latest in the Gaussian series of electronic structure programs. Gaussian03 is used by chemists, chemical engineers, biochemists, physicists and others for research in established and emerging areas of chemical interest. Starting from the basic laws of quantum mechanics, Gaussian predicts the energies, molecular structures, and vibrational frequencies of molecular systems, along with numerous molecular properties derived from these basic computation types. It can be used to study molecules and reactions under a wide range of conditions, including both stable species and compounds which are difficult or impossible to observe experimentally such as short-lived intermediates and transition structures.

Gaussian03 is, as mentioned above, a series of programs and the performance of Gaussian03 on any system is highly dependent on the particular problem being solved. Generally speaking, Gaussian03 can be run as a single core job, an SMP job, or a parallel Linda job. Parallel Linda is not supported at this site. Users of Gaussian03 running in an SMP mode generally refer to it being run in a "parallel environment", but it is really an SMP environment, that is, the job will not run on more cores than are available on a node (and the job does not run using either the MPI libraries or Linda).

At the HPCC, Gaussian03 is installed only on Zeus. Zeus has 16 compute nodes for Gaussian03 applications. Eight of those nodes have 8 cores; the other 8 have two cores. Gaussian03 jobs submitted requiring either one or two cores with be sent to the 2 core Woodcrest nodes by SGE; those requiring 3 to 8 cores will be dispatched to the Clovertown nodes.

The Woodcrest nodes have 300 GB local disks that can be used for temporary Gaussian03 files. The user is responsible for deleting the temporary files at the completion of the run.

The Clovertown nodes have 1 TB local disks that can be used for temporary Gaussian03 files. The user is responsible for deleting the temporary files at the completion of the run.

If the user expects that the job will require more storage than is available on the node for temporary disk space, the user should specify that the temporary files should be written out to the file system. The user is responsible for deleting the temporary files at the completion of the run.

The following compares the run time of a simple Gaussian03 test case on the Intel Woodcrest X5154 chip running at 2.83 GHz and on Intel Clovertown X5355 chip running at 2.66 GHz. Note that the CLovertown cores execute faster than the Woodcrest cores even though they have a slower clock speed. Exact timings are not provided because they will vary from case to case and the Gaussian license does not allow the publication of benchmark data.

The following five test cases were run:

1. G03 test case using a single core on the Woodcrest nodes without local temp file space, i.e., using network attached storage.

2. G03 test case using a single core on the Clovertown nodes without local temp filespace, i.e., using network attached storage.

3. G03 test case using a single core on the Clovertown nodes and local temp space. The Clovertown core was about 33% faster than the Woodcrest core.

4. G03 test case using the parallel environment with 4 cores on the Clovertown and local temp space. For this case, using the parallel environment provided near linear speed-up.

5. G03 test case using the parallel environment with 8 cores on the Clovertown and local temp space. For this case, using the parallel environment provided near linear speed-up.

With Gaussian03, every problem will show a different performance. Users should optimize performance based on their experience. It is suggested that users DO NOT routinely opt to use the maximum number of cores unless they know that it will substantially speed execution. Using more cores than is needed wastes resources and will limit the number of Gaussian03 jobs that can be run.


Running Gaussian03 jobs on Zeus

There are two ways to submit and run Gaussian03 jobs.

  • First method is easy and should work in most cases--as long as the amount of local disk space required for temporary files is less than 100 GB per core. To submit the job, do the following:

1) Prepare a Gaussian03 job script. The following example submits a job for parallel execution on 4 cores (nproc=4):

%Chk=water
%mem=4GB
%nproc=4
#RHF/6-31G(d)
 
water energy 

0  1 
O 
H  1  1.0 
H  1  1.0  2  120.0 

2) Run the command easysub

To submit the job for execution, enter the following command:

easysub -n 4 -f name_that_you_like -p /home/your.username/your_g03code

In the above example, the job was run in the parallel environment on 4 cores and the Gaussian03 job file was saved in file /home/your.username/your_g03code The -h option shows the details of the easysub script, for example:

# easysub -h


Easy submit of Gaussian03 jobs. 
USAGE:
--------------------------------------
/bin/easysub [-n number_of_cores_to_be_used] [-f name_of_a_job] 
   [-p /full/path/to/g03/code] [-h|--help|help]

Defaults are:
default amount of cores: 1
default name of a job:   username.current-date

-h or --help or help shows this usage summary.
--------------------------------------

All the Gaussian03 output files are saved in the file 
'name_of_a_job.log'  within your
/home/your.username/g03jobs/name_of_a_job 
directory.

Mail comments, suggestions, bugs etc to
HPChelp@mail.csi.cuny.edu

Note, that flag -p is obligatory while -n and -f are not (if these options are not included, the default values will be used instead).

This method allows users about 100 GB per core for temp/scratch files. That is, if you are running a 4 core job, you can use up to 400 GB of the local disk for the temp/scratch files. If you need more than 100 GB of temporary space per core, you will need to write those files to the network attached storage. That is can be done with the help of method 2.


  • Second method requires the user to prepare all necessary files manually. Use the following procedures to run a job in this manner:

1) Prepare file with Gaussian03 code. Here is the "job script" or "input script":

#!/bin/csh 
setenv LD_LIBRARY_PATH /share/apps/pgi/linux86-64/7.2-5/libso:$LD_LIBRARY_PATH
setenv g03root /share/apps/gaussianE1
setenv GAUSS_SCRDIR /state/partition1/tmp/some_name 
source $g03root/g03/bsd/g03.login

$g03root/g03/g03 <<END > FILE_WITH_OUTPUTS.log 
%Chk=chk_myjob 
%mem=4GB       
%nproc=(1,2,4 or 8)       
#P RHF/6-31G(d) Test
   
My test program: Water -- single point energy
   
0 1
O   -0.464   0.177   0.0
H   -0.464   1.137   0.0
H    0.441  -0.143   0.0

END

This job script has to be executable (chmod +x ./file.job).

The first 3 lines of this c-shell script initialize different environmental variables needed by Gaussian03. For example setenv GAUSS_SCRDIR specifies the directory where temp files are to be placed (/state/partition1/tmp/some_name for local disk, $HOME for network disk). Scratch files must be deleted after job completion.


The input file consists of commands and Gaussian03 input content. Here, we need to put Gaussian input data(type of calculation, basis set, etc.) between the line $g03root/g03/g03 <<END > FILE_WITH_OUTPUTS.log and the last line END.

Please refer to Gaussian03 documentation on the syntax for creating the input content.


2) Create a submit script as follows:

#!/bin/bash
#$ -N some_name
#$ -q all.q   
#$ -pe mpi n  
#$ -cwd

./file.job 

rm -r /state/partition1/tmp/some_name 
or (depending on your choice of $SCR_DIR)
rm -r $HOME/GAU*

In the submit script, "-pe mpi n" defines the use of the parallel environment and the number of cores the job can run on where n is from 1 to 8 on Zeus. Last line invokes removal of scratch files.


3) The job should be submit in the usual manner (make sure that both send.script and file.job are in the same directory):

qsub send.script

4) Upon completion, all the outputs will be stored in the file FILE_WITH_OUTPUTS.log

CITATION

The current required citation for Gaussian 03 is the following (presented here in three formats for convenient cutting and pasting). Note that this is an updated list with respect to that printed out by earlier revisions of the program, but it applies to every revision of Gaussian 03.

Gaussian 03, Revision C.02, M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman, J. A. Montgomery, Jr., T. Vreven, K. N. Kudin, J. C. Burant, J. M. Millam, S. S. Iyengar, J. Tomasi, V. Barone, B. Mennucci, M. Cossi, G. Scalmani, N. Rega, G. A. Petersson, H. Nakatsuji, M. Hada, M. Ehara, K. Toyota, R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao, H. Nakai, M. Klene, X. Li, J. E. Knox, H. P. Hratchian, J. B. Cross, V. Bakken, C. Adamo, J. Jaramillo, R. Gomperts, R. E. Stratmann, O. Yazyev, A. J. Austin, R. Cammi, C. Pomelli, J. W. Ochterski, P. Y. Ayala, K. Morokuma, G. A. Voth, P. Salvador, J. J. Dannenberg, V. G. Zakrzewski, S. Dapprich, A. D. Daniels, M. C. Strain, O. Farkas, D. K. Malick, A.D. Rabuck, K. Raghavachari, J. B. Foresman, J. V. Ortiz, Q. Cui, A. G. Baboul, S. Clifford, J. Cioslowski, B. B. Stefanov, G. Liu, A. Liashenko, P. Piskorz, I. Komaromi, R. L. Martin, D. J. Fox, T. Keith, M. A. Al-Laham, C. Y. Peng, A. Nanayakkara, M. Challacombe, P. M. W. Gill, B. Johnson, W. Chen, M. W. Wong, C. Gonzalez, and J. A. Pople, Gaussian, Inc., Wallingford CT, 2004.

Mathematica

“Mathematica” is a fully integrated technical computing system that combines fast, high-precision numerical and symbolic computation with data visualization and programming capabilities. Mathematica version 7.0 is currently installed on the CUNY HPC Center's ATHENA cluster (athena.csi.cuny.edu). The basics of running Mathematica on ATHENA are present here. Additional information on how to use Mathematica can be found at http://www.wolfram.com/learningcenter/

A Note on Fonts on Unix and Linux Systems

If you have Mathematica installed on your local system, you should already have the correct fonts available for local use, but when displaying the Mathematica GUI (via X11 forwarding). on your local system while running remotely, some additional preparation may be required to provide the fonts that Mathematica requires to X11 locally. The procedure for doing this is presented here.

The Mathematica GUI interface supports BDF, TrueType, and Type1 fonts. These fonts are automatically installed for local use by the MathInstaller. Your workstation or personal computer will have access to these fonts if you have installed Mathematica for local use. However, if the Mathematica process is installed and running only on a remote system at the CUNY HPC Center (say ATHENA), then X11 and the Mathematica GUI being displayed on your local machine (through X11 port forwarding) must know where to find the Mathematica fonts locally. Typically, the Mathematica fonts must be added to your local workstation's X11 font path using the 'xset' command, as follows.

First, you must create a client-local directory into which to copy the fonts, for example on a Linux system cd $HOME; mkdir Fonts. Next, you must copy the Mathematica font directories into this local directory from their remote location on ATHENA. They are currently stored in the directory:

/share/apps/mathematica7/SystemFiles/Fonts/

To create local copies in the 'Fonts' directory you created, execute the following commands from your local desktop (this assumes that secure copy (scp) is available on your desktop system):

$
$mkdir Fonts
$
$cd Fonts
$scp -r your.account@athena.csi.cuny.edu:/share/apps/mathematica7/SystemFiles/Fonts/*   .
$
$ls -l
drwxr-xr-x 2 your.account users   4096 Nov  3 16:07 AFM
drwxr-xr-x 2 your.account users 45056 Nov  3 16:08 BDF
drwxr-xr-x 2 your.account users   4096 Nov  3 16:07 SVG
drwxr-xr-x 2 your.account users   4096 Nov  3 16:07 TTF
drwxr-xr-x 2 your.account users   4096 Nov  3 16:07 Type1
$

After you have copied the remote font directories into your local directory, run the following X11 'xset' commands locally:

xset fp+ ${HOME}/Fonts/Type1; xset fp rehash
xset fp+ ${HOME}/Fonts/BDF;    xset fp rehash

For optimal on-screen performance, the Type1 font path should appear before the BDF font path. Hence, ${HOME}/Fonts/Type1 should appear before ${HOME}/Fonts/BDF in the path. You can check font path order by executing the command:

xset q

Additional information on handling Mathematica fonts can be found at http://reference.wolfram.com/mathematica/tutorial/FontsOnUnixAndLinux.html

Modes of Operation in Mathematica

Mathematica can be run locally on an office workstation, directly on a server or cluster from its head node, or across the network between an office-local client and a remote server (a cluster for instance). It can be run serially or in parallel; its licenses can be provided locally or via a network-resident license server; and it can be run in command-line of GUI mode. The details of installing and running Mathematica on a local office workstation are left to the user. Those modes of operation important to the use of CUNY's HPC resources are discussed here.

Selecting Between GUI and Command-Line Mode

The use of command-line mode or GUI mode is determined by the Mathematica command selected. To use the Mathematica GUI, enter the following command to the user prompt:

$mathematica

To use Mathematica Command Line Interface (CLI), enter:

$math

More detail on these and other Mathematica commands is available through man command as in:

$man mathematica
$man math
$man mcc

The lines above provide documentation on the GUI, CLI, and Mathematica C-compiler, respectively.

Submitting Batch Jobs Directly from the CUNY ATHENA Cluster

Currently, there is no simple and secure method for the automatic network submission of Mathematica jobs from a remote (user local or desktop) CUNY installation of Mathematic to ATHENA. This is something that is being pursued. In the mean time, Mathematica work can be submitted from ATHENA's head node using a user's locally tested Mathematica command sequences to form a standard batch job, or through Mathematica's built-in batch submission feature. Either of these approaches can be used while logged into ATHENA's head node. The standard batch submission process is simple to set up and imposes the smallest burden on ATHENA's head node. Mathematica's built-in batch submission feature can be used from the Mathematica CLI or through its GUI. The latter requires setting up X11 forwarding (potentially through more than one host, which is explained below), and imposes a greater burden on ATHENA's head node.

Batch Jobs Run with 'qsub' Using a Mathematica Command (Text) File

In the following example, a batch job is created around a locally tested Mathematica command sequence that is then submitted to ATHENA's batch queueing system using the qsub command. The simple Mathematica command sequence shown here computes a matrix of integrals and prints out every element of that matrix. Any valid sequence of Mathematica commands provided in a note book file, whether tested on an office Mathematica installation or on the cluster itself, could be used in this example.

When working remotely from an office or a classroom, a user would validate the command sequence on their local workstation (via a smaller local test run), modify it incrementally to make use of the additional resources available on ATHENA, and then copy, paste, and save the Mathematica command sequence in a note book file on ATHENA. This last step would be done through a cluster text editor like 'vi' or 'emacs' from a terminal window. From a Windows desktop, the free, secure Windows to Linux terminal package, PuTTY could be used. From a Linux desktop, connecting with secure shell 'ssh' would be the right approach.

Below, a locally tested note book file has been saved on ATHENA for our integral calculation and is called test_run.nb. It contents are listed here:


$
$ cat test_run.nb

Print ["Beginning Integral Calculations"]; p=5;
Timing[matr = Table[Integrate[x^(j+i),{x,0,1}], {i,1,p-1}, {j,1,p-1}]//N];
For[i=1, i<p, i++, For[j=1, j<p, j++, Print[matr[[i]][[j]]]]];
Print ["Finished!"];
Quit[];

$

This is a serial Mathematica job that executes on just one core on one of ATHENA's compute nodes. The simple batch script offered to 'qsub' to run this job (we will call it run.math) is listed here:

$
$cat run.math

#!/bin/bash
#$ -V
#$ -N math_test
#$ -q PP16.q
#$ -pe mpi 1
#$ -cwd 

math -run "<<test_run.nb"

$

This script runs on a single processor under the 'mpi' parallel environment job. The '-V' option ensures that the current Unix environment is provided as context for the batch job. The '-N math_test' option names the job 'math_test.' The job is run in ATHENA's 'PP16.q' queue in the current working directory. Notice the form of the Mathematica command. The CLI version of the command is used. The '-cwd' option instructs the batch job to look for the command file in the directory on the head node from which the job was submitted.

Save this script in a file for your future use, for example in "my_run.math". With few modifications it can be used to run most Mathematica batch jobs on ATHENA.

To run this job script use the command:

 qsub run.math 

Like any other batch job submitted using 'qsub', you can check the status of your job by running the command 'qstat'. Upon completion, the output generated by the job will be written to the file 'math_test.oXXXX', where the XXXX is the job request ID (number) of the job.

Here is the output from the sample job:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.

''(This error message is due to interactive prompting and can be ignored)''

Mathematica 6.0 for Linux x86 (64-bit)
Copyright 1988-2008 Wolfram Research, Inc.

Beginning Integral Calculations

0.333333
0.25
0.2
0.166667
0.25
0.2
0.166667
0.142857
0.2
0.166667
0.142857
0.125
0.166667
0.142857
0.125
0.111111

Finished!
Batch Jobs Run Directly from the Mathematica CLI or GUI

To submit batch work to the ATHENA cluster compute nodes using the Mathematica CLI, users must take the following steps:

1) 'ssh' to the ATHENA head node, athena.csi.cuny.edu:

$ssh your.name@athena.csi.cuny.edu

If you are not on CUNY's CSI campus, you will have to access ATHENA through the CSI gateway system, neptune.csi.cuny.edu. (Note: X11 forwarding is not needed to use the Mathematica CLI.)

2) Start Mathematica using the CLI command:

$math

3) Enter the required notebook commands to the Mathematica CLI prompt:

$
$math

Mathematica 7.0 for Linux x86 (64-bit)
Copyright 1988-2009 Wolfram Research, Inc.

In[1]:= LaunchKernels[SGE["localhost","KernelProgram"->
            "/share/apps/wolfram/mathematica/bin/math","NativeSpecifications"->
             "-q PD16.q -pe mpi 4","ToQueue"->True], 2] (* set up and launch 2 slaves*)

In[2]:= ParallelSum[2^n/n!, {n, 0, 1000}];   (* actual calculation starts here *)

In[3]:= CloseKernels[]; (* close slaves *)
In[4]:=
in[4]:=^d (* end session *)

In[1]:= Needs["ClusterIntegration`"];
In[2]:= ClusterSetEngine[SGE -> {"MathKernelCommand" -> "/share/apps/wolfram/mathematica/bin/math", "NativeSpecification" -> "-q PD16.q -pe mpi 4", "ToQueue" -> True}];
In[3]:= ClusterLaunchSlaves[2]; (* launching 2 slaves*)
In[4]:= ParallelSum[2^n/n!, {n, 0, 1000}]; (* actual calculation starts here *)

LaunchKernels::launch: Launching 4 kernels...

In[5]:= ClusterCloseSlaves[]; (* close slaves *)
In[6]:=
in[6]:=^d (* end session *)

$
$

The simple notebook series of commands shown above runs a parallel Mathematica job on 4 cluster compute nodes. This job uses SGE's 'mpi' parallel environment and SGE's PD16.q queue.

To submit work to the ATHENA cluster compute nodes using the Mathematica GUI, the user's local system must be a Linux- or Unix-based system and they must take the following steps:

1) Install Mathematica's fonts on your local Linux machine ( A Note on Fonts on Unix and Linux Systems).

2) 'ssh' to ATHENA (athena.csi.cuny.edu) with X11 forwarding enabled.

$ssh -X your.name@athena.csi.cuny.edu

Note: If you are not on CUNY's CSI campus, you will have to access ATHENA through the CSI gateway system, neptune.csi.cuny.edu. This will lower performance because you will be forwarding X11 packets twice.

3) Start Mathematica using the GUI command:

$mathematica

The same notebook test command sequence presented above can be used here with the GUI.

Submitting Batch Jobs from Remote Locations to CUNY's ATHENA Cluster

A method for doing this is being developed and tested.

For more information on Mathematica:

  • Online documentation is available through the Help menu within the Mathematica notebook front end.
  • The Mathematica Book, 5th Edition (Wolfram Media, Inc., 2003) by Stephen Wolfram.
  • The Mathematica Book is available online.
  • Additional Mathematica documentation is available online.
  • Information on the Parallel Computing Toolkit is available online.
  • Getting Started with Mathematica (Wolfram Research, Inc., 2004).
  • The Wolfram web site http://www.wolfram.com

MATLAB

The MATLAB high-performance language for technical computing integrates computation, visualization, and programming in an easy-to-use environment where problems and solutions are expressed in familiar mathematical notation. Typical uses include:

Math and computation

Algorithm development

Data acquisition

Modeling, simulation, and prototyping

Data analysis, exploration, and visualization

Scientific and engineering graphics

Application development, including graphical user interface building

MATLAB is an interactive system with both a command line and Graphical User Interface (GUI) whose basic data element is an array that does not require dimensioning. It allows you to solve many technical computing problems, especially those with matrix and vector formulations, in a fraction of the time it would take to write a program in a scalar non-interactive language such as C or Fortran. Properly licensed and configured, MATLAB compute engine can be run serially or in parallel, and on the local desktop or remotely. Remote operation on HPC cluster systems is supported, and includes the use of CUNY's HPC cluster system BOB (bob.csi.cuny.edu).

MATLAB concurrent jobs can be divided (whether local or remote) into two distinct categories, Distributed and Parallel. In MATLAB, a Distributed concurrent job is a workload divided among two or more fully independent MATLAB processes or 'workers'. The MATLAB client submits several independent jobs, one for each worker, to SGE (in this case). Each worker works on a piece of the same problem, but runs fully independently of the others. They are queued up by BOB's current workload manager (SGE) as separately scheduled serial workloads. Each has its own job ID and is run on its own compute node. .

On the other hand, MATLAB Parallel jobs do not produce independent processes run by separate 'workers', but produce a single, coupled, parallel workload run under MATLAB's 'labs' abstraction. They are run in SGE's 'matlab' parallel queue, under the '-pe matlab' parallel environment, and have only one job ID. They are closely coupled and rely on MPI-like inter-process communication.

While anyone at a CUNY campus can submit jobs, there is an important distinction between those users within the CSI campus and those outside it. Those inside CSI have direct access to the head node of BOB where the jobs are started, and therefore their office machines can serve as the local, submitting client. Those outside of CSI do not have direct access to BOB's head node, and therefore must submit there work from a CSI-campus-local client. The HPCC gateway machine, NEPTUNE has been setup for this. NEPTUNE must serve as the submitting client for those CUNY users off of the CSI campus.

Note: Regardless of the type of remote job (Distributed or Parallel), one must have set up two-way, passwordless 'ssh' between the submitting client (the CSI office workstation or NEPTUNE) and BOB's head node. This is currently possible only from within the CUNY's CSI campus, and this is the reason non-CSI users must use NEPTUNE to submit MATLAB jobs to BOB.

Licensing requirements for client-to-cluster job submission

MATLAB combines its basic tools for manipulating matrices with a large suite of applications-specific libraries or 'toolboxes', including its Parallel Computing Toolbox which is required to submit jobs to a cluster. In order to successfully run parallel MATLAB jobs on CUNY's HPC cluster systems, a user must have (or be able to acquire over their campus network) licenses for all the MATLAB components that will be used by their job. At a minimum, users must have a client-local license for MATLAB itself and the Parallel Computing Toolbox. Currently, the CUNY HPC group at CSI has 5 combined MATLAB and Parallel Computing Toolbox node-locked client licenses to distribute on a case-by-case and temporary basis. With these two licenses and the license that CUNY HPC provides on BOB for the Distributed Computing Server (DCS), basic MATLAB Distributed or Parallel jobs can be run (governed by the 'ssh' requirement above). If the job makes use of other applications-specific toolboxes (e.g. Aerospace Toolbox, Bioinformatics Toolbox, Econometrics Toolbox, etc.), it will need to acquire those licenses locally or from the CSI on-campus MATLAB license server. These extra tool box licenses are not provided by CUNY HPC.

Currently, a properly configured CSI campus client that also requires an application-specific toolbox to complete its work will have two license.lic files installed in ${MATLAB_ROOT}/licenses (the value for the MATLAB_ROOT directory can be determined on the machine of interest by typing 'matlabroot' at the MATLAB command-line prompt). The first will be the node-local license (say, mylocal.lic) for MATLAB and the Parallel Computing Toolbox, and the second will be the network-served license (network.lic) pointing to the campus MATLAB toolbox license server. These are read in alphabetical order upon MATLAB startup to obtain proper licensing. Other licensing schemes are conceivable.

Note: Non-CSI users, submitting jobs from the NEPTUNE client (gateway), must currently rely on the licenses supported from that system. These include a limit number of node-locked MATLAB and Parallel Computing Toolbox licenses and those applications tool box licenses available from the CSI campus license server. Not all applications tool boxes have been licensed by CSI.

The node-local license for MATLAB and the Parallel Computing Tool Box would look something like this:

# BEGIN--------------BEGIN--------------BEGIN
# DO NOT EDIT THIS FILE.  Any changes will be overwritten.
# MATLAB license passcode file.
# LicenseNo: 99999
INCREMENT MATLAB MLM 22 01-jan-0000 uncounted 99C9EC4D3695 \
        VENDOR_STRING=vi=30:at=187:pd=1:lo=GM:lu=200:ei=944275: \
        HOSTID=MATLAB_HOSTID=0015179549BA:000000 PLATFORMS="i86_re \
        amd64_re" ISSUED=30-Sep-2009 SN=000000 TS_OK
INCREMENT Distrib_Computing_Toolbox MLM 22 01-jan-0000 uncounted \
        E77E2F473055 \
        VENDOR_STRING=vi=30:at=187:pd=1:lo=GM:lu=200:ei=944275: \
        HOSTID="0015179549ba 0015179549bb 002219504c4f 002219504c51" \
        PLATFORMS="i86_re amd64_re" ISSUED=30-Sep-2009 SN=000000 TS_OK
# END-----------------END-----------------END

The network license for any required Applications Toolboxes would look something like this:

SERVER 163.238.11.65  000f1f8d5c66 27000
USE_SERVER

(The license files above are for illustration only, and are not functional license files.)

Within the CSI campus, a node-local license file for MATLAB and the Parallel Computing Toolbox can be obtained from CUNY's HPC group, and a network license from Thomas Lauria or Gabriel Cynowicz (Thomas.Lauria@csi.cuny.edu; Gabriel.Cynowicz@csi.cuny.edu), who are responsible for the CSI campus MATLAB license server. In addition, installations of MATLAB on a CSI campus client must have included the current on-campus File Installation Key. This can also be obtained from the CSI campus MATLAB license support staff listed above. This does not apply to non-CSI users because the MATLAB installation on the NEPTUNE client is complete.

In the future, if arrangements are made for non-CSI CUNY sites to have direct 'ssh' access to CUNY's HPC clusters at CSI, those non-CSI sites will need to provide local licensing for MATLAB itself, the Parallel Computing Toolbox, and any Applications Toolboxes they require. For all CUNY users (within and outside of CSI), the CUNY clusters at CSI provide the proper DCS licensing automatically for jobs started on the cluster as long as they arrive with the proper licenses for the Toolboxes they use.

Setting up the client and cluster environment for remote execution

A number of steps must be taken to successfully transfer, submit, and recover MATLAB jobs from the HPC cluster BOB. An important first step is to ensure that the version of MATLAB running locally is identical to the version running on the CUNY cluster, BOB. The CUNY HPC Center is currently running MATLAB Version 7.9.0.529 (R2009b), but to determine the release generally, login to BOB, run matlab's command-line interface (CLI), and to the >> prompt enter MATLAB's 'version' command. If identical versions are not running, the local MATLAB will detect a mismatch, assume there are potential feature incompatibilities, and not submit your job. The error message produced when this occurs is not very diagnostic. Note: This does not apply to non-CSI users submitting their work from NEPTUNE where the versions already match.

Next, two-way, passwordless secure login and file transfer in both directions must be working correctly. For Linux-to-Linux transfers this involves following the procedures outlined in the 'ssh-keygen' man page and/or referring to the numerous descriptions on the web. This includes putting the public keys generated with 'ssh-keygen' on each machine into the other machine's authorized_keys file. For Windows-to-Linux transfers this is usually accomplished with the help of the Windows remote login utility 'PuTTY'. Please refer to the numerous HOW TOs on the web to complete this. (Note: CSI clients that are behind a firewall or reside on a local subnet behind a router may require special configuration, including port-forwarding of return 'ssh' traffic on port 22 from BOB through the local router to the local client).

In addition, on the cluster, passwordless 'ssh' must be allowed for the user from the head node to all of the compute nodes where the MATLAB job might run. This is the default for user accounts on BOB, but it should be checked by the user. Because the home directory on the head node is shared with all the compute nodes, accomplishing this is a simple matter of including the head node's public key in the 'authorized_keys' file in the user's '.ssh' directory. Again refer to the ssh-keygen man page or many on-line sources for more detail here.

Once passwordless 'ssh' is operational, the CUNY HPC group recommends studying the sections in MATLAB's Parallel Computing Toolbox User Guide [3]. The sections on 'Programming Distributed Jobs' and 'Programming Parallel Jobs' are particularly useful. The sub-sections titled 'Using the Generic Scheduler Interface' are specific to the topic of submitting remote jobs to the so-called 'Generic Interface', which is the term that MATLAB uses for workload managers such as SGE that are not explicitly supported. Note: Reading through these section is strongly recommended before submitting the test jobs provided below.

In addition, an important source of information can be found in the README files in the following directory under MATLAB's root  directory or installation tree on your campus-client system or on the head node of BOB:
$(MATLAB_ROOT}/toolbox/distcomp/examples/integration/sge

There are similar directories for other common workload managers at the same level. Since, in a submission from a campus client to BOB, there is not a shared file system users should pay particularly close attention to the contents of the 'nonshared' subdirectory in the above SGE directory. There is guidance for both Linux and Windows clients on non-shared file systems there. Further information can be found at the MATLAB website here [4] and here [5].

CUNY's HPC group has successfully run both Distributed and Parallel jobs from a remote client with a non-shared file system on BOB. Below, the CUNY's HPC group includes MATLAB scripts that have successfully been used to submit both Distributed and Parallel work to BOB from a Linux client. The MATLAB script for Distributed job submission is:

% Define arguments to SubmitFcn
clusterHost = 'bob.csi.cuny.edu';
% This is the path of the working directory on 'bob.csi.cuny.edu'
remoteDataLocation = '/<some user defined path>/matlab_remote';
% Create scheduler object
sched = findResource('scheduler', 'type', 'generic');
% Define a client local working directory in DataLocation on the submitting machine
set(sched, 'DataLocation', '/<some other user defined path>/matlab_local');
set(sched, 'ClusterMatlabRoot', '/share/apps/mtlb');
set(sched, 'HasSharedFilesystem', false);
set(sched, 'ClusterOsType', 'unix');
set(sched, 'GetJobStateFcn', @sgeGetJobState);
set(sched, 'DestroyJobFcn', @sgeDestroyJob);
% The SubmitFcn must be a cell array that includes the two additional inputs
set(sched, 'SubmitFcn', {@sgeNonSharedSimpleSubmitFcn, clusterHost, remoteDataLocation});

j = createJob(sched);
t = createTask(j,@rand,1);
t = createTask(j,@rand,1);
t = createTask(j,@rand,1);
t = createTask(j,@rand,1);

submit(j);

waitForState(j, 'finished');
results = getAllOutputArguments(j);

References to files in the remote working directory are preceded by the '@' sign, and those files are presumed to be have been made available there. Files placed in other locations may be referenced with the full remote file system path or through the MATLAB addpath command using to the path below. The last two commands wait for the job to achieve a 'finished' state on the client and grab the results for display on the client. The runtime functions needed above (and below) can be obtained for customization from their distribution location in:

$(MATLAB_ROOT}/toolbox/distcomp/examples/integration/sge/nonshared

The MATLAB script for Parallel job submission is listed here (the function colsum.m must be provided in MATLAB's local working directory):

% Define arguments to ParallelSubmitFcn
clusterHost = 'bob.csi.cuny.edu';
% This is the path of the working directory on 'bob.csi.cuny.edu'
remoteDataLocation = '/<some user defined path>/matlab_remote';
sched = findResource('scheduler', 'type', 'generic');
% Define a client local working directory in DataLocation on the submitting machine
set(sched, 'DataLocation', '/<some other user defined path>/matlab_local');
set(sched, 'ClusterMatlabRoot', '/share/apps/mtlb');
set(sched, 'HasSharedFilesystem', false);
set(sched, 'ClusterOsType', 'unix');
set(sched, 'GetJobStateFcn', @sgeGetJobState);
set(sched, 'DestroyJobFcn', @sgeDestroyJob);
% If you want to run parallel jobs, you must specify a ParallelSubmitFcn
set(sched, 'ParallelSubmitFcn', {@sgeNonSharedParallelSubmitFcn, clusterHost, remoteDataLocation});

pjob = createParallelJob(sched);
% Create a dependency on the parallel function colsum.m to get it transferred to cluster
set(pjob, 'FileDependencies', {'colsum.m'});
% Define the number of processes to use for this job
set(pjob, 'MaximumNumberOfWorkers', 4);
set(pjob, 'MinimumNumberOfWorkers', 4);
t = createTask(pjob,@colsum,1,{});

submit(pjob);

waitForState(pjob, 'finished');
results = getAllOutputArguments(pjob);

Parallel function 'colsum':

function total_sum = colsum
if labindex == 1
    % Send magic square to other labs
    A = labBroadcast(1,magic(numlabs))
else
    % Receive broadcast on other labs
    A = labBroadcast(1)
end

% Calculate sum of column identified by labindex for this lab
column_sum = sum(A(:,labindex))

% Calculate total sum by combining column sum from all labs
total_sum = gplus(column_sum)

It is important to point out that any user-authored code residing on the local client will need to be copied over to the cluster remote directory and be made available in the MATLAB path. The setting of the 'FileDependencies' property above in the Parallel job script illustrates how to accomplish this automatically as part of the job submission process. In this example, the job is dependent on the user supplied function 'colsum.m' that is local to the client. The line:

set(pjob, 'FileDependencies', {'colsum.m'});

accomplishes the fine transfer automatically. Because 'colsum.m' is written in MATLAB script, it can be transferred as text. However, user defined functions that need to be compiled (typically ending in the suffix '.mex') must be compiled in the environment in which they will be used. This may mean that users will need to compile their code on the destination machine (the head node of the cluster, BOB, in our case) and provide the compiled result in the remote working directory defined in their submit script (or gather them up on the local system before submission and use the file dependency command as show above for 'colsum.m'. Further information on file dependencies can be found in the MATLAB User's Guide [6].

All of the scripting described above, when tested and functioning can be reduced in the MATLAB GUI to a MATLAB configuration which is a drop down menu job submission tab.

A successful MATLAB job submission to BOB from a CUNY campus-client (currently only CSI clients can meet the 'ssh' connectivity requirement) can be tracked through the following steps:

1. Creation of the client-local job directory 'Job' in the current
    working directory on the client.

2. Transfer of this 'Job' directory via 'ssh' from the client to the cluster head
     node (server) for execution.  At this point running the 'get(job)' from the
     client will show a state of 'queued' or 'pending'.

3. The assignment of compute nodes to the job by SGE and the queuing of 
     the job for execution (the job should now be visible as queued by 'qstat').

4. The startup of a MATLAB instance(s) on the cluster compute nodes.  The
     job should now be listed as running when 'qstat' is run on BOB.

5. Job completion on BOB is indicated by a 'finished' state listed in the 'Job.state.mat' 
     file.  The 'qstat' command will now show the job has completed.

6. Job files are transferred back to the client local directory marking the 'Job.state.mat'
    on the client also as 'finished'.  At this point, running the 'get(job)' command from
    the client will show a job state of 'finished'.

7. Job results will available in MATLAB via the 'results = getAllOutputArguments(job)
     command upon successful completion.

There will be slight differences in the protocol between Distributed and Parallel jobs with Distributed jobs showing N separate queued jobs with N job IDs, one for each MATLAB 'worker' task running on its own compute node; and Parallel jobs showing one queued job per MATLAB 'lab' running in concert on N compute nodes.

This description is based on CUNY HPC's current use of the SGE workload manager. Updates will be required if the CUNY HPC group decides to support another workload managers such PBS Pro.

MrBayes

"MrBayes" is a program for the Bayesian estimation of phylogeny. Bayesian inference of phylogeny is based upon a quantity called the posterior probability distribution of trees, which is the probability of a tree conditioned on the observations. The conditioning is accomplished using Bayes's theorem. The posterior probability distribution of trees is impossible to calculate analytically; instead, MrBayes uses a simulation technique called Markov chain Monte Carlo (or MCMC) to approximate the posterior probabilities of trees.

The program takes as input a character matrix in a NEXUS file format. The output is several files with the parameters that were sampled by the MCMC algorithm. MrBayes can summarize the information in these files for the user.

For more information, click here.

Running MrBayes with one core

1. Prepare a job script(primates.nex) which contains the DATA block and MRBAYES block. The job script must be in NEXUS file format. The MRBAYES block simply contains the commands ended with a semi-colon. The example below contains 12 mitochondrial DNA sequences of primates and we get at least 1,000 samples from the posterior probability distribution. If you need more detail on the NEXUS file format, please check the on-line manual.


#NEXUS
begin data;
dimensions ntax=12 nchar=898;
format datatype=dna interleave=no gap=-;
matrix
Tarsius_syrichta	AAGTTTCATTGGAGCCACCACTCTTATAATTGCCCATGGCCTCACCTCCTCCCTATTATTTTGCCTAGCAAATACAAACTACGAACGAGTCCACAGTCGAACAATAGCACTAGCCCGTGGCCTTCAAACCCTATTACCTCTTGCAGCAACATGATGACTCCTCGCCAGCTTAACCAACCTGGCCCTTCCCCCAACAATTAATTTAATCGGTGAACTGTCCGTAATAATAGCAGCATTTTCATGGTCACACCTAACTATTATCTTAGTAGGCCTTAACACCCTTATCACCGCCCTATATTCCCTATATATACTAATCATAACTCAACGAGGAAAATACACATATCATATCAACAATATCATGCCCCCTTTCACCCGAGAAAATACATTAATAATCATACACCTATTTCCCTTAATCCTACTATCTACCAACCCCAAAGTAATTATAGGAACCATGTACTGTAAATATAGTTTAAACAAAACATTAGATTGTGAGTCTAATAATAGAAGCCCAAAGATTTCTTATTTACCAAGAAAGTA-TGCAAGAACTGCTAACTCATGCCTCCATATATAACAATGTGGCTTTCTT-ACTTTTAAAGGATAGAAGTAATCCATCGGTCTTAGGAACCGAAAA-ATTGGTGCAACTCCAAATAAAAGTAATAAATTTATTTTCATCCTCCATTTTACTATCACTTACACTCTTAATTACCCCATTTATTATTACAACAACTAAAAAATATGAAACACATGCATACCCTTACTACGTAAAAAACTCTATCGCCTGCGCATTTATAACAAGCCTAGTCCCAATGCTCATATTTCTATACACAAATCAAGAAATAATCATTTCCAACTGACATTGAATAACGATTCATACTATCAAATTATGCCTAAGCTT
Lemur_catta		AAGCTTCATAGGAGCAACCATTCTAATAATCGCACATGGCCTTACATCATCCATATTATTCTGTCTAGCCAACTCTAACTACGAACGAATCCATAGCCGTACAATACTACTAGCACGAGGGATCCAAACCATTCTCCCTCTTATAGCCACCTGATGACTACTCGCCAGCCTAACTAACCTAGCCCTACCCACCTCTATCAATTTAATTGGCGAACTATTCGTCACTATAGCATCCTTCTCATGATCAAACATTACAATTATCTTAATAGGCTTAAATATGCTCATCACCGCTCTCTATTCCCTCTATATATTAACTACTACACAACGAGGAAAACTCACATATCATTCGCACAACCTAAACCCATCCTTTACACGAGAAAACACCCTTATATCCATACACATACTCCCCCTTCTCCTATTTACCTTAAACCCCAAAATTATTCTAGGACCCACGTACTGTAAATATAGTTTAAA-AAAACACTAGATTGTGAATCCAGAAATAGAAGCTCAAAC-CTTCTTATTTACCGAGAAAGTAATGTATGAACTGCTAACTCTGCACTCCGTATATAAAAATACGGCTATCTCAACTTTTAAAGGATAGAAGTAATCCATTGGCCTTAGGAGCCAAAAA-ATTGGTGCAACTCCAAATAAAAGTAATAAATCTATTATCCTCTTTCACCCTTGTCACACTGATTATCCTAACTTTACCTATCATTATAAACGTTACAAACATATACAAAAACTACCCCTATGCACCATACGTAAAATCTTCTATTGCATGTGCCTTCATCACTAGCCTCATCCCAACTATATTATTTATCTCCTCAGGACAAGAAACAATCATTTCCAACTGACATTGAATAACAATCCAAACCCTAAAACTATCTATTAGCTT
Homo_sapiens		AAGCTTCACCGGCGCAGTCATTCTCATAATCGCCCACGGGCTTACATCCTCATTACTATTCTGCCTAGCAAACTCAAACTACGAACGCACTCACAGTCGCATCATAATCCTCTCTCAAGGACTTCAAACTCTACTCCCACTAATAGCTTTTTGATGACTTCTAGCAAGCCTCGCTAACCTCGCCTTACCCCCCACTATTAACCTACTGGGAGAACTCTCTGTGCTAGTAACCACGTTCTCCTGATCAAATATCACTCTCCTACTTACAGGACTCAACATACTAGTCACAGCCCTATACTCCCTCTACATATTTACCACAACACAATGGGGCTCACTCACCCACCACATTAACAACATAAAACCCTCATTCACACGAGAAAACACCCTCATGTTCATACACCTATCCCCCATTCTCCTCCTATCCCTCAACCCCGACATCATTACCGGGTTTTCCTCTTGTAAATATAGTTTAACCAAAACATCAGATTGTGAATCTGACAACAGAGGCTTA-CGACCCCTTATTTACCGAGAAAGCT-CACAAGAACTGCTAACTCATGCCCCCATGTCTAACAACATGGCTTTCTCAACTTTTAAAGGATAACAGCTATCCATTGGTCTTAGGCCCCAAAAATTTTGGTGCAACTCCAAATAAAAGTAATAACCATGCACACTACTATAACCACCCTAACCCTGACTTCCCTAATTCCCCCCATCCTTACCACCCTCGTTAACCCTAACAAAAAAAACTCATACCCCCATTATGTAAAATCCATTGTCGCATCCACCTTTATTATCAGTCTCTTCCCCACAACAATATTCATGTGCCTAGACCAAGAAGTTATTATCTCGAACTGACACTGAGCCACAACCCAAACAACCCAGCTCTCCCTAAGCTT
Pan	  		AAGCTTCACCGGCGCAATTATCCTCATAATCGCCCACGGACTTACATCCTCATTATTATTCTGCCTAGCAAACTCAAATTATGAACGCACCCACAGTCGCATCATAATTCTCTCCCAAGGACTTCAAACTCTACTCCCACTAATAGCCTTTTGATGACTCCTAGCAAGCCTCGCTAACCTCGCCCTACCCCCTACCATTAATCTCCTAGGGGAACTCTCCGTGCTAGTAACCTCATTCTCCTGATCAAATACCACTCTCCTACTCACAGGATTCAACATACTAATCACAGCCCTGTACTCCCTCTACATGTTTACCACAACACAATGAGGCTCACTCACCCACCACATTAATAACATAAAGCCCTCATTCACACGAGAAAATACTCTCATATTTTTACACCTATCCCCCATCCTCCTTCTATCCCTCAATCCTGATATCATCACTGGATTCACCTCCTGTAAATATAGTTTAACCAAAACATCAGATTGTGAATCTGACAACAGAGGCTCA-CGACCCCTTATTTACCGAGAAAGCT-TATAAGAACTGCTAATTCATATCCCCATGCCTGACAACATGGCTTTCTCAACTTTTAAAGGATAACAGCCATCCGTTGGTCTTAGGCCCCAAAAATTTTGGTGCAACTCCAAATAAAAGTAATAACCATGTATACTACCATAACCACCTTAACCCTAACTCCCTTAATTCTCCCCATCCTCACCACCCTCATTAACCCTAACAAAAAAAACTCATATCCCCATTATGTGAAATCCATTATCGCGTCCACCTTTATCATTAGCCTTTTCCCCACAACAATATTCATATGCCTAGACCAAGAAGCTATTATCTCAAACTGGCACTGAGCAACAACCCAAACAACCCAGCTCTCCCTAAGCTT
Gorilla   		AAGCTTCACCGGCGCAGTTGTTCTTATAATTGCCCACGGACTTACATCATCATTATTATTCTGCCTAGCAAACTCAAACTACGAACGAACCCACAGCCGCATCATAATTCTCTCTCAAGGACTCCAAACCCTACTCCCACTAATAGCCCTTTGATGACTTCTGGCAAGCCTCGCCAACCTCGCCTTACCCCCCACCATTAACCTACTAGGAGAGCTCTCCGTACTAGTAACCACATTCTCCTGATCAAACACCACCCTTTTACTTACAGGATCTAACATACTAATTACAGCCCTGTACTCCCTTTATATATTTACCACAACACAATGAGGCCCACTCACACACCACATCACCAACATAAAACCCTCATTTACACGAGAAAACATCCTCATATTCATGCACCTATCCCCCATCCTCCTCCTATCCCTCAACCCCGATATTATCACCGGGTTCACCTCCTGTAAATATAGTTTAACCAAAACATCAGATTGTGAATCTGATAACAGAGGCTCA-CAACCCCTTATTTACCGAGAAAGCT-CGTAAGAGCTGCTAACTCATACCCCCGTGCTTGACAACATGGCTTTCTCAACTTTTAAAGGATAACAGCTATCCATTGGTCTTAGGACCCAAAAATTTTGGTGCAACTCCAAATAAAAGTAATAACTATGTACGCTACCATAACCACCTTAGCCCTAACTTCCTTAATTCCCCCTATCCTTACCACCTTCATCAATCCTAACAAAAAAAGCTCATACCCCCATTACGTAAAATCTATCGTCGCATCCACCTTTATCATCAGCCTCTTCCCCACAACAATATTTCTATGCCTAGACCAAGAAGCTATTATCTCAAGCTGACACTGAGCAACAACCCAAACAATTCAACTCTCCCTAAGCTT
Pongo     		AAGCTTCACCGGCGCAACCACCCTCATGATTGCCCATGGACTCACATCCTCCCTACTGTTCTGCCTAGCAAACTCAAACTACGAACGAACCCACAGCCGCATCATAATCCTCTCTCAAGGCCTTCAAACTCTACTCCCCCTAATAGCCCTCTGATGACTTCTAGCAAGCCTCACTAACCTTGCCCTACCACCCACCATCAACCTTCTAGGAGAACTCTCCGTACTAATAGCCATATTCTCTTGATCTAACATCACCATCCTACTAACAGGACTCAACATACTAATCACAACCCTATACTCTCTCTATATATTCACCACAACACAACGAGGTACACCCACACACCACATCAACAACATAAAACCTTCTTTCACACGCGAAAATACCCTCATGCTCATACACCTATCCCCCATCCTCCTCTTATCCCTCAACCCCAGCATCATCGCTGGGTTCGCCTACTGTAAATATAGTTTAACCAAAACATTAGATTGTGAATCTAATAATAGGGCCCCA-CAACCCCTTATTTACCGAGAAAGCT-CACAAGAACTGCTAACTCTCACT-CCATGTGTGACAACATGGCTTTCTCAGCTTTTAAAGGATAACAGCTATCCCTTGGTCTTAGGATCCAAAAATTTTGGTGCAACTCCAAATAAAAGTAACAGCCATGTTTACCACCATAACTGCCCTCACCTTAACTTCCCTAATCCCCCCCATTACCGCTACCCTCATTAACCCCAACAAAAAAAACCCATACCCCCACTATGTAAAAACGGCCATCGCATCCGCCTTTACTATCAGCCTTATCCCAACAACAATATTTATCTGCCTAGGACAAGAAACCATCGTCACAAACTGATGCTGAACAACCACCCAGACACTACAACTCTCACTAAGCTT
Hylobates 		AAGCTTTACAGGTGCAACCGTCCTCATAATCGCCCACGGACTAACCTCTTCCCTGCTATTCTGCCTTGCAAACTCAAACTACGAACGAACTCACAGCCGCATCATAATCCTATCTCGAGGGCTCCAAGCCTTACTCCCACTGATAGCCTTCTGATGACTCGCAGCAAGCCTCGCTAACCTCGCCCTACCCCCCACTATTAACCTCCTAGGTGAACTCTTCGTACTAATGGCCTCCTTCTCCTGGGCAAACACTACTATTACACTCACCGGGCTCAACGTACTAATCACGGCCCTATACTCCCTTTACATATTTATCATAACACAACGAGGCACACTTACACACCACATTAAAAACATAAAACCCTCACTCACACGAGAAAACATATTAATACTTATGCACCTCTTCCCCCTCCTCCTCCTAACCCTCAACCCTAACATCATTACTGGCTTTACTCCCTGTAAACATAGTTTAATCAAAACATTAGATTGTGAATCTAACAATAGAGGCTCG-AAACCTCTTGCTTACCGAGAAAGCC-CACAAGAACTGCTAACTCACTATCCCATGTATGACAACATGGCTTTCTCAACTTTTAAAGGATAACAGCTATCCATTGGTCTTAGGACCCAAAAATTTTGGTGCAACTCCAAATAAAAGTAATAGCAATGTACACCACCATAGCCATTCTAACGCTAACCTCCCTAATTCCCCCCATTACAGCCACCCTTATTAACCCCAATAAAAAGAACTTATACCCGCACTACGTAAAAATGACCATTGCCTCTACCTTTATAATCAGCCTATTTCCCACAATAATATTCATGTGCACAGACCAAGAAACCATTATTTCAAACTGACACTGAACTGCAACCCAAACGCTAGAACTCTCCCTAAGCTT
Macaca_fuscata		AAGCTTTTCCGGCGCAACCATCCTTATGATCGCTCACGGACTCACCTCTTCCATATATTTCTGCCTAGCCAATTCAAACTATGAACGCACTCACAACCGTACCATACTACTGTCCCGAGGACTTCAAATCCTACTTCCACTAACAGCCTTTTGATGATTAACAGCAAGCCTTACTAACCTTGCCCTACCCCCCACTATCAATCTACTAGGTGAACTCTTTGTAATCGCAACCTCATTCTCCTGATCCCATATCACCATTATGCTAACAGGACTTAACATATTAATTACGGCCCTCTACTCTCTCCACATATTCACTACAACACAACGAGGAACACTCACACATCACATAATCAACATAAAGCCCCCCTTCACACGAGAAAACACATTAATATTCATACACCTCGCTCCAATTATCCTTCTATCCCTCAACCCCAACATCATCCTGGGGTTTACCTCCTGTAGATATAGTTTAACTAAAACACTAGATTGTGAATCTAACCATAGAGACTCA-CCACCTCTTATTTACCGAGAAAACT-CGCAAGGACTGCTAACCCATGTACCCGTACCTAAAATTACGGTTTTCTCAACTTTTAAAGGATAACAGCTATCCATTGACCTTAGGAGTCAAAAACATTGGTGCAACTCCAAATAAAAGTAATAATCATGCACACCCCCATCATTATAACAACCCTTATCTCCCTAACTCTCCCAATTTTTGCCACCCTCATCAACCCTTACAAAAAACGTCCATACCCAGATTACGTAAAAACAACCGTAATATATGCTTTCATCATCAGCCTCCCCTCAACAACTTTATTCATCTTCTCAAACCAAGAAACAACCATTTGGAGCTGACATTGAATAATGACCCAAACACTAGACCTAACGCTAAGCTT
M_mulatta		AAGCTTTTCTGGCGCAACCATCCTCATGATTGCTCACGGACTCACCTCTTCCATATATTTCTGCCTAGCCAATTCAAACTATGAACGCACTCACAACCGTACCATACTACTGTCCCGGGGACTTCAAATCCTACTTCCACTAACAGCTTTCTGATGATTAACAGCAAGCCTTACTAACCTTGCCCTACCCCCCACTATCAACCTACTAGGTGAACTCTTTGTAATCGCGACCTCATTCTCCTGGTCCCATATCACCATTATATTAACAGGATTTAACATACTAATTACGGCCCTCTACTCCCTCCACATATTCACCACAACACAACGAGGAGCACTCACACATCACATAATCAACATAAAACCCCCCTTCACACGAGAAAACATATTAATATTCATACACCTCGCTCCAATCATCCTCCTATCTCTCAACCCCAACATCATCCTGGGGTTTACTTCCTGTAGATATAGTTTAACTAAAACATTAGATTGTGAATCTAACCATAGAGACTTA-CCACCTCTTATTTACCGAGAAAACT-CGCGAGGACTGCTAACCCATGTATCCGTACCTAAAATTACGGTTTTCTCAACTTTTAAAGGATAACAGCTATCCATTGACCTTAGGAGTCAAAAATATTGGTGCAACTCCAAATAAAAGTAATAATCATGCACACCCCTATCATAATAACAACCCTTATCTCCCTAACTCTCCCAATTTTTGCCACCCTCATCAACCCTTACAAAAAACGTCCATACCCAGATTACGTAAAAACAACCGTAATATATGCTTTCATCATCAGCCTCCCCTCAACAACTTTATTCATCTTCTCAAACCAAGAAACAACCATTTGAAGCTGACATTGAATAATAACCCAAACACTAGACCTAACACTAAGCTT
M_fascicularis		AAGCTTCTCCGGCGCAACCACCCTTATAATCGCCCACGGGCTCACCTCTTCCATGTATTTCTGCTTGGCCAATTCAAACTATGAGCGCACTCATAACCGTACCATACTACTATCCCGAGGACTTCAAATTCTACTTCCATTGACAGCCTTCTGATGACTCACAGCAAGCCTTACTAACCTTGCCCTACCCCCCACTATTAATCTACTAGGCGAACTCTTTGTAATCACAACTTCATTTTCCTGATCCCATATCACCATTGTGTTAACGGGCCTTAATATACTAATCACAGCCCTCTACTCTCTCCACATGTTCATTACAGTACAACGAGGAACACTCACACACCACATAATCAATATAAAACCCCCCTTCACACGAGAAAACATATTAATATTCATACACCTCGCTCCAATTATCCTTCTATCTCTCAACCCCAACATCATCCTGGGGTTTACCTCCTGTAAATATAGTTTAACTAAAACATTAGATTGTGAATCTAACTATAGAGGCCTA-CCACTTCTTATTTACCGAGAAAACT-CGCAAGGACTGCTAATCCATGCCTCCGTACTTAAAACTACGGTTTCCTCAACTTTTAAAGGATAACAGCTATCCATTGACCTTAGGAGTCAAAAACATTGGTGCAACTCCAAATAAAAGTAATAATCATGCACACCCCCATCATAATAACAACCCTCATCTCCCTGACCCTTCCAATTTTTGCCACCCTCACCAACCCCTATAAAAAACGTTCATACCCAGACTACGTAAAAACAACCGTAATATATGCTTTTATTACCAGTCTCCCCTCAACAACCCTATTCATCCTCTCAAACCAAGAAACAACCATTTGGAGTTGACATTGAATAACAACCCAAACATTAGACCTAACACTAAGCTT
M_sylvanus		AAGCTTCTCCGGTGCAACTATCCTTATAGTTGCCCATGGACTCACCTCTTCCATATACTTCTGCTTGGCCAACTCAAACTACGAACGCACCCACAGCCGCATCATACTACTATCCCGAGGACTCCAAATCCTACTCCCACTAACAGCCTTCTGATGATTCACAGCAAGCCTTACTAATCTTGCTCTACCCTCCACTATTAATCTACTGGGCGAACTCTTCGTAATCGCAACCTCATTTTCCTGATCCCACATCACCATCATACTAACAGGACTGAACATACTAATTACAGCCCTCTACTCTCTTCACATATTCACCACAACACAACGAGGAGCGCTCACACACCACATAATTAACATAAAACCACCTTTCACACGAGAAAACATATTAATACTCATACACCTCGCTCCAATTATTCTTCTATCTCTTAACCCCAACATCATTCTAGGATTTACTTCCTGTAAATATAGTTTAATTAAAACATTAGACTGTGAATCTAACTATAGAAGCTTA-CCACTTCTTATTTACCGAGAAAACT-TGCAAGGACCGCTAATCCACACCTCCGTACTTAAAACTACGGTTTTCTCAACTTTTAAAGGATAACAGCTATCCATTGGCCTTAGGAGTCAAAAATATTGGTGCAACTCCAAATAAAAGTAATAATCATGTATACCCCCATCATAATAACAACTCTCATCTCCCTAACTCTTCCAATTTTCGCTACCCTTATCAACCCCAACAAAAAACACCTATATCCAAACTACGTAAAAACAGCCGTAATATATGCTTTCATTACCAGCCTCTCTTCAACAACTTTATATATATTCTTAAACCAAGAAACAATCATCTGAAGCTGGCACTGAATAATAACCCAAACACTAAGCCTAACATTAAGCTT
Saimiri_sciureus	AAGCTTCACCGGCGCAATGATCCTAATAATCGCTCACGGGTTTACTTCGTCTATGCTATTCTGCCTAGCAAACTCAAATTACGAACGAATTCACAGCCGAACAATAACATTTACTCGAGGGCTCCAAACACTATTCCCGCTTATAGGCCTCTGATGACTCCTAGCAAATCTCGCTAACCTCGCCCTACCCACAGCTATTAATCTAGTAGGAGAATTACTCACAATCGTATCTTCCTTCTCTTGATCCAACTTTACTATTATATTCACAGGACTTAATATACTAATTACAGCACTCTACTCACTTCATATGTATGCCTCTACACAGCGAGGTCCACTTACATACAGCACCAGCAATATAAAACCAATATTTACACGAGAAAATACGCTAATATTTATACATATAACACCAATCCTCCTCCTTACCTTGAGCCCCAAGGTAATTATAGGACCCTCACCTTGTAATTATAGTTTAGCTAAAACATTAGATTGTGAATCTAATAATAGAAGAATA-TAACTTCTTAATTACCGAGAAAGTG-CGCAAGAACTGCTAATTCATGCTCCCAAGACTAACAACTTGGCTTCCTCAACTTTTAAAGGATAGTAGTTATCCATTGGTCTTAGGAGCCAAAAACATTGGTGCAACTCCAAATAAAAGTAATA---ATACACTTCTCCATCACTCTAATAACACTAATTAGCCTACTAGCGCCAATCCTAGCTACCCTCATTAACCCTAACAAAAGCACACTATACCCGTACTACGTAAAACTAGCCATCATCTACGCCCTCATTACCAGTACCTTATCTATAATATTCTTTATCCTTACAGGCCAAGAATCAATAATTTCAAACTGACACTGAATAACTATCCAAACCATCAAACTATCCCTAAGCTT
;
end;

begin mrbayes; 
    set autoclose=yes nowarn=yes; 
    lset nst=6 rates=gamma; 
    mcmc nruns=1 ngen=10000 samplefreq=10; 
end; 


2. Create a submit script(for example, submit) as follows.

#!/bin/bash
#$ -q PP16.q
#$ -N test
#$ -cwd

/share/apps/mrbayes-3.1.2/mb primates.nex


3. To submit the job, enter the command by typing "qsub submit".

qsub submit


4. Output can be found in file test.o****.

Running MrBayes with two or more cores

1. Prepare a job script(replicase.nex)

#NEXUS

begin data;
	dimensions  ntax=9 nchar=720;
	format datatype=rna interleave=no gap=- missing=?;
	matrix
	[       1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50]
	FR      GGC AAC GGU --- GUG UUU ACU GUU CCG AAG AAC AAU AAA AUA GAU CGG GCU GCU UGC AAA GAG CCU GAU AUG AAU AUG UAC UUA CAG AAA GGG GUC GGC GGU UUC AUC CGU CGC CGC CUU AAG ACU GUG GGU AUA GAC UUG AAC GAU CAA ACG AUC AAU CAA CGC CUG GCU CAA CAA GGU AGC CGU GAU GGG UCU UUG GCG ACG AUA GAC UUA UCG UCU GCU UCU GAC UCC AUC AGC GAC CGC CUA GUG UGG AGU UUU CUC CCA CCU GAG CUA UAU UCA UAU CUC GAC AUG AUU CGA AGC CAC UAC GGU UAC GUA --- AAU GGC AAG AUG AUU CGU UGG GAA CUA UUU UCG ACG AUG GGU AAU GGG UUC ACC UUU GAA CUA GAG UCC AUG AUU UUC UGG GCU AUA GUC AGG GCU ACU CAG AUC CAU UUU CGU --- AAC ACC GGA ACC AUU GGC AUC UAU GGG GAC GAU AUU AUA UGC CCC ACA GAG AUU GCA CCU CGC GUG CUG GAA GCA CUG AGC UUC UAC GGU UUC AAA CCG AAU CUA CGA AAG ACG --- UUC ACG UCC GGC UCU UUU CGC GAG AGC UGC GGC GCG CAC UAU UUC CGU GGU GUC GAU GUU AAA CCA UUU UAU AUC AAG AAA CCA AUC ACU GAC CUA UUC UCC CUA AUG CUU AUA CUU AAC CGU AUA CGC GGA UGG GGG GUA GUC AAC GGA AUAGCAGACCCACGCCUC
	MS2     GGG AAC GGA --- GUG UUU ACA GUU CCG AAG AAU AAU AAA AUA GAU CGG GCU GCC UGU AAG GAG CCU GAU AUG AAU AUG UAC CUC CAG AAA GGG GUC GGU GCU UUC AUC AGA CGC CGG CUC AAA UCC GUU GGU AUA GAC CUG AAU GAU CAA UCG AUC AAC CAG CGU CUG GCU CAG CAG GGC AGCGUAGAUGGUUCGCUUGCGACGAUAGACUUAUCGUCUGCAUCCGAUUCCAUCUCCGAUCGCCUGGUGUGGAGUUUUCUCCCACCUGAGCUAUAUUCAUAUCUCGAUCGUAUCCGCUCACACUACGGA---GUA---GAUGGCGAGACGAUACGAUGGGAACUAUUUUCCACAAUGGGAAAUGGGUUCACAUUUGAGCUAGAGUCCAUGAUAUUCUGGGCAAUAGUCAAAGCGACCCAAAUCCAUUUUGGU---AACGCCGGAACCAUAGGCAUCUACGGGGACGAUAUUAUAUGUCCCAGUGAGAUUGCACCCCGUGUGCUAGAGGCACUUGCCUACUACGGUUUUAAACCGAAUCUUCGUAAAACG---UUCGUGUCCGGGCUCUUUCGCGAGAGCUGCGGCGCGCACUUUUACCGUGGUGUCGAUGUCAAACCGUUUUACAUCAAGAAACCUGUUGACAAUCUCUUCGCCCUGAUGCUGAUAUUAAAUCGGCUACGGGGUUGGGGAGUUGUCGGAGGUAUGUCAGAUCCACGCCUC
	GA      GGC AAC GGU --- UUG UUU UCU GUU CCG AAG AAC AAU AAA AUA GAU CGG GCU GCC UGU AAG GAG CCU GAU AUG AAU AUG UAC CUU CAG AAG GGG GCG GGA UCU UUU AUA AGA AAA CGC CUU CGC UCC GUC GGU AUA GAU CUU AAC GAU CAG ACG CGC AAU CAG GAA UUA GCC CGA CUU GGC AGCAUUGAUGGUUCGCUCGCUACUAUUGAUCUUAGUAGCGCUAGCGAUUCCAUCUCUGACCGUCUUGUCUGGGAUCUACUUCCGCCGCACGUUUAUUCAUACCUCGCUCGUAUCCGAACAUCGUUCACUAUGAUC---GAUGGGCGUUUACAUAAGUGGGGUCUAUUUUCUACCAUGGGUAAUGGCUUCACGUUCGAACUCGAGUCCAUGAUCUUUUGGGCUUUAAGCAAGAGCAUUAUGCUGUCCAUGGGUGUU---ACUGGCUCAUUAGGCAUCUACGGUGAUGAUAUAAUCGUCCCCGUUGAGUGUCGUCCAACUCUCCUUAAGGUACUAUCCGCUGUAAACUUUCUUCCUAAUGAGGAGAAAACA---UUUACAACGGGUUACUUUCGUGAAAGUUGUGGUGCCCACUUCUUCAAAGAUGCCGACAUGAAACCUUUUUACUGCAAGCGGCCAAUGGAAACCCUUCCCGAUGUCAUGUUGCUAUGCAACAGGAUAAGAGGUUGGCAGACCGUUGGUGGAAUGUCAGAUCCGCGACUC
	SP      UCA --- AAU AAA GCA GUC ACU GUU CCA AAG AAC AGU AAA ACU GAU CGC UGU AUU GCU AUC GAG CCC GGC UGG AAU AUG UUU UUC CAG UUA GGC GUC GGU GCA GUG CUA CGC GAU AGG UUG CGU UUA UGG AAG AUU GAU CUU AAU GAC CAA UCG ACC AAU CAA CGC CUC GCG CGU GAU GGG UCUCUGCUAAAUCAUUUAGCUACCAUAGACUUAUCUGCAGCCAGCGAUUCAAUCAGCCUUAAGCUUGUUGAGUUGCUCAUGCCCCCUGAAUGGUAUGACCUUCUAACGGAUCUCCGAUCCGAUGAAGGAAUACUGCCUGACGGGCGAGUUGUGACCUAUGAGAAAAUAUCCUCCAUGGGUAAUGGCUACACUUUCGAACUCGAGUCGCUUAUUUUUGCGGCUAUCGCUCGAAGUGUGUGCGAGUUACUGGAAAUUGACCAAUCUACUGUUAGCGUGUACGGGGAUGAUAUAAUCAUCGAUACCCGUGCCGCAGCUCCAUUAAUGGAUGUCUUUGAGUACGUCGGGUUCACUCCUAACAGAAAGAAAACG---UUCUGCGAUGGACCCUUCCGCGAAUCGUGCGGUAAGCACUGGUUCCAAGGGGUAGAUGUAACGCCCUUUUACAUACGACGACCAAUACGUUGCCUAGCCGAUAUGAUACUUGUAUUAAAUAGUAUCUAUAGGUGGGGCACUGUUGAUGGCAUAUGGGAUCCUAGAGCA
	NL95    UCG --- AAU AAA GCA GUC ACU GUU CCA AAG AAC AGU AAA ACU GAU CGC UGC AUU GCU AUC GAG CCC GGC UGG AAU AUG UUU UUC CAG UUA GGC GUC GGU GCU GUG CUC CGU GAU CGG UUG CGC CUU UGG CAU AUU GAU CUC AAU GAU CAA UCU GUU AAU CAG CGC CUC GCA CGU GAU GCA UCGCAGUUGGACCAUUUGGCCACUGUCGAUUUAUCAGCAGCAAGCGAUUCGAUAAGCUUACGGCUUGUUGAACUGCUAAUGCCGCCUGCUUGGUUUGAUCUCCUGACCGAUCUCCGAUCGGACCAGGGAAUCCUGCCUGACGGGCGUGUCGUUACUUACGAGAAAAUAUCCUCCAUGGGUAAUGGCUACACUUUUGAGCUAGAGUCGUUAAUUUUCGCGGCUCUCGCCAGAAGUGUGUGCGAGUUAUUGGACCUUGACCAGUCAACUGUCAGCGUGUACGGUGAUGAUAUAAUCAUCGAUUCACGUGCCGCUGAUGUCCUUAUGGCGGUUUUCGAGUAUGUUGGGUUUACGCCUAAUCGAAAGAAAACU---UUCAUUAAGGGCCCCUUUAGAGAGUCGUGCGGAAAGCACUGGCACUCCGGGGUUGACGUAACGCCCUUUUACAUACGCCGCCCAAUCCGCUGCCUAGCCGACAUGAUACUUGUAUUGAACAGUAUCUACCGGUGGGGUACGAUUGACGGUGUGUGGGAUCCUAGGGUA
	M11     CCU UUC AAU AAA GCA GUU ACU GUA CCA AAG AAC AGU AAA ACU GAU CGC UGU AUA GCC AUC GAA CCU GGC UGG AAU AUG UUU UUC CAG CUA GGU AUC GGU GGU GUU AUA CGC GAA AAG UUG CGU UUG UGG GGC AUC GAU CUG AAU GAU CAG ACG AUU AAC CAA ACG CGC GCA UAU UUA GGC AGCCGUGAUGAUAAUCUCGCCACGGUGGAUCUCUCAAGAGCUAGCGAUACUAUUUCGCUUGCCCUUGUUGAGCUCCUUAUGCCUCCUGAGUGGUUUAAGGUCCUGUUGGCCUUAAGAUCACCCAAGGGCAUCUUGCCAGAUGGUACCGUCAUUACUUAUGAGAAAAUAUCCUCAAUGGGUAAUGGCUAUACCUUCGAGCUUGAGUCGCUUAUAUUUGCGGCUCUUGCUCGGUCUUUAUGCGAAUUACUGGGCUUACGACCGUCAGAUGUUACGGUCUAUGGCGAUGACAUAAUAUUGCCAUCAGACGCGUGCAGUCCUCUAGUUGAAGUUUUCUCCUAUGUUGGUUUUCGUACCAACAAGAAGAAAACG---UUUUCUAGUGGACCGUUCCGAGAGUCGUGCGGAAAGCACUACUUUUUGGGCGUUGACGUCACACCUUUCUACAUACGUCGCCGUAUAGUGAGUCCCUCCGAUCUCAUACUGGUUUUGAACCAGAUGUAUCGUUGGGCCACAAUUGACGGCGUAUGGGAUCCUAGGGUA
	MX1     CCU UUC AAU AAA GCA GUU ACU GUA CCA AAG AAC AGU AAA ACU GAU CGC UGC AUC GCU AUC GAG CCA GGC UGG AAU AUG UUU UUC CAG UUG GGC AUU GGU GGC GUA AUU CGC GAA AAG UUG CAC UUG UGG AAU AUC GAC CUG AAU GAU CAG ACG AUU AAC CAG GUG CGC GCA UAU UCA GGC AGCUGUAGCAAUGAACUUGCUACAGUGGAUCUCUCGAGCGCGAGUGAUACUAUUUCGCUUGCGCUCGUUGAGCUCCUGCUACCCCCUGCGUGGUUUAAAGUCCUUACGGACCUUAGGUCACGAAGGGGUAUGUUGCCAGACGGUAGAAUCAUUACCUAUGAGAAAAUUUCCUCAAUGGGUAACGGUUUCACCUUCGAGCUCGAGUCGCUUAUAUUUGCAGCUCUUGCUCGGUCUUUAUGCGAGUUACUGAACUUACAACCGUCGAGUGUCACGGUCUAUGGCGAUGAUAUUAUAUUGCCAUCAGACGCGUGCAGCUCGUUGAUUGAAGUUUUCUCCUACGUAGGUUUUAGAACCAACGAGAAGAAGACC---UUUUUCGACGGGCCGUUCCGAGAGUCGUGCGGAAAGCACUACUUUAUGGGCGUUGACGUCACACCUUUCUACAUACGCCACCGUAUAGUGAGUCCCUCUGAUCUCAUACUGGUUUUGAACCAGAUGUAUCGUUGGGCCACGAUUGAUGGCGUAUGGGAUCCUAGGGUA
	QB      CCU UUU AAU AAA GCA GUU ACU GUA CCU AAG AAC AGU AAG ACA GAU CGU UGU AUU GCU AUC GAA CCU GGU UGG AAU AUG UUU UUC CAA CUG GGU AUC GGU GGC AUU CUA CGC GAU CGG UUG CGU UGC UGG GGU AUC GAU CUG AAU GAU CAG ACG AUA AAU CAG CGC CGC GCU CAC GAA GGC UCCGUUACUAAUAACUUAGCAACGGUUGAUCUCUCAGCGGCAAGCGAUUCUAUAUCUCUUGCCCUCUGUGAGCUCUUAUUGCCCCUAGGCUGGUUUGAGGUUCUUAUGGACCUCAGAUCACCUAAGGGGCGAUUGCCUGACGGUAGUGUUGUUACCUACGAGAAGAUUUCUUCUAUGGGUAACGGUUACACAUUCGAGCUCGAGUCGCUUAUUUUUGCUUCUCUCGCUCGUUCCGUUUGUGAGAUACUGGACUUAGACUCGUCUGAGGUCACUGUUUACGGAGACGAUAUUAUUUUACCGUCCUGUGCAGUCCCUGCCCUCCGGGAAGUUUUUAAGUAUGUUGGUUUUACGACCAAUACUAAAAAGACU---UUUUCCGAGGGGCCGUUCAGAGAGUCGUGCGGCAAGCACUACUAUUCUGGCGUAGAUGUUACUCCCUUUUACAUACGUCACCGUAUAGUGAGUCCUGCCGAUUUAAUACUGGUUUUGAAUAACCUAUAUCGGUGGGCCACAAUUGACGGCGUAUGGGAUCCUAGGGCC
	PP7     GAC AGC --- CGG UUC GAU UUU GUC GCU AAG ACC GCG AAG GCG GUU CGC UUC AUC GCU AUG GAG CCA GAA CUU AAC AUG CUG CUG CAG AAA UCU GUA GGA GAC ACG AUA AGG GCU GCU CUG CGG AAA GCG GGU AUC GAU CUC AAU ACC CAG CGA CUA AAU CAA GAU CUU GCG UAC CAC GGA UCCGUUUUUCGGAAUCUCGGUACGAUAGAUCUGUCUAGCGCUUCCGAUACGUUAAGCAUUGAACUCGUGCGGCAGUACCUGCCGAAGCGGUUUCUCCGCUAUGUAUUGGAUCUCCGAACCCCCUACACGAGUGUA---GGUGGUAAGAAGCACAGGCUCGAGAAGGUCGCUUCGAUGGGCAACGGGUUCAUUUUUGAACUCCAGAGCCUCAUCUACGCAGCCUUCGCGCAUGCCAUGACGCUAGUAGUAGGAGGAAGAGAAUGCGACAUAGCCAUUUACGGCGAUGAUAUCAUCGUCAGUGAAUGCGUAGUAGAGCCUCUGAUGCAGUUCCUCGAAUGGCAUGGGUUCUGCCCCAAUCUCGAUAAGAGUUAUUGGGGAGGGGAUCCAUUCCGCGAGUCCUGCGGGAAGCACUACUUCGCUGGUCGCGACGUUACCCCUGUCUACGUGAAGGGGGCCCUGGAUAACCUACCUGCCCUUUUCCGUCUCUUCAACUCGUUGAAGCGAUGGGAGGAGCAAACAGGUAUCCGGAUCCCUGACACG
	;
end;

begin mrbayes;
	[The following line is useful for automatic execution with no
	 warnings issued before files are overwritten and automatic
	 termination of chains after the prespecified number of generations.]
	set autoclose=yes nowarn=yes;
	
	[The following block demonstrates how you can set up a model that
	 allows the third codon position to have gamma-distributed rate
	 variation with a different shape parameter than the first and 
	 second positions.]

	[First define character sets]
	charset first_pos  = 1-720\3;
	charset second_pos = 2-720\3;
	charset third_pos  = 3-720\3;
	
	[Define the partition]
	partition by_codon = 3:first_pos,second_pos,third_pos;
	
	[Select the partition]
	set partition=by_codon;
	
	[Set a GTR + gamma model for all partitions]
	lset nst=6 rates=gamma;

	[This is the line that allows the gamma shape parameter of the third codon position
	 to be unique:]
	unlink shape=(3);
                mcmc ngen=1000 samplefreq=10;

end;

2. Create a submit script(for example, submit) as follows.

#!/bin/bash
#$ -q PP16.q
#$ -N test
#$ -pe mpi 2
#$ -cwd

/opt/openmpi/bin/mpirun --hostfile $TMPDIR/machines -np $NSLOTS /share/apps/mrbayes-3.1.2/mb replicase.nex

NAMD

NAMD, recipient of a 2002 Gordon Bell Award, is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. Based on Charm++ parallel objects, NAMD scales to hundreds of processors on high-end parallel platforms and tens of processors on commodity clusters using gigabit ethernet. NAMD uses the popular molecular graphics program VMD for simulation setup and trajectory analysis, but is also file-compatible with AMBER, CHARMM, and X-PLOR. NAMD is distributed free of charge with source code.

For more informaiton, please go here.


Usage

1. NAMD job scripts are consisted of several files. Users can download examples here.( /share/apps/NAMD/NAMD_CVS_Source/Linux-x86_64-g++/src ) In this example, we will use "alanin" for demostration.


2. Create a submit script(for example, submit) as follows. Here we use 8 cores for running the job.

#!/bin/bash
#$ -q PP16.q
#$ -N namd
#$ -pe mpi 8
#$ -cwd
#$ -j y
# -S /bin/bash

echo group main ++shell ssh > $TMPDIR/machines
awk '{ for (i=0;i<$2;++i) {print "host",$1} }' $PE_HOSTFILE >> $TMPDIR/machines

/share/apps/NAMD/NAMD_CVS_Source/Linux-x86_64-g++/charmrun ++nodelist $TMPDIR/machines +p$NSLOTS /share/apps/NAMD/NAMD_CVS_Source/Linux-x86_64-g++/namd2 ./alanin


3. To submit the job, enter the command by typing "qsub submit".

qsub submit


4. Output can be found in file namd.o****.

Network Simulator-2 (NS2)

NS2 is a discrete event simulator targeted at networking research. NS2 provides substantial support for simulation of TCP, routing, and multicast protocols over wired and wireless (local and satellite) networks. Version 2.31 and 2.33 are both installed at the HPCC.

For more information, go here.

Usage

1. Prepare a Tcl script for NS2. An example(ex.tcl) is listed below. This example has 2 nodes with 1 link and uses UDP agent with CBR traffic generator.

set ns [new Simulator]
set tr [open trace.out w]
$ns trace-all $tr

proc finish {} {
        global ns tr
        $ns flush-trace
        close $tr
        exit 0
}

set n0 [$ns node]
set n1 [$ns node]

$ns duplex-link $n0 $n1 1Mb 10ms DropTail

set udp0 [new Agent/UDP]
$ns attach-agent $n0 $udp0
set cbr0 [new Application/Traffic/CBR]
$cbr0 set packetSize_ 500
$cbr0 set interval_ 0.005
$cbr0 attach-agent $udp0
set null0 [new Agent/Null]
$ns attach-agent $n1 $null0
$ns connect $udp0 $null0  

$ns at 0.5 "$cbr0 start"
$ns at 4.5 "$cbr0 stop"
$ns at 5.0 "finish"

$ns run


2. Create a submit script(for example, submit) as follows.

#!/bin/bash
#$ -q SC.q
#$ -N ns2
#$ -cwd

/share/apps/ns2/ns-allinone-2.31/ns-2.31/ns ./ex.tcl


3. To submit the job, enter the command by typing "qsub submit".

qsub submit


4. At HPCC, nam files can be produced, but cannot be run as nam requires a graphical environment for execution as only allow command line job submission is currently supported.

If there are any run errors, they can be found in ns2.e****. The output trace file is stored in trace.out.

Trace graph

Trace graph is a free network trace files analyser developed for network simulator ns-2 trace processing. Trace graph can support any trace format if converted to its own or ns-2 trace format.

Supported ns-2 trace file formats:

wired,satellite,wireless,new trace,wired-wireless.

For more information, click here.


Users must use Linux with X window system and perform the following steps:

1. SSH to Athena with X11 forwarding.

ssh -X athena

2. Start tracegraph by typing the command "trgraph".

trgraph

NWChem

To be added


RAxML

Randomized Axelerated Maximum Likelihood (RAxML) is a program for sequential and parallel Maximum Likelihood based inference of large phylogenetic trees. It has originally been derived from fastDNAml which in turn was derived from Joe Felsentein’s dnamlwhich is part of the PHYLIP package. RAxML 7.0.4 is the latest version and is installed at HPCC. It is recommended that RAxML run on four or more cores, but for illustrative purposes both single core and multi-core runs are illustrated below.

More information can be found here.

Running RAxML with 1 core

1. Prepare a job script(for example, alg.phy), and put the following code into it. It is a standard PHYLIP file of aligned DNA or amino-acids sequences. It should look like this in interleaved format :

5 60
Tax1        CCATCTCACGGTCGGTACGATACACCTGCTTTTGGCAG
Tax2        CCATCTCACGGTCAGTAAGATACACCTGCTTTTGGCGG
Tax3        CCATCTCCCGCTCAGTAAGATACCCCTGCTGTTGGCGG
Tax4        TCATCTCATGGTCAATAAGATACTCCTGCTTTTGGCGG
Tax5        CCATCTCACGGTCGGTAAGATACACCTGCTTTTGGCGG

GAAATGGTCAATATTACAAGGT
GAAATGGTCAACATTAAAAGAT
GAAATCGTCAATATTAAAAGGT
GAAATGGTCAATCTTAAAAGGT
GAAATGGTCAATATTAAAAGGT

For more detail about PHYLIP , please check the manual.


2. Create a submit script(for example, submit) as follows. Program options -m,-n,-s are required.

#!/bin/bash
#$ -N test1
#$ -q PP16.q
#$ -cwd

/share/apps/raxml_mpi/raxmlHPC -m GTRCAT -n TEST1 -s alg.phy

Program options:

  • -s : sequenceFileName
  • -n : outputFileName
  • -m : substitutionModel

For more program options, please check the manual.


3. To submit the job, simply enter the command "qsub submit".

qsub submit


4. Output files will be listed as follows:

  1. Parsimony starting tree is written to RAxML_parsimonyTree.TEST1.
  2. Final tree is written to RAxML_result.TEST1.
  3. Execution Log File is written to RAxML_log.TEST1.
  4. Execution information file is written to RAxML_info.TEST1.

Running RAxML with 2 or more cores

raxmlHPC-MPI is the MPI-parallelized version for all types of clusters to perform parallel bootstraps, rapid parallel bootstraps, or multiple inferences on the original alignment. The MPI-version is for executing really large production runs (i.e. 100 or 1,000 bootstraps). You can also perform multiple inferences on larger datasets in parallel to find a best-known ML tree for your dataset. Finally, the novel rapid BS algorithm and the associated ML search have also been parallelized with MPI.


1. Prepare a job script(for example, alg.phy), and put the following code into it.

5 60
Tax1        CCATCTCACGGTCGGTACGATACACCTGCTTTTGGCAG
Tax2        CCATCTCACGGTCAGTAAGATACACCTGCTTTTGGCGG
Tax3        CCATCTCCCGCTCAGTAAGATACCCCTGCTGTTGGCGG
Tax4        TCATCTCATGGTCAATAAGATACTCCTGCTTTTGGCGG
Tax5        CCATCTCACGGTCGGTAAGATACACCTGCTTTTGGCGG

GAAATGGTCAATATTACAAGGT
GAAATGGTCAACATTAAAAGAT
GAAATCGTCAATATTAAAAGGT
GAAATGGTCAATCTTAAAAGGT
GAAATGGTCAATATTAAAAGGT


2. Create a submit script(for example, submit) as follows. Program options -m,-n,-s and -N are required. This example uses 4 cores and performs 10 number of runs.

#!/bin/bash
#$ -N test2
#$ -q PP16.q
#$ -pe mpi 4
#$ -cwd

/opt/openmpi/bin/mpirun --hostfile $TMPDIR/machines -np $NSLOTS /share/apps/raxml_mpi/raxmlHPC-MPI -m GTRCAT -n TEST2 -s alg.phy -N 10

Program options:

  • -s : sequenceFileName
  • -n : outputFileName
  • -m : substitutionModel
  • -# or -N :numberOfRuns

For more program options, please check the manual.


3. To submit the job, simply enter the command "qsub submit".

qsub submit


4. Output files will be listed as follows:

  1. Parsimony starting tree is written to RAxML_parsimonyTree.TEST2.RUN.****.
  2. Final tree is written to RAxML_result.TEST2.RUN.****.
  3. Execution Log File is written to RAxML_log.TEST2.RUN.****.
  4. Execution information file is written to RAxML_info.TEST2.


References:

  • Alexandros Stamatakis : “RAxML-VI-HPC:Maximum Likelihood-based Phylogenetic Analyses with Thousands of Taxa and Mixed Models”, Bioinformatics 22(21):2688–2690, 2006.
  • Alexandros Stamatakis, Paul Hoover, and Jacques Rougemont:“A Rapid Bootstrap Algorithm for the RAxML Web-Servers”.

WRF

To be added

WRF-Chem

To be added

Help

If you have any questions regarding the CUNY HPCC please email us at: HPCHelp@mail.csi.cuny.edu

Contacts

Other Links

Personal tools