Applications
This is an index of available applications sorted by their academic relevance, as well as alphabetically.
For information about using modules to run your applications go to Using Modules To Run Your Applications.
Computational Physics and Computational Chemistry
Applications in this section use classical mechanics, quantum mechanics and thermodynamics and are applied in simulation studies of fundamental properties of atoms, molecules, and chemical reactions.
AMBER (Assisted Model Building with Energy Refinement)
Amber is the collective name for a suite of programs for classical bio-molecular simulations. The name "Amber" also denotes the family of potentials (force fields) used with Amber software. Here we discuss only simulation packages, but not the force fields or free tools available via AmberTools package. Details and submission scripts can be found at: http://wiki.csi.cuny.edu/cunyhpc/index.php/Applications_Environment/amber
AUTODOCK
AutoDock is a suite of automated docking tools.
CP2K
CP2K is a program to perform atomistic and molecular simulations of solid state, liquid, molecular, and biological systems.
It provides a general framework for different methods such as e.g., density functional theory (DFT) using a mixed Gaussian and plane waves approach (GPW) and classical pair and many-body potentials. CP2K provides state-of-the-art methods for efficient and accurate atomistic simulations. More information about our installation can be found here CP2K.
DL_POLY
DL_POLY is a general purpose molecular dynamics simulation package developed at Daresbury Laboratory by W. Smith, T.R. Forester and I.T. Todorov.
Both serial and parallel versions are available. The original package was developed by the Molecular Simulation Group (now part of the Computational Chemistry Group, MSG) at Daresbury Laboratory under the auspices of the Engineering and Physical Sciences Research Council (EPSRC) for the EPSRC's Collaborative Computational Project for the Computer Simulation of Condensed Phases ( CCP5). Later developments were also supported by the Natural Environment Research Council through the eMinerals project. The package is the property of the Central Laboratory of the Research Councils, UK. More information about our installation and use can be found here DL_POLY.
GAMESS-US
GAMESS is a program for ab initio molecular quantum chemistry.
Briefly, GAMESS can compute SCF wavefunctions ranging from RHF, ROHF, UHF, GVB, and MCSCF. Correlation corrections to these SCF wavefunctions include Configuration Interaction, second order perturbation Theory, and Coupled-Cluster approaches, as well as the Density Functional Theory approximation. Excited states can be computed by CI, EOM, or TD-DFT procedures. Nuclear gradients are available, for automatic geometry optimization, transition state searches, or reaction path following. Computation of the energy hessian permits prediction of vibrational frequencies, with IR or Raman intensities. Solvent effects may be modeled by the discrete Effective Fragment potentials, or continuum models such as the Polarizable Continuum Model. Numerous relativistic computations are available, including infinite order two component scalar corrections, with various spin-orbit coupling options. The Fragment Molecular Orbital method permits use of many of these sophisticated treatments to be used on very large systems, by dividing the computation into small fragments. Nuclear wavefunctions can also be computed, in VSCF, or with explicit treatment of nuclear orbitals by the NEO code. More information, including code, can be found here GAMESS-US.
Gaussian09
Gaussian09 is third-party, commercially licensed software from Gaussian, Inc. It is a set of programs for calculating electronic structure.
Gaussian09 is available for general use only on ANDY. The Gaussian User Guide can be found here at [[2]]. More information about our installation can be found here GAUSSIAN09.
GPAW
GPAW is a density-functional theory (DFT) Python code based on the projector-augmented wave (PAW) method and the atomic simulation environment (ASE).
It uses real-space uniform grids and multigrid methods, atom-centered basis-functions or plane-waves. GPAW calculations are controlled through scripts written in the programming language Python. GPAW relies on the Atomic Simulation Environment (ASE), which is a Python package that helps to describe atoms. The ASE package also handles molecular dynamics, analysis, visualization, geometry optimization and more. More information about our installation can be found here GPAW.
GROMACS
GROMACS (Groningen Machine for Chemical Simulations)
GROMACS is a full-featured suite of free software, licensed under the GNU General Public License to perform molecular dynamics simulations -- in other words, to simulate the behavior of molecular systems with hundreds to millions of particles using Newton's equations of motion. It is primarily used for research on proteins, lipids, and polymers, but can be applied to a wide variety of chemical and biological research questions.
Details and submission scripts for production runs can be found at: http://wiki.csi.cuny.edu/cunyhpc/index.php/Applications_Environment/gromacs Please note that preparing molecular system for simulation via GROMACS tools, cannot be done on login node. Instead the users must either use their own workstation or use interactive or development queues.
HONDO PLUS
Hondo Plus is a versatile electronic structure code that combines work from the original Hondo application developed by Harry King in the lab of Michel Dupuis and John Rys, and that of numerous subsequent contributers.
It is currently distributed from the research lab of Dr. Donald Truhlar at the University of Minnesota. Part of the advantage of Hondo Plus is the availability of source implementations of a wide variety of model chemistries developed over its life time that researchers can adapt to their particular needs. The license to use the code requires a literature citation which is documented in the Hondo Plus 5.1 manual found at:
http://comp.chem.umn.edu/hondoplus/HONDOPLUS_Manual_v5.1.2007.2.17.pdf
More information about our installation can be found here HONDO PLUS.
HOOMD
Performs general purpose particle dynamics simulations, taking advantage of NVIDIA GPUs to attain a level of performance equivalent to many processor cores on a fast cluster.
Unlike some other applications in the particle and molecular dynamics space, HOOMD developers have worked to implement all of the code's computationally intensive kernels on the GPU, although currently only single node, single-GPU or OpenMP-GPU runs are possible. There is no MPI-GPU or distributed parallel GPU version available at this time.
LAMMPS
LAMMPS is a classical molecular dynamics code that models an ensemble of particles in a liquid, solid, or gaseous state.
It can model atomic, polymeric, biological, metallic, granular, and coarse-grained systems using a variety of force fields and boundary conditions. LAMMPS runs efficiently on single-processor desktop or laptop machines, but is also designed for parallel computers, including clusters with and without GPUs. It will run on any parallel machine that compiles C++ and supports the MPI message-passing library. This includes distributed- or shared-memory parallel machines and Beowulf-style clusters. LAMMPS can model systems with only a few particles up to millions or billions. LAMMPS is a freely-available open-source code, distributed under the terms of the GNU Public License, which means you can use or modify the code however you wish. LAMMPS is designed to be easy to modify or extend with new capabilities, such as new force fields, atom types, boundary conditions, or diagnostics. A complete description of LAMMPS can be found in its on-line manual here [3] or from the full PDF manual here [4]. Information about our installation can be found here LAMMPS.
NAMD
NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. [5].
The main server for Molecular Dynamics Calculations is PENZIAS which supports both GPU and non GPU versions of NAMD. However the MPI only (no GPU support) parallel versions of NAMD are also installed on SALK and ANDY. More information about our installation can be found here NAMD
NWChem
NWChem is an ab initio computational chemistry software package which also includes molecular dynamics (MM, MD) and coupled, quantum mechanical and molecular dynamics functionality (QM-MD).
NWChem has been developed by the Molecular Sciences Software group at the Department of Energy's EMSL. The software is available on PENZIAS and ANDY. More information about our installation can be found here NWChem
Octopus
Octopus is a pseudopotential real-space package aimed at the simulation of the electron-ion dynamics of one-, two-, and three-dimensional finite systems subject to time-dependent electromagnetic fields.
The program is based on time-dependent density-functional theory (TDDFT) in the Kohn-Sham scheme. All quantities are expanded in a regular mesh in real space, and the simulations are performed in real time. The program has been successfully used to calculate linear and non-linear absorption spectra, harmonic spectra, laser induced fragmentation, etc. of a variety of systems. More information about our installation can be found here OCTOPUS
OpenMM
OpenMM is both a library and a stand-alone application which provides tools for modern molecular modeling simulation. As a library it can be hooked into any code, allowing that code to do molecular modeling with minimal extra coding.
Moreover, OpenMM has a strong emphasis on hardware acceleration via GPU, thus providing not just a consistent API, but much greater performance than what one could get from just about any other code available. OpenMM was developed as a part of Physics-Based Simulation project with project leader prof. Pande.
ORCA
The program ORCA is electronic structure program capable to carry out geometry optimizations and to predict a large number of spectroscopic parameters at different levels of theory.
Besides the use of Hartee Fock theory, density functional theory (DFT) and semiempirical methods, high level ab initio quantum chemical methods, based on the configuration interaction and coupled cluster methods, are included into ORCA to an increasing degree. More information about our installation can be found here ORCA
VMD
VMD is a molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics and built-in scripting.
It was developed by The Theoretical and Computational Biophysics Group at the University of Illinois. It is documented on the TCB's homepage.
VMD is installed on Karle. To use it within command-line interface login to Karle as usual and start VMD by typing "vmd" followed by return. Or alternatively use the full path: "/share/apps/vmd/default/bin/vmd"
In order to use VMD in GUI-mode, login to Karle with -X option (see this article for details) and start VMD as described above.
Computational Biology
ANVIO
Anvio is a tool for an analysis and visualization platform for ‘omics data. Anvio allows various types of workflows to be established. ANVIO
BAMOVA
Bamova is a package used to do genetic analysis of a wide range of organisms on the basis of next-generation sequence data. The software implements Bayesian Analysis of Molecular Variance and different likelihood models for three different types of molecular data (including two models for high throughput sequence data). For more detail on BAMOVA please visit the BAMOVA web site [6] and manual here [7]. Further information can also be found here BAMOVA.
BAYESCAN
BAYESCAN is Population Genomics Software package. It identifies outlier loci and is applicable to both, dominant and codominant data.
genetic data, using differences in allele frequencies between populations. BayeScan is based on the multinomial-Dirichlet model. One of the scenarios covered consists of an island model in which subpopulation allele frequencies are correlated through a common migrant gene pool from which they differ in varying degrees. The difference in allele frequency between this common gene pool and each subpopulation is measured by a subpopulation- specific FST coefficient. Therefore, this formulation can consider realistic ecological scenarios where the effective size and the immigration rate may differ among subpopulations.
More detailed information on Bayescan can be found at the web site here [8] and in the manual here [9]. More information about our installation can be found here BAYESCAN.
BEST
BEST is an application aimed to estimate gene trees and the species tree from multilocus sequences.
The program uses information from multiple gene trees and performs a Bayesian analysis to estimate the topology of the species tree, divergence times and population sizes.
It provides a new approach for estimating the mutation-rate- based, phylogenetic relationships among species. Its method accounts for deep coalescence, but not for other complicating issues such as horizontal transfer or gene duplication. The program works in conjunction within the popular Bayesian phylogenetics package, MrBayes (Ronquist and Huelsenbeck, Bioinformatics, 2003). BEST's parameters are defined using the 'prset' command from MrBayes. Details on BEST's capabilities and options are avialable at the BEST web site here [10]. More information about our installation is available here BEST.
BEAST
BEAST is a powerful and flexible evolutionary analysis package for molecular sequence variation.
The package implements a family of Markov chain Monte Carlo (MCMC) algorithms for Bayesian phylogenetic inference, divergence time dating, coalescent analysis, phylogeography and related molecular evolutionary analyses. It is a cross-platform Java program for Bayesian MCMC analysis of molecular sequences. It is entirely orientated towards rooted, time-measured phylogenies inferred using strict or relaxed molecular clock models. It can be used as a method of reconstructing phylogenies, but is also a framework for testing evolutionary hypotheses without conditioning on a single tree topology. BEAST uses MCMC to average over tree space, so that each tree is weighted proportional to its posterior probability. The distribution includes a simple to use user-interface program called 'BEAUti' for setting up standard analyses and a suite of programs for analysing the results. For more detail on BEAST (and BEAUTi) please visit the BEAST web site [11]. More information about our installation can be found here BEAST.
BOWTIE2
BOWTIE2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes.
BOWTIE2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. BOWTIE2 supports gapped, local, and paired-end alignment modes. BOWTIE2 is part of a sequence alignment and analysis tool chain developed at John Hopkins, University of California at Berkeley, and Harvard, and distributed through the Center for Bioinformatics and Computational Biology. The other tools in this collection, CUFFLINKS, SAMTOOLS, and TOPHAT are also installed at the CUNY HPC Center. Additional information can be found at the BOWTIE2 home page here [12]. Information about our installation can be found here BOWTIE2.
BPP2
BPP2 uses a Bayesian modeling approach to generate the posterior probabilities of species assignments taking into account uncertainties due to unknown gene trees and the ancestral coalescent process. For tractability, it relies on a user-specified guide tree to avoid integrating over all possible species delimitations.
BROWNIE
BROWNIE is a program for analyzing rates of continuous character evolution and looking for substantial rate differences in different parts of a tree using likelihood ratio tests and Akaike Information Criterion (AIC) statistics. It now also implements many other methods for examining trait evolution and methods for doing species delimitation. More information about our installation can be found here BROWNIE.
CUFFLINKS
CUFFLINKS assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples.
It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. CUFFLINKS then estimates the relative abundances of these transcripts based on how many reads support each one, taking into account biases in library preparation protocols. CUFFLINKS is part of a sequence alignment and analysis tool chain developed at John Hopkins, University of California at Berkeley, and Harvard, and distributed through the Center for Bioinformatics and Computational Biology. The other tools in this collection, BOWTIE, SAMTOOLS, and TOPHAT are also installed at the CUNY HPC Center.Additional information can be found at the CUFFLINKS home page here [14]. More information about our installation can be found here CUFFLINKS.
GARLI
GARLI is a program that performs phylogenetic inference using the maximum-likelihood criterion.
Several sequence types are supported, including nucleotide, amino acid and codon. Version 2.0 adds support for partitioned models and morphology-like data types. It is usable on all operating systems, and is written and maintained by Derrick Zwickl at the University of Texas at Austin. Additional information can be found on the GARLI Wiki here [15]. More information about our installation can be found here GARLI.
MPFR
The MPFR library is a C library for multiple-precision floating-point computations with correct rounding. MPFR has continuously been supported by the INRIA and the current main authors come from the Caramel and AriC project-teams at Loria (Nancy, France) and LIP (Lyon, France) respectively; see more on the credit page.
MPFR is based on the GMP multiple-precision library. The main goal of MPFR is to provide a library for multiple-precision floating-point computation which is both efficient and has a well-defined semantics. It copies the good ideas from the ANSI/IEEE-754 standard for double-precision floating-point arithmetic (53-bit significant). The library is installed on PENZIAS.
MRBAYES
MrBayes is a program for the Bayesian estimation of phylogeny. Bayesian inference of phylogeny is based upon a quantity called the posterior probability distribution of trees, which is the probability of a tree conditioned on certain observations.
The conditioning is accomplished using Bayes's theorem. The posterior probability distribution of trees is impossible to calculate analytically; instead, MrBayes uses a simulation technique called Markov chain Monte Carlo (or MCMC) to approximate the posterior probabilities of trees. More information about our installation can be found here MRBAYES
msABC
msABC is a program for simulating various neutral evolutionary demographic scenarios based on the software ms (Hudson 2002). msABC extends ms, calculating a multitude of summary statistics.
Therefore, msABC is suitable for performing the sampling step of an Approximate Bayesian Computation analysis (ABC), under various neutral demographic models. The main advantages of msABC are (i) use of various prior distributions, such as uniform, Gaussian, log-normal, gamma, (ii) implementation of a multitude summary statistics for one or more populations, (iii) efficient implementation, which allows the analysis of hundrends of loci and chromosomes even in a single computer, (iv) extended flexibility, such as simulation of loci of variable size and simulation of missing data. More information about our installation can be found here msABC
MSMS
MSMS is a tool to generate sequence samples under both neutral models and single locus selection models. MSMS permits the full range of demographic models provided by its relative MS (Hudson, 2002).
In particular, it allows for multiple demes with arbitrary migration patterns, population growth and decay in each deme, and for population splits and mergers. Selection (including dominance) can depend on the deme and also change with time. More information about our installation can be found here MSMS
POPABC
PopABC is a computer package to estimate historical demographic parameters of closely related species/populations (e.g. population size, migration rate, mutation rate, recombination rate, splitting events) within a Isolation with migration model.
The software performs coalescent simulation in the framework of approximate Bayesian computation (ABC, Beaumont et al, 2002). PopABC can also be used to perform Bayesian model choice to discriminate between different demographic scenarios. The program can be used either for research or for education and teaching purposes. Further details and a manual can be found at the POPABC website here [16] More information about our installation can be found here POPABC
PHOENICS
PHOENICS is an integrated Computational Fluid Dynamics (CFD) package for the preparation, simulation, and visualization of processes involving fluid flow, heat or mass transfer, chemical reaction, and/or combustion in engineering equipment, building design, and the environment. More detail is available at the CHAM website, here http://www.cham.co.uk.
Although we expect most users to pre- and post-process their jobs on office-local clients, the CUNY HPC Center has installed the Unix version of the entire PHOENICS package on ANDY. PHOENICS is installed in /share/apps/phoenics/default where all the standard PHOENICS directories are located (d_allpro, d_earth, d_enviro, d_photo, d_priv1, d_satell, etc.). Of particular interest on ANDY is the MPI parallel version of the 'earth' executable 'parexe' which makes full use of the parallel processing power of the ANDY cluster for larger individual jobs. While the parallel scaling properties of PHOENICS jobs will vary depending on the job size, processor type, and the cluster interconnect, larger work loads will generally scale and run efficiently on from 8 to 32 processors, while smaller problems will scale efficiently only up to about 4 processors. More detail on parallel PHOENICS is available at http://www.cham.co.uk/products/parallel.php. Aside from the tightly coupled MPI parallelism of 'parexe', users can run multiple instances of the non-parallel modules on ANDY (including the serial 'earexe' module) when a parametric approach can be used to solve their problems. More information about our installation can be found here PHOENICS
PHRAP-PHRED
PHRAP and PHRED are part of the DNA sequence analysis tool set that also includes the programs CROSSMATCH and SWAT. These tools are describe in detail here [17], but a brief description of both, extracted from their manuals, follows.
PHRED and PHRAP (along with CONSED) can be used for both small sequence assemblies and larger shotgun analyses. This makes the tools a perhaps under-utilized set for smaller non-genomic groups. Some variables may need to be adjusted, particularly in CONSED, but researchers that have multiple sequences from a small locus can use the suite, starting from their chromatogram files. More information about our installation can be found here PHRAP-PHRED
PyRAD
Reduced-representation genomic sequence data (e.g., RADseq, GBS, ddRAD) are commonly used to study population-level research questions and consequently most software packages for assembling or analyzing such data are designed for sequences with little variation across samples.
Phylogenetic analyses typically include species with deeper divergence times (more variable loci across samples) and thus a different approach to clustering and identifying orthologs will perform better. pyRAD is intended for use with any type of restriction-site associated DNA. It currently supports RAD, ddRAD, PE-ddRAD, GBS, PE-GBS, EzRAD, PE-EzRAD, 2B-RAD, nextRAD, and can be extended to other types. More information about our installation can be found here PyRAD
RAXML
Randomized Axelerated Maximum Likelihood (RAxML) is a program for sequential and parallel maximum likelihood based inference of large phylogenetic trees. It is a descendent of fastDNAml which in turn was derived from Joe Felsentein’s DNAml which is part of the PHYLIP package.
RAxML is installed at the CUNY HPC Center on ANDY. Multiple versions are available. RAxML is available in both serial and MPI parallel versions. The MPI-parallel version should be run on four or more cores. RaxML parallel MPI version is installed on Penzias. More information about our installation can be found here RAXML
Structurama
Structurama is a program for inferring population structure from genetic data. The program assumes that the sampled loci are in linkage equilibrium and that the allele frequencies for each population are drawn from a Dirichlet probability distribution. Two different models for population structure are implemented.
First, Structurama offers the method of Pritchard et al. (2000) in which the number of populations is considered fixed. The program also allows the number of populations to be a random variable following a Dirichlet process prior(Pella and Masuda, 2006; Huelsenbeck and Andolfatto, 2007). More information about our installation can be found here STRUCTURAMA
Structure
The program Structure is a free software package for using multi-locus genotype data to investigate population structure. Its uses include inferring the presence of distinct populations, assigning individuals to populations, studying hybrid zones, identifying migrants and admixed individuals, and estimating population allele frequencies in situations where many individuals are migrants or admixed.
More information about our installation can be found here STRUCTURE
TOPHAT
TOPHAT is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.
TOPHAT is part of a sequence alignment and analysis tool chain developed at John Hopkins, University of California at Berkeley, and Harvard, and distributed through the Center for Bioinformatics and Computational Biology. More information about our installation can be found here TOPHAT
Trinity
Trinity, developed at the Broad Institute and the Hebrew University of Jerusalem, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data.
Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes. More information about our installation can be found here TRINITY
VELVET
Velvet is a set of algorithms for de novo short read assembly using de Bruijn graphs. It was developed at the
European Bioinformatics Institute, Cambridge, UK.
More information about our installation can be found here VELVET
Computational Genomics, Proteonics, Microbiomics, Genetics
AUGUSTUS
AUGUSTUS is a program that predicts genes in eukaryotic genomic sequences. Augustus is a gene-finding software based on Hidden Markov Models (HMMs), described in papers by Stanke and Waack (2003) and Stanke et al (2006) and Stanke et al (2006b) and Stanke et al (2008).The local version of the program is installed on Penzias. More information can be found here: AUGUSTUS
CONSED
CONSED is a DNA sequence analysis finishing tool that provides sequence viewing, editing, alignment, and assembly capabilities from a X Windows graphical user interface (GUI).
It makes extensive use of other non-graphical and underlying sequence analysis tools including PHRED, PHRAP, and CROSSMATCH that may also be used separately and are described else where in this document. It also includes a viewer called BAMVIEW. The CONSED tool chain is developed and maintained at the University of Washington and is described more completely here [19] CONSED is provided at the CUNY HPC Center under an academic license that allows use, but not the copying or out bound transfer of any of the executables or files distributed under this academic license. The license is not transferable in any way and users wishing to run the application at their own site must acquire a license directly from the authors.
The CUNY HPC Center supports CONSED version 23.0 for interactive use on KARLE. CONSED 23.0 and the tool chain described above is also installed on ANDY to allow for the batch use of underlying support tools mention above and described in detail below. In general, running GUI-based applications on ANDY's login node is discouraged. There should be little need to do this as KARLE is on the periphery of the CUNY HPC network making login there direct and KARLE shares its HOME directory file system with ANDY making files created on either system immediately available on the other.
Rather than rewrite portions of the CONSEND manual here, users are directed to the manual's "Quick Tour" section here [20] and asked to walk through some of the exercises after logging into KARLE. If problems or questions come up, please post them to "hpchelp@csi.cuny.edu". The CONSED 23.0 distribution is installed on KARLE in the following directory:
/share/apps/consed/default
All the files in the distribution can be found there.
ExaML
ExaML stands for Exascale Maximum Likelihood (ExaML) code for phylogenetic inference using MPI.
The code is installed only on Penzias and implements the popular RAxML search algorithm for maximum likelihood based inference of phylogenetic trees.
It uses a radically new MPI parallelization approach that yields improved paralll efficiency, in particular on partitioned multi-gene or whole-genome datasets.
When using ExaML please cite the following paper:
Alexey M. Kozlov, Andre J. Aberer, Alexandros Stamatakis: "ExaML Version 3: A Tool for Phylogenomic Analyses on Supercomputers." Bioinformatics (2015) 31 (15): 2577-2579.
It is up to 4 times faster than RAxML-Light [1].
As RAxML-Light, ExaML also implements checkpointing, SSE3, AVX vectorization and memory saving techniques.
[1] A. Stamatakis, A.J. Aberer, C. Goll, S.A. Smith, S.A. Berger, F. Izquierdo-Carrasco: "RAxML-Light: A Tool for computing TeraByte Phylogenies", Bioinformatics 2012; doi: 10.1093/bioinformatics/bts309.
The run script for parallel job is analogous to one for running RAxML on Penzias and Andy.
ExaBayes
ExaBayes is a software package for Bayesian tree inference. It is particularly suitable for large-scale analyses on computer clusters. It is installed on Penzias server at HPCC center. The installed package is MPI parallel version.
Availability:' PENZIAS Module file:exabayes
Citation:
Fredrik Ronquist, Maxim Teslenko, Paul van der Mark, Daniel L Ayres, Aaron Darling, Sebastian Höhna, Bret Larget, Liang Liu, Marc a Suchard, and John P Huelsenbeck. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Systematic biology, 61(3):539--42, May 2012.
Alexei J Drummond, Marc a Suchard, Dong Xie, and Andrew Rambaut. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Molecular biology and evolution, 29(8):1969--73, August 2012.
Clemens Lakner, Paul van der Mark, John P Huelsenbeck, Bret Larget, and Fredrik Ronquist. Efficiency of Markov chain Monte Carlo tree proposals in Bayesian phylogenetics. Systematic biology, 57(1):86--103, February 2008.
Use: The example SLURM script to run the FDPPDIV on PENZIAS is given below
#!/bin/bash #SBATCH --partition production #SBATCH --<name_of_job> #SBATCH --nodes=1 #SBATCH --ntasks=2 # You must explicitly change to the working directory in SLURM cd $SLURM_SUBMIT_DIR mpirun -np 2 exabayes <input_file> > output_file
More information about application along with sample workflows are available on ExaBayes web site:
http://sco.h-its.org/exelixis/web/software/exabayes/manual/index.html#sec-11
GENOMEPOP2
GenomePop2 is a newer and specialized version of the older program GenomePop.
GenomePop2 is designed to manage SNPs under more flexible and useful settings that are controlled by the user. If you need models with more than 2 alleles you should use the older GenomePop version of the program.
GenomePop2 allows the forward simulation of sequences of biallelic positions. As in the previous version, a number of evolutionary and demographic settings are allowed. Several populations under any migration model can be implemented. Each population consists of a number N of individuals. Each individual is represented by one (haploids) or two (diploids) chromosomes with constant or variable (hotspots) recombination between binary sites. The fitness model is multiplicative with each derived allele having a multiplicate effect of (1-s * h-E) onto the global fitness value. By default E=0 and h=0.5 in diploids, but 1 in homozygotes or in haploids. Selective nucleotide sites undergoing directional selection (positive or negative) in different populations can be defined. In addition, bottlenecks and/or population expansion scenarios can be settled by the user during a desired number of generations. Several runs can be executed and a sample of user-defined size is obtained for each run and population. For more detail on how to use GenomePop2, please visit the web site here [21]. More information about our installation can be found here GENOMEPOP2.
HUMAnN2
HUMAnN is a pipeline for efficiently and accurately profiling the presence/absence and abundance of microbial pathways in a community from metagenomic or metatranscriptomic sequencing data (typically millions of short DNA/RNA reads). HUMAnN2 is the next generation of HUMAnN (HMP Unified Metabolic Analysis Network). Details and submission scripts can be found at: http://wiki.csi.cuny.edu/cunyhpc/index.php/Applications_Environment/humann2
IMa2
The IMa2 application performs basic calculations ‘Isolation with Migration’ using Bayesian inference and Markov chain Monte Carlo methods.
The only major conceptual addition to IMa2 that makes it different from the original IMa program is that it can handle data from multiple populations. This requires that the user specify a phylogenetic tree. Importantly, the tree must be rooted, and the sequence in time of internal nodes must be known and specified. More information on the IMa2 and IMa can be found in the user manual here [22]. Information about our installation can be found here IMA2.
I-TASSER
I-TASSER is a platform for protein structure and function predictions. 3D models are built based on multiple-threading alignments by LOMETS and iterative template fragment assembly simulations; function inslights are derived by matching the 3D models with BioLiP protein function database. Details and submission scripts can be found at: http://wiki.csi.cuny.edu/cunyhpc/index.php/Applications_Environment/itasser
LAMARC
LAMARC is a program which estimates population-genetic parameters such as population size, population growth rate, recombination rate, and migration rates.
It approximates a summation over all possible genealogies that could explain the observed sample, which may be sequence, SNP, microsatellite, or electrophoretic data. LAMARC and its sister program MIGRATE are successor programs to the older programs Coalesce, Fluctuate, and Recombine, which are no longer being supported. These programs are memory-intensive, but can run effectively on workstations. They are supported on a variety of operating systems. For more detail on LAMARC please visit the website here [23], read this paper [24], and look at the documentation here [25]. More information about our installation can be found here LAMARC.
QIIME
QIIME (pronounced "chime") stands for Quantitative Insights Into Microbial Ecology. QIIME is a pipeline application that uses numerous third-party applications.
QIIME takes users from their raw sequencing output through initial analyses such as OTU picking, taxonomic assignment, and construction of phylogenetic trees from representative sequences of OTUs, and through downstream statistical analysis, visualization, and production of publication-quality graphics. More information about our installation can be found here QIIME
USEARCH
USEARCH is a unique sequence analysis tool with thousands of users world-wide.
USEARCH offers search and clustering algorithms that are often orders of magnitude faster than BLAST. More information about our installation can be found here USEARCH
VELVET
Velvet is a set of algorithms for de novo short read assembly using de Bruijn graphs. It was developed at the European Bioinformatics Institute, Cambridge, UK. More information about our installation can be found here VELVET.
VSEARCH
VSEARCH is a open source alternative to USEARCH.
VSEARCH stands for vectorized search, as the tool takes advantage of parallelism in the form of SIMD vectorization as well as multiple threads to perform accurate alignments at high speed. VSEARCH uses an optimal global aligner (full dynamic programming Needleman-Wunsch), in contrast to USEARCH which by default uses a heuristic seed and extend aligner. This usually results in more accurate alignments and overall improved sensitivity (recall) with VSEARCH, especially for alignments with gaps.
Additional details on VSEARCH can be found at: this link
VSEARCH is installed on Penzias HPC cluster. To start using VSEARCH load corresponding module first:
module load vsearch
Math, Engineering, Computer Science
ADCIRC
ADCIRC is a system of programs for solving time-dependent, free-surface, circulation and transport problems in two and three dimensions.
FDPPDIV
FDPPDiv is a program for estimating divergence times on a fixed, rooted tree topology.
FDPPDiv offers two alternative approaches to divergence time estimation. The DPPDiv part refers to the Dirichlet Process Prior (DPP) model for divergence time estimation, and the F prefix (for Fossil) refers to the new Fossil Birth-Death approach. More information about our installation can be found here FDPPDIV.
GAUSS
An easy-to-use data analysis, mathematical and statistical environment based on the powerful, fast and efficient GAUSS Matrix Programming Language.
GAUSS is used to solve real world problems and data analysis problems of exceptionally large scale. GAUSS is currently available on ANDY. At the CUNY HPC Center GAUSS is typically run in serial mode. (Note: GAUSS should not be confused with the computational chemistry application Gaussian.) More information about our installation can be found here GAUSS.
Hapsembler
Hapsembler is a haplotype-specific genome assembly toolkit that is designed for genomes that are rich in SNPs and other types of polymorphism. Hapsembler can be used to assemble reads from a variety of platforms including Illumina and Roche/454.
module load hapsembler
HOPSPACK
HOPSPACK stands for Hybrid Optimization Parallel Search Package designed to help users to solve wide range of derivative free optimization problems.
The first two constraints specify linear inequalities and equalities with coefficient matrices AI and AE. The next two constraints describe nonlinear inequalities and equalities captured in functions cI(x) and cE(x). The final constraints denote lower and upper bounds on the variables. HOPSPACK allow variables to be continuous or integer-valued and has provisions for multi-objective optimization problems. In general, functions f(x),cI(x), and cE(x) can be noisy and nonsmooth, although most algorithms perform best on determinate functions with continuous derivatives.
The users are allowed to design and implement their own solver either by writing their own code or by building existing solvers already in a framework. Because all solvers (called citizens) are members of the same global class they can share assigned resources. The main features of the package are:
- Only function values are required for the optimization. - The user must provide a separate program that can evaluate the objective and nonlinear constraint functions at a given point. - A robust implementation of the Generating Set Search (GSS) solver is supplied, including the capability to handle linear constraints. - Multiple solvers can run simultaneously and are easily configured to share information. - Solvers may share a cache of computed function and constraint evaluations to eliminate duplicate work. - Solvers can initiate and control sub-problems Continuation -> HOPSACK.
LS-DYNA
From its early development in the 1970s, LS-DYNA has evolved into a general purpose material stress, collision, and crash analysis program with many built-in material and structural element models.
In recent years, the code has also been adapted for both OpenMP and MPI parallel execution on a variety of platforms. The most recent version, LS-DYNA 7.1.2, is installed on ANDY at the CUNY HPC Center under an academic license held by the City College of New York. The use of this license to do work that is commercial in anyway is prohibited.
Details on LS-DYNA's use, input deck construction, and execution options can be found in the LS-DYNA manual here [30]. All files related to the HPC Center installation of version 971 (executables and example inputs) are located in:
/share/apps/lsdyna/default/[bin,examples]
More information about our installation can be found here LSDYNA.
Network Simulator-2 (NS2)
NS2 is a discrete event simulator targeted at networking research. NS2 provides substantial support for simulation of TCP, routing, and multicast protocols over wired and wireless (local and satellite) networks.
OpenFOAM
OpenFOAM is before everything a library which users may incorporate in their own code(s). The OpenFOAM is installed on PENZIAS.
More information about our installation can be found here OpenFOAM
OpenSEES
OpenSEES, the Open System for Earthquake Engineering Simulation, is an object-oriented, open source software framework.
It allows users to create both serial and parallel finite element computer applications for simulating the response of structural and geotechnical systems subjected to earthquakes and other hazards. OpenSEES is primarily written in C++ and uses several Fortran and C numerical libraries for linear equation solving, and material and element routines. The software is installed on PENZIAS.
ParGAP
ParGAP is build on top of GAP system. The later is a system for computational discrete algebra, with particular emphasis on Computational Group Theory. GAP provides a programming language, a library of thousands of functions implementing algebraic algorithms written in the GAP language as well as large data libraries of algebraic objects.
The ParGAP (Parallel GAP) package itself provides a way of writing parallel programs using the GAP language. Former names of the package were ParGAP/MPI and GAP/MPI; the word MPI refers to Message Passing Interface, a well-known standard for parallelism. ParGAP is based on the MPI standard, and this distribution includes a subset implementation of MPI, to provide a portable layer with a high level interface to BSD sockets. More information about our installation can be found here ParGAP
SAGE
Sage can be used to study elementary and advanced, pure and applied mathematics.
This includes a huge range of mathematics, including basic algebra, calculus, elementary to very advanced number theory, cryptography, numerical computation, commutative algebra, group theory, combinatorics, graph theory, exact linear algebra and much more. More information about our installation can be found here SAGE
WRF
The Weather Research and Forecasting (WRF) model is a specific computer program with dual use for both weather forecasting and weather research.
It was created through a partnership that includes the National Oceanic and Atmospheric Administration (NOAA), the National Center for Atmospheric Research (NCAR), and more than 150 other organizations and universities in the United States and abroad. WRF is the latest numerical model and application to be adopted by NOAA's National Weather Service as well as the U.S. military and private meteorological services. It is also being adopted by government and private meteorological services worldwide. More information about our installation can be found here WRF
Economics, Business, Statistics, Analytics
R
R is a free software environment for statistical computing and graphics.
General Notes
R language has become a de facto standard among statisticians for the development of statistical software, and is widely used for statistical software development and data analysis. R is available on the following HPCC's servers: Karle, Penzias, Appel and Andy. Karle is the only machine where R can be used without submitting jobs to PBS manager. On all other systems users must submit their R jobs via PBS batch scheduler. More information about our installation can be found here R
R-devel
R is a language and environment for statistical computing and graphics. R-devel provides both core R userspace and all R development components.
Stata/MP
Stata is a complete, integrated statistical package that provides tools for data analysis, data management, and graphics. Stata/MP takes advantage of multiprocessor computers. CUNY HPC Center is licensed to use Stata on up to 8 cores.
Currently Stata/MP is available for users on Karle (karle.csi.cuny.edu). More information about our installation can be found here STATA
SAS
SAS (pronounced "sass", originally Statistical Analysis System) is an integrated system of software products provided by SAS Institute Inc.
It enables the programmer to perform:
- data entry, retrieval, management, and mining
- report writing and graphics
- statistical analysis
- business planning, forecasting, and decision support
- operations research and project management
- quality improvement
- applications development
- data warehousing (extract, transform, load)
- platform independent and remote computing
More information about our installation can be found here SAS
General Development Systems
Coming soon.
Tools, Libraries, Compilers
CGAL
The Computational Geometry Algorithms Library (CGAL), offers data structures and algorithms.
Examples of these are triangulations (2D constrained triangulations, and Delaunay triangulations and periodic triangulations in 2D and 3D), Voronoi diagrams (for 2D and 3D points, 2D additively weighted Voronoi diagrams, and segment Voronoi diagrams), polygons (Boolean operations, offsets, straight skeleton), polyhedra (Boolean operations), arrangements of curves and their applications (2D and 3D envelopes, Minkowski sums), mesh generation (2D Delaunay mesh generation and 3D surface and volume mesh generation, skin surfaces), geometry processing (surface mesh simplification, subdivision and parameterization, as well as estimation of local differential properties, and approximation of ridges and umbilics), alpha shapes, convex hull algorithms (in 2D, 3D and dD), search structures (kd trees for nearest neighbor search, and range and segment trees), interpolation (natural neighbor interpolation and placement of streamlines), shape analysis, fitting, and distances (smallest enclosing sphere of points or spheres, smallest enclosing ellipsoid of points, principal component analysis), and kinetic data structures.
The library is installed on PENZIAS.
More information can be found here http://wiki.csi.cuny.edu/cunyhpc/index.php/Applications_Environment/CGAL.
GMP
GMP is a library for arbitrary precision arithmetic, operating on signed integers, rational numbers, and floating-point numbers. There is no practical limit to the precision except the ones implied by the available memory in the machine GMP runs on. GMP has a rich set of functions, and the functions have a regular interface. The library is installed on PENZIAS.
Gnuplot
Gnuplot is a portable command-line driven graphing utility. It is installed on the following systems:
- Karle under /usr/bin/gnuplot
- Andy under /share/apps/gnuplot/default/bin/gnuplot
Extensive documentation of gnuplot is available at the gnuplot's homepage.
JULIA
Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments. Julia is installed on Penzias.
MAGMA
MAGMA is a library similar to LAPACK but for hybrid architectures. MAGMA provides implementations for CUDA, Intel Xeon Phi, and OpenCL. On CUNY HPCC systems, MAGMA is installed in its CUDA variant only on Penzias.
MATHEMATICA
“Mathematica” is a fully integrated technical computing system that combines fast, high-precision numerical and symbolic computation with data visualization and programming capabilities.
Mathematica is currently installed on the CUNY HPC Center's ANDY cluster (andy.csi.cuny.edu) and KARLE standalone server (karle.csi.cuny.edu). The basics of running Mathematica on CUNY HPC systems are present here. Additional information on how to use Mathematica can be found at http://www.wolfram.com/learningcenter/
More information is available in this wiki, find it here MATHEMATICA.
MATLAB
The MATLAB high-performance language for technical computing integrates computation, visualization, and programming in an easy-to-use environment where problems and solutions are expressed in familiar mathematical notation.
Typical uses include:
Math and computation Algorithm development Data acquisition Modeling, simulation, and prototyping Data analysis, exploration, and visualization Scientific and engineering graphics Application development, including graphical user interface building
More information about our installation can be found here MATLAB
MET (Model Evaluation Tools)
MET was developed by the National Center for Atmospheric Research (NCAR) Developmental Testbed Center (DTC) through the generous support of the U.S. Air Force Weather Agency (AFWA) and the National Oceanic and Atmospheric Administration (NOAA).
MET provides a variety of verification techniques, including:
- Standard verification scores comparing gridded model data to point-based observations
- Standard verification scores comparing gridded model data to gridded observations
- Spatial verification methods comparing gridded model data to gridded observations using neighborhood, object-based, and intensity-scale decomposition approaches
- Probabilistic verification methods comparing gridded model data to point-based or gridded observations
More information about use and set-up can be found here MET
Migrate
Migrate estimates population parameters, effective population sizes and migration rates of n populations, using genetic data. It uses a coalescent theory approach taking into account the history of mutations and the uncertainty of the genealogy.
inference (BI). Migrate's output is presented in an TEXT file and in a PDF file. The PDF file eventually will contain all possible analyses including histograms of posterior distributions. More information about our installation can be found here MIGRATE
Python
Python is a programming language that lets you work more quickly and integrate your systems more effectively. You can learn to use Python and see almost immediate gains in productivity and lower maintenance costs. [31]
There are two supported versions installed on Andy system:
- Python 3.1.3 located under /share/apps/python/3.1.3/bin
- Python 2.7.3 located under /share/apps/epd/7.3-2/bin
More information about our installation can be found here PYTHON
SAMTOOLS
SAMTOOLS provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.
SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments. SAM is compact format aims to be a format that:
Is flexible enough to store all the alignment information generated by various alignment programs; Is simple enough to be easily generated by alignment programs or converted from existing formats; Allows most of operations on the alignment to work without loading the whole alignment into memory; Allows the file to be indexed by genomic position to efficiently retrieve all reads aligning to a locus.
More information about our installation can be found here SAMTOOLS
Thrust Library (CUDA)
Thrust is a C++ template library for CUDA based on the Standard Template Library (STL). Thrust allows you to implement high performance parallel applications with minimal programming effort through a high-level interface that is fully interoperable with CUDA C.
As of CUDA, Thrust has been integrated into the default CUDA distribution. The HPC Center is currently running CUDA as the default on PENZIAS which includes Thrust library. More information about our installation can be found here THRUST
Xmgrace
Grace is a WYSIWYG 2D plotting tool for the X Window System and M*tif. Xmgrace is developed at Plasma Laboratory, Weizmann Institute of Science. More information about it's capabilities can be found at the web page http://plasma-gate.weizmann.ac.il/Grace/
Grace is installed on Karle. To use it within command-line interface login to Karle as usual and start Grace by typing "xmgrace" followed by return. Or alternatively use the full path: "/share/apps/xmgrace/default/grace/bin/xmgrace" In order to use Grace in GUI-mode, login to Karle with -X option (see this article for details) and start Xmgrace as described above.
Alphabetical List
A
ADCIRC
ADCIRC is a system of programs for solving time-dependent, free-surface, circulation and transport problems in two and three dimensions.
AMBER (Assisted Model Building with Energy Refinement)
Amber is the collective name for a suite of programs for classical bio-molecular simulations. The name "Amber" also denotes the family of potentials (force fields) used with Amber software. Here we discuss only simulation packages, but not the force fields or free tools available via AmberTools package. Details and submission scripts can be found at: http://wiki.csi.cuny.edu/cunyhpc/index.php/Applications_Environment/amber
ANVIO
Anvio is a tool for an analysis and visualization platform for genomics data. Anvio allows various types of workflows to be established. ANVIO
AUGUSTUS
AUGUSTUS is a program that predicts genes in eukaryotic genomic sequences. Augustus is a gene-finding software based on Hidden Markov Models (HMMs), described in papers by Stanke and Waack (2003) and Stanke et al (2006) and Stanke et al (2006b) and Stanke et al (2008).The local version of the program is installed on Penzias. More information can be found here: AUGUSTUS
AUTODOCK
AutoDock is a suite of automated docking tools.
B
BAMOVA
Bamova is a package used to do genetic analysis of a wide range of organisms on the basis of next-generation sequence data. The software implements Bayesian Analysis of Molecular Variance and different likelihood models for three different types of molecular data (including two models for high throughput sequence data). For more detail on BAMOVA please visit the BAMOVA web site [37] and manual here [38]. Further information can also be found here BAMOVA.
BAYESCAN
BAYESCAN is Population Genomics Software package. It identifies outlier loci and is applicable to both, dominant and codominant data.
specific FST coefficient. Therefore, this formulation can consider realistic ecological scenarios where the effective size and the immigration rate may differ among subpopulations. More detailed information on Bayescan can be found at the web site here [39] and in the manual here [40]. More information about our installation can be found here BAYESCAN.
BEAST
BEAST is a powerful and flexible evolutionary analysis package for molecular sequence variation.
The package implements a family of Markov chain Monte Carlo (MCMC) algorithms for Bayesian phylogenetic inference, divergence time dating, coalescent analysis, phylogeography and related molecular evolutionary analyses. It is a cross-platform Java program for Bayesian MCMC analysis of molecular sequences. It is entirely orientated towards rooted, time-measured phylogenies inferred using strict or relaxed molecular clock models. It can be used as a method of reconstructing phylogenies, but is also a framework for testing evolutionary hypotheses without conditioning on a single tree topology. BEAST uses MCMC to average over tree space, so that each tree is weighted proportional to its posterior probability. The distribution includes a simple to use user-interface program called 'BEAUti' for setting up standard analyses and a suite of programs for analysing the results. For more detail on BEAST (and BEAUTi) please visit the BEAST web site [41]. More information about our installation can be found here BEAST.
BEST
BEST is an application aimed to estimate gene trees and the species tree from multilocus sequences.
The program uses information from multiple gene trees and performs a Bayesian analysis to estimate the topology of the species tree, divergence times and population sizes.
It provides a new approach for estimating the mutation-rate- based, phylogenetic relationships among species. Its method accounts for deep coalescence, but not for other complicating issues such as horizontal transfer or gene duplication. The program works in conjunction within the popular Bayesian phylogenetics package, MrBayes (Ronquist and Huelsenbeck, Bioinformatics, 2003). BEST's parameters are defined using the 'prset' command from MrBayes. Details on BEST's capabilities and options are avialable at the BEST web site here [42]. More information about our installation is available here BEST.
BOWTIE2
BOWTIE2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes.
BOWTIE2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. BOWTIE2 supports gapped, local, and paired-end alignment modes. BOWTIE2 is part of a sequence alignment and analysis tool chain developed at John Hopkins, University of California at Berkeley, and Harvard, and distributed through the Center for Bioinformatics and Computational Biology. The other tools in this collection, CUFFLINKS, SAMTOOLS, and TOPHAT are also installed at the CUNY HPC Center. Additional information can be found at the BOWTIE2 home page here [43]. Information about our installation can be found here BOWTIE2.
BPP2
BPP2 uses a Bayesian modeling approach to generate the posterior probabilities of species assignments taking into account uncertainties due to unknown gene trees and the ancestral coalescent process. For tractability, it relies on a user-specified guide tree to avoid integrating over all possible species delimitations.
BROWNIE
BROWNIE is a program for analyzing rates of continuous character evolution and looking for substantial rate differences in different parts of a tree using likelihood ratio tests and Akaike Information Criterion (AIC) statistics. It now also implements many other methods for examining trait evolution and methods for doing species delimitation. More information about our installation can be found here BROWNIE.
C
CGAL
The Computational Geometry Algorithms Library (CGAL), offers data structures and algorithms.
Examples of these are triangulations (2D constrained triangulations, and Delaunay triangulations and periodic triangulations in 2D and 3D), Voronoi diagrams (for 2D and 3D points, 2D additively weighted Voronoi diagrams, and segment Voronoi diagrams), polygons (Boolean operations, offsets, straight skeleton), polyhedra (Boolean operations), arrangements of curves and their applications (2D and 3D envelopes, Minkowski sums), mesh generation (2D Delaunay mesh generation and 3D surface and volume mesh generation, skin surfaces), geometry processing (surface mesh simplification, subdivision and parameterization, as well as estimation of local differential properties, and approximation of ridges and umbilics), alpha shapes, convex hull algorithms (in 2D, 3D and dD), search structures (kd trees for nearest neighbor search, and range and segment trees), interpolation (natural neighbor interpolation and placement of streamlines), shape analysis, fitting, and distances (smallest enclosing sphere of points or spheres, smallest enclosing ellipsoid of points, principal component analysis), and kinetic data structures.
The library is installed on PENZIAS.
More information can be found here http://wiki.csi.cuny.edu/cunyhpc/index.php/Applications_Environment/CGAL.
CONSED
CONSED is a DNA sequence analysis finishing tool that provides sequence viewing, editing, alignment, and assembly capabilities from a X Windows graphical user interface (GUI).
It makes extensive use of other non-graphical and underlying sequence analysis tools including PHRED, PHRAP, and CROSSMATCH that may also be used separately and are described else where in this document. It also includes a viewer called BAMVIEW. The CONSED tool chain is developed and maintained at the University of Washington and is described more completely here [45] CONSED is provided at the CUNY HPC Center under an academic license that allows use, but not the copying or out bound transfer of any of the executables or files distributed under this academic license. The license is not transferable in any way and users wishing to run the application at their own site must acquire a license directly from the authors.
The CUNY HPC Center supports CONSED version 23.0 for interactive use on KARLE. CONSED 23.0 and the tool chain described above is also installed on ANDY to allow for the batch use of underlying support tools mention above and described in detail below. In general, running GUI-based applications on ANDY's login node is discouraged. There should be little need to do this as KARLE is on the periphery of the CUNY HPC network making login there direct and KARLE shares its HOME directory file system with ANDY making files created on either system immediately available on the other.
Rather than rewrite portions of the CONSEND manual here, users are directed to the manual's "Quick Tour" section here [46] and asked to walk through some of the exercises after logging into KARLE. If problems or questions come up, please post them to "hpchelp@csi.cuny.edu". The CONSED 23.0 distribution is installed on KARLE in the following directory:
/share/apps/consed/default
All the files in the distribution can be found there.
CP2K
CP2K is a program to perform atomistic and molecular simulations of solid state, liquid, molecular, and biological systems.
It provides a general framework for different methods such as e.g., density functional theory (DFT) using a mixed Gaussian and plane waves approach (GPW) and classical pair and many-body potentials. CP2K provides state-of-the-art methods for efficient and accurate atomistic simulations. More information about our installation can be found here CP2K.
CUFFLINKS
CUFFLINKS assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples.
It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. CUFFLINKS then estimates the relative abundances of these transcripts based on how many reads support each one, taking into account biases in library preparation protocols. CUFFLINKS is part of a sequence alignment and analysis tool chain developed at John Hopkins, University of California at Berkeley, and Harvard, and distributed through the Center for Bioinformatics and Computational Biology. The other tools in this collection, BOWTIE, SAMTOOLS, and TOPHAT are also installed at the CUNY HPC Center.Additional information can be found at the CUFFLINKS home page here [47]. More information about our installation can be found here CUFFLINKS.
D
DL_POLY
DL_POLY is a general purpose molecular dynamics simulation package developed at Daresbury Laboratory by W. Smith, T.R. Forester and I.T. Todorov.
Both serial and parallel versions are available. The original package was developed by the Molecular Simulation Group (now part of the Computational Chemistry Group, MSG) at Daresbury Laboratory under the auspices of the Engineering and Physical Sciences Research Council (EPSRC) for the EPSRC's Collaborative Computational Project for the Computer Simulation of Condensed Phases ( CCP5). Later developments were also supported by the Natural Environment Research Council through the eMinerals project. The package is the property of the Central Laboratory of the Research Councils, UK. More information about our installation and use can be found here DL_POLY.
E
ExaML
ExaML stands for Exascale Maximum Likelihood (ExaML) code for phylogenetic inference using MPI.
The code is installed only on Penzias and implements the popular RAxML search algorithm for maximum likelihood based inference of phylogenetic trees.
It uses a radically new MPI parallelization approach that yields improved paralll efficiency, in particular on partitioned multi-gene or whole-genome datasets.
When using ExaML please cite the following paper:
Alexey M. Kozlov, Andre J. Aberer, Alexandros Stamatakis: "ExaML Version 3: A Tool for Phylogenomic Analyses on Supercomputers." Bioinformatics (2015) 31 (15): 2577-2579.
It is up to 4 times faster than RAxML-Light [1].
As RAxML-Light, ExaML also implements checkpointing, SSE3, AVX vectorization and memory saving techniques.
[1] A. Stamatakis, A.J. Aberer, C. Goll, S.A. Smith, S.A. Berger, F. Izquierdo-Carrasco: "RAxML-Light: A Tool for computing TeraByte Phylogenies", Bioinformatics 2012; doi: 10.1093/bioinformatics/bts309.
The run script for parallel job is analogous to one for running RAxML on Penzias and Andy.
ExaBayes
ExaBayes is a software package for Bayesian tree inference. It is particularly suitable for large-scale analyses on computer clusters. It is installed on Penzias server at HPCC center. The installed package is MPI parallel version.
Availability:' PENZIAS Module file:exabayes
Citation:
Fredrik Ronquist, Maxim Teslenko, Paul van der Mark, Daniel L Ayres, Aaron Darling, Sebastian Höhna, Bret Larget, Liang Liu, Marc a Suchard, and John P Huelsenbeck. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Systematic biology, 61(3):539--42, May 2012.
Alexei J Drummond, Marc a Suchard, Dong Xie, and Andrew Rambaut. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Molecular biology and evolution, 29(8):1969--73, August 2012.
Clemens Lakner, Paul van der Mark, John P Huelsenbeck, Bret Larget, and Fredrik Ronquist. Efficiency of Markov chain Monte Carlo tree proposals in Bayesian phylogenetics. Systematic biology, 57(1):86--103, February 2008.
Use: The example PBS script to run the FDPPDIV on PENZIAS is given below
#!/bin/bash #PBS -q production #PBS -N <name_of_job> #PBS -l select=1:ncpus=2 #PBS -l place=free #PBS -V # You must explicitly change to the working directory in PBS cd $PBS_O_WORKDIR mpirun -np 2 exabayes <input_file> > output_file
More information about application along with sample workflows are available on ExaBayes web site:
http://sco.h-its.org/exelixis/web/software/exabayes/manual/index.html#sec-11
F
FDPPDIV
FDPPDiv is a program for estimating divergence times on a fixed, rooted tree topology.
FDPPDiv offers two alternative approaches to divergence time estimation. The DPPDiv part refers to the Dirichlet Process Prior (DPP) model for divergence time estimation, and the F prefix (for Fossil) refers to the new Fossil Birth-Death approach. More information about our installation can be found here FDPPDIV.
G
GAMESS-US
GAMESS is a program for ab initio molecular quantum chemistry.
Briefly, GAMESS can compute SCF wavefunctions ranging from RHF, ROHF, UHF, GVB, and MCSCF. Correlation corrections to these SCF wavefunctions include Configuration Interaction, second order perturbation Theory, and Coupled-Cluster approaches, as well as the Density Functional Theory approximation. Excited states can be computed by CI, EOM, or TD-DFT procedures. Nuclear gradients are available, for automatic geometry optimization, transition state searches, or reaction path following. Computation of the energy hessian permits prediction of vibrational frequencies, with IR or Raman intensities. Solvent effects may be modeled by the discrete Effective Fragment potentials, or continuum models such as the Polarizable Continuum Model. Numerous relativistic computations are available, including infinite order two component scalar corrections, with various spin-orbit coupling options. The Fragment Molecular Orbital method permits use of many of these sophisticated treatments to be used on very large systems, by dividing the computation into small fragments. Nuclear wavefunctions can also be computed, in VSCF, or with explicit treatment of nuclear orbitals by the NEO code. More information, including code, can be found here GAMESS-US.
GARLI
GARLI is a program that performs phylogenetic inference using the maximum-likelihood criterion.
Several sequence types are supported, including nucleotide, amino acid and codon. Version 2.0 adds support for partitioned models and morphology-like data types. It is usable on all operating systems, and is written and maintained by Derrick Zwickl at the University of Texas at Austin. Additional information can be found on the GARLI Wiki here [48]. More information about our installation can be found here GARLI.
GAUSS
An easy-to-use data analysis, mathematical and statistical environment based on the powerful, fast and efficient GAUSS Matrix Programming Language.
GAUSS is used to solve real world problems and data analysis problems of exceptionally large scale. GAUSS is currently available on ANDY. At the CUNY HPC Center GAUSS is typically run in serial mode. (Note: GAUSS should not be confused with the computational chemistry application Gaussian.) More information about our installation can be found here GAUSS.
Gaussian09
Gaussian09 is third-party, commercially licensed software from Gaussian, Inc. It is a set of programs for calculating electronic structure.
Gaussian09 is available for general use only on ANDY. The Gaussian User Guide can be found here at [[49]]. More information about our installation can be found here GAUSSIAN09.
GMP
GMP is a library for arbitrary precision arithmetic, operating on signed integers, rational numbers, and floating-point numbers. There is no practical limit to the precision except the ones implied by the available memory in the machine GMP runs on. GMP has a rich set of functions, and the functions have a regular interface. The library is installed on PENZIAS.
Gnuplot
Gnuplot is a portable command-line driven graphing utility. It is installed on the following systems:
- Karle under /usr/bin/gnuplot
- Andy under /share/apps/gnuplot/default/bin/gnuplot
Extensive documentation of gnuplot is available at the gnuplot's homepage.
GenomePop2
GenomePop2 is a newer and specialized version of the older program GenomePop.
GenomePop2 is designed to manage SNPs under more flexible and useful settings that are controlled by the user. If you need models with more than 2 alleles you should use the older GenomePop version of the program.
GenomePop2 allows the forward simulation of sequences of biallelic positions. As in the previous version, a number of evolutionary and demographic settings are allowed. Several populations under any migration model can be implemented. Each population consists of a number N of individuals. Each individual is represented by one (haploids) or two (diploids) chromosomes with constant or variable (hotspots) recombination between binary sites. The fitness model is multiplicative with each derived allele having a multiplicate effect of (1-s * h-E) onto the global fitness value. By default E=0 and h=0.5 in diploids, but 1 in homozygotes or in haploids. Selective nucleotide sites undergoing directional selection (positive or negative) in different populations can be defined. In addition, bottlenecks and/or population expansion scenarios can be settled by the user during a desired number of generations. Several runs can be executed and a sample of user-defined size is obtained for each run and population. For more detail on how to use GenomePop2, please visit the web site here [50]. More information about our installation can be found here GENOMEPOP2.
GROMACS
GROMACS (Groningen Machine for Chemical Simulations)
GROMACS is a full-featured suite of free software, licensed under the GNU General Public License to perform molecular dynamics simulations -- in other words, to simulate the behavior of molecular systems with hundreds to millions of particles using Newton's equations of motion. It is primarily used for research on proteins, lipids, and polymers, but can be applied to a wide variety of chemical and biological research questions.
Details and submission scripts for production runs can be found at: http://wiki.csi.cuny.edu/cunyhpc/index.php/Applications_Environment/gromacs Please note that preparing molecular system for simulation via GROMACS tools, cannot be done on login node. Instead the users must either use their own workstation or use interactive or development queues.
GPAW
GPAW is a density-functional theory (DFT) Python code based on the projector-augmented wave (PAW) method and the atomic simulation environment (ASE).
It uses real-space uniform grids and multigrid methods, atom-centered basis-functions or plane-waves. GPAW calculations are controlled through scripts written in the programming language Python. GPAW relies on the Atomic Simulation Environment (ASE), which is a Python package that helps to describe atoms. The ASE package also handles molecular dynamics, analysis, visualization, geometry optimization and more. More information about our installation can be found here GPAW.
H
Hapsembler
Hapsembler is a haplotype-specific genome assembly toolkit that is designed for genomes that are rich in SNPs and other types of polymorphism. Hapsembler can be used to assemble reads from a variety of platforms including Illumina and Roche/454.
module load hapsembler
HOOMD
Performs general purpose particle dynamics simulations, taking advantage of NVIDIA GPUs to attain a level of performance equivalent to many processor cores on a fast cluster.
Unlike some other applications in the particle and molecular dynamics space, HOOMD developers have worked to implement all of the code's computationally intensive kernels on the GPU, although currently only single node, single-GPU or OpenMP-GPU runs are possible. There is no MPI-GPU or distributed parallel GPU version available at this time.
HOOMD's object-oriented design patterns make it both versatile and expandable. Various types of potentials, integration methods and file formats are currently supported, and more are added with each release. The code is available and open source, so anyone can write a plugin or change the source to add additional functionality. Simulations are configured and run using simple python scripts, allowing complete control over the force field choice, integrator, all parameters, how many time steps are run, etc. The scripting system is designed to be as simple as possible to the non-programmer.
The HOOMD development effort is led by the Glotzer group at the University of Michigan, but many groups from different universities have contributed code that is now part of the HOOMD main package, see the credits page for the full list. The HOOMD website and documentation are available here [51]. More information about our installation can be found here HOOMD.
HOPSPACK
HOPSPACK stands for Hybrid Optimization Parallel Search Package designed to help users to solve wide range of derivative free optimization problems.
The first two constraints specify linear inequalities and equalities with coefficient matrices AI and AE. The next two constraints describe nonlinear inequalities and equalities captured in functions cI(x) and cE(x). The final constraints denote lower and upper bounds on the variables. HOPSPACK allow variables to be continuous or integer-valued and has provisions for multi-objective optimization problems. In general, functions f(x),cI(x), and cE(x) can be noisy and nonsmooth, although most algorithms perform best on determinate functions with continuous derivatives.
The users are allowed to design and implement their own solver either by writing their own code or by building existing solvers already in a framework. Because all solvers (called citizens) are members of the same global class they can share assigned resources. The main features of the package are:
- Only function values are required for the optimization. - The user must provide a separate program that can evaluate the objective and nonlinear constraint functions at a given point. - A robust implementation of the Generating Set Search (GSS) solver is supplied, including the capability to handle linear constraints. - Multiple solvers can run simultaneously and are easily configured to share information. - Solvers may share a cache of computed function and constraint evaluations to eliminate duplicate work. - Solvers can initiate and control sub-problems Continuation -> HOPSACK.
HONDO PLUS
Hondo Plus is a versatile electronic structure code that combines work from the original Hondo application developed by Harry King in the lab of Michel Dupuis and John Rys, and that of numerous subsequent contributers.
It is currently distributed from the research lab of Dr. Donald Truhlar at the University of Minnesota. Part of the advantage of Hondo Plus is the availability of source implementations of a wide variety of model chemistries developed over its life time that researchers can adapt to their particular needs. The license to use the code requires a literature citation which is documented in the Hondo Plus 5.1 manual found at:
http://comp.chem.umn.edu/hondoplus/HONDOPLUS_Manual_v5.1.2007.2.17.pdf
More information about our installation can be found here HONDO PLUS.
HUMAnN2
HUMAnN is a pipeline for efficiently and accurately profiling the presence/absence and abundance of microbial pathways in a community from metagenomic or metatranscriptomic sequencing data (typically millions of short DNA/RNA reads). HUMAnN2 is the next generation of HUMAnN (HMP Unified Metabolic Analysis Network). Details and submission scripts can be found at: http://wiki.csi.cuny.edu/cunyhpc/index.php/Applications_Environment/humann2
I
IMa2
The IMa2 application performs basic calculations ‘Isolation with Migration’ using Bayesian inference and Markov chain Monte Carlo methods.
The only major conceptual addition to IMa2 that makes it different from the original IMa program is that it can handle data from multiple populations. This requires that the user specify a phylogenetic tree. Importantly, the tree must be rooted, and the sequence in time of internal nodes must be known and specified. More information on the IMa2 and IMa can be found in the user manual here [52]. Information about our installation can be found here IMA2.
I-TASSER
I-TASSER is a platform for protein structure and function predictions. 3D models are built based on multiple-threading alignments by LOMETS and iterative template fragment assembly simulations; function inslights are derived by matching the 3D models with BioLiP protein function database. Details and submission scripts can be found at: http://wiki.csi.cuny.edu/cunyhpc/index.php/Applications_Environment/itasser
J
JULIA
Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments. Julia is installed on Penzias.
L
LAMARC
LAMARC is a program which estimates population-genetic parameters such as population size, population growth rate, recombination rate, and migration rates.
It approximates a summation over all possible genealogies that could explain the observed sample, which may be sequence, SNP, microsatellite, or electrophoretic data. LAMARC and its sister program MIGRATE are successor programs to the older programs Coalesce, Fluctuate, and Recombine, which are no longer being supported. These programs are memory-intensive, but can run effectively on workstations. They are supported on a variety of operating systems. For more detail on LAMARC please visit the website here [53], read this paper [54], and look at the documentation here [55]. More information about our installation can be found here LAMARC.
LAMMPS
LAMMPS is a classical molecular dynamics code that models an ensemble of particles in a liquid, solid, or gaseous state.
It can model atomic, polymeric, biological, metallic, granular, and coarse-grained systems using a variety of force fields and boundary conditions. LAMMPS runs efficiently on single-processor desktop or laptop machines, but is also designed for parallel computers, including clusters with and without GPUs. It will run on any parallel machine that compiles C++ and supports the MPI message-passing library. This includes distributed- or shared-memory parallel machines and Beowulf-style clusters. LAMMPS can model systems with only a few particles up to millions or billions. LAMMPS is a freely-available open-source code, distributed under the terms of the GNU Public License, which means you can use or modify the code however you wish. LAMMPS is designed to be easy to modify or extend with new capabilities, such as new force fields, atom types, boundary conditions, or diagnostics. A complete description of LAMMPS can be found in its on-line manual here [56] or from the full PDF manual here [57]. Information about our installation can be found here LAMMPS.
LS-DYNA
From its early development in the 1970s, LS-DYNA has evolved into a general purpose material stress, collision, and crash analysis program with many built-in material and structural element models.
In recent years, the code has also been adapted for both OpenMP and MPI parallel execution on a variety of platforms. The most recent version, LS-DYNA 7.1.2, is installed on ANDY at the CUNY HPC Center under an academic license held by the City College of New York. The use of this license to do work that is commercial in anyway is prohibited.
Details on LS-DYNA's use, input deck construction, and execution options can be found in the LS-DYNA manual here [58]. All files related to the HPC Center installation of version 971 (executables and example inputs) are located in:
/share/apps/lsdyna/default/[bin,examples]
More information about our installation can be found here LSDYNA.
M
MAGMA
MAGMA is a library similar to LAPACK but for hybrid architectures. MAGMA provides implementations for CUDA, Intel Xeon Phi, and OpenCL. On CUNY HPCC systems, MAGMA is installed in its CUDA variant only on Penzias.
MATHEMATICA
“Mathematica” is a fully integrated technical computing system that combines fast, high-precision numerical and symbolic computation with data visualization and programming capabilities.
Mathematica is currently installed on the CUNY HPC Center's ANDY cluster (andy.csi.cuny.edu) and KARLE standalone server (karle.csi.cuny.edu). The basics of running Mathematica on CUNY HPC systems are present here. Additional information on how to use Mathematica can be found at http://www.wolfram.com/learningcenter/
More information is available in this wiki, find it here MATHEMATICA.
MATLAB
The MATLAB high-performance language for technical computing integrates computation, visualization, and programming in an easy-to-use environment where problems and solutions are expressed in familiar mathematical notation.
Typical uses include:
Math and computation Algorithm development Data acquisition Modeling, simulation, and prototyping Data analysis, exploration, and visualization Scientific and engineering graphics Application development, including graphical user interface building
More information about our installation can be found here MATLAB
MET (Model Evaluation Tools)
MET was developed by the National Center for Atmospheric Research (NCAR) Developmental Testbed Center (DTC) through the generous support of the U.S. Air Force Weather Agency (AFWA) and the National Oceanic and Atmospheric Administration (NOAA).
MET provides a variety of verification techniques, including:
- Standard verification scores comparing gridded model data to point-based observations
- Standard verification scores comparing gridded model data to gridded observations
- Spatial verification methods comparing gridded model data to gridded observations using neighborhood, object-based, and intensity-scale decomposition approaches
- Probabilistic verification methods comparing gridded model data to point-based or gridded observations
More information about use and set-up can be found here MET
Migrate
Migrate estimates population parameters, effective population sizes and migration rates of n populations, using genetic data. It uses a coalescent theory approach taking into account the history of mutations and the uncertainty of the genealogy.
inference (BI). Migrate's output is presented in an TEXT file and in a PDF file. The PDF file eventually will contain all possible analyses including histograms of posterior distributions. More information about our installation can be found here MIGRATE
MPFR
The MPFR library is a C library for multiple-precision floating-point computations with correct rounding. MPFR has continuously been supported by the INRIA and the current main authors come from the Caramel and AriC project-teams at Loria (Nancy, France) and LIP (Lyon, France) respectively; see more on the credit page.
MPFR is based on the GMP multiple-precision library. The main goal of MPFR is to provide a library for multiple-precision floating-point computation which is both efficient and has a well-defined semantics. It copies the good ideas from the ANSI/IEEE-754 standard for double-precision floating-point arithmetic (53-bit significant). The library is installed on PENZIAS.
MRBAYES
MrBayes is a program for the Bayesian estimation of phylogeny. Bayesian inference of phylogeny is based upon a quantity called the posterior probability distribution of trees, which is the probability of a tree conditioned on certain observations.
The conditioning is accomplished using Bayes's theorem. The posterior probability distribution of trees is impossible to calculate analytically; instead, MrBayes uses a simulation technique called Markov chain Monte Carlo (or MCMC) to approximate the posterior probabilities of trees. More information about our installation can be found here MRBAYES
msABC
msABC is a program for simulating various neutral evolutionary demographic scenarios based on the software ms (Hudson 2002). msABC extends ms, calculating a multitude of summary statistics.
Therefore, msABC is suitable for performing the sampling step of an Approximate Bayesian Computation analysis (ABC), under various neutral demographic models. The main advantages of msABC are (i) use of various prior distributions, such as uniform, Gaussian, log-normal, gamma, (ii) implementation of a multitude summary statistics for one or more populations, (iii) efficient implementation, which allows the analysis of hundrends of loci and chromosomes even in a single computer, (iv) extended flexibility, such as simulation of loci of variable size and simulation of missing data. More information about our installation can be found here msABC
MSMS
MSMS is a tool to generate sequence samples under both neutral models and single locus selection models. MSMS permits the full range of demographic models provided by its relative MS (Hudson, 2002).
In particular, it allows for multiple demes with arbitrary migration patterns, population growth and decay in each deme, and for population splits and mergers. Selection (including dominance) can depend on the deme and also change with time. More information about our installation can be found here MSMS
N
NAMD
NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. [59].
The main server for Molecular Dynamics Calculations is PENZIAS which supports both GPU and non GPU versions of NAMD. However the MPI only (no GPU support) parallel versions of NAMD are also installed on SALK and ANDY. More information about our installation can be found here NAMD
Network Simulator-2 (NS2)
NS2 is a discrete event simulator targeted at networking research. NS2 provides substantial support for simulation of TCP, routing, and multicast protocols over wired and wireless (local and satellite) networks.
NWChem
NWChem is an ab initio computational chemistry software package which also includes molecular dynamics (MM, MD) and coupled, quantum mechanical and molecular dynamics functionality (QM-MD).
NWChem has been developed by the Molecular Sciences Software group at the Department of Energy's EMSL. The software is available on PENZIAS and ANDY. More information about our installation can be found here NWChem
O
Octopus
Octopus is a pseudopotential real-space package aimed at the simulation of the electron-ion dynamics of one-, two-, and three-dimensional finite systems subject to time-dependent electromagnetic fields.
The program is based on time-dependent density-functional theory (TDDFT) in the Kohn-Sham scheme. All quantities are expanded in a regular mesh in real space, and the simulations are performed in real time. The program has been successfully used to calculate linear and non-linear absorption spectra, harmonic spectra, laser induced fragmentation, etc. of a variety of systems. More information about our installation can be found here OCTOPUS
OpenMM
OpenMM is both a library and a stand-alone application which provides tools for modern molecular modeling simulation. As a library it can be hooked into any code, allowing that code to do molecular modeling with minimal extra coding.
Moreover, OpenMM has a strong emphasis on hardware acceleration via GPU, thus providing not just a consistent API, but much greater performance than what one could get from just about any other code available. OpenMM was developed as a part of Physics-Based Simulation project with project leader prof. Pande.
OpenFOAM
OpenFOAM is before everything a library which users may incorporate in their own code(s). The OpenFOAM is installed on PENZIAS.
More information about our installation can be found here OpenFOAM
OpenSees
OpenSees, the Open System for Earthquake Engineering Simulation, is an object-oriented, open source software framework.
It allows users to create both serial and parallel finite element computer applications for simulating the response of structural and geotechnical systems subjected to earthquakes and other hazards. OpenSees is primarily written in C++ and uses several Fortran and C numerical libraries for linear equation solving, and material and element routines. The software is installed on PENZIAS.
ORCA
The program ORCA is electronic structure program capable to carry out geometry optimizations and to predict a large number of spectroscopic parameters at different levels of theory.
Besides the use of Hartee Fock theory, density functional theory (DFT) and semiempirical methods, high level ab initio quantum chemical methods, based on the configuration interaction and coupled cluster methods, are included into ORCA to an increasing degree. More information about our installation can be found here ORCA
P
ParGAP
ParGAP is build on top of GAP system. The later is a system for computational discrete algebra, with particular emphasis on Computational Group Theory. GAP provides a programming language, a library of thousands of functions implementing algebraic algorithms written in the GAP language as well as large data libraries of algebraic objects.
The ParGAP (Parallel GAP) package itself provides a way of writing parallel programs using the GAP language. Former names of the package were ParGAP/MPI and GAP/MPI; the word MPI refers to Message Passing Interface, a well-known standard for parallelism. ParGAP is based on the MPI standard, and this distribution includes a subset implementation of MPI, to provide a portable layer with a high level interface to BSD sockets. More information about our installation can be found here ParGAP
POPABC
PopABC is a computer package to estimate historical demographic parameters of closely related species/populations (e.g. population size, migration rate, mutation rate, recombination rate, splitting events) within a Isolation with migration model.
The software performs coalescent simulation in the framework of approximate Bayesian computation (ABC, Beaumont et al, 2002). PopABC can also be used to perform Bayesian model choice to discriminate between different demographic scenarios. The program can be used either for research or for education and teaching purposes. Further details and a manual can be found at the POPABC website here [60] More information about our installation can be found here POPABC
PHOENICS
PHOENICS is an integrated Computational Fluid Dynamics (CFD) package for the preparation, simulation, and visualization of processes involving fluid flow, heat or mass transfer, chemical reaction, and/or combustion in engineering equipment, building design, and the environment. More detail is available at the CHAM website, here http://www.cham.co.uk.
Although we expect most users to pre- and post-process their jobs on office-local clients, the CUNY HPC Center has installed the Unix version of the entire PHOENICS package on ANDY. PHOENICS is installed in /share/apps/phoenics/default where all the standard PHOENICS directories are located (d_allpro, d_earth, d_enviro, d_photo, d_priv1, d_satell, etc.). Of particular interest on ANDY is the MPI parallel version of the 'earth' executable 'parexe' which makes full use of the parallel processing power of the ANDY cluster for larger individual jobs. While the parallel scaling properties of PHOENICS jobs will vary depending on the job size, processor type, and the cluster interconnect, larger work loads will generally scale and run efficiently on from 8 to 32 processors, while smaller problems will scale efficiently only up to about 4 processors. More detail on parallel PHOENICS is available at http://www.cham.co.uk/products/parallel.php. Aside from the tightly coupled MPI parallelism of 'parexe', users can run multiple instances of the non-parallel modules on ANDY (including the serial 'earexe' module) when a parametric approach can be used to solve their problems. More information about our installation can be found here PHOENICS
PHRAP-PHRED
PHRAP and PHRED are part of the DNA sequence analysis tool set that also includes the programs CROSSMATCH and SWAT. These tools are describe in detail here [61], but a brief description of both, extracted from their manuals, follows.
PHRED and PHRAP (along with CONSED) can be used for both small sequence assemblies and larger shotgun analyses. This makes the tools a perhaps under-utilized set for smaller non-genomic groups. Some variables may need to be adjusted, particularly in CONSED, but researchers that have multiple sequences from a small locus can use the suite, starting from their chromatogram files. More information about our installation can be found here PHRAP-PHRED
PyRAD
Reduced-representation genomic sequence data (e.g., RADseq, GBS, ddRAD) are commonly used to study population-level research questions and consequently most software packages for assembling or analyzing such data are designed for sequences with little variation across samples.
Phylogenetic analyses typically include species with deeper divergence times (more variable loci across samples) and thus a different approach to clustering and identifying orthologs will perform better. pyRAD is intended for use with any type of restriction-site associated DNA. It currently supports RAD, ddRAD, PE-ddRAD, GBS, PE-GBS, EzRAD, PE-EzRAD, 2B-RAD, nextRAD, and can be extended to other types. More information about our installation can be found here PyRAD
Python
Python is a programming language that lets you work more quickly and integrate your systems more effectively. You can learn to use Python and see almost immediate gains in productivity and lower maintenance costs. [62]
There are two supported versions installed on Andy system:
- Python 3.1.3 located under /share/apps/python/3.1.3/bin
- Python 2.7.3 located under /share/apps/epd/7.3-2/bin
More information about our installation can be found here PYTHON
Installing Python packages
Users may install python packages/modules in their own space. Many packages available in Python repositories can be installed easily with PIP manager, which is available in any of Anaconda and Miniconda builds.
Users must remember that using PIP without first loading the module for python will cause the installed modules to match system python on login node only. However the python interpreter available (after login module) on all nodes is installed in /share/usr/compilers/python space. Thus when installing packages in user space it is very important to follow the procedure outlined below. The given example demonstrates how users can install package "guppy" in their own space:
For Python 2.7.13 in Anaconda build:
module load python/2.7.13_anaconda pip install guppy --user
For Python 3.6.0 in Anaconda build
module load python/3.6.0_anaconda pip install guppy --user
For Python 2.7.13 in Miniconda
module load python/miniconda2 pip install guppy --user
For Python 3.6.0 in Miniconda 3
module load python/miniconda3 pip install guppy --user
To check if the package is properly installed type:
pip list | grep guppy
Q
QIIME
QIIME (pronounced "chime") stands for Quantitative Insights Into Microbial Ecology. QIIME is a pipeline application that uses numerous third-party applications.
QIIME takes users from their raw sequencing output through initial analyses such as OTU picking, taxonomic assignment, and construction of phylogenetic trees from representative sequences of OTUs, and through downstream statistical analysis, visualization, and production of publication-quality graphics. More information about our installation can be found here QIIME
R
R
R is a free software environment for statistical computing and graphics.
General Notes
R language has become a de facto standard among statisticians for the development of statistical software, and is widely used for statistical software development and data analysis. R is available on the following HPCC's servers: Karle, Penzias, Appel and Andy. Karle is the only machine where R can be used without submitting jobs to PBS manager. On all other systems users must submit their R jobs via PBS batch scheduler. More information about our installation can be found here R
RAXML
Randomized Axelerated Maximum Likelihood (RAxML) is a program for sequential and parallel maximum likelihood based inference of large phylogenetic trees. It is a descendent of fastDNAml which in turn was derived from Joe Felsentein’s DNAml which is part of the PHYLIP package.
RAxML is installed at the CUNY HPC Center on ANDY. Multiple versions are available. RAxML is available in both serial and MPI parallel versions. The MPI-parallel version should be run on four or more cores. RaxML parallel MPI version is installed on Penzias. More information about our installation can be found here RAXML
S
SAGE
Sage can be used to study elementary and advanced, pure and applied mathematics.
This includes a huge range of mathematics, including basic algebra, calculus, elementary to very advanced number theory, cryptography, numerical computation, commutative algebra, group theory, combinatorics, graph theory, exact linear algebra and much more. More information about our installation can be found here SAGE
SAMTOOLS
SAMTOOLS provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.
SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments. SAM is compact format aims to be a format that:
Is flexible enough to store all the alignment information generated by various alignment programs; Is simple enough to be easily generated by alignment programs or converted from existing formats; Allows most of operations on the alignment to work without loading the whole alignment into memory; Allows the file to be indexed by genomic position to efficiently retrieve all reads aligning to a locus.
More information about our installation can be found here SAMTOOLS
SAS
SAS (pronounced "sass", originally Statistical Analysis System) is an integrated system of software products provided by SAS Institute Inc.
It enables the programmer to perform:
- data entry, retrieval, management, and mining
- report writing and graphics
- statistical analysis
- business planning, forecasting, and decision support
- operations research and project management
- quality improvement
- applications development
- data warehousing (extract, transform, load)
- platform independent and remote computing
More information about our installation can be found here SAS
Stata/MP
Stata is a complete, integrated statistical package that provides tools for data analysis, data management, and graphics. Stata/MP takes advantage of multiprocessor computers. CUNY HPC Center is licensed to use Stata on up to 8 cores.
Currently Stata/MP is available for users on Karle (karle.csi.cuny.edu). More information about our installation can be found here STATA
Structurama
Structurama is a program for inferring population structure from genetic data. The program assumes that the sampled loci are in linkage equilibrium and that the allele frequencies for each population are drawn from a Dirichlet probability distribution. Two different models for population structure are implemented.
First, Structurama offers the method of Pritchard et al. (2000) in which the number of populations is considered fixed. The program also allows the number of populations to be a random variable following a Dirichlet process prior(Pella and Masuda, 2006; Huelsenbeck and Andolfatto, 2007). More information about our installation can be found here STRUCTURAMA
Structure
The program Structure is a free software package for using multi-locus genotype data to investigate population structure. Its uses include inferring the presence of distinct populations, assigning individuals to populations, studying hybrid zones, identifying migrants and admixed individuals, and estimating population allele frequencies in situations where many individuals are migrants or admixed.
More information about our installation can be found here STRUCTURE
T
Thrust Library (CUDA)
Thrust is a C++ template library for CUDA based on the Standard Template Library (STL). Thrust allows you to implement high performance parallel applications with minimal programming effort through a high-level interface that is fully interoperable with CUDA C.
As of CUDA, Thrust has been integrated into the default CUDA distribution. The HPC Center is currently running CUDA as the default on PENZIAS which includes Thrust library. More information about our installation can be found here THRUST
TOPHAT
TOPHAT is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.
TOPHAT is part of a sequence alignment and analysis tool chain developed at John Hopkins, University of California at Berkeley, and Harvard, and distributed through the Center for Bioinformatics and Computational Biology. More information about our installation can be found here TOPHAT
Trinity
Trinity, developed at the Broad Institute and the Hebrew University of Jerusalem, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data.
Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes. More information about our installation can be found here TRINITY
U
USEARCH
USEARCH is a unique sequence analysis tool with thousands of users world-wide.
USEARCH offers search and clustering algorithms that are often orders of magnitude faster than BLAST. More information about our installation can be found here USEARCH
V
VELVET
Velvet is a set of algorithms for de novo short read assembly using de Bruijn graphs. It was developed at the European Bioinformatics Institute, Cambridge, UK. More information about our installation can be found here VELVET.
VSEARCH
VSEARCH is a open source alternative to USEARCH.
VSEARCH stands for vectorized search, as the tool takes advantage of parallelism in the form of SIMD vectorization as well as multiple threads to perform accurate alignments at high speed. VSEARCH uses an optimal global aligner (full dynamic programming Needleman-Wunsch), in contrast to USEARCH which by default uses a heuristic seed and extend aligner. This usually results in more accurate alignments and overall improved sensitivity (recall) with VSEARCH, especially for alignments with gaps.
Additional details on VSEARCH can be found at: this link
VSEARCH is installed on Penzias HPC cluster. To start using VSEARCH load corresponding module first:
module load vsearch
VMD
VMD is a molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics and built-in scripting.
It was developed by The Theoretical and Computational Biophysics Group at the University of Illinois. It is documented on the TCB's homepage.
VMD is installed on Karle. To use it within command-line interface login to Karle as usual and start VMD by typing "vmd" followed by return. Or alternatively use the full path: "/share/apps/vmd/default/bin/vmd"
In order to use VMD in GUI-mode, login to Karle with -X option (see this article for details) and start VMD as described above.
W
WRF
The Weather Research and Forecasting (WRF) model is a specific computer program with dual use for both weather forecasting and weather research.
It was created through a partnership that includes the National Oceanic and Atmospheric Administration (NOAA), the National Center for Atmospheric Research (NCAR), and more than 150 other organizations and universities in the United States and abroad. WRF is the latest numerical model and application to be adopted by NOAA's National Weather Service as well as the U.S. military and private meteorological services. It is also being adopted by government and private meteorological services worldwide. More information about our installation can be found here WRF
X
Xmgrace
Grace is a WYSIWYG 2D plotting tool for the X Window System and M*tif. Xmgrace is developed at Plasma Laboratory, Weizmann Institute of Science. More information about it's capabilities can be found at the web page http://plasma-gate.weizmann.ac.il/Grace/
Grace is installed on Karle. To use it within command-line interface login to Karle as usual and start Grace by typing "xmgrace" followed by return. Or alternatively use the full path: "/share/apps/xmgrace/default/grace/bin/xmgrace" In order to use Grace in GUI-mode, login to Karle with -X option (see this article for details) and start Xmgrace as described above.