Main Page

From CUNYHPC

Jump to: navigation, search

Image:CUNY-HPC-Logo.gif

Contents


Introduction to the City University of New York High Performance Computing Center

The City University of New York (CUNY) High Performance Computing Center (HPCC) is located on the campus of the College of Staten Island, 2800 Victory Boulevard, Staten Island, New York 10314. HPCC goals are to:

  • Support the scientific computing needs of CUNY faculty, their collaborators at other universities, and their public and private sector partners, and CUNY students and research staff.
  • Create opportunities for the CUNY research community to develop new partnerships with the government and private sectors; and
  • Leverage the HPC Center capabilities to acquire additional research resources for its faculty and graduate students in existing and major new programs.

Please send comments on or corrections to the wiki to HPChelp@mail.csi.cuny.edu

Installed systems

The HPCC currently operates seven significant systems. The following table summarizes the characteristics of these systems; additional information is provided below the table.

Image:systems1.png

Andy. Andy (andy.csi.cuny.edu) is named in honor of Dr. Andrew S. Grove, an alumnus of the City College of New York and one of the founders of the Intel Corporation (http://educationupdate.com/archives/2005/Dec/html/col-ccnypres.htm) . Andy is composed of two distinct computational halves serviced by a single head node and several service nodes. The first and older half (Andy1) is an SGI ICE system (http://www.sgi.com/products/servers/altix/ice/) with 45 dual-socket, compute nodes each with Intel 2.93 GHz quad-core Intel Core 7 (Nehalem) processors providing a total of 360 compute cores. Each compute node has 24 Gbytes of memory or 3 Gbytes of memory per core. Andy1's interconnect network is a dual rail, DDR Infiniband (20 Gbit/second) network in which one rail is used to access Andy's Lustre storage system and the other is used for inter-processor communication. The second and newer half (Andy2) is a cluster of 48 SGI x340 1U compute nodes (each configured similarly to those on Andy1 to give it 384 cores), but also connected to 24 1U quad-GPU Fermi s2050 accelerator nodes. Each socket in each x340 compute node on Andy2 has a companion Fermi GPU with 3 GBytes of memory associated with it for a total of 96 GPUs system wide. Andy2's interconnect is a single rail QDR Infiniband (40 Gbit/second) network serving both its communication network and Lustre storage system. A portion of Andy2 (3 compute nodes or 24 cores and 6 GPUs) is dedicated to GPU interactive and development work, while the rest (45 dual-socket compute nodes [360 cores] and 90 GPUs) are available for parallel and serial production work in either CPU-only or CPU-GPU mode. Both Andy1 and Andy2 (360 + 384 == 744 cores) are served by the same head node and home directory, which is a Lustre parallel file system with 24 Tbytes of useable storage.

Bob. Bob (bob.csi.cuny.edu) is named in honor of Dr. Robert E. Kahn, an alumnus of the City College of New York who, along with Vinton G. Cerf, invented the TCP/IP protocol, the technology used to transmit information over the modern Internet (http://www.economicexpert.com/a/Robert:E:Kahn.htm). "Bob" is a Dell PowerEdge system consisting of one head node with two sockets of AMD Shanghai native quad-core processors running at 2.3 GHz and twenty-nine compute nodes of the same type providing a total of 30 x 8 = 240 cores. Each compute node has 16 Gbytes of memory or 2 Gbytes of memory per core. "Bob" has both a standard 1 Gbit Ethernet interconnect and a low-latency, SDR Infiniband (10 Gbit/second) interconnect. Bob is currently largely dedicated to running the Gaussian suite of computation chemistry programs.

Karle. Karle (karle.csi.cuny.edu) is named in honor of Dr. Jerome Karle, an alumnus of the City College of New York who was awarded the Nobel Prize in Chemistry in 1985, jointly with Herbert A. Hauptman, for the direct analysis of crystal structures using X-ray scattering techniques. Karle functions both as a gateway and interface system to run MATLAB, SAS, MATHEMATICA and other GUI-oriented applications for CUNY users both within and outside the local area network at the College of Staten Island where the CUNY HPC Center is located. Karle can be used to run such computations (in serial or parallel) locally and directly on Karle, or to submit batch work over the network to the clusters Bob or "Andy" described above. As a single, four socket, 4 x 6 = 24 core head-like node, Karle is a highly capable system. Karle's 24 Intel E740-based cores run at 2.4 GHz. Karle has a total of 96 Gbytes of memory or 4 Gbytes per core. Account allocation on Karle will be limited to those requiring access to the GUI-oriented applications it is intended to run.

Neptune. Neptune (neptune.csi.cuny.edu) functions as a generic gateway or interface system for CUNY users that are not within local area network at the College of Staten Island where the CUNY HPC Center is located. Neptune can be addressed using the secure shell command ssh (ssh [-X] neptune.csi.cuny.edu). "Neptune" is only used as a secure jumping-off point to access other HPCC systems. HPC work loads should NOT be run on Neptune which has limited memory and compute power. Work found running on Neptune that consumes significant quantities of CPU time as shown in the 'top' command will be killed. This applies in general to the head nodes of all the CUNY systems.

Salk. Salk (salk.csi.cuny.edu) is named in honor of Dr. Jonas Salk, also an alumnus of the City College of New York and creator of the first vaccine for Polio (http://en.wikipedia.org/wiki/Jonas_Salk#College). Salk is a two-cabinet, Cray XE6m system interconnected with Cray's latest, custom, high-speed, Gemini interconnect. Salk consists of 176 dual-socket compute nodes each containing two 8-core AMD Magny-Cours processors running at 2.3 GHz for a total of 16 cores per node. This gives the system a total of 2816 cores for the production processing of CUNY's HPC applications. Each node has a total of 32 Gbytes of memory or 2 Gbytes of memory per core. "Salk's" Gemini interconnect is a high-bandwidth, low-latency, high-message-injection rate interconnect supported by a custom ASIC and low-level communications protocol developed by Cray. Unlike the other clusters at the CUNY HPC Center which are connected in a multi-tiered switch topology, the Cray XE6m nodes supported by Gemini are laid out in a 2D torus network. "Salk" is intended to run jobs of a larger scale than the other CUNY HPC Center systems. Jobs smaller that 16 cores are not allowed on SALK while jobs of 1024 cores and larger are. In addition, SALK, through its Gemini interconnect and compilers, support the Partitioned Global Address Space languages, CoArray Fortran and Unified Parallel C. These languages make programming large, distributed-memory parallel systems easier and more scalable.

Zeus. Zeus (zeus.csi.cuny.edu) is focused on supporting users running Gaussian, and now also, the development of CPU-GPU applications. This system (Dell PowerEdge 1950) consists of one head node (2 x 4 cores running at 1.86 GHz) and 18 compute nodes. Eight of the compute nodes (nodes 0 through 7) have two sockets with Intel 2.66 GHz quad-core Harpertown processors providing a total of eight cores per node. These 8 Harpertown nodes have 2 Gbytes of memory per core for a total of 16 Gbytes per node. Each Harpertown node also has a ~1 TByte disk drive (/state/partition1) for storing Gaussian scratch files. Compute nodes (nodes 8 and 9) have two sockets with Intel 2.27 GHz Woodcrest dual-core processors and a total of 6 Gbytes of memory. Nodes 8 and 9 are also each attached to a NVIDIA Tesla S1070, 1U, 4-way GPU array via dual PCI-Express 2.0 cables to support integrated CPU-GPU computing. Each GPU (4 per 1U Tesla node) has 240, 32-bit floating-pointing units with a peak performance of 1 teraflop (there are 30 64-bit units). Each GPU also has 4 Gbytes of GPU-local memory. Zeus has another 8 compute nodes (compute-0-10 through compute-0-17) that are single socket Intel 2.86 GHz Woodcrest dual-core processors for a total of 88 cores. They may also be used for Gaussian work and include a local 250 Gbyte disk drive for storing Gaussian scratch files. The interconnect network is a standard 1 Gbit Ethernet.

New system installations planned for Q4 2012

Penzias. Penzias (penzias.csi.cuny.edu) is named in honor Dr. Arno Penzias who graduated from City College of New York in 1954 and won the Nobel Prize in Physics in 1978 for his discovery along with Robert Wilson of cosmic microwave background radiation while working at Bell Labs [1]. This system will consist of a head node and 72 compute nodes with a total of 2 x 8 x 72 = 1,152 Sandy Bridge E5-2660 processors running at a base speed of 2.2 GHz, although turbo performance of up to 2.7 GHz is possible. Each compute node will have 2 GBytes per core for a total of 32 GBytes per node. On-board communication will be supported by a PCI-Express Gen3 bus. Communication among the nodes will be based on a two-tiered, QDR InfiniBand fully-fat tree switched interconnect. Penzias will have 30 TBytes of directly attached local storage and also be attached to a 1 PByte home directory storage hub. We expect that Penzias will also be augmented with NVIDIA's new Kepler GPUs when they become available later in the year. More detail will be forthcoming on Penzias as we get closer to this system installation date.

In addition to the above, the HPC Center will be installing a new, centralized Storage System and Network. The Storage System will provide an order of magnitude of additional on-line storage capacity for home directories and project space that is directly accessible (although not directly controlled) from any of the HPC Center's installed systems, and include a large, remote tape archival facility.

The remote tape silo will allow for daily incremental backups, full weekly and monthly backups, and long-term retention of critical research data. An iRODS server will be integrated into the environment and will provide a mechanism for the user community to share data.


Image:sysdata.png


  • All home directories and project files will be resident on the Storage System and Network.
  • Salk, Andy, and Karle will use their local Lustre File Systems for local scratch space.
  • The “Rocks” (CentOS) based clusters (Penzias, Zeus, and Bob) will use local NFS File Systems for local scratch space.
  • Bob and Zeus will also continue to provide directly-attached scratch space for Gaussian users on each compute node.


Acknowledgements

  • National Science Foundation Grant 1126113 funded the Storage Network.
  • A New York State Regional Economic Development Grant funded the upgrade to Salk
  • The acquisition of Penzias was made possible by funding from the Office of the CUNY Chief Information Officer.

Software Overview

The HPC Center works to maintain a certain amount of uniformity in its software stack, especially at the user and application level. In general, we have standardized on OpenMPI as our MPI implementation, although vendor versions from Cray and SGI are available (on the SALK the Cray version on MPI is the default). While we support the Intel, PGI, and GNU compilers, we have made the Intel compiler suite the default on all systems, except SALK. Moving down the stack to the operating systems, we are a Linux shop although there is some variation in the flavor on Linux supported on each system dictated by the vendor. As such, on PENZIAS (available Q4 2012), BOB, and ZEUS which are Commodity Off-The-Self (COTS) clusters from Dell, we support CentOS which is part of the Rocks 5.3 release. The operating system running on ANDY is SLES 11 updated with SGI ProPack SP1 support package. The operating system on SALK's, Cray Linux Environment 3.1 (CLE 3.1), is based on SLES 11. The queuing system in use on all CUNY HPC Center systems is PBS Pro 11 with a queue design that is as identical as possible across the systems. The user application software stack supported on all systems includes the following compilers and parallel library software. Much more detail on each can be found below.

  • GNU C, C++ and Fortran compilers;
  • Portland Group, Inc. optimizing C, C++, and Fortran compilers with CUDA and GPU support;
  • The Intel Cluster Studio including the Intel C, C++ and Fortran compilers, Math and Kernel Library;
  • OpenMPI 1.5.5 (Cray's custom MPICH on SALK, SGI's proprietary MPT on ANDY, and Intel's MPI are also available)

SALK, the Cray XE6m system, uses is own proprietary MPI library based on the API to its Gemini interconnect. Cray also provides its own C, C++, and Fortran Compilers which support the Partitioned Global Address Space parallel programming models, Unified Parallel C (UPC) and CoArray Fortran (CAF) respectively.

The following third party applications are currently installed, although not on every system described above. The CUNY HPC Center staff will be happy to work with any user interested in installing additional applications, subject to meeting that application's license requirements.

  • ADCIRC
  • ADF (Amsterdam Density Functional Theory)
  • AMBER
  • AUTODOCK
  • BAMOVA
  • BAYESCAN
  • BEAST
  • BEST
  • Bioperl
  • Blast
  • BPP2
  • Brownie
  • BUPC (Berkeley UPC)
  • CESM
  • CUDA
  • Dalton
  • DLPOLY
  • FASTA
  • Gauss (Economic Modeling)
  • Gaussian03
  • Gaussian09
  • GNUPlot
  • GenomePOP2
  • Gromacs
  • HondoPlus
  • HOOMD
  • IMA2
  • Lamarc
  • LAMMPS
  • LS-DYNA
  • Mathematica
  • MATLAB
  • MAXENT
  • MET
  • Migrate
  • MrBayes
  • MSMS
  • NAMD
  • Network Simulator2 (NS2)
  • NWCHEM
  • Octopus
  • Phoenics
  • Quantum Espresso
  • R
  • RAxML
  • RAxMLL
  • ROMS
  • SAGE
  • SAS
  • Stata
  • Strucuturama
  • Structure
  • Thrust
  • VMD
  • Visualization/NAG
  • WRF (Weather Research and Forecasting Code)
  • WRF-Chem

The following graphics, IO, and scientific libraries are also supported.

  • Atlas
  • Boost
  • FFTW (2.1.5, 3.2.2, 3.3.0)
  • GRADS
  • GSL
  • HDF4
  • HDF5
  • IMSL
  • LAPACK
  • MET
  • NCAR
  • NETCDF
  • PNETCDF (Argonne)
  • SPARSEKIT
  • ZLIB

Hours of Operation

The second and fourth Tuesday mornings in the month from 8:00AM to 12PM are normally reserved (but not always used) for scheduled maintenance. Please plan accordingly. Unplanned maintenance to remedy system related problems may be scheduled as needed. Reasonable attempts will be made to inform users running on those systems when these needs arise.

User Support

Users are encouraged to read this Wiki carefully. In particular, the sections on compiling and running parallel programs, and the section on the PBS Pro batch queueing system will give you the essential knowledge needed to use the CUNY HPC Center systems. We have strived to maintain the most uniform user applications environment possible across the Center's systems to ease the transfer of applications and run scripts among them. Still, there are some differences, particularly with the SGI (ANDY) and Cray (SALK) systems.

The CUNY HPC Center staff, along with outside vendors, offer regular courses and workshops to the CUNY community in parallel programming techniques, HPC computing architecture, and the essentials of using our systems. Please follow our mailings on the subject and feel free to inquire about such courses. We regularly schedule training visits and classes at the various CUNY campuses. Please let us know if such a training visit is of interest. In the past, topics have include an overview of parallel programming, GPU programming and architecture, using the evolutionary biology software at the HPC Center, the PBS queueing system at the CUNY HPC Center, Mixed GPU-MPI and OpenMP programming, etc. Staff has also presented guest lectures at formal classes throughout the CUNY campuses.

Users with further questions or requiring immediate assistance in use of the systems should send an email to:


hpchelp@mail.csi.cuny.edu

Mail to this address is received by the entire CUNY HPC Center support staff. This ensures that the person on staff with the most appropriate skill set and job related responsibility will respond to your questions. During the business week you should expect a same-day response. During the weekend you may or may not get same-day response depending on what staff are reading email that weekend. Please send all technical and administrative questions (including replies) to this address.

Please do not send questions to individual CUNY HPC Center staff members directly.


These will be returned to the sender with a polite request to send them to 'hpchelp'. This applies to replies to initial questions as well as those initial questions.

The CUNY HPC Center staff are focused on providing high quality support to its user community, but compared to other HPC Centers of similar size our staff is lean. Please make full use of the tools that we have provided (especially the Wiki), and feel free to offer suggestions for improved service. We hope and expect your experience in using our systems will be predictably good and productive.

Data storage, retention/deletion, and back-ups

Home Directories

Each user account, upon creation, is provided a home directory (currently on each system) with a default 50 GB storage ceiling or disk quota. A user may request an increase in the size of their home directory if there is a special need. The HPC Center will endeavor to satisfy reasonable requests, but storage is not unlimited and full file systems (especially large files) make backing up the system more difficult. Please regularly remove unwanted files and directories to minimize this burden and avoid keeping duplicate copies in multiple locations. File transfer among the HPC Center systems is very fast. Furthermore, occasionally HPC Center users have thought that HPC Center disks could be used to 'park' or archive data that was locally generated at their site on our HPC Center systems. This practice strictly forbidden.

By the end of 2012, the HPC Center will have completed upgrading its storage system and network architecture. This will create a central hub, home directory storage location for all systems of over 1 PByte is size with tape backup and high-speed local script space on each system. Look for these changes here and in HPC Center mailings.

An incremental backup of user home directories on Andy, Salk, Karle, Bob, and Zeus is performed daily. These backups are retained for three weeks. Full backups are performed weekly and are retained for two months. These backups are stored in a remote location. A full backup is read off tape, bi-monthly, and verified (to ensure backups are readable and restorable).

The following user and system files are backed up:

Retention/Deletion of Home Directories

For active accounts, current Home Directories are retained indefinitely. If a user account is inactive for one year, the HPCC will contact the user and request that the data be removed from the system. If there is no response from the user within three months of the initial notice, or if the user cannot be reached, the Home Directory will be purged.

System temporary/scratch directories

Files on system temporary and scratch directories, as well as home directories on Neptune are not backed up. There is no provision for retaining data stored in these directories.

Acknowledgements

The CUNY HPCC gratefully acknowledges support from the following sources:

• The acquisition of “Salk”, a Cray XE6m, was made possible by a grant from the National Science Foundation under award CNS-0958379.

• The acquisition of “Andy”, a SGI system with NVIDIA accelerators, was made possible by a grant from National Science Foundation under award CNS-0855217, as well as funding from the New York City Council made possible through the efforts of Councilman James Oddo.

• CUNY HPCC facility upgrades were funded through the efforts of Staten Island Borough President James P. Molinaro.

• Operating funds for the CUNY HPCC are provided by the College of Staten Island and the City University of New York.

• The acquisition of 1 PByte Storage System and Network was made possible by National Science Foundation Grant 1126113.

• The acquisition of the SGI UV-2 and upgrade of SALK with a second cabinet was made possible by a New York State Regional Economic Development Grant.

• The acquisition of the Dell cluster Penzias was made possible by funding from the Office of the CUNY Chief Information Officer.

Users of the CUNY HPCC resources should include the following statement in papers, journal articles, and presentations:

“This research was supported, in part, under National Science Foundation Grants CNS-0958379 and CNS-0855217 and the City University of New York High Performance Computing Center.”

The CUNY HPCC requests that users of the system provide it copies of any such publications. The publications can be forwarded electronically to Hpchelp@csi.cuny.edu


Program Compilation and Job Submission

Serial Program Compilation

The CUNY HPC Center supports four different compiler suites at this time; those from Cray, Intel, The Portland Group, and GNU. Basic serial programs in C, C++, and Fortran can be compiled with any of these offerings, although the Cray compilers are available only on SALK. Man pages (e.g. for Cray, man cc, for Intel, man icc; for PGI, man pgcc; for GNU, man gcc) and manuals exist for each compiler in each suite and provide details on specific compiler flags. Optimized performance on a particular system with a particular compiler often depends on the compiler options chosen. Identical flags are accepted by the MPI-wrapped versions of each compiler (mpicc, mpif90, etc. [NOTE: SALK does not use mpi-prefixed MPI compile and run tools; it has its own]). Program debuggers and performance profilers are also part of each of these suites.

  • The Intel Compiler Suite
Intel's Cluster Studio (ICS) compilers, debuggers, profilers, and libraries are available on all HPC Center cluster systems, including the Cray system, SALK.
  • The Portland Group Compiler Suite
The Portland Group Inc. (PGI) compilers, debuggers, profilers, and libraries are available on all HPC Center cluster systems including the Cray system, SALK.
  • The Cray Compiler Suite
The HCP Center's Cray XE6m system, SALK, includes the Cray Compiler Environment (CCE) provided by Cray along with the others described here.
  • The GNU Compiler Suite
The GNU compilers, debuggers, profilers, and libraries are available on all HPC Center cluster systems although unlike the other compilers mention, the default and mix of installed versions may not be the same on each system. This is because the HPC Center runs different version of Linux (SUSE and CentOS) at different release levels.

For details and instructions on Serial Program Compilation and the items listed above. Click Here.

OpenMP, OpenMP SMP-Parallel Program Compilation, and PBS Job Submission

All the compute nodes on all the the systems at the CUNY HPC Center include at least 2 sockets and multiple cores. Some have 8 cores (ZEUS, BOB, ANDY), and some have 16 (SALK). These multicore, SMP compute nodes offer the CUNY HPC Center user community the option of creating parallel programs using the OpenMP Symmetric Multi-Processing (SMP) parallel programming model. The SMP parallel programming with the OpenMP model (and other SMP models) is the original parallel processing model because the earliest parallel HPC systems were built only with shared memories. The Cray-XMP (circa 1982) was among the first systems in this class. Shared memory, multi-socket and multi-core designs are now typical of even today's desktop and portable PC and Mac systems. As the CUNY HPC Center systems, each compute node is similarly a shared-memory, symmetric multi-processing system that can compute in parallel using the OpenMP shared-memory model.

In the SMP model, multiple processors work simultaneously within a single program's memory space (image). This eliminates the need to copy data from one program (process) image to another (required by MPI) and simplifies the parallel run-time environment significantly. As such, writing parallel programs to the OpenMP standard is generally easier and requires many fewer lines of code. However, the size of the problem that can be addressed using OpenMP is limited by the amount of memory on a single compute node, and the similarly the parallel performance improvement to be gained is limited by the number of processors (cores) within that single node.

As of Q4 2012 at CUNY's HPC Center, OpenMP applications can run with a maximum of 16 cores (this is on SALK, the Cray XE6m system). Most of the HPC Center's other systems are limited to 8 core OpenMP parallelism. When the HPC Center's SGI UV-2 is installed in November of 2012, OpenMP programs of much larger size will be runnable because the UV-2 is a 512 processor shared, non-uniform-memory-architecture (NUMA) system.

  • Compiling OpenMP Programs Using the Intel Compiler Suite
  • Compiling OpenMP Programs Using the PGI Compiler Suite
  • Compiling OpenMP Programs Using the GNU Compiler Suite
  • Submitting an OpenMP Program to the PBS Batch Queueing System

For details and instructions on OpenMP, OpenMP SMP-Parallel Program Compilation, and PBS Job Submission and the items listed above. Click Here.

MPI, MPI Parallel Program Compilation, and PBS Batch Job Submission

The Message Passing Interface (MPI) is a hardware-independent parallel programming and communications library callable from C, C++, or Fortran. Quoting from the MPI standard:

MPI is a message-passing application programmer interface (API), together with protocol and semantic specifications for how its features must behave in any implementation.

MPI has become the de facto standard approach for parallel programming in HPC. MPI is a collection of well-defined library calls composing an Applications Program Interface (API) for transfering data (packaged as messages) between completely independent processes with independent address spaces. These processes might be running within a single physical node, as required above with OpenMP, or distributed across nodes connected by an interconnect such as GigaBit Ethernet or InfiniBand. MPI communication is generally two-sided with both the sender and receiver of the data actively participating in the communication events. Both point-to-point and collective communication (one-to-many; many-to-one; many-to-many) are supported. MPI's goals are high performance, scalability, and portability. MPI remains the dominant parallel programming model used in high-performance computing today, although it is sometimes criticized as difficult to program with.

  • An Overview of the CUNY MPI Compilers and Batch Scheduler
  • Sample Compilations and Production Batch Scripts
    • Intel OpenMPI Parallel C
    • Intel OpenMPI Parallel FORTRAN
    • Intel OpenMPI PBS Submit Script
    • Portland Group OpenMPI Parallel C
    • Portland Group OpenMPI Parallel FORTRAN
    • Portland Group OpenMPI PBS Submit Script
    • Cray's Custom Gemini-based MPI Parallel C on SALK
    • Cray MPI PBS Submit Script
    • GNU OpenMPI Parallel C
    • GNU OpenMPI Parallel FORTRAN
    • GNU OpenMPI PBS Submit Script
    • Other System-Local Custom Versions of the MPI Stack
  • Setting Your Preferred MPI and Compiler Defaults
  • Getting the Right Interconnect for High Performance MPI

For details and instructions on MPI, MPI Parallel Program Compilation, and PBS Batch Job Submission and the items listed above. Click Here.

GPU Parallel Program Compilation and PBS Job Submission

The CUNY HPC Center supports computing with Graphics Processing Units (GPUs). GPUs can be thought of of as highly parallel co-processors (or accelerators) connected to a node's CPUs via a PCI Express bus. The HPC Center provides GPU accelerators on two systems, on ZEUS (for largely development purposes) and on ANDY (for both development and production). ZEUS has an older, rack-mounted NVIDIA Tesla S1070 attached to two, dual-socket, dual-core x86-64 compute nodes (nodes compute-0-8 and compute-0-9). This arrangement provides 4 GPUs, one per CPU socket for CUDA and OpenCL development work on ZEUS. Recently, the HPC Center upgraded ANDY (a production configured CPU-GPU cluster) with new 448 core, 1U rack-mounted NVIDIA Fermi S2050 nodes each of which include 4 Fermi GPUs. Referred to as ANDY2, this system is coupled to, (although interconnect- distinct from) ANDY1 installed in December of 2009. ANDY2 combines an additional 384 Nehalem cores with 96 NVIDIA Fermi GPUs (4 per Fermi 1U form factor). Each of the 96 Fermis has 448 light-weight cores for parallel floating-point or integer calculation. The details of ANDY's (ANDY1 and ANDY2) architecture are described above in the CUNY HPC Center's system description section.

Each NVIDIA Fermi S2050 chassis (and each Tesla S1070) includes 4 NVIDIA GPUs enhanced for scientific use. Two Two GPUs of these 4 are connected (one per socket) to each x86-64 compute node via a single 16x PCI-Express 2.0 cable. In combination, a Fermi's 448 cores (clocked at 1.147 GHz at CUNY) are capable of 515 double-precision GFlops and more than 1 TFlops single precision. This gives each four-GPU Fermi S1070 a peak single-precision performance of over 4 TFlops and a peak double precision of over 2 TFlops. In combination, the peak single-precision performance of the 96 Fermi GPUs available on ANDY2 is over 100 TFlops. The Tesla GPU's 240 cores (clocked at 1.296 GHz at CUNY) on ZEUS are capable of 993 GFlops in single-precision, but just 78 GFlops in double-precision.

  • GPU Parallel Programming with the Portland Group Compiler Directives
  • Submitting Portland Group, GPU-Parallel Programs Using PBS
  • GPU Parallel Programming with NVIDIA's CUDA C or PGI's CUDA Fortran Programming Models
    • A Sample CUDA GPU Parallel Program Written in NVIDIA's CUDA C
    • A Sample CUDA GPU Parallel Program Written in PGI's CUDA Fortran
  • Submitting CUDA (C or Fortran), GPU-Parallel Programs Using PBS
  • Submitting CUDA (C or Fortran), GPU-Parallel Programs and Functions Using MATLAB

For details and instructions on GPU Parallel Program Compilation and PBS Job Submission and the items listed above. Click Here.

CoArray Fortran and Unified Parallel C (PGAS) Program Compilation and PBS Job Submission

As part of its plan to offer CUNY HPC Center users a unique variety of HPC parallel programming alternatives (beyond even those described above), the HPC Center support a two cabinet 2816 core Cray XE6m system called SALK. This system supports two newer and similar, language-integrated and highly scalable approaches to parallel programming, CoArray Fortran (CAF) and Unified Parallel C (UPC). Both are extensions of their parent languages, Fortran and C respectively, and offer a symbolically concise alternative to the de facto standard, message-passing model, MPI. CAF and UPC are so-called Partitioned Global Address Space (PGAS) parallel programming models. Unlike MPI, CAF and UPC are not based on a subroutine library call API.

Both MPI and the PGAS approach to parallel programming rely on a Single Program Multiple Data (SPMD) model. In the SPMD parallel programming model, identical collaborating programs (with fully separate memory spaces, or program images) are executed by different processors that may or may not be separated by a network. Each processor-program produces different parts of the result in parallel by working on different data and taking conditionally different paths through the same code. The PGAS approach differs from MPI in that it abstracts away as much as possible, reducing the way that communication is expressed to minimal built-in extensions to the base language, in our case C and Fortran. In large part, CAF and UPC are free of extension-related, explicit library calls. With the underlying communication layer abstracted away, PGAS languages appear to provide a singular, global memory space spanning its processes.

In addition, communication among processes in a PGAS program is one-sided in the sense that any process can read and/or write into the memory of any other process without informing it of its actions. Such one-sided communication has the advantage of being economical, lowering the latency (first byte delay) that is part of the cost of communication among different parallel processes. Lower latency parallel programs are generally more scalable because they waste less time in communication, especially when the data to be moved are small in size, in finer-grained communication patterns.

  • An Example CoArray Fortran (CAF) Code
  • Submitting CoArray Fortran Parallel Programs Using PBS
  • An Example Unified Parallel C (UPC) Code
  • Submitting UPC Parallel Programs Using PBS

For details and instructions on CoArray Fortran and Unified Parallel C (PGAS) Program Compilation and PBS Job Submission and the items listed above. Click Here.

Available Mathematical Libraries

  • FFTW Scientific Library
  • GNU Scientific Library
  • MKL
  • IMSL
    • Fortran Example
    • C Example

For details on Available Mathematical Libraries. Click Here.

Training Courses

The CUNY HPCC provides training course and organizes seminars on various HPC topics. The training courses are provided at no cost and may be held at any CUNY campus site, the CUNY HPCC at the College of Staten Island, or the Graduate Center.

For more information on attending a course, please visit http://www.csi.cuny.edu/cunyhpc/events.html
For information about having a course scheduled, please send an email to hpchelp@mail.csi.cuny.edu

UPCOMING WORKSHOPS:
High Performance Computing on CUNY’s Cray XE6 at the College of Staten Island - 11 March 2011 - 9AM-5PM
High Performance Computing on CUNY’s Cray XE6 at Baruch College - 16-17 Feb 2011 - CLOSED FOR REGISTRATION

To register for a seat, fill out an application at http://www.csi.cuny.edu/cunyhpc/events.html


The curriculum for a typical 2 1/2 day course in parallel programming using the Message Passing Interface Library (MPI) is provided below. The course is typically given as a workshop with hands-on exercises. It is expect that attendees know UNIX (or one of its variants) and either C or FORTRAN.


DAY 1 (Half day; 1:00 PM to 5:00 PM)

    Overview of computer architectures
        Distribution of class materials
        Serial computers
        Vector processors
        Symmetric Multi-processors
        Parallel computers
            Single Instruction Multiple Data
            Multiple Instruction Multiple Data
        Heterogeneous computing with general purpose graphical processing units

    The City University of New York High Performance Computing Initiative
        Why HPC?
        Installed systems
        Future plans

    Getting familiar with the systems
        Account set-up
        Logging on
        Running a sample job

DAY 2  (Full day; 9:00 AM to 5:00 PM)

        Introduction to MPI
	  MPI point-to-point communications
          Collectives
          Blocking sends and receives
          Non-blocking sends and receives
          Testing for completion
        Hands on exercises

DAY 3  (Full day; 9:00 AM to 5:00 PM)

        MPI collectives
	     Gather/scatter
           All-to-all
           Performance notes
        OpenMP
           What is OpenMP
           Compiler Directives
           Conditional Compilation
           Environmental Variables
           OpenMP Performance
        Parallel Programming Futures
	  Hands on exercises

User Accounts

Applying for a HPCC Account

Only CUNY faculty, research staff, their collaborators at other universities and their public and private sector partners, and currently enrolled CUNY students (who MUST have a faculty sponsor) are allowed to use the CUNY HPCC systems. Applications for accounts are accepted at any time, but accounts expire on 30 September and must be renewed before then.

A CUNY HPCC account is required to log into the HPCC systems. Faculty, staff or students at CUNY may apply for a HPCC account by following this link: (http://www.csi.cuny.edu/cunyhpc/Accounts.html).

Please be sure to complete all parts of the application including information on publications, funded projects, and resources required. With regard to the latter, please indicated the number of processor hours the are required for the academic year. For example, if you expect that you will submit 30 jobs per week, each using 16 processors, and each running, for 2 hours, then you requirement is for 49,920 processor hours (30 jobs * 52 weeks *16 processors * 2 hours).

By applying for and obtaining an account, the user agrees to comply with the CUNY Acceptable Use Policy, the HPCC User Account and Password Policy, and to include a Citation regarding use of the CUNY HPC resources.

Acceptable Use Policy

Use of the computing resources at the HPCC is governed by the CUNY Acceptable Use Policy (AUP). The AUP is documented at

http://portal.cuny.edu/cms/id/cuny/documents/level_3_page/001171.htm and http://www.csi.cuny.edu/privacy/index.html

Citations

Users of the CUNY HPC systems must include the following citation on any publication or presentation that includes results or is based on work using CUNY HPC resources:

"This research was supported, in part, by a grant of computer time from the City University of New York 
 High Performance Computing Center under NSF Grants CNS-0855217, CNS-0958379 and ACI-1126113."

Renewal applications should include a list of publications or presentation that resulted from the use of the CUNY HPC resources as future grants of time will be based, in part, on past research accomplishments.

Users are request to sent a copy of the publication or presentation to the Center either electronically (hpchelp@mail.csi.cuny.edu) or by mail to CUNY HPC, Building 1M-206, College of Staten Island, 2800 Victory Boulevard, Staten Island, NY 10314.

User Account and Password Policy

A user account is issued to an individual user. Accounts are not to be shared.

By default all users have access to NEPTUNE, BOB, ZEUS, KARLE and ANDY. Access to SALK is granted by request only. The default disk storage for a general account is 50GB on each system.

Users are responsible for protecting their passwords. Passwords are not to be shared.

When an account is opened, the user will receive a one use only password sent by mail to his university mailing address. The user, upon receiving the one use password should log onto the HPCC systems and change the password. If the password is not changed within 30 days of issuance, it will be expired.

The new password must conform to the CUNY password policy, which requires that it be at least eight (8) characters long, include at least one capitalized letter, one numerical character, and one of the following special characters:

 ! @ # $ % & * = + ) ( 

Passwords are good for 92 days. You will receive a notice two weeks before the end of the 92 day period, requesting that you change your passwords. If you do not change your passwords, your accounts will be locked and the password will need to be reset.

How to change password

The command to change a password is "passwd". An example of its use follows:

[user.name@bob ~]$ passwd
Changing password for user user.name.
Changing password for user.name
(current) UNIX password: old_password
New UNIX password: new_password
Retype new UNIX password: new_password
passwd: all authentication tokens updated successfully.
[user.name@bob ~]$ 

Groups

All users belong to a group.
To locate your group(s), use the following command:

groups

To share files within a group. 1. Set group ownership to the file

chgrp groupname filename

2. Set the file permissions; to read, write

chmod g+r filename
chmod g+w filename

Logging in to HPCC

  • Logging from windows machine
  • Login from Unix
  • X11 Forwarding or Tunneling
    • UNIX clients
    • WINDOWS clients

Click here for Instructions on how to login to the CUNY HPCC systems.

Data transfers from and to HPCC systems

There are two methods of transferring data between the CUNY HPC systems and the rest of the world, for example a user’s desktop. SCP/SFTP and Globus Online.

SCP/SFTP

The first method is using SSH-based file transfer protocols, such as the Secure Copy Protocol, or Secure FTP. Since these protocols require SSH to be available, for users outside the CSI campus, this type of transfer is limited to connections to neptune.csi.cuny.edu.

If the data needs to arrive to or come from another CUNY HPCC system such as Andy or Salk, a two-step transfer will be required.

GNU/Linux and Mac users may use the scp or the sftp commands to copy files between their desktop and neptune. For Windows users the recommended client is WinSCP with the SCP protocol enabled.

SCP and SFTP are appropriate for transferring small numbers of small files: a good rule of thumb is, fewer than 100 files, or less than 1 Gigabyte total.

For larger-scale transfers, we recommend using Globus Online.

Globus Online

Globus Online is the second supported data transfer method. At the moment, only Andy and Salk support it.

Globus Online (GO) is a software-as-a-service approach to large file transfer providing end-users with a browser interface to initiate data transfers between ‘endpoints’ registered with the Globus Alliance. An ‘endpoint’ is one of the two file transfer locations – either the source or the destination – where data is being moved. Once a user creates a GO account, data can be moved using the browser interface between ‘endpoints’ via the "manage data" menu.

Globus Online offers a software client called Globus Connect, which enables users move files directly and easily to and from their laptop or desktop computer and CUNY HPCC. Globus Online manages the transfer, monitors its performance, recovers from faults automatically where possible, and reports the status.

To get started with Globus Online, please read the following guide: https://www.globusonline.org/quickstart

Then, please sign up for a Globus Online account: https://www.globusonline.org/SignUp

To set up your laptop or desktop computer to use Globus Online for transferring, install Globus Connect (the client) as described in the following document: https://www.globusonline.org/globus_connect

The above link gives instructions on how to download and install the client on your local system, and lets you define your system as an endpoint. During the installation, you will be asked to give a name to the endpoint that represents your system, for example, nikos#mydesktop. There is also a CUNY HPC Center overview here [2].

To initiate a file transfer, start the Globus Connect application, sign in with your Globus Online account and select ‘Start Transfer’. The Globus Online browser interface will appear, and your local system should now show up as an endpoint that can be used for transferring files.

CUNY HPCC has an endpoint registered in the Globus Online ecosystem, called "cunyhpc#cea" which gives you access to Salk's and Andy's filesystems. You can now setup a transfer between your own endpoint and cunyhpc#cea. You will be prompted for a username and a password to access cunyhpc#cea. Use your CUNY HPCC login and password to connect. Your Salk files will be under /cunyhpc/salk/home/{your CUNY HPCC username}. Similarly, files from Andy are accessible under /cunyhpc/andy/home/{your CUNY HPCC username}.

Basic Unix/Linux Commands

  • UNIX Tutorial
  • vi Usage
  • Starting vi
  • Moving the cursor
  • Delete, Undo
  • Input/Editing
  • tar and gzip/bzip2
    • tar
    • gzip
    • bzip2


Basic Linux and Unix commands can be found here.

HPC Center Discouraged and Forbidden Practices

In order to provide secure and efficient operation of the HPC Center resources for all users, certain practices are discouraged and/or forbidden on HPC Center systems.

Allowing Someone Else to Login and Use Your Account:

The CUNY IT security policy forbids this. Users are assumed to be responsible for all the usage and activities on account in their name. Providing access to another party undermines this assumption and puts the account owner at risk for the unsupervised actions of another party. Users violating this policy may lose their accounts and/or be denied access to their files.

Running Long Compute, Memory, and/or IO Intensive Processes on System Login Nodes:

The Login Node resources are not intended to be used for computation beyond that required to compile, link, and organize work for batch job submission to the compute nodes. It is the Compute Nodes where computationally intensive work is expected to be run. There may be occasions when a user anticipates that a required activity on the Login Node will consume more than a typical small fraction of its resources (a large file transfer for instance). HPC Center staff should be informed through 'hpchelp@csi.cuny.edu' in advance of such activity. In general, users running processes that consume large fractions of the Login Node's compute or other resources will have those processes killed. Repeat offenders may have their accounts closed temporarily or even permanently.

Running Monitoring Scripts or Process in Tight Loops that Fill Login Node Logfiles:

While it is natural for users to wish to keep track of their jobs and the availability of resources on HPC Center systems, doing so by running iterative processes or scripts that can potentially fill up system log files is forbidden. An example of this would be running 'watch' along with 'qstat' in a tight one second loop for an extended period of time. This fills up log file space in system directories. Such processes will be killed.

Leaving Your Application or Executable's Unix Output and Error Files Undefined in a PBS Script:

While programs that produce small output files are not a problem, one cannot anticipate what might be printed to your Unix standard error file in the event of a problem with your application. When left undefined in a PBS script, Unix standard output and error are written to the smallish system partition where PBS is data is stored. When this becomes 100% filled, PBS becomes non-responsive to new job submissions. Please always include Unix redirection at the end of the executable line(s) in your PBS scripts similar to the following:

.
.
. PBS script 
.
.
mpirun -np 4 -machinefile $PBS_NODEFILE mbbest ./bglobin.nex  > best_mpi.out 2>&1
.
.

Here the string " > best_mpi.out 2>&1 " names an output file (1, or best_mpi.out here) and merges it with the Unix standard error file (2). These files will then be written into the working directory for your job rather than the PBS data directory. This location will typically have much more disk space.

Modules, Managing Your CUNY HPC Center Environment

Modules is a software package that provides for the fast and convenient management of the components of a user's environment via modulefiles. When executed by the module command each module file fully configures the environment for its associated application or application group. The modules configuration language allows for the management of applications environment conflicts and dependencies as well. The modules software allows users to load (and unload and reload) an application and/or system environment that is specific to their needs and avoids the need to set and manage a large, one-size-fits-all, generic environment for everyone at login.

Modules has been the default approach to managing the user applications environment on SALK, the CUNY HPC Center Cray, since its installation in 2011. By the end of 2012, all non-legacy and future systems at the CUNY HPC Center will use modules to manage the user environment instead of generic environmental initialization files stored in /etc/profile.d. The only system that will need to transition from this older approach to the all-modules approach will be ANDY. All new systems, such as Penzias and the new SGI UV2, will come up as modules-based when they are ready for production use. The legacy systems, BOB and ZEUS, currently used almost entirely for Gaussian jobs will NOT be reconfigured with the modules software. Module version 3.2.6 is installed on SALK, and version 3.2.9 will be the default on the other HPC Center systems.

  • Modules, Learning by Example
    • Example 1, Basic Non-Cray System
    • Example 2, Less Basic From SALK (Cray System)

For information on Modules and Managing Your CUNY HPC Center Environment click here.

PBS Pro 11.0, Job Submission, and Queues

  • PBS Pro Design and the Cluster Paradigm
  • PBS Pro Job Submission Modes
  • Running Batch Jobs with PBS Pro
    • Submitting Serial (Scalar) Jobs
    • Submitting OpenMP Symmetric Multiprocessing (SMP) Parallel Jobs
    • Submitting MPI Distributed Memory Parallel Jobs
    • Submitting GPU-Accelerated Data Parallel Jobs
    • Submitting 'Interactive' Batch Jobs
    • More on PBS Pro resource 'chunks' and the '-l select' Option
  • CUNY HPC Center PBS Pro Queue Structure
    • CUNY HPC Center PBS Pro Routing Queues
    • CUNY HPC Center PBS Pro Execution Queues

For details and instructions on PBS Pro 11.0, Job Submission, Queues the items listed above. Click Here.

User Applications Currently Supported

  • ADCIRC
  • ADFAmsterdamDensityFunctionalTheory
  • AMBERAssistedModelBuildingwithEnergyRefinement
  • AUTODOCK
  • BAMOVA
  • BAYESCAN
  • BEAST
  • BEST
  • BOWTIE2
  • BPP2
  • BROWNIE
  • CONSED
  • CP2K
  • CUFFLINKS
  • DL_POLY
  • GARLI
  • GAUSS
  • Gaussian09 and Gaussian03
  • Gnuplot
  • GENOMEPOP2
  • GROMACS
  • HOOMD
  • IMa2
  • HONDOPLUS
  • LAMARC
  • LAMMPS
  • LS-DYNA
  • MATHEMATICA
  • MATLAB
  • Migrate
  • MRBAYES
  • msABC
  • MSMS
  • NAMD
  • NetworkSimulator-2(NS2)
  • NWChem
  • Octopus
  • POPABC
  • PHOENICS
  • PHRAP-PHRED
  • Python
  • R
  • RAXML
  • SAGE
  • SAMTOOLS
  • SAS
  • Stata/MP
  • Structurama
  • Structure
  • TOPHAT
  • ThrustLibrary(CUDA)
  • VMD
  • WRF
  • MET(ModelEvaluationTools)

To view detailed information on each of these applications, Click Here.

Some Applications In Depth

Mathematica

  • General notes
    • Modes of Operation in Mathematica
    • Selecting Between GUI and Command-Line Mode
    • A Note on Fonts on Unix and Linux Systems
  • Using Mathematica on KARLE
    • Serial Job Exmaple
    • Parallel Job Example
  • Submitting Batch Jobs to the CUNY ANDY Cluster
    • Serial Batch Jobs Run with 'qsub' Using a Mathematica Command (Text) File
    • SMP-Parallel Batch Jobs Run with 'qsub' Using a Mathematica Command (Text) File
    • Submitting Batch Jobs from Remote Locations to CUNY's ANDY Cluster

For details and instructions on Mathematica and the items listed above. Click Here.

MATLAB

The MATLAB high-performance language for technical computing integrates computation, visualization, and programming in an easy-to-use environment where problems and solutions are expressed in familiar mathematical notation. Typical uses include:

Math and computation
Algorithm development
Data acquisition
Modeling, simulation, and prototyping
Data analysis, exploration, and visualization
Scientific and engineering graphics
Application development, including graphical user interface building


  • Starting MATLAB in GUI or CLI Mode on KARLE
  • Modes of Operation: Local versus Remote (Batch)
  • Modes of Operation: Serial versus Parallel
    • Computing PI Serially on KARLE
    • Computing PI Using Loop-Local Parallelism on KARLE
    • Compute PI using SPMD Parallelism on KARLE
  • Running Remote Parallel Jobs on BOB or ANDY
  • Licensing requirements for client-to-cluster job submission
  • Setting up the client and cluster environment for remote execution
    • Computing PI Serially Remotely on ANDY or BOB
    • Computing PI In Parallel Remotely on ANDY or BOB
    • Computing Remotely on ANDY Using MATLAB's GPU capability
    • Other Examples of MATLAB CPU Parallel Job Submission
    • An Outline of the Major Steps Involved in Remote Job Submission


For details and instructions on MATLAB and the items listed above. Click Here.


Contact

If you have any questions about the CUNY HPCC, please contact the CUNY HPCC Helpline at HPCHelp@mail.csi.cuny.edu

Other Links

Personal tools