Introduction to the City University of New York High Performance Computing Center
The City University of New York (CUNY) High Performance Computing Center (HPCC) is located on the campus of the College of Staten Island, 2800 Victory Boulevard, Staten Island, New York 10314. HPCC goals are to:
- Support the scientific computing needs of CUNY faculty, their collaborators at other universities, and their public and private sector partners, and CUNY students and research staff.
- Create opportunities for the CUNY research community to develop new partnerships with the government and private sectors; and
- Leverage the HPC Center capabilities to acquire additional research resources for its faculty and graduate students in existing and major new programs.
Please send comments on or corrections to the wiki to firstname.lastname@example.org
The HPCC currently operates seven significant systems. The following table summarizes the characteristics of these systems; additional information is provided below the table.
Andy. Andy (andy.csi.cuny.edu) is named in honor of Dr. Andrew S. Grove, an alumnus of the City College of New York and one of the founders of the Intel Corporation (http://educationupdate.com/archives/2005/Dec/html/col-ccnypres.htm) . Andy is composed of two distinct computational halves serviced by a single head node and several service nodes. The first and older half (Andy1) is an SGI ICE system (http://www.sgi.com/products/servers/altix/ice/) with 45 dual-socket, compute nodes each with Intel 2.93 GHz quad-core Intel Core 7 (Nehalem) processors providing a total of 360 compute cores. Each compute node has 24 Gbytes of memory or 3 Gbytes of memory per core. Andy1's interconnect network is a dual rail, DDR Infiniband (20 Gbit/second) network in which one rail is used to access Andy's Lustre storage system and the other is used for inter-processor communication. The second and newer half (Andy2) is a cluster of 48 SGI x340 1U compute nodes (each configured similarly to those on Andy1 to give it 384 cores). Andy2's interconnect is a single rail QDR Infiniband (40 Gbit/second) network serving both its communication network and Lustre storage system. Both Andy1 and Andy2 (360 + 384 == 744 cores) are served by the same head node and home directory, which is a Lustre parallel file system with 24 Tbytes of useable storage.
Bob. Bob (bob.csi.cuny.edu) is named in honor of Dr. Robert E. Kahn, an alumnus of the City College of New York who, along with Vinton G. Cerf, invented the TCP/IP protocol, the technology used to transmit information over the modern Internet (http://www.economicexpert.com/a/Robert:E:Kahn.htm). Bob is a Dell PowerEdge system consisting of one head node with two sockets of AMD Shanghai native quad-core processors running at 2.3 GHz and twenty-nine compute nodes of the same type providing a total of 30 x 8 = 240 cores. Each compute node has 16 Gbytes of memory or 2 Gbytes of memory per core. Bob has both a standard 1 Gbit Ethernet interconnect and a low-latency, SDR Infiniband (10 Gbit/second) interconnect. Bob is currently largely dedicated to running the Gaussian suite of computation chemistry programs.
Chizen. Chizen (chizen.csi.cuny.edu) functions as a generic gateway or interface system for users not on the csi.cuny.edu local area network. Chizen can be addressed using the secure shell command ssh (ssh [-X] chizen.csi.cuny.edu). Chizen is only used as a secure jumping-off point to access other HPCC systems. HPC work loads should NOT be run on Chizen which has limited memory and compute power. Work found running on Chizen will be killed. This also applies in general to the head nodes of all the CUNY systems.
Karle. Karle (karle.csi.cuny.edu) is named in honor of Dr. Jerome Karle, an alumnus of the City College of New York who was awarded the Nobel Prize in Chemistry in 1985, jointly with Herbert A. Hauptman, for the direct analysis of crystal structures using X-ray scattering techniques. Karle functions both as a gateway and interface system to run MATLAB, SAS, MATHEMATICA and other GUI-oriented applications for CUNY users both within and outside the local area network at the College of Staten Island where the CUNY HPC Center is located. Karle can be used to run such computations (in serial or parallel) locally and directly on Karle, or to submit batch work over the network to the clusters Bob or "Andy" described above. As a single, four socket, 4 x 6 = 24 core head-like node, Karle is a highly capable system. Karle's 24 Intel E740-based cores run at 2.4 GHz. Karle has a total of 96 Gbytes of memory or 4 Gbytes per core. Account allocation on Karle will be limited to those requiring access to the GUI-oriented applications it is intended to run.
Penzias. Penzias, named after Arno Penzias, a CUNY alumnus and Nobel Laureate in Physics. Penzias is a Dell R720 system consisting of dual head nodes and 72 compute nodes. Each compute node has two sockets of Intel E5-2660 2.2 GHz chips with 16 cores per node. It has a total of 1172 cores available for user computations. The cores each have 4 Gbytes of memory (16 cores with a total of 64 Gbytes to a node). The interconnect network is FDR. This system also has 144 NVIDIA Kepler K20 GPUs.
Salk. Salk (salk.csi.cuny.edu) is named in honor of Dr. Jonas Salk, also an alumnus of the City College of New York and creator of the first vaccine for Polio (http://en.wikipedia.org/wiki/Jonas_Salk#College). Salk is a two-cabinet, Cray XE6m system interconnected with Cray's latest, custom, high-speed, Gemini interconnect. Salk consists of 176 dual-socket compute nodes each containing two 8-core AMD Magny-Cours processors running at 2.3 GHz for a total of 16 cores per node. This gives the system a total of 2816 cores for the production processing of CUNY's HPC applications. Each node has a total of 32 Gbytes of memory or 2 Gbytes of memory per core. Salk's Gemini interconnect is a high-bandwidth, low-latency, high-message-injection rate interconnect supported by a custom ASIC and low-level communications protocol developed by Cray. Unlike the other clusters at the CUNY HPC Center which are connected in a multi-tiered switch topology, the Cray XE6m nodes supported by Gemini are laid out in a 2D torus network. Salk is intended to run jobs of a larger scale than the other CUNY HPC Center systems. Jobs smaller that 16 cores are not allowed on SALK while jobs of 768 cores and larger are. In addition, SALK, through its Gemini interconnect and compilers, support the Partitioned Global Address Space languages, CoArray Fortran and Unified Parallel C. These languages make programming large, distributed-memory parallel systems easier and more scalable.
In addition to the above, the HPC Center is installing a new, centralized Storage System and Network. The Storage System will provide an order of magnitude of additional on-line storage capacity for home directories and project space that is directly accessible (although not directly controlled) from any of the HPC Center's installed systems, and include a large, remote tape archival facility.
The remote tape silo will allow for daily incremental backups, full weekly and monthly backups, and long-term retention of critical research data. An iRODS server will be integrated into the environment and will provide a mechanism for the user community to share data.
- The acquisition of the Storage Network is allowing us to transform the environment from a “server centric” to a “data centric” environment.
- At the present time, each system has its own file system for scratch, home directories, and project files.
- The Storage Network will support home directories and project files. This benefits the user in that all files are now in one place accessible from any system. In addition, old servers can be retired and new servers installed without impacting user data.
- Local system disk will be used only for scratch space.
- Offsite storage will be provided for home directories and project files.
- A data transfer node will provide for interconnectivity to instrumentation connected to science DMZs.
- An iRODS server will be provided to support the management of research data.
The HPC Center works to maintain a certain amount of uniformity in its software stack, especially at the user and application level. In general, we have standardized on OpenMPI as our MPI implementation, although vendor versions from Cray and SGI are available (on the SALK the Cray version on MPI is the default). While we support the Intel, PGI, and GNU compilers, we have made the Intel compiler suite the default on all systems, except SALK. Moving down the stack to the operating systems, we are a Linux shop although there is some variation in the flavor on Linux supported on each system dictated by the vendor. As such, on PENZIAS and BOB, which are Commodity Off-The-Self (COTS) clusters from Dell, we support CentOS which is part of the Rocks release. The operating system running on ANDY is SLES 11 updated with SGI ProPack SP1 support package. The operating system on SALK's, Cray Linux Environment 3.1 (CLE 3.1), is based on SLES 11. The queuing system in use on all CUNY HPC Center systems is PBS Pro 11 with a queue design that is as identical as possible across the systems. The user application software stack supported on all systems includes the following compilers and parallel library software. Much more detail on each can be found below.
- GNU C, C++ and Fortran compilers;
- Portland Group, Inc. optimizing C, C++, and Fortran compilers with CUDA and GPU support;
- The Intel Cluster Studio including the Intel C, C++ and Fortran compilers, Math and Kernel Library;
- OpenMPI 1.5.5 (Cray's custom MPICH on SALK, SGI's proprietary MPT on ANDY, and Intel's MPI are also available)
SALK, the Cray XE6m system, uses is own proprietary MPI library based on the API to its Gemini interconnect. Cray also provides its own C, C++, and Fortran Compilers which support the Partitioned Global Address Space parallel programming models, Unified Parallel C (UPC) and CoArray Fortran (CAF) respectively.
Hours of Operation
The second and fourth Tuesday mornings in the month from 8:00AM to 12PM are normally reserved (but not always used) for scheduled maintenance. Please plan accordingly. Unplanned maintenance to remedy system related problems may be scheduled as needed. Reasonable attempts will be made to inform users running on those systems when these needs arise.
Users are encouraged to read this Wiki carefully. In particular, the sections on compiling and running parallel programs, and the section on the PBS Pro batch queueing system will give you the essential knowledge needed to use the CUNY HPC Center systems. We have strived to maintain the most uniform user applications environment possible across the Center's systems to ease the transfer of applications and run scripts among them. Still, there are some differences, particularly with the SGI (ANDY) and Cray (SALK) systems.
The CUNY HPC Center staff, along with outside vendors, offer regular courses and workshops to the CUNY community in parallel programming techniques, HPC computing architecture, and the essentials of using our systems. Please follow our mailings on the subject and feel free to inquire about such courses. We regularly schedule training visits and classes at the various CUNY campuses. Please let us know if such a training visit is of interest. In the past, topics have include an overview of parallel programming, GPU programming and architecture, using the evolutionary biology software at the HPC Center, the PBS queueing system at the CUNY HPC Center, Mixed GPU-MPI and OpenMP programming, etc. Staff has also presented guest lectures at formal classes throughout the CUNY campuses.
Users with further questions or requiring immediate assistance in use of the systems should send an email to:
Mail to this address is received by the entire CUNY HPC Center support staff. This ensures that the person on staff with the most appropriate skill set and job related responsibility will respond to your questions. During the business week you should expect a same-day response. During the weekend you may or may not get same-day response depending on what staff are reading email that weekend. Please send all technical and administrative questions (including replies) to this address.
Please do not send questions to individual CUNY HPC Center staff members directly. Send questions to: email@example.com
These will be returned to the sender with a polite request to send them to 'hpchelp'. This applies to replies to initial questions as well as those initial questions.
The CUNY HPC Center staff are focused on providing high quality support to its user community, but compared to other HPC Centers of similar size our staff is lean. Please make full use of the tools that we have provided (especially the Wiki), and feel free to offer suggestions for improved service. We hope and expect your experience in using our systems will be predictably good and productive.
Data storage, retention/deletion, and back-ups
Each user account, upon creation, is provided a home directory (currently on each system) with a default 50 GB storage ceiling or disk quota. A user may request an increase in the size of their home directory if there is a special need. The HPC Center will endeavor to satisfy reasonable requests, but storage is not unlimited and full file systems (especially large files) make backing up the system more difficult. Please regularly remove unwanted files and directories to minimize this burden and avoid keeping duplicate copies in multiple locations. File transfer among the HPC Center systems is very fast. Furthermore, occasionally HPC Center users have thought that HPC Center disks could be used to 'park' or archive data that was locally generated at their site on our HPC Center systems. This practice strictly forbidden.
By the end of March 2014, the HPC Center will have completed upgrading its storage system and network architecture. This will create a central hub, home directory storage location for all systems of over 1 PByte is size with tape backup and high-speed local script space on each system. Look for these changes here and in HPC Center mailings.
An incremental backup of user home directories on Andy, Salk, Karle, Bob is performed daily. These backups are retained for three weeks. Full backups are performed weekly and are retained for two months. These backups are stored in a remote location. A full backup is read off tape, bi-monthly, and verified (to ensure backups are readable and restorable).
The following user and system files are backed up:
Retention/Deletion of Home Directories
For active accounts, current Home Directories are retained indefinitely. If a user account is inactive for one year, the HPCC will attempt to contact the user and request that the data be removed from the system. If there is no response from the user within three months of the initial notice, or if the user cannot be reached, the Home Directory will be purged.
System temporary/scratch directories
Files on system temporary and scratch directories, as well as home directories on Neptune are not backed up. There is no provision for retaining data stored in these directories.
Data storage infrastructure
CUNY HPC Center provides 3-level data storage infrastructure: - HOME filesystem - SCRATCH filesystems - SR1 (long-tern storage resource)
HOME and SR1 are shared filesystems. This means that they are accessible from any of the HPCC's machines. SCRATCH filesystems are local -- files stored on one machine are not visible from another one (for example, files stored in Andy's scratch are not accessible from Penzias and vice versa).
By default users have access to HOME and SCRATCH. Allocation on SR1 is grated upon request.
- HOME is limited to 50GB of available space. Users demanding bigger storage capabilities need to request an SR1 allocation. HOME is persistent (files are not deleted by HPCC's staff) and backed up. NOTE: files stored in HOME are not visible on the compute nodes.
- SCRATCH has no quota. SCRATCH is used to run computational jobs that write large intermidiate temp files. SCRATCH is not persistent (file can be deleted by HPCC's staff) and is not backed up.
- SR1 is a long term storage resource that is designed to facilitate users requiring to store datasets larger then 50GB. Space in SR1 is allocated upon user's request. SR1 is persistent (files are not deleted by HPCC's staff) and backed up. Access to SR1 is available via iRODS interface. NOTE: files stored in SR1 are not visible on the compute nodes.
Typical workflows in are described below:
1. Copy files from HOME or from SR1 to SCRATCH.
If working with HOME:
cd /scratch/user.name mkdir myPBS_Job && cd myPBS_Job cp /home/user.name/myProject/a.out ./ cp /home/user.name/myProject/mydatafile ./
If working with SR1:
cd /scratch/user.name mkdir myPBS_Job && cd myPBS_Job iget myProject/a.out iget myProject/mydatafile
2. Prepare PBS job script. Typical PBS sript is similar to the following:
#!/bin/bash #PBS -q production #PBS -N test #PBS -l select=8:ncpus=1 #PBS -l place=free #PBS -V echo "Starting…" cd $PBS_O_WORKDIR mpirun -np 4 ./a.out ./mydatafile > myoutputs echo "Done…"
Your PBS may be different depending on your needs. Read section Submitting Jobs for a reference.
3. Run the job
4. Once job is finished, clean up SCRATCH and store outputs in HOME or SR1.
If working with HOME:
mv ./myoutputs /home/user.name/myProject/. cd ../ rm -rf myPBS_Job
If working with SR1:
iput ./myoutputs myProject/. cd ../ rm -rf myPBS_Job
5. If output files are stored in SR1 tag them with metadata.
imeta addw -d myoutput zvalue 15 meters imeta addw -d myoutput colorLabel RED
iRODS is the integrated Rule-Oriented Data-management System, a community-driven, open source, data grid software solution. iRODS is designed to abstract data services from data storage hardware and provide users with hardware-agnostic way to manipulate data.
iRODS is a primary tool that is used by the CUNY HPCC users to seamlessly access 1PB storage resource (further referenced as 'SR1' here) from any of the HPCC's computational systems.
Access to SR1 is provided via so-called i-commands:
Comrehesive list of i-commands with detailed description can be obtained at iRODS wiki.
To obtain quick help on any of the commads while being logged into any of the HPCC's machines type 'i-command -h'. For example:
Following is the list of some of the most relevant i-commands:
iinit -- Initialize session and store your password in a scrambled form for automatic use by other icommands.
iput -- Store a file
iget -- Get a file
imkdir -- Like mkdir, make an iRODS collection (similar to a directory or Windows folder)
ichmod -- Like chmod, allow (or later restrict) access to your data objects by other users.
icp -- Like cp or rcp, copy an iRODS data object
irm -- Like rm, remove an iRODS data object
ils -- Like ls, list iRODS data objects (files) and collections (directories)
ipwd -- Like pwd, print the iRODS current working directory
icd -- Like cd, change the iRODS current working directory
ichksum -- Checksum one or more data-object or collection from iRODS space.
imv -- Moves/renames an irods data-object or collection.
irmtrash -- Remove one or more data-object or collection from a RODS trash bin.
imeta -- Add, remove, list, or query user-defined Attribute-Value-Unit triplets metadata
iquest -- Query (pose a question to) the ICAT, via a SQL-like interface
Before using any of the i-commands users need to identify themselves to the iRODS server running command
and providing HPCC's password.
Typical workflow that involves operations on files stored in SR1 include storing/getting data to and from SR1, tagging data with metadata, searching for data, sharing (setting permissions).
Storing data to SR1
1. Create iRODS directory (aka 'collection'):
# imkdir myProject
2. Store all files 'myfile*' into this directory (collection):
# iput -r myfile* myProject/.
3. Verify that files are stored:
# ils /cunyZone/home/user.name: C- /cunyZone/home/user.name/myProject # ils myProject /cunyZone/home/user.name/myProject: myfile1 myfile2 myfile3
Symbol 'C-' in the beginning of output of 'ils' shows that listed item is a collection.
4. Combining 'ils', 'imkdir', 'iput', 'icp', 'ipwd', 'imv' user can create iRODS directories and store files in them similarly to what is normally done with UNIX commands 'ls', 'mkdir', 'cp', 'pwd', 'mv' etc...
Getting data from SR1
1. To copy file from SR1 to current working directory run
# iget myProject/myfile1
2. Now listing current working directory should reveal 'myfile1':
# ls myfile1
3. Instead of individual files the whole directory (with sub-directories) can be copied with '-r' flag (stands for 'recursive')
# iget -r myProject
NOTE: wildcards are not supported, therefore something like
"iget myProject/myfile*" will not work
Tagging data with metadata
iRODS provides users with extremely powerful mechanism of managing data with metadata. While working with large datasets it's sometimes easy to forget what is stored in this or the other file. Metadata tags help organizing data in a very easy and reliable manner.
Let's tag files from previous example with some metadata:
# imeta add -d myProject/myfile1 zvalue 15 meters AVU added to 1 data-objects # imeta add -d myProject/myfile1 colorLabel RED AVU added to 1 data-objects # imeta add -d myProject/myfile1 comment "This is file number 1" AVU added to 1 data-objects # imeta add -d myProject/myfile2 zvalue 10 meters AVU added to 1 data-objects # imeta add -d myProject/myfile2 colorLabel RED AVU added to 1 data-objects # imeta add -d myProject/myfile2 comment "This is file number 2" AVU added to 1 data-objects # imeta add -d myProject/myfile3 zvalue 15 meters AVU added to 1 data-objects # imeta add -d myProject/myfile3 colorLabel BLUE AVU added to 1 data-objects # imeta add -d myProject/myfile3 comment "This is file number 3" AVU added to 1 data-objects
Here we've tagged myfile1 with 3 metadata labels:
- zvalue 10 meters
- colorlabel RED
- comment "This is file number 1"
Similar tags were added to 'myfile2' and 'myfile3'
Metadata come in form of AVU -- Attribute|Value|Unit. As seen from the above examples Unit is not necessary.
Let's list all metadata assigned to file 'myfie1':
# imeta ls -d myProject/myfile1 AVUs defined for dataObj myProject/myfile1: attribute: zvalue value: 15 units: meters ---- attribute: colorLabel value: RED units: ---- attribute: comment value: This is file number 1 units:
To remove an AVU assigned to a file run:
# imeta rm -d myProject/myfile1 zvalue 15 meters # imeta ls -d myProject/myfile1 AVUs defined for dataObj myProject/myfile1: attribute: colorLabel value: RED units: ---- attribute: comment value: This is file number 1 units: # # # imeta add -d myProject/myfile1 zvalue 15 meters
Metadata may be assigned to directories as well:
# imeta add -C myProject simulationsPool 1 # imeta ls -C myProject AVUs defined for collection myProject: attribute: simulationsPool value: 1 units:
Note the '-C' key that is used instead of '-d'.
Searching for data
Power of metadata becomes obvious when data needs to be found in large collections. Here is an illustration of how easy this task is done with iRODS via imeta queries:
# imeta qu -d zvalue = 15 collection: /cunyZone/home/user.name/myProject dataObj: myfile1 ---- collection: /cunyZone/home/user.name/myProject dataObj: myfile3
We see both files that were tagged with label 'zvalue 10 meters'. Here is different query:
# imeta qu -d colorLabel = RED collection: /cunyZone/home/user.name/myProject dataObj: myfile1 ---- collection: /cunyZone/home/user.name/myProject dataObj: myfile2
Another powerful mechanism to query data is provided with 'iquest'. Following is a number of examples that show 'iquest' capabilities:
iquest "SELECT DATA_NAME, DATA_SIZE WHERE DATA_RESC_NAME like 'cuny%'" iquest "For %-12.12s size is %s" "SELECT DATA_NAME , DATA_SIZE WHERE COLL_NAME = '/cunyZone/home/user.name'" iquest "SELECT COLL_NAME WHERE COLL_NAME like '/cunyZone/home/%' AND USER_NAME like 'user.name'" iquest "User %-6.6s has %-5.5s access to file %s" "SELECT USER_NAME, DATA_ACCESS_NAME, DATA_NAME WHERE COLL_NAME = '/cunyZone/home/user.name'" iquest " %-5.5s access has been given to user %-6.6s for the file %s" "SELECT DATA_ACCESS_NAME, USER_NAME, DATA_NAME WHERE COLL_NAME = '/cunyZone/home/user.name'" iquest no-distinct "select META_DATA_ATTR_NAME" iquest "select COLL_NAME, DATA_NAME WHERE DATA_NAME like 'myfile%'" iquest "User %-9.9s uses %14.14s bytes in %8.8s files in '%s'" "SELECT USER_NAME, sum(DATA_SIZE),count(DATA_NAME),RESC_NAME" iquest "select sum(DATA_SIZE) where COLL_NAME = '/cunyZone/home/user.name'" iquest "select sum(DATA_SIZE) where COLL_NAME like '/cunyZone/home/user.name%'" iquest "select sum(DATA_SIZE), RESC_NAME where COLL_NAME like '/cunyZone/home/user.name%'" iquest "select order_desc(DATA_ID) where COLL_NAME like '/cunyZone/home/user.name%'" iquest "select count(DATA_ID) where COLL_NAME like '/cunyZone/home/user.name%'" iquest "select RESC_NAME where RESC_CLASS_NAME IN ('bundle','archive')" iquest "select DATA_NAME,DATA_SIZE where DATA_SIZE BETWEEN '100000' '100200'"
Access to the data can be controlled via 'ichmod' command. It's behavior is similar to UNIX 'chmod' command. For example if there is a need to provide user 'user.name1' with read access to file 'myProject/myfile1' execute the following command:
ichmod read user.name1 myProject/myfile1
To see who has access to a file/directory use:
# ils -A myProject/myfile1 /cunyZone/home/user.name/myProject/myfile1 ACL - user.name1#cunyZone:read object user.name#cunyZone:own
In the above example user 'user.name1' has read access to the file and user 'user.name' is an owner of the file.
Possible levels of access to a data object are null/read/write/own.