Introduction to the City University of New York High Performance Computing Center
The City University of New York (CUNY) High Performance Computing Center (HPCC) is located on the campus of the College of Staten Island, 2800 Victory Boulevard, Staten Island, New York 10314. HPCC goals are to:
- Support the scientific computing needs of CUNY faculty, their collaborators at other universities, and their public and private sector partners, and CUNY students and research staff.
- Create opportunities for the CUNY research community to develop new partnerships with the government and private sectors; and
- Leverage the HPC Center capabilities to acquire additional research resources for its faculty and graduate students in existing and major new programs.
Please send comments on or corrections to the wiki to firstname.lastname@example.org
The HPC Center works to maintain a certain amount of uniformity in its software stack, especially at the user and application level. In general, we have standardized on OpenMPI as our MPI implementation, although vendor versions from Cray and SGI are available (on the SALK the Cray version on MPI is the default). While we support the Intel, PGI, and GNU compilers, we have made the Intel compiler suite the default on all systems, except SALK. Moving down the stack to the operating systems, we are a Linux shop although there is some variation in the flavor on Linux supported on each system dictated by the vendor. As such, on PENZIAS and BOB, which are Commodity Off-The-Self (COTS) clusters from Dell, we support CentOS which is part of the Rocks release. The operating system running on ANDY is SLES 11 updated with SGI ProPack SP1 support package. The operating system on SALK's, Cray Linux Environment 3.1 (CLE 3.1), is based on SLES 11. The queuing system in use on all CUNY HPC Center systems is PBS Pro 11 with a queue design that is as identical as possible across the systems. The user application software stack supported on all systems includes the following compilers and parallel library software. Much more detail on each can be found below.
- GNU C, C++ and Fortran compilers;
- Portland Group, Inc. optimizing C, C++, and Fortran compilers with CUDA and GPU support;
- The Intel Cluster Studio including the Intel C, C++ and Fortran compilers, Math and Kernel Library;
- OpenMPI 1.5.5 (Cray's custom MPICH on SALK, SGI's proprietary MPT on ANDY, and Intel's MPI are also available)
SALK, the Cray XE6m system, uses is own proprietary MPI library based on the API to its Gemini interconnect. Cray also provides its own C, C++, and Fortran Compilers which support the Partitioned Global Address Space parallel programming models, Unified Parallel C (UPC) and CoArray Fortran (CAF) respectively.
Hours of Operation
The second and fourth Tuesday mornings in the month from 8:00AM to 12PM are normally reserved (but not always used) for scheduled maintenance. Please plan accordingly. Unplanned maintenance to remedy system related problems may be scheduled as needed. Reasonable attempts will be made to inform users running on those systems when these needs arise.
Users are encouraged to read this Wiki carefully. In particular, the sections on compiling and running parallel programs, and the section on the PBS Pro batch queueing system will give you the essential knowledge needed to use the CUNY HPC Center systems. We have strived to maintain the most uniform user applications environment possible across the Center's systems to ease the transfer of applications and run scripts among them. Still, there are some differences, particularly with the SGI (ANDY) and Cray (SALK) systems.
The CUNY HPC Center staff, along with outside vendors, offer regular courses and workshops to the CUNY community in parallel programming techniques, HPC computing architecture, and the essentials of using our systems. Please follow our mailings on the subject and feel free to inquire about such courses. We regularly schedule training visits and classes at the various CUNY campuses. Please let us know if such a training visit is of interest. In the past, topics have include an overview of parallel programming, GPU programming and architecture, using the evolutionary biology software at the HPC Center, the PBS queueing system at the CUNY HPC Center, Mixed GPU-MPI and OpenMP programming, etc. Staff has also presented guest lectures at formal classes throughout the CUNY campuses.
Users with further questions or requiring immediate assistance in use of the systems should send an email to:
Mail to this address is received by the entire CUNY HPC Center support staff. This ensures that the person on staff with the most appropriate skill set and job related responsibility will respond to your questions. During the business week you should expect a same-day response. During the weekend you may or may not get same-day response depending on what staff are reading email that weekend. Please send all technical and administrative questions (including replies) to this address.
Please do not send questions to individual CUNY HPC Center staff members directly. Send questions to: email@example.com
These will be returned to the sender with a polite request to send them to 'hpchelp'. This applies to replies to initial questions as well as those initial questions.
The CUNY HPC Center staff are focused on providing high quality support to its user community, but compared to other HPC Centers of similar size our staff is lean. Please make full use of the tools that we have provided (especially the Wiki), and feel free to offer suggestions for improved service. We hope and expect your experience in using our systems will be predictably good and productive.
Data storage, retention/deletion, and back-ups
Each user account, upon creation, is provided a home directory (currently on each system) with a default 50 GB storage ceiling or disk quota. A user may request an increase in the size of their home directory if there is a special need. The HPC Center will endeavor to satisfy reasonable requests, but storage is not unlimited and full file systems (especially large files) make backing up the system more difficult. Please regularly remove unwanted files and directories to minimize this burden and avoid keeping duplicate copies in multiple locations. File transfer among the HPC Center systems is very fast. Furthermore, occasionally HPC Center users have thought that HPC Center disks could be used to 'park' or archive data that was locally generated at their site on our HPC Center systems. This practice strictly forbidden.
By the end of March 2014, the HPC Center will have completed upgrading its storage system and network architecture. This will create a central hub, home directory storage location for all systems of over 1 PByte is size with tape backup and high-speed local script space on each system. Look for these changes here and in HPC Center mailings.
An incremental backup of user home directories on Andy, Salk, Karle, Bob is performed daily. These backups are retained for three weeks. Full backups are performed weekly and are retained for two months. These backups are stored in a remote location. A full backup is read off tape, bi-monthly, and verified (to ensure backups are readable and restorable).
The following user and system files are backed up:
Retention/Deletion of Home Directories
For active accounts, current Home Directories are retained indefinitely. If a user account is inactive for one year, the HPCC will attempt to contact the user and request that the data be removed from the system. If there is no response from the user within three months of the initial notice, or if the user cannot be reached, the Home Directory will be purged.
System temporary/scratch directories
Files on system temporary and scratch directories, as well as home directories on Chizen are not backed up. There is no provision for retaining data stored in these directories.
Data storage infrastructure
CUNY HPC Center provides 3-level data storage infrastructure: - HOME filesystem - SCRATCH filesystems - SR1 (long-tern storage resource)
HOME and SR1 are shared filesystems. This means that they are accessible from any of the HPCC's machines. SCRATCH filesystems are local -- files stored on one machine are not visible from another one (for example, files stored in Andy's scratch are not accessible from Penzias and vice versa).
By default users have access to HOME and SCRATCH. Allocation on SR1 is grated upon request.
- HOME is limited to 50GB of available space. Users demanding bigger storage capabilities need to request an SR1 allocation. HOME is persistent (files are not deleted by HPCC's staff) and backed up. NOTE: files stored in HOME are not visible on the compute nodes.
- SCRATCH has no quota. SCRATCH is used to run computational jobs that write large intermidiate temp files. SCRATCH is not persistent (file can be deleted by HPCC's staff) and is not backed up.
- SR1 is a long term storage resource that is designed to facilitate users requiring to store datasets larger then 50GB. Space in SR1 is allocated upon user's request. SR1 is persistent (files are not deleted by HPCC's staff) and backed up. Access to SR1 is available via iRODS interface. NOTE: files stored in SR1 are not visible on the compute nodes.
Typical workflows in are described below:
1. Copy files from HOME or from SR1 to SCRATCH.
If working with HOME:
cd /scratch/user.name mkdir myPBS_Job && cd myPBS_Job cp /home/user.name/myProject/a.out ./ cp /home/user.name/myProject/mydatafile ./
If working with SR1:
cd /scratch/user.name mkdir myPBS_Job && cd myPBS_Job iget myProject/a.out iget myProject/mydatafile
2. Prepare PBS job script. Typical PBS sript is similar to the following:
#!/bin/bash #PBS -q production #PBS -N test #PBS -l select=8:ncpus=1 #PBS -l place=free #PBS -V echo "Starting…" cd $PBS_O_WORKDIR mpirun -np 4 ./a.out ./mydatafile > myoutputs echo "Done…"
Your PBS may be different depending on your needs. Read section Submitting Jobs for a reference.
3. Run the job
4. Once job is finished, clean up SCRATCH and store outputs in HOME or SR1.
If working with HOME:
mv ./myoutputs /home/user.name/myProject/. cd ../ rm -rf myPBS_Job
If working with SR1:
iput ./myoutputs myProject/. cd ../ rm -rf myPBS_Job
5. If output files are stored in SR1 tag them with metadata.
imeta addw -d myoutput zvalue 15 meters imeta addw -d myoutput colorLabel RED
iRODS is the integrated Rule-Oriented Data-management System, a community-driven, open source, data grid software solution. iRODS is designed to abstract data services from data storage hardware and provide users with hardware-agnostic way to manipulate data.
iRODS is a primary tool that is used by the CUNY HPCC users to seamlessly access 1PB storage resource (further referenced as 'SR1' here) from any of the HPCC's computational systems.
Access to SR1 is provided via so-called i-commands:
Comrehesive list of i-commands with detailed description can be obtained at iRODS wiki.
To obtain quick help on any of the commads while being logged into any of the HPCC's machines type 'i-command -h'. For example:
Following is the list of some of the most relevant i-commands:
iinit -- Initialize session and store your password in a scrambled form for automatic use by other icommands.
iput -- Store a file
iget -- Get a file
imkdir -- Like mkdir, make an iRODS collection (similar to a directory or Windows folder)
ichmod -- Like chmod, allow (or later restrict) access to your data objects by other users.
icp -- Like cp or rcp, copy an iRODS data object
irm -- Like rm, remove an iRODS data object
ils -- Like ls, list iRODS data objects (files) and collections (directories)
ipwd -- Like pwd, print the iRODS current working directory
icd -- Like cd, change the iRODS current working directory
ichksum -- Checksum one or more data-object or collection from iRODS space.
imv -- Moves/renames an irods data-object or collection.
irmtrash -- Remove one or more data-object or collection from a RODS trash bin.
imeta -- Add, remove, list, or query user-defined Attribute-Value-Unit triplets metadata
iquest -- Query (pose a question to) the ICAT, via a SQL-like interface
Before using any of the i-commands users need to identify themselves to the iRODS server running command
and providing HPCC's password.
Typical workflow that involves operations on files stored in SR1 include storing/getting data to and from SR1, tagging data with metadata, searching for data, sharing (setting permissions).
Storing data to SR1
1. Create iRODS directory (aka 'collection'):
# imkdir myProject
2. Store all files 'myfile*' into this directory (collection):
# iput -r myfile* myProject/.
3. Verify that files are stored:
# ils /cunyZone/home/user.name: C- /cunyZone/home/user.name/myProject # ils myProject /cunyZone/home/user.name/myProject: myfile1 myfile2 myfile3
Symbol 'C-' in the beginning of output of 'ils' shows that listed item is a collection.
4. Combining 'ils', 'imkdir', 'iput', 'icp', 'ipwd', 'imv' user can create iRODS directories and store files in them similarly to what is normally done with UNIX commands 'ls', 'mkdir', 'cp', 'pwd', 'mv' etc...
Getting data from SR1
1. To copy file from SR1 to current working directory run
# iget myProject/myfile1
2. Now listing current working directory should reveal 'myfile1':
# ls myfile1
3. Instead of individual files the whole directory (with sub-directories) can be copied with '-r' flag (stands for 'recursive')
# iget -r myProject
NOTE: wildcards are not supported, therefore something like
"iget myProject/myfile*" will not work
Tagging data with metadata
iRODS provides users with extremely powerful mechanism of managing data with metadata. While working with large datasets it's sometimes easy to forget what is stored in this or the other file. Metadata tags help organizing data in a very easy and reliable manner.
Let's tag files from previous example with some metadata:
# imeta add -d myProject/myfile1 zvalue 15 meters AVU added to 1 data-objects # imeta add -d myProject/myfile1 colorLabel RED AVU added to 1 data-objects # imeta add -d myProject/myfile1 comment "This is file number 1" AVU added to 1 data-objects # imeta add -d myProject/myfile2 zvalue 10 meters AVU added to 1 data-objects # imeta add -d myProject/myfile2 colorLabel RED AVU added to 1 data-objects # imeta add -d myProject/myfile2 comment "This is file number 2" AVU added to 1 data-objects # imeta add -d myProject/myfile3 zvalue 15 meters AVU added to 1 data-objects # imeta add -d myProject/myfile3 colorLabel BLUE AVU added to 1 data-objects # imeta add -d myProject/myfile3 comment "This is file number 3" AVU added to 1 data-objects
Here we've tagged myfile1 with 3 metadata labels:
- zvalue 10 meters
- colorlabel RED
- comment "This is file number 1"
Similar tags were added to 'myfile2' and 'myfile3'
Metadata come in form of AVU -- Attribute|Value|Unit. As seen from the above examples Unit is not necessary.
Let's list all metadata assigned to file 'myfie1':
# imeta ls -d myProject/myfile1 AVUs defined for dataObj myProject/myfile1: attribute: zvalue value: 15 units: meters ---- attribute: colorLabel value: RED units: ---- attribute: comment value: This is file number 1 units:
To remove an AVU assigned to a file run:
# imeta rm -d myProject/myfile1 zvalue 15 meters # imeta ls -d myProject/myfile1 AVUs defined for dataObj myProject/myfile1: attribute: colorLabel value: RED units: ---- attribute: comment value: This is file number 1 units: # # # imeta add -d myProject/myfile1 zvalue 15 meters
Metadata may be assigned to directories as well:
# imeta add -C myProject simulationsPool 1 # imeta ls -C myProject AVUs defined for collection myProject: attribute: simulationsPool value: 1 units:
Note the '-C' key that is used instead of '-d'.
Searching for data
Power of metadata becomes obvious when data needs to be found in large collections. Here is an illustration of how easy this task is done with iRODS via imeta queries:
# imeta qu -d zvalue = 15 collection: /cunyZone/home/user.name/myProject dataObj: myfile1 ---- collection: /cunyZone/home/user.name/myProject dataObj: myfile3
We see both files that were tagged with label 'zvalue 10 meters'. Here is different query:
# imeta qu -d colorLabel = RED collection: /cunyZone/home/user.name/myProject dataObj: myfile1 ---- collection: /cunyZone/home/user.name/myProject dataObj: myfile2
Another powerful mechanism to query data is provided with 'iquest'. Following is a number of examples that show 'iquest' capabilities:
iquest "SELECT DATA_NAME, DATA_SIZE WHERE DATA_RESC_NAME like 'cuny%'" iquest "For %-12.12s size is %s" "SELECT DATA_NAME , DATA_SIZE WHERE COLL_NAME = '/cunyZone/home/user.name'" iquest "SELECT COLL_NAME WHERE COLL_NAME like '/cunyZone/home/%' AND USER_NAME like 'user.name'" iquest "User %-6.6s has %-5.5s access to file %s" "SELECT USER_NAME, DATA_ACCESS_NAME, DATA_NAME WHERE COLL_NAME = '/cunyZone/home/user.name'" iquest " %-5.5s access has been given to user %-6.6s for the file %s" "SELECT DATA_ACCESS_NAME, USER_NAME, DATA_NAME WHERE COLL_NAME = '/cunyZone/home/user.name'" iquest no-distinct "select META_DATA_ATTR_NAME" iquest "select COLL_NAME, DATA_NAME WHERE DATA_NAME like 'myfile%'" iquest "User %-9.9s uses %14.14s bytes in %8.8s files in '%s'" "SELECT USER_NAME, sum(DATA_SIZE),count(DATA_NAME),RESC_NAME" iquest "select sum(DATA_SIZE) where COLL_NAME = '/cunyZone/home/user.name'" iquest "select sum(DATA_SIZE) where COLL_NAME like '/cunyZone/home/user.name%'" iquest "select sum(DATA_SIZE), RESC_NAME where COLL_NAME like '/cunyZone/home/user.name%'" iquest "select order_desc(DATA_ID) where COLL_NAME like '/cunyZone/home/user.name%'" iquest "select count(DATA_ID) where COLL_NAME like '/cunyZone/home/user.name%'" iquest "select RESC_NAME where RESC_CLASS_NAME IN ('bundle','archive')" iquest "select DATA_NAME,DATA_SIZE where DATA_SIZE BETWEEN '100000' '100200'"
Access to the data can be controlled via 'ichmod' command. It's behavior is similar to UNIX 'chmod' command. For example if there is a need to provide user 'user.name1' with read access to file 'myProject/myfile1' execute the following command:
ichmod read user.name1 myProject/myfile1
To see who has access to a file/directory use:
# ils -A myProject/myfile1 /cunyZone/home/user.name/myProject/myfile1 ACL - user.name1#cunyZone:read object user.name#cunyZone:own
In the above example user 'user.name1' has read access to the file and user 'user.name' is an owner of the file.
Possible levels of access to a data object are null/read/write/own.