Main Page: Difference between revisions

From HPCC Wiki
Jump to navigation Jump to search
 
(305 intermediate revisions by the same user not shown)
Line 1: Line 1:
[[File:CUNY-HPCC-HEADER-LOGO.jpg]]
__TOC__
__TOC__


[[Image:hpcc-panorama3.png]]
[[Image:hpcc-panorama3.png]]


The City University of New York (CUNY) High Performance Computing Center (HPCC) is located on the campus of the College of Staten Island, 2800 Victory Boulevard, Staten Island, New York 10314. The CUNY-HPCC supports computational research and computational intensive courses on graduate and undergraduate level offered at all CUNY colleges in fields such as Computer Science, Engineering, Bioinformatics, Chemistry, Physics,Materials Science, Genetics, Computational Biology, Finance and others. HPCC provides educational outreach to local schools and supports undergraduates who work in the research programs of the host institution (e.g. REU program from NSF). The primary mission of HPCC is:
== Mission of CUNY-HPCC ==
The City University of New York (CUNY) High Performance Computing Center (HPCC) serves as a pivotal research and educational hub for the university. Situated on the campus of the College of Staten Island, located at 2800 Victory Boulevard, Staten Island, New York 10314, the center’s primary objective is to enhance educational opportunities and foster scientific research and discovery within the university. This is achieved through the management of state-of-the-art computing infrastructure and the provision of comprehensive research support services. Notably, CUNY-HPCC offers domain-specific expertise in various aspects of computationally intensive research. Furthermore, CUNY’s membership in the Empire AI (EAI) consortium positions CUNY-HPCC as a stepping stone for CUNY researchers seeking access to EAI advanced facilities. 
 
== Empire AI and CUNY-HPCC ==
The Empire AI Consortium comprises the '''CUNY Graduate Center, Columbia University, Cornell University, Icahn School of Medicine, New York University, Rochester Institute of Technology, Rensselaer Polytechnic Institute, the State University of New York, University at Buffalo, and University of Rochester.''' CUNY-HPCC provides support and maintains tickets for all CUNY users with allocation on EAI. Additionally, CUNY-HPCC serves as a stepping stone for CUNY researchers as it operates (on a smaller scale) architectures (including nodes with Hopper) similar to EAI, including extended “Alpha” servers and new “Beta” computers. The latter will consist of 288 B200 GPUs and recently added RTX 6000 Pro nodes. The anticipated cost for EAI is $0.50 per unit (SU), which will provide CUNY PIs with a rate that is significantly lower than a typical AWS rate. One SU corresponds to one hour of H100 compute, and one hour of B200 compute corresponds to two SU. In comparison, the CUNY-HPCC recovery costs for public servers are $0.015 per CPU hour (1 unit) and $0.09 per GPU hour (6 units). For further details, please refer to the section on HPCC access plans.   


* To enable advanced research and scholarship at CUNY colleges by providing faculty, staff, and students with access to high-performance computing, adequate storage resources and visualization resources;
== CUNY-HPCC services ==
* To enable advanced education and cross disciplinary  education by providing flexible and scalable resources;
   
* To provide CUNY faculty and their collaborators at other universities, CUNY research staff and CUNY graduate and undergraduate students with expertise in scientific computing, parallel scientific computing (HPC), software development, advanced data analytics, data driven science and simulation science, visualization, advanced database engineering, and others.  
CUNY-HPCC offers a professionally maintained, modern computational environment and architectures, along with advanced storage and fast interconnects. CUNY-HPCC serves the following purposes:   
* Leverage the HPC Center capabilities to acquire additional research resources for CUNY faculty, researchers and students in existing and major new programs.
* Create opportunities for the CUNY research community to win grants from national funding institutions and to develop new partnerships with the government and private sectors.
CUNY-HPPC is voting member of '''Coalition for Academic Scientific Computation (CASC)'''. Originally formed in the 1990s as a small group of the heads of national supercomputing centers, CASC expanded to more than 100 member institutions representing many of the nation’s most forward-thinking universities and computing centers. CASC includes the leadership of large academic computing centers such as TACC or San Diego SC and recently attracts a greater diversity of smaller institutions such as non-R1s, HBCUs, HSIs, TCUs, etc. CASC’s mission as to be  “''dedicated to advocating for the use of the most advanced computing technology to accelerate scientific discovery for national competitiveness, global security, and economic success, as well as develop a diverse and well-prepared 21st century workforce.”''


== CUNY-HPCC - Democratization of Research ==
<nowiki>*</nowiki> Supports research computing at CUNY, benefiting faculty, their collaborators at other universities, and their public and private sector partners. It also supports CUNY students and research staff.
In last few years the model of cloud computing (called also computing-on-demand) made the promise that anyone, no matter where the user is, could leverage almost unlimited computing resources. This computing supposed to “democratize” research and level the playing field, as it were. Unfortunately that is not entirely true (for now) because the cloud computing even available to nearly anyone, from nearly anywhere remains comparatively expensive to local resources and lacks the flexibility and accessibility of local support tailored to education and research offered by the local research HPCC. Indeed, every computational environment has limitations and a learning curve such that students and faculty coming from variety of  backgrounds and having might feel crushed and helpless without  close and personalized local support. in this sense the carefully designed, user centered, academically focused  CUNY-HPCC has the transformative capability for rapidly evolving computation and data-driven research, and creates opportunities for vast collaboration and convergence research activities and thus provides the real democratization of the research.


== Pedagogical value of CUNY- HPCC ==
<nowiki>*</nowiki> Provides state-of-the-art computing resources and comprehensive research support services, including expertise and full support for users with allocation on EMPIRE-AI.
CUNY-HPPC supports whole variety of classes on graduate and undergraduate level from all CUNY-colleges, Graduate Center and institutes. It is important to mention that CUNY-HPCC impact goes beyond the STEM disciplines. Thus the CUNY-HPCC:
[[File:NAnoBio6.jpg|right|frameless|Dr Alexander Tzanov, the director of CUNY-HPCC speaks on NanoBioNYC workshop]]
* '''Allows to conduct analysis of datasets that are too large to work with easily on personal devices, or that cannot easily be shared or disseminated.''' These data sets are not coming only from STEM fields (i.e. finance, economics, linguistics etc.). Facilitating these analyses provides the students with opportunity to interact in real time with  increasingly large amounts of data, enabling them to gather important skills and experiences.
* '''CUNY-HPCC provides collaborative space for entire courses. T'''he multi-user capabilities and environment of HPCC facilitates collaborative work among learners and promotes more complex closer to reality learning problems.
* Large computational and visualization capabilities of '''HPCC is enabler for applying analytical techniques too large for personal devices.''' Students can  run unattended parameter sweeps or workflows in order to explore the problem in detail. That self exploration has proven positive effect on learning
* '''Use of CUNY-HPCC resources provides students with needed pre request skills and knowledge''' they may need later when explore larger HPC environments. For instance the CUNY-HPCC workflow and environment is extremely close to the environment of other research centers of ACCESS resources.
* '''CUNY-HPCC participates in educational programs such as NSF funded NanoBioNYC Ph.D. traineeship program at  CUNY.''' This program is focused on developing groundbreaking bio-nanoscience solutions to address urgent human and planetary health issues and preparing students to become tomorrow’s leaders in diverse STEM careers.


==Research Computing Infrastructure==
<nowiki>*</nowiki> Creates opportunities for the CUNY research community to establish new partnerships with the government and private sectors.
[[File:HPCC structure last.png|thumb|682x682px|                                                                                                                                    '''<big>                                                           The Organization of HPCC resources</big>''']]
The research computing infrastructure is depicted in the figure on right. In order to support various types of research projects the CUNY-HPCC support variety of computational architectures.  All computational resources are organized in '''3''' tiers - '''''Condominium Tier (CT), Free Tier''''' (FT) and '''''Advanced Tier (AT) plus visualization (Vz).'''''  All nodes in all tiers are attached to central file system '''HPFS''' which provides  '''/scratch''' and  Global Storage ('''GS''') - '''/global/u/.'''   


=== Storage systems ===
<nowiki>*</nowiki> Utilizes HPCC capabilities to acquire additional research resources for its faculty and graduate students in existing and major new programs.
The '''/scratch''' file system mentioned above is small fast file system mounted on all nodes. The file system resides on solid state drives so it is fast. This file system has capacity of '''256GB'''. Note that files on '''/scratch''' are '''not backup-ed and are not protected.''' This file system does not have quota so users can submit large jobs. The file system is automatically purged ifd: '''1.''' the load of the file system exceeds 70% or '''2.''' file(s) are not accessed for <u>60 days whatever comes first.</u> The partition '''/global/u''' in main HPFS file system holds user home directories.  The HPFS is the hybrid file system and combines SSD and HDD (solid state and hard disks) with capabilities for dynamic relocation of files. The capacity is 2 Peta Bytes (PB). This file system, was purchased under NSF grant OAC-2215760. That file system is mounted on all  nodes.  


=== Computational and Visualization Resources ===
<nowiki>*</nowiki> Maintains tickets for all CUNY users with allocation on EAI.  
The  computrational resources in 3 tiers mentioned above are combined within ARROW hybrid cluster. In addition the HPCC operates specialized visualization server which shares the file system with all nodes. That allow to conduct i<u>n-situ  visualizations</u> of simulations. The description in nodes is given in a table below. Note that '''black''' denotes '''basic''' tier, '''blue''' denotes '''advanced''' tier and '''orange''' denotes '''condo''' tier. The '''yellow''' marks the '''vizualization''' tier. 
[[File:Arrow Viz Resources.png|frameless|Computational And Visualization Resources|911x911px|left]]
===Submit jobs on HPCC servers ===
Despite of a tier all jobs at HPCC must:


'''>>''' Start from user's directory on '''scratch''' file system '''- /scratch/<userid>''' . Jobs cannot be started from users home directories - '''/global/u/<userid>'''
== Organization of systems and data storage (architecture) ==
All user data and project data are kept on  Parallel File System Storage (PFSS) which is mounted only on login node(s) of all servers. It holds both user directories and specific partition  called  '''/scratch (see below).''' The main features of '''/scratch'''  partition are''': 1.''' It is mounted on all computational nodes and on all login nodes '''2.''' It is fast '''3.''' '''/scratch''' partition is temporary space, but is '''not  a home directory''' for accounts nor can be used for long term data preservation.  Users must use "staging" procedure described below to ensure preservation of their data, codes and parameters files. The figure below is a schematic of the environment. 


'''>>''' Use SLURM job submission system (job scheduler)'''.'''  All jobs submission scripts written for other job scheduler(s) (i.e. PBS pro) must be converted to SLURM syntax.
Upon registering with HPCC every user will get 2 directories:


'''>>''' All jobs in all tiers <u>must start from Master Hear Node (MHN)</u>. The jobs are distributed automatically to different tiers according to job submission policies and job(s) requirements. Users do not need to communicate directly to any of the servers. In near future the process of submission of jobs will be improved further with launch of HPC job submission portal.
:• '''<font face="courier">/scratch/<font color="red"><userid></font color></font>''' – this is temporary workspace on the HPC systems. Currently scratch resides on the same file system as global/u.
:• '''<font face="courier">/global/u/<font color="red"><userid></font color></font>''' – space for “home directory”, i.e., storage space on the DSMS for program, scripts, and data


All useful users' data must be kept in user's home directory '''/global/u/<userid>.'''. This file system is mounted only on login node. In contrast '''/scratch''' is mounted on all nodes and thus all jobs can be submitted only from '''/scratch'''. It is important to remember that '''/scratch''' is not main storage for users' accounts (home directories), but a <u>temporary storage used for job submission only.</u> Thus:
:• In some instances a user will also have use of disk space on the DSMS in '''<font face="courier">/cunyZone/home/<font color="red"><projectid></font color></font>''' (IRods).  
#data in '''/scratch''' are not protected, preserved or backup-ed and can be lost at any time. CUNY-HPCC has no obligation to preserve user data in '''/scratch.'''
[[File:HPCC_structure.png|center|frameless|900x900px]]
#'''/scratch''' undergoes regular and automatic file purging when either or both conditions are satisfied:
The '''/global/u/<userid>''' directory has quota (see below for details) while  the '''/scratch/<userid>''' do not have. However the '''/scratch''' space is cleaned up  following the rules described below. There are no guarantees of any kind that files in '''/scratch''' will be preserved during the hardware crashes or cleaning up. Access to all HPCC resources is provided by bastion host called '<nowiki/>'''''chizen'''''.  The Data Transfer Node called '''Cea''' allows file transfer from/to remote sites directly to/from '''<font face="courier">/global/u/<font color="red"><userid></font></font>''' or to/from  '''<font face="courier">/scratch/<font color="red"><userid></font></font>'''        
##load of the '''/scratch''' file system reaches '''70+%.'''
##there is/are inactive file(s) older than 60 days.


Upon  registering with HPCC every user will get 2 directories:
==Computing architectures at HPCC==


:• '''<font face="courier">/scratch/<font color="red"><userid></font></font>''' – this is temporary workspace on the HPC systems
The HPC Center employs a diverse range of architectures to accommodate intricate and demanding workflows. All computational resources of various types are consolidated into a single hybrid cluster known as Arrow. This cluster comprises symmetric multiprocessor (SMP) nodes equipped with and without GPUs, distributed shared memory (NUMA) node(s), high-memory nodes, and advanced SMP nodes featuring multiple GPUs. The number of GPUs per node varies between two and eight, along with the utilized GPU interface and GPU family. Consequently, the fundamental GPU nodes are equipped with two Tesla K20m GPUs connected via the PCIe interface, while the most advanced nodes support eight Ampere A100 GPUs connected via the SXM interface.  
:• '''<font face="courier">/global/u/<font color="red"><userid></font></font>''' – space for “home directory”, i.e., storage space on the DSMS for program, scripts, and data
:• In some instances a user will also have use of disk space on the DSMS in '''<font face="courier">/cunyZone/home/<font color="red"><projectid></font></font>''' (IRods).
The '''/global/u/<userid>''' directory has quota (see below for details).  


''Overview of Computational architectures'':


However the '''/scratch''' space is cleaned up  following the rules described below. There are no guarantees of any kind that files in '''/scratch''' will be preserved during the hardware crashes or cleaning up. Access to all HPCC resources is provided by bastion host called '<nowiki/>'''''chizen'''''.  The Data Transfer Node called '''Cea''' allows file transfer from/to remote sites directly to/from '''<font face="courier">/global/u/<font color="red"><userid></font></font>''' or to/from  '''<font face="courier">/scratch/<font color="red"><userid></font></font>'''
'''SMP''' servers have several processors (working under a single operating system) which "share everything"Thus all cpu-cores allocate a common memory block via shared bus or data path. SMP servers support all combinations of memory VS cpu (up to the limits of the particular computer). The SMP servers are commonly used to run sequential or thread parallel (e.g. OpenMP) jobs and they may have or may not have GPU.   
The computational architectures participating in each tier are discussed below. All tiers access 2 separate file systems: 1) Global Storage (GS) (a global file system) and 2) '''/scratch f'''ile system. The '''GS''' is mounted only on login nodes and is proposed to keep user data (home directories) and project data (project directories). The '''/scratch''' file system is mounted on login nodes and on all computational nodes (in all tiers). Thus '''GS''' holds long lasting user data (executables, scripts and data), while '''/scratch i'''s used to hold provisional data required by particular simulation(s).  Consequently, jobs can be started only from '''/scratch''' and '''never from'''  '''GS''' storageIt is important to remember that '''/scratch''' is not main storage for users' accounts (home directories), but a temporary storage used for job submission only. Thus:


#data in '''/scratch''' are not protected, preserved or backup-ed and can be lost at any time. CUNY-HPCC has no obligation to preserve user data in '''/scratch.'''
'''Cluster''' is defined as a single system comprising set of servers interconnected with high performance network. Specific software coordinates  programs on and/or across those in order to  perform computationally intensive tasks. The most common cluster type is the one that consists of several identical SMP servers connected via fast interconnect. Each SMP member of the cluster is called a '''node'''. All nodes run independent copies of the same operating system (OS). Some or all of the nodes may incorporate GPU.
#'''/scratch''' undergoes regular file purging when either or both conditions are satisfied:
##load of the '''/scratch''' file system reaches 71%.
##there is/are inactive file(s) older than 60 days.
# only data in '''GS''' are protected and recoverable.


Upon registering with HPCC every user will get 2 directories:
Hybrid clusters combine nodes of different architectures. For instance the main CUNY-HPCC machine is a hybrid cluster called '''Arrow'''. Sixty two (62) of its nodes are identical GPU enabled SMP servers each with 2 x GPU K20m, 3 are SMP but with extended memory (fat nodes), one node is distributed shared memory  node (NUMA, see below) and 2 are fat SMP servers especially designed to support 8 NVIDIA GPU per node. The latter are connected via SXM interface. In addition HPCC operates the cluster '''Herbert''' dedicated only to education.


:• '''<font face="courier">/scratch/<font color="red"><userid></font></font>''' – this is temporary workspace on the HPC systems
'''Distributed shared memory''' computer is tightly coupled server in which the memory is physically distributed, but it is logically unified as a single block. The system resembles SMP, but the number of cpu cores and the amounts of memory possible is far beyond limitations of the SMPBecause the memory is distributed, the access times across address space are non-uniform. Thus, this architecture is called Non Uniform Memory Access (NUMA) architecture. Similarly to SMP, the '''NUMA''' systems are typically used for applications such as data mining and decision support system in which processing can be parceled out to a number of processors that collectively work on a common data. HPCC operates the '''NUMA''' node at Arrow named '''Appel'''.  This node does not have GPU.
: • '''<font face="courier">/global/u/<font color="red"><userid></font></font>''' – space for “home directory”, i.e., storage space on the DSMS for program, scripts, and data
:• In some instances a user will also have use of disk space on the DSMS in '''<font face="courier">/cunyZone/home/<font color="red"><projectid></font></font>''' (IRods).
The '''/global/u/<userid>''' directory has quota (see below for details) while the '''/scratch/<userid>''' do not have. However the '''/scratch''' space is cleaned up following the rules described below. There are no guarantees of any kind that files in '''/scratch''' will be preserved during the hardware crashes or cleaning up. Access to all HPCC resources is provided by bastion host called ''''''chizen'''''.  The Data Transfer Node called '''Cea''' allows file transfer from/to remote sites directly to/from '''<font face="courier">/global/u/<font color="red"><userid></font></font>''' or to/from  '''<font face="courier">/scratch/<font color="red"><userid></font></font>'''


===<u>Condominium  Tier</u>===
'' Infrastructure systems'':
Condominium tier (called '''condo''') organizes resources purchased and owned by faculty, but maintained by HPCC. The participation in this tier is '''strictly voluntary'''. Several faculty/research groups can combine finds to purchase and consequently share the hardware (a node or several nodes). All nodes in this tier must meet certain hardware specifications including to be fully warranted for time life of the node(s) in order to be accepted. If you want to participate in condominium please sent a request mail to hpchelp@csi.cuny.edu and consult HPCC before making a purchase.  Condominium tier:  


* Promotes vertical and horizontal collaboration between research groups;
o Master Head Node ('''MHN/Arrow)''' is a redundant login node from which all jobs on all servers start. This server is not directly accessible from outside CSI campus. Note that name of main server and its login nodes are the same Arrow. Thus users can access the Arrow login nodes using name Arrow or MHN.
* Makes possible to utilize small amounts of research money or "left-over" money wisely and to obtain advanced resources;
*Helps researchers to conduct large scope high quality research including collaborative projects leading to successful grants with high impact.


====Access to condo resources====
o '''Chizen''' is a redundant gateway server which provides access to protected HPCC domain.
The resources are available only for condo owners and their groups. The users registered with condo must use the main login node of Arrow server. To access their own node the condo users must  specify their own private partition. In addition there are partitions which operate over two or more nodes owned by condo members.  Condo users may us  (private partition) to access their node. he condo tier members benefit from professional support from HPCC security and maintenance. Upon approval (from the node owner)  any idle node(s) can be used by any other member(s). For instance a member can borrow (for agreed time) node with more advanced GPUs than those installed on his/her own node(s). The owners of the equipment are responsible for any repair costs for their node(s). Other users may rent any of the described below condo resources if agreed with owners. The unused cycles can be shared with other members of the community.


In sum the benefits of condo are:   
o '''Cea''' is a file transfer node allowing transfer of files between users’ computers to/from  /scratch space or to/from /global/u/<usarid>. '''Cea''' is accessible directly (not only via '''Chizen'''), but allows only limited set of shell commands. 


*'''5 year lifecycle''' - condo resources will be available for a duration of 5 years.
'''Table 1''' below provides a quick summary of the attributes of each of the sub clusters of the main HPC Center called Arow.
'''Access to more cpu cores than purchased''' and access to resources which are not purchased.
*'''Support''' - HPCC staff will install, upgrade, secure and maintain condo hardware throughout its lifecycle.
*'''Access to main application server'''
*'''Access to HPC analytics'''
Responsibilities of condo memnbers


*'''To share their resources''' (when idle or partially available) with other members of a condo;
{| class="wikitable"
*'''To include in their research and instrumentation grants money''' for computing used to cover operational expences (non-tax-levy expenses) of the HPCC.
|+
The table below summarized the available resources
!Master Head Node
{| class="wikitable sortable"
!Sub System
|+Resources in Condominium Tier (Arrow cluster)
!Tier
!Number of nodes
!Type
!Cores/node
!Type of Jobs
!Chip
!Nodes
!Memory/node
!CPU Cores
!GPU/node
!GPUs
!Interconnect
!Mem/node
!Use
!Mem/core
!Private partition
!Chip Type
!GPU Type and Interface
|-
| rowspan="17" |'''<big>Arrow</big>'''
| rowspan="4" |Penzias
| rowspan="10" |Advanced
| rowspan="4" |Hybrid Cluster
|Sequential & Parallel jobs w/wo GPU
|66
|16
|2
|64 GB
|4 GB
|SB, EP 2.20 GHz
|K20m GPU, PCIe v2
|-
| rowspan="3" |Sequential & Parallel jobs
| rowspan="3" |1
|24
| -
|1500 GB
|62 GB
| rowspan="3" |HL, 2.30 GHz
| -
|-
|36
| -
|768 GB
|21 GB
| -
|-
|24
| -
|768 GB
|32 GB
| -
|-
|Appel
|NUMA
|Massive Parallel, sequential, OpenMP
|1
|384
| -
|11 TB
|28 GB
|IB, 3 GHz
| -
|-
|Cryo
|SMP
|Sequential and Parallel jobs, with GPU
|1
|40
|8
|1500 GB
|37 GB
|SL, 2.40 GHz
|V100 (32GB) GPU, SXM
|-
| rowspan="2" |Blue Moon
| rowspan="2" |Hybrid Cluster
| rowspan="2" |Sequential and Parallel jobs w/wo GPU
|24
|32
| -
| rowspan="2" |192 GB
| rowspan="2" |6 GB
| rowspan="2" |SL, 2.10 GHz
| -
|-
|2
|32
|2
|V100(16GB) GPU, PCIe
|-
|Karle
|SMP
|Visualization, MATLAB/Mathematica
|1
|36*
| -
|768 GB
|21 GB
|HL, 2.30 GHz
| -
|-
|Chizen
|Gateway
|No jobs allowed
| colspan="7" | -
|-
|-
| rowspan="2" |CFD
| rowspan="2" |Condo
| rowspan="2" |SMP
| rowspan="7" |Parallel, Seq, OpenMP
|1
|48
|2
|2
|64
|768 GB
|2 x AMD EPYC
|
| 256 GB
|EM, 4.8 GHz
|2 x A30 24 GB, PCIe gen 3
|A40, PCIe, v4
|100 Gbps Infiniband EDR
|-
| Number Crunching
|1
|parchem, partasrc
|48
| -
|512 GB
|
|ER, 4.3 GHz
| -
|-
| rowspan="2" |PHYS
| rowspan="2" |Condo
| rowspan="2" |SMP
|1
|48
|2
|640 GB
|
|ER, 4 GHz
|L40, PCIe, v4
|-
|-
|1
|1
|64
|48
|2 x AMD EPYC
| -
|512 GB
|512 GB
| --
|
|100 Gbps infiniband EDR
|ER, 4.3 GHz
|Number Crunching
| -
 
|parthphys
|-
|-
| rowspan="2" |CHEM
| rowspan="2" |Condo
| rowspan="2" |SMP
|1
|48
|2
|2
|256 GB
|
|EM, 2.8 GHz
|A30, PCIe, v4
|-
|1
|128
|128
|2 x AMD EPYC
|8
|512 GB
|512 GB
|2 x A40 48 GB, PCIe gen 3
|
|100 Gbps Infiniband EDR
|ER, 2.0 GHz
|Number Crunching
|A100/40, SXM
|parting, partmath
|-
|-
|ASRC
|Condo
|SMP
|1
|1
|128
|48
|2 x AMD EPYC
|2
|512 GB  
|256 GB
|8 x A100 40GB, SXM
|
|200 Gbps Infiniband HDR
|ER, 2.8 GHz
|Number Crunching
|A30, PCIe, v4
|parched
|}
|}
Note: SB = Intel(R) Sandy Bridge, HL = Intel (R) Haswell, IB = Intel (R) Ivy Bridge, SL = Intel (R) Xeon(R) Gold, ER  = AMD(R) EPYC ROMA, EM = AMD(R) EPYC MILAN, EG = AMD (R) EPYC GENOA 


==='''<u>Advanced Tier</u>'''===
= Recovery of  operational costs =
The advanced tier holds the resources used for more advanced or large scale research. This tier provides nodes with Volta class GPUs with 16 GB and 32 GB on board. The table below summarizes the resources.  
CUNY-HPCC, a not-for-profit core research facility affiliated with CUNY, is dedicated to supporting a wide range of research endeavors that necessitate advanced computational resources. <u>Notably, CUNY-HPCC’s operations are not directly or indirectly funded by CUNY or the College of Staten Island (CSI). Consequently, CUNY-HPCC employs a cost recovery model that exclusively recoups operational expenses, without generating any profit for the HPCC.</u> The recovered costs are meticulously calculated using comprehensive documentation of actual operational expenditures and are designed to achieve a break-even point for all CUNY users. This methodology is approved by CUNY-RF and is employed in other CUNY research facilities. The cost recovery charging schema is based on unit-hour usage, encompassing both CPU and GPU units. Definitions for these units are provided in the accompanying table.  
{| class="wikitable sortable"
{| class="wikitable mw-collapsible"
|+Resources in Advanced Tier (Blue Moon, Cryo, Appel):
|+Definitions of unit-hour
1256 cores and 12 Volta class GPU
!Type of resource
!Number of Nodes
!Unit-hour
!Cores/node
!For V100, A30, A40 or L40
!Chip
!For A100
!Memory/node
!GPU/node
!Interconnect
!Use
!Association
|-
|-
|2
|CPU unit
|32
|1 cpu core/hour
|2 x Intel X86_64
| --
|192 GB
|2 x V100 (16 GB) PCIe gen 3
|100 Gbps Infiniband EDR
|Number Crunching
|Blue Moon Cluster
|-
|24
|32
|2 x Intel X86_64
|192 GB
| --
| --
|100 Gbps Infiniband EDR
|Number Crunching
|Blue Moon Cluster
|-
|-
|1
|GPU unit
|40
|(4 cpu cores + 1 GPU thread )/hour
|2 x Intel X86_64
|4 cpu cores + 1 GPU
|1500 GB
|4 cpu cores and 1/7 A100
|8 x V100 (32 GB) SXM
|}
|100 Gbps Infiniband EDR
 
|Number Crunching
=== Compute on public resources ===
|Cryo
The cost recovery model for public (non-condominium) servers offers the following options:
 
# Minimal Access Plan (MAP)
# Compute on Demand (CODP)
# Lease a node(s)
 
==== Minimal Access Plan (MAP) ====
Colleges may participate in any of the Minimal Access Plan tiers. The tiered pricing structure is as follows:


|-
- Tier A: $5,000 per year
|1
|384
|2 x Intel X86_64
|11,000 GB
| --
|56 Gbps Infiniband  QDR
| Number Crunching


|Appel
- Tier B: $10,000 per year
|}


===<u>Basic tier</u>===
- Tier C: $25,000 per year
The basic tier provides resources for sequential and moderate size parallel jobs. The openMP jobs can be run only in a scope of single node. The distributed parallel jobs (MPI) can be run across cluster.  This tier alsop support Matlab Parallel server which can be run across nodes. The users also can run GPU enabled jobs since this tier has 132 GPU Tesla K20m. Please note that these GPU are not supported by NVIDIA anymore. Many applications also may not support this GPU as well. The table below summaries the resources for their tier. 
{| class="wikitable sortable mw-collapsible"
|+Resources in Free Tier (Penzias, Karle):1056 cores and 132 Tesla class GPU
!Number of nodes
!Cores/node
!Chip
!Memory/node
!GPU/node
!Interconnect
!Use
!Association


|-
Within each tier, the cost is '''$0.015 per CPU hour and $0.09 per GPU hour.''' It is important to note that the Minimum Access Plan (MAP) tiers A, B, and C are '''not all-inclusive.''' Therefore, even if a college pays for a higher tier, it does not guarantee unlimited use of all college employees, faculty, and students throughout the year.
|66
|16
|2 x Intel X86_64
|64 GB
| 2 x K20m, PCIe.2
|56 Gbps Infiniband
|Number crunching, Sequential jobs, General computing, Distributed Parallel Matlab
|Penzias


|-
# The MAP plan is indirectly linked to the number of hours used. Consequently, the definition of “up to 12 users for B tier” should not be interpreted as “all users up to 12 per college receive unlimited access to HPCC resources.”
|1
# The number of users per tier is determined by statistical analysis of resource usage and statistics for the average duration of a job across all CUNY institutions. This means that if the number of users from a college exceeds the number of hours encoded in the MAP fee, the additional hours will be charged at the preferred rate of $0.015 per CPU hour and $0.09 per GPU thread hour.
|36/72*
# Furthermore, the <u>MAP fee may fully cover the expenses for individuals from a given college for a year,</u> but it may also not. This depends on the actual usage and type of resources, as well as the number of additional individuals from the same college who appear during the year. It is crucial to understand that allocation on HPCC is not restricted; our focus is solely on the completion of tasks.
|2 x Intel X86_64
# The free 11,540 CPU hours and 1,440 GPU hours are allocated per PI account and project and are available only for MAP-B and MAP-C (see below). These free hours are intended to facilitate project establishment. Consequently, the PI can request that a member or members of a group utilize time and explore new research opportunities. For instance, upon creating his account the PI X will receive free hours. If he/she/whatever hires a graduate student(s), they may share these free hours if they work on the same project.
|768 GB
# It is important to note that each GPU requires at least four CPU cores to operate. Therefore, if a user requests one GPU thread, it equates to one GPU thread and four CPU threads, which is equivalent to $0.15 per unit hour for that unit (units are explained above). Note that not all GPU support virtualization, so unit may include the whole GPU depend on used GPU type. 
| --
|56 Gbps Infiniband


|Visualization, Matlab, Parallel Matlab (toolbox)
==== Compute on demand (CODP) ====
|Karle
Users from colleges that do not participate in MAP (A,B,C) are charged '''$0.018 per CPU hour and $0.11 per GPU hour.''' There is no free time associated with CODP. 
|}
<nowiki>*</nowiki> Hyperthread


===<u>Arrow cluster and hybrid storage (NSF grant 2023 equipment)</u>===
==== Lease a public node(s) ====
This equipment consist of large hybrid parallel file system and 2 computational nodes integrated in a cluster named Arrow. The file system has capacity of 2PB (Petabytes) and bandwidth of 35 GBps write and 50 GBps read. The computational nodes details are summarized in table below.
Users may lease a node(s) for project. That ensures them 100% access and no  time or job limitations over leased resource. The minimum lease time is 30 days (one month). Longe leases (more than 90 days) have 10% discount. '''MAP users''' are charged between $172 and $950 per month (see below), depending on the type of node. '''Non-MAP''' users are charged between $249.58 and $1,399 per month, depending on the type of node. Please see below for details and examples.
{| class="wikitable sortable mw-collapsible"
|+Resources in NSF grant equipment (Arrow): Total of 256 cores and 16 GPU Ampere A100/80GB GPU
!Number of Nodes
!Cores/node
!GPU/node
!Memory/node
!Chip
!Interconnect
!Use
!Association


|-
=== Condo ===
|2
Condo users only pay for infrastructure support.  The annual fee depends on the type of node and ranges from $1,540 to $4,520 per year. 
|128
|8 x A100/80GB
|1024 GB
| 2 x AMD EPYC
|HDR 100 Gbps
|Molecular modeling, Data science, Number Crunching, Materials Science, AI, ML
| Arrow
|}


==== Lease on condo node(s) ====
Users may lease condo nodes for a project, ensuring them 100% access and eliminating any time or job limitations on the leased resource. The minimum lease duration '''is 30 days (one month).''' Longer leases (more than 90 days) receive a 10% discount. The lease of a condo node is contingent upon the owner of the required node agreeing to lease for a specific time period and duration. Users interested in leasing a condo node of a particular type must contact the HPCC director for options. Lease fees vary depending on the type of node and are currently (note that prices are reviewed once every six months) between $230 and $1100 per month.  


==HPC systems and their architectures==
=== Storage ===
Storage costs are $60 per TB per year, backup costs are $45 per TB per year, and archive costs are $35 per TB per year. The first 50 GB of scratch storage are free. Prices are calculated at the end of each month.


The HPC Center operates variety of architectures in order to support complex and demanding workflowsThe deployed systems include:  distributed memory (also referred to as “cluster”) computers, symmetric multiprocessor (also referred as SMP) and shared memory (also reffred as NUMA machines).
# Additionally, note that '''file transfers from/to HPCC are free''', but it is important to consider the CUNY network speed, which is significantly slower than modern standardsThis will impact the time required to download large data sets. For large data HPCC utilizes and recommend to use secure parallel download via Globus.
# All services associated with data and storage provided by HPCC are free for CUNY users.


''Computational Systems'':
=== HPCC access plans details and examples  ===
a.     '''Minimum access (MAP):'''


'''SMP''' servers have several processors (working under a single operating system) which "share everything".  Thus  all cpu-cores allocate a common memory block via shared bus or data path. SMP servers support all combinations of memory VS cpu (up to the limits of the particular computer). The SMP servers are commonly used to run sequential or thread parallel (e.g. OpenMP) jobs and they may have or may not have GPU. Currently, HPCC operates several detached SMP servers named '''Math, Cryo ''' and '''Karle'''. Karle is a server which does not have GPU and is used for visualizations, visual analytics  and interactive MATLAB/Mathematica jobs. '''Math''' is a condominium server without GPU as well. Cryo (CPU+GPU server) is  specialized server with  eight (8) NVIDIA V100 (32G) GPU designed to support large scale multi-core multi-GPU jobs.  
Minimum Access is designed to provide extensive support for research activities across various colleges, foster collaboration between institutions, facilitate the establishment of new research projects, and serve as a testing ground for innovative studies. MAP accounts operate under a stringent fair share policy, which determines the actual waiting time for job allocation in a queue based on the resources utilized by that account in previous cycles. Furthermore, all jobs are subject to strict time constraints. Consequently, extended jobs necessitate the implementation of checkpoints.


'''Cluster''' is defined as a single system comprizing a  set of SMP servers interconnected with high performance network. Specific software coordinates  programs on and/or across those in order to  perform computationally intensive tasks. Each SMP member of the cluster is called a '''node'''. All nodes run independent copies of the same operating system (OS). Some or all of the nodes may incorporate GPU.  The main cluster at HPCC is a hybrid (CPU+GPU) cluster called '''Penzias'''.  Sixty six (66) of Penzias nodes have 2 x GPU K20m, and the 3 fat nodes (nodes with large number of CPU-cores and memory) of the cluster do not have GPU.  In addition HPCC operates the cluster '''Herbert''' dedicated only to education.
The MAP offers three tiers of access:


'''Distributed shared memory''' computer is tightly coupled server in which the memory is physically distributed, but it is logically unified as a single block. The system resembles SMP, but the number of cpu cores and the amounts of memory possible is far beyond limitations of the SMP.  Because the memory is distributed, the access times across address space are non-uniform. Thus, this architecture is called Non Uniform Memory Access (NUMA) architecture.  Similarly to SMP, the '''NUMA''' systems are typically used for applications such as data mining and decision support system in which processing can be parceled out to a number of processors that collectively work on a common data. HPCC operates the '''NUMA''' server called '''Appel'''.  This server does not have GPU.  
· A: The Basic tier incurs a yearly fee of $5,000. It is tailored to support users from colleges with limited research activities. The fee covers infrastructure expenses associated with one to two users from these colleges.


======'' Infrastructure systems'':======
· B: The Medium tier incurs a yearly fee of $15,000. It covers infrastructure expenses for up to twelve users from these colleges. Additionally, every account under the Medium tier receives complimentary 11,520 CPU hours and 1,440 GPU hours upon account creation.
o Master Head Node ('''MHN''') is a redundant login node from which all jobs on all servers start. This server is not directly accessible from outside CSI campus.  


o '''Chizen''' is a redundant gateway server which provides access to protected HPCC domain.
· C: The Advanced tier incurs a yearly fee of $25,000. It covers infrastructure expenses for all users from these colleges. Furthermore, every new account from this tier receives complimentary 11,520 CPU hours and 1,440 GPU hours upon account creation.  Minimum access is designed to provide wide support for research activities in any college, to promote collaboration between colleges, to help establish a new research project, and/or to be testbed for new studies. MAP accounts operate under strict fair share policy so actual waiting time for a job in a que depends on resources used by that account in previous cycles. In addition all jobs have strict time limitations. Therefore long jobs must use check-points.  


o '''Cea''' is a file transfer node allowing transfer of files between users’ computers to/from  /scratch space or to/from /global/u/<usarid>. '''Cea''' is accessible directly (not only via '''Chizen'''), but allows only limited set of shell commands. 
The MAP users get charged per CPU/GPU hour at low rate of '''<u>$0.015 per cpu hour and $0.09 per GPU hour</u>.'''  


'''Table 1''' below provides a quick summary of the attributes of each of the systems available at the HPC Center.
{| class="wikitable mw-collapsible"
{| class="wikitable sortable"
|+Cost recovery fees for MAP users example
|+CUNY-HPCC resources
|Job
!System
|Cpu cores
!Type
|GPU
!Type of Jobs
|Cost/hour
!Number of Nodes
|-
!Cores/node
|1 core no GPU
!Chip Type
|1
!GPU/node
|0
!Memory/node (GB)
|$0.015/hour
|-
|16 cores no GPU
|4
|0
|$0.24/hour
|-
| 4 cores + 1 GPU  
|4
|1
|$0.15/hour
|-
|16 cores + 1 GPU
|16
|1  
|$0.33/hour
|-
|-
|Penzias
|16 cores + 2 GPU
|Hybrid Cluster
|Sequential and parallel jobs with/without GPU  
|66
|16
|16
 
|2  
|2 x Intel Sandy Bridge EP 2.20 GHz
|$0.42/hour
|2 x K20M (5 GB on board), PCIe 2.0
|64 GB
|-
|-
|Blue Moon
|32 cores + 2 GPU
|Hybrid cluster
|Sequential and parallel jobs with/without GPU
|24 (CPU)  & 2 (CPU + GPU)
|32
|32
|2  
|$0.66/hour
|-
|40 cores + 8 GPU
|40
|8
|$1.32/hour
|}


|2 x Intel Skylake 2.10 GHz


|2 x V100 (16 GB on board), PCIe 3.0
b.     '''Computing on demand (CODP)'''
|192 GB
|-
|Cryo


|SMP
Computing on demand plan (CODP) is open for all users from all CUNY colleges that do not participate in MAP plan, but want to use the HPCC resources. CODP accounts operate under strict fair share policy, so actual waiting time for a job in a que depends on resources previously used. In addition, all jobs have time limitations, so long jobs must use check-points. The users in CODP are charged for the time (CPU and GPU) per hour. The current rates are '''$0.018 per cpu hour and $0.11 per GPU hour.'''  In difference to MAP, the new CODP accounts does not come with free time. The invoices are generated and send to users (PI only) at the end of each month.  The examples in following table explain the fees structure:
|Sequential and parallel jobs with/without GPU
|1
|40
|2 x Intel Skylake 2.10 GHz


| 8 x V100 (32 GB on board), XSM
{| class="wikitable mw-collapsible"
|1,500 GB
|+Cost recovery fees for CODP plan
|Job
|Cpu cores
|GPU
|Cost/hour
|-
|-
|Appel
|1 core no GPU
|NUMA
|Massive parallel and/or big data jobs without GPU
|1
|1
|384
|0
| 2 x Intel Ivy Bridge 3.0 GHz
|$0.018/hour
| --
|-
|11,000 GB
|16 cores no GPU
|16
|0
|$0.288/hour
|-
|-
|Fat node 1
|4 cores + 1 GPU
|Part of Penzias
|4
|Big data jobs without GPU
|1
|1
|24
|$0.293/hour
|-
|16 cores + 1 GPU
|16
|1  
|$0.334/hour
|-
|32 cores + 1 GPU
|32
|1  
|$0.666/hour
|-
|32 cores + 2 GPU
|32
|2
|$0.756/hour
|}
 
 
c.  '''Leasing public node(s) (LNP)'''
 
Leasing node plan allows the users to lease the node(s) for the duration of the project. The minimum lease time is 30 days (one month), but leases of any length are possible. Discounts of 10% are given to users whose lease is longer than 90 days. Discounts cannot be combined.  In difference to MAP and CODP the LNP users do not compete for resources and have full access to rented resources 24/7.  


| 2 x Intel Sandy Bridge 2.30 GHz
{| class="wikitable mw-collapsible"
| --
|+Fees for lease node(s) for MAP users
|768 GB
|Job (MAP users)
|Cpu cores
|GPU
|Cost/30 days
|-
|-
|Fat node 2
|1 core no GPU
|Part of Penzias
|Big data jobs without GPU
|1
|1
|24
|0
|NA
|-
|16 cores no GPU
|16
|0
|$172.80
|-
|32 cores no GPU
|32
|0
|$264.96
|-
|16 cores + 2 GPU
|16
|1  
|$302.40
|-
|32 cores + 2 GPU
|32
|2
|$475.20
|-
|40 cores + 8 GPU
|40
|8
|$760.0
|-
|64 cores + 8 GPU
|64
|8
|$950.40
|}


| 2 x Intel Sandy Bridge 2.30 GHz
{| class="wikitable mw-collapsible"
| --
|+Fees for lease a node(s) for <span style="color:red;"> non </span>-MAP users
|1500 GB
|Job (non-MAP users)
|Cpu cores
|GPU
|Cost/month
|-
|-
|Fat node 3
|1 core no GPU
|Part of Penzias
|Big data jobs without GPU
|1
|1
|0
|NA
|-
|16 cores no GPU
|16
|0
|$249.82
|-
|32 cores no GPU
|32
|32
| 2 x Intel Haswell 2.30 GHz
|0
| --
|$497.64
|768 GB
|-
|-
|Condo 1
|16 cores + 1 GPU
|Part of Condo
|16
|Sequential and parallel jobs with/without GPU
|2
|1
|$443.23
|64
|2 x AMD EPYC Roma, 2.10 GHz
|2 x A30
|256 GB
|-
|-
| Condo 2
|32 cores + 2 GPU
|Part of Condo
|32
|Sequential and parallel jobs with/without GPU
|2
| 1
|$886.64
|64
|2 x AMD EPYC Roma, 2.10 GHz
|2 x A30
|256 GB
|-
|-
|Condo 3
|40 cores + 8 GPU
|Part of Condo
|40
|Sequential and parallel jobs with/without GPU
|8
|$1399.68
|}
 


|1
d.     '''Condo Ownership (COP)'''
| 128
 
|2 x AMD EPYC Roma, 2.10 GHz
Condo describes a model when user(s) own a node/server managed by HPCC. Only full time faculty can own condo node. Condo nodes are fully integrated into HPCC infrastructure. The owners pay only HPCC’s infrastructure support operational fee which includes only proportional part of licenses and materials need for day-to-day operations. The fees are reviewed twice a year and currently are $0.003 per CPU hour and $0.02 per GPU hour. Condo owners can “borrow”  (upon agreement) free of charge any node(s) from condo stack and can also lease (for higher fee – see below) their own nodes to non-condo users. The minimum let time is 30 days. The fees collected from non-condo users offset payments of the owner. 
| 1 x A40
{| class="wikitable mw-collapsible"
|512 GB
|+Condo owners costs per year
|Type of condo node
|Cpu cores
|GPU
|Cost/year
|-
|Large hybrid SXM
|128
|8
|$4518.92
|-
|Small hybrid
|48
|2
|$1540.54
|-
|Medium compute
|96
|0
|$2464.86
|-
|-
| Condo 4
|Large compute
|Part of Condo
|128
|Sequential and parallel jobs with/without GPU
|0
|$3286.49
|}
 


|1
Condo owners can contract their node(s) to other non-condo users. Renting period is unlimited with min. length of 30 days. The table below shows the payments the non-condo users recompense the condo owners. These fees are accumulated in owners account(s) and do offset the owner’s duties. Discount of 10% is applied for leases longer than 90 days.   
{| class="wikitable mw-collapsible"
|+Type of nodes and monthly lease fees for condo nodes
!Type of node
!Renters cost/month
!Long term (90+ days) rent cost/month
!CPU/node
!CPU type
!GPU/node
!GPU type
!GPU interface
|-
|Laghe Hybrid
|$602.52
|$564.86
|128
|128
|2 x AMD EPYC Roma, 2.10 GHz
|EPYC, 2.2 GHz
|2 x A40
|8
| 512 GB
|A100/80
|SXM
|-
|-
|Condo 5
|Small Hybrid
|Part of Condo
|$205.41
|Sequential and parallel jobs with/without GPU
|$192.57
|1
|48
|64
|EPYC, 2.8 GHz
|2 x AMD EPYC Roma, 2.10 GHz
|2
| --
|A40, A30, L40
|512 GB
|PCIe v4
|-
|-
|Karle
|Medium Non GPU
|SMP
|$328.65
| Visualization, Matlab, Mathematica
|$308.11
|1
|96
| 72*
|EPYC, 4.11GHz
 
|48
|2 x Intel Sandy Bridge 2.30 GHz
|None
| --
|NA
|768 GB
|-
|-
| Chizen
|Lagre Non GPU
|SMP
|$438.20
|$410.81
|128
|EPYC, 2.0 GHz
|128
|None
|NA
|}


|Gateway only
=== Free time ===
Any new project '''from CUNY colleges/centers that participate in MAP-B or  MAP-C''' is entitled to get free '''11520 CPU hours and 1440 GPU hours.'''  Users under '''MAP-A''' are not entitled for free time. The free compute hours are intended to help to establish a project and thus are shared for all members of the project. Thus compute free hours can be used either by PI  or by any number of project's members. It is important to note that <u>free time is per project not per user account, so any project can have free time only once. External collaborators to CUNY are not normally eligible for free time.</u> Additional hours beyond free time are charged with MAP plan rates. '''<u>Please contact CUNY-HPCC director for  further details.</u>'''


|1 (redundant)
= Support for research grants =
| --
'''<u>All proposals dated on Jan 1st 2026 (<span style="color:red;"> 01/01/26 </span>) and later</u>''' that require computational resources '''<u>must include budget for cost recovery fees at CUNY-HPCC.</u>'''  For a project the PI can choose between:
| --
| --
|64 GB
|-
|MHN
|SMP
|Master Head Node


|2 (redundant)
* lease the node(s), That is useful option for well defined projects and those with high computational component requiring 100% availability of the computational resource.
* use "on-demand" resources. That is flexible option good for experimental projects or exploring new areas of study. The downgrade is that resources are shared among all users under fair share policy. Thus immediate access to resource cannot be guaranteed.
* participate in CONDO  tier. That is most beneficial option in terms of availability of resources and level of support. It fits best the focused research of group(s) (e.g. materials science).


| --
In all cases the PI can use the appropriate rates listed above to establish correct budget for the proposal.  PI should  '''<u>contact the Director of CUNY-HPCC Dr. Alexander Tzanov</u>'''  (alexander.tzanov@csi.cuny.edu) and discuss  the project's computational  requirements  including optimal and most economical computational workflows, suitable hardware, shared or own resources, CUNY-HPCC support options and any other matter concerning  correct and optimal computational budget for the proposal.    
| --
| --
| --
|-
|Cea
|SMP
|File Transfer Node
|1
| --
| --
| --
| --
|}
<nowiki>*</nowiki> hyperthread


==Partitions and jobs ==
= Partitions and jobs =
The only way to submit job(s) to HPCC servers is through SLURM batch system.  Any  job despite of its type (interactive, batch, serial, parallel etc.) must be submitted via SLURM. The latter allocates the requested resources on proper server and starts the job(s) according to predefined strict fair share policy. Computational resources (cpu-cores, memory, GPU) are organized in partitions. The main partition is called production. This is routing partition which distributes the jobs in several sub-partitions depend on job’s requirements. Thus the serial job submitted in '''production''' will land in '''partsequential''' partition.  No SLURM Pro scripts should be ever used and all existing SLURM scripts must be converted to SLURM before use. The table below shows the limitations of the partitions.
The only way to submit job(s) to HPCC servers is through SLURM batch system.  Any  job despite of its type (interactive, batch, serial, parallel etc.) must be submitted via SLURM. The latter allocates the requested resources on proper server and starts the job(s) according to predefined strict fair share policy. Computational resources (cpu-cores, memory, GPU) are organized in '''partitions'''. The table below describes the partitions and their limitations. The users are granted permissions house one or other partition and corresponding QOS key.   The table below shows the limitations of the partitions (in progress).
{| class="wikitable sortable"
{| class="wikitable mw-collapsible"
|+
|+
!Partition
!Partition
!Max cores/job
!Max cores/job
!Max jobs/user  
!Max jobs/user
!Total cores/group
!Total cores/group
!Time limits
!Time limits
!Tier
!
!GPU types
!
|-
|-
|production
|partnsf
|128
|128
|50
|50
|256
|256
|240 Hours
|240 Hours
|Advanced
|
|K20m, V100/16, A100/40
|
|-
|-
|partedu
|partchem
|128
|50
|256
|No limit
|Condo
|
|A100/80, A30
|
|-
|partcfd
|96
|50
|96
|No limit
|Condo
|
|A40
|
|-
|partsym
|96
|50
|96
|No limit
|Condo
|
|A30
|
|-
|partasrc
|48
|16
|16
|2
|16
|216
|No limit
|72 Hours
|Condo
|
|A30
|
|-
|-
|partmath
|partmatlabD
|128
|128
|128
|128
|50
|256
|240 Hours
|240 Hours
|Advanced
|
|V100/16,A100/40
|
|-
|-
|partmatlab
|partmatlabN
|1972
|384
|50
|50
|1972
|384
|240 Hours
|240 Hours
|Advanced
|
|None
|
|-
|-
|partdev
|partphys
|16
|96
|16
|50
|16
|96
|4 Hours
|No limit
|Condo
|
|L40
|
|}
|}
o '''production''' is the main partition with assigned resources across all servers (except Math and Cryo).It is routing partition so the actual job(s) will be placed in proper sub-partition automatically. Users may submit sequential, thread parallel or distributed parallel jobs with or without GPU.
o '''partedu'''  partition is only for education. Assigned resources are on educational server Herbert. Partedu is accessible only to students (graduate and/or undergraduate) and their professors who are registered for a class supported by HPCC. Access to this partition is limited by the duration of the class.


o '''partmatlab''' partition allows to run MATLAB's Distributes Parallel  Server across main cluster. Note however that parallel toolbox programs  can be submitted via production partition, but only as thread parallel jobs.  
* '''partnsf''' is the main partition with assigned resources across all sub-servers. Users may submit sequential, thread parallel or distributed parallel jobs with or without GPU.
* '''partchem'''  is CONDO partition. 
* '''partphys'''  is CONDO partition
* '''partsym'''    is CONDO partition
* '''partasrc'''  is CONDO partition
* '''partmatlabD''' partition allows to run MATLAB's Distributes Parallel  Server across main cluster.  
* '''partmatlabN''' partition to access large matlab node with 384 cores and 11 TB of shared memory. It is useful to run parallel Matlab jobs with Parallel ToolBox
* '''partdev''' is dedicated to development. All HPCC users have access to this partition with assigned resources of one computational node with 16 cores, 64 GB of memory and 2 GPU (K20m). This partition has time limit of 4 hours.


o '''partdev''' is dedicated to development. All HPCC users have access to this partition with assigned resources of one computational node with 16 cores, 64 GB of memory and 2 GPU (K20m). This partition has time limit of 4 hours.
= Hours of Operation =
In order to maximize the use of resources HPCC applies “rolling” maintenance scheme across all systems. When downtime is needed, HPCC will notify all users a week or more in advance (unless emergency situation occur).  Typically, the fourth Tuesday mornings in the month from 8:00AM to 12PM is normally reserved (but not always used) for scheduled maintenance. Please plan accordingly.  Unplanned maintenance to remedy system related problems may be scheduled as needed out of above mentioned days. Reasonable attempts will be made to inform users running on those systems when these needs arise. Note that users are strongly encouraged to use checkpoints in their jobs.


==Hours of Operation ==
= User Support =
The second and fourth Tuesday mornings in the month from 8:00AM to 12PM are normally reserved (but not always used) for scheduled maintenance.  Please plan accordingly.  <br>
Users are strongly encouraged to read this Wiki carefully before submitting ticket(s) for help. In particular, the sections on compiling and running parallel programs, and the section on the SLURM batch queueing system will give you the essential knowledge needed to use the CUNY HPCC systems.  We have strived to maintain the most uniform user applications environment possible across the Center's systems to ease the transfer of applications
Unplanned maintenance to remedy system related problems may be scheduled as needed.  Reasonable attempts will be made to inform users running on those systems when these needs arise.
 
==User Support==
Users are encouraged to read this Wiki carefully. In particular, the sections on compiling and running
parallel programs, and the section on the SLURM batch queueing system will give you the essential
knowledge needed to use the CUNY HPC Center systems.  We have strived to maintain the most uniform
user applications environment possible across the Center's systems to ease the transfer of applications
and run scripts among them.   
and run scripts among them.   


The CUNY HPC Center staff, along with outside vendors, offer regular courses and workshops to the CUNY
The CUNY HPC Center staff, along with outside vendors, offer regular courses and workshops to the CUNY
community in parallel programming techniques, HPC computing architecture, and the essentials of using our
community in parallel programming techniques, HPC computing architecture, and the essentials of using our
systems. Please follow our mailings on the subject and feel free to inquire about such courses.  We regularly
systems. Please follow our mailings on the subject and feel free to inquire about such courses.  We regularly schedule training visits and classes at the various CUNY campuses.  Please let us know if such a training visit is of interest.  In the past, topics have include an overview of parallel programming, GPU programming and architecture, using the evolutionary biology software at the HPC Center,  the SLURM queueing system at the CUNY HPC Center, Mixed GPU-MPI and OpenMP programming, etc.  Staff has also presented guest lectures at formal classes throughout the CUNY campuses.     
schedule training visits and classes at the various CUNY campuses.  Please let us know if such a training visit
is of interest.  In the past, topics have include an overview of parallel programming, GPU programming and
architecture, using the evolutionary biology software at the HPC Center,  the SLURM queueing system at the
CUNY HPC Center, Mixed GPU-MPI and OpenMP programming, etc.  Staff has also presented guest lectures
at formal classes throughout the CUNY campuses.     


If you have problems accessing your account and cannot login to the ticketing service, please send an email to:
If you have problems accessing your account and cannot login to the ticketing service, please send an email to:
Line 524: Line 695:
   [mailto:hpchelp@csi.cuny.edu hpchelp@csi.cuny.edu]  
   [mailto:hpchelp@csi.cuny.edu hpchelp@csi.cuny.edu]  


==Warnings and modes of operation==
= Warnings and modes of operation =




1. hpchelp@csi.cuny.edu is for questions and accounts help communication '''only''' and does not accept tickets. For tickets please use  the ticketing system mentioned above. This ensures that the person on staff with the most appropriate skill set and job related responsibility will respond to your questions. During the business week you should expect a 48h response, quite  often even same day response. During the weekend you may not get any response.  
 
1. hpchelp@csi.cuny.edu is for questions and accounts help communication '''only''' and does not accept tickets unless ticketing system is not operational. For tickets please use  the ticketing system mentioned above. This ensures that the person on staff with the most appropriate skill set and job related responsibility will respond to your questions. During the business week you should expect a 48h response, quite  often even same day response. During the weekend you may not get any response.  


2. '''E-mails to hpchelp@csi.cuny.edu must have a valid CUNY e-mail as reply address.''' Messages originated from public mailers (google, hotmail, etc) are filtered out.
2. '''E-mails to hpchelp@csi.cuny.edu must have a valid CUNY e-mail as reply address.''' Messages originated from public mailers (google, hotmail, etc) are filtered out.
Line 537: Line 709:
the Wiki), and feel free to offer suggestions for improved service.  We hope and expect your experience in using
the Wiki), and feel free to offer suggestions for improved service.  We hope and expect your experience in using
our systems will be predictably good and productive.
our systems will be predictably good and productive.
= User Manual =
The old version of the user manual provides PBS not SLURM batch scripts as examples. Currently CUNY-HPCC uses SLURM scheduler so users must check and use only the updated brief SLURM manual distributed with new accounts or ask CUNY-HPCC for a copy of the latter.

Latest revision as of 16:36, 19 April 2026

Hpcc-panorama3.png

Mission of CUNY-HPCC

The City University of New York (CUNY) High Performance Computing Center (HPCC) serves as a pivotal research and educational hub for the university. Situated on the campus of the College of Staten Island, located at 2800 Victory Boulevard, Staten Island, New York 10314, the center’s primary objective is to enhance educational opportunities and foster scientific research and discovery within the university. This is achieved through the management of state-of-the-art computing infrastructure and the provision of comprehensive research support services. Notably, CUNY-HPCC offers domain-specific expertise in various aspects of computationally intensive research. Furthermore, CUNY’s membership in the Empire AI (EAI) consortium positions CUNY-HPCC as a stepping stone for CUNY researchers seeking access to EAI advanced facilities.

Empire AI and CUNY-HPCC

The Empire AI Consortium comprises the CUNY Graduate Center, Columbia University, Cornell University, Icahn School of Medicine, New York University, Rochester Institute of Technology, Rensselaer Polytechnic Institute, the State University of New York, University at Buffalo, and University of Rochester. CUNY-HPCC provides support and maintains tickets for all CUNY users with allocation on EAI. Additionally, CUNY-HPCC serves as a stepping stone for CUNY researchers as it operates (on a smaller scale) architectures (including nodes with Hopper) similar to EAI, including extended “Alpha” servers and new “Beta” computers. The latter will consist of 288 B200 GPUs and recently added RTX 6000 Pro nodes. The anticipated cost for EAI is $0.50 per unit (SU), which will provide CUNY PIs with a rate that is significantly lower than a typical AWS rate. One SU corresponds to one hour of H100 compute, and one hour of B200 compute corresponds to two SU. In comparison, the CUNY-HPCC recovery costs for public servers are $0.015 per CPU hour (1 unit) and $0.09 per GPU hour (6 units). For further details, please refer to the section on HPCC access plans.

CUNY-HPCC services

CUNY-HPCC offers a professionally maintained, modern computational environment and architectures, along with advanced storage and fast interconnects. CUNY-HPCC serves the following purposes:

* Supports research computing at CUNY, benefiting faculty, their collaborators at other universities, and their public and private sector partners. It also supports CUNY students and research staff.

* Provides state-of-the-art computing resources and comprehensive research support services, including expertise and full support for users with allocation on EMPIRE-AI.

* Creates opportunities for the CUNY research community to establish new partnerships with the government and private sectors.

* Utilizes HPCC capabilities to acquire additional research resources for its faculty and graduate students in existing and major new programs.

* Maintains tickets for all CUNY users with allocation on EAI.

Organization of systems and data storage (architecture)

All user data and project data are kept on Parallel File System Storage (PFSS) which is mounted only on login node(s) of all servers. It holds both user directories and specific partition called /scratch (see below). The main features of /scratch partition are: 1. It is mounted on all computational nodes and on all login nodes 2. It is fast 3. /scratch partition is temporary space, but is not a home directory for accounts nor can be used for long term data preservation. Users must use "staging" procedure described below to ensure preservation of their data, codes and parameters files. The figure below is a schematic of the environment.

Upon registering with HPCC every user will get 2 directories:

/scratch/<userid> – this is temporary workspace on the HPC systems. Currently scratch resides on the same file system as global/u.
/global/u/<userid> – space for “home directory”, i.e., storage space on the DSMS for program, scripts, and data
• In some instances a user will also have use of disk space on the DSMS in /cunyZone/home/<projectid> (IRods).
HPCC structure.png

The /global/u/<userid> directory has quota (see below for details) while the /scratch/<userid> do not have. However the /scratch space is cleaned up following the rules described below. There are no guarantees of any kind that files in /scratch will be preserved during the hardware crashes or cleaning up. Access to all HPCC resources is provided by bastion host called 'chizen. The Data Transfer Node called Cea allows file transfer from/to remote sites directly to/from /global/u/<userid> or to/from /scratch/<userid>

Computing architectures at HPCC

The HPC Center employs a diverse range of architectures to accommodate intricate and demanding workflows. All computational resources of various types are consolidated into a single hybrid cluster known as Arrow. This cluster comprises symmetric multiprocessor (SMP) nodes equipped with and without GPUs, distributed shared memory (NUMA) node(s), high-memory nodes, and advanced SMP nodes featuring multiple GPUs. The number of GPUs per node varies between two and eight, along with the utilized GPU interface and GPU family. Consequently, the fundamental GPU nodes are equipped with two Tesla K20m GPUs connected via the PCIe interface, while the most advanced nodes support eight Ampere A100 GPUs connected via the SXM interface.

Overview of Computational architectures:

SMP servers have several processors (working under a single operating system) which "share everything". Thus all cpu-cores allocate a common memory block via shared bus or data path. SMP servers support all combinations of memory VS cpu (up to the limits of the particular computer). The SMP servers are commonly used to run sequential or thread parallel (e.g. OpenMP) jobs and they may have or may not have GPU.

Cluster is defined as a single system comprising set of servers interconnected with high performance network. Specific software coordinates programs on and/or across those in order to perform computationally intensive tasks. The most common cluster type is the one that consists of several identical SMP servers connected via fast interconnect. Each SMP member of the cluster is called a node. All nodes run independent copies of the same operating system (OS). Some or all of the nodes may incorporate GPU.

Hybrid clusters combine nodes of different architectures. For instance the main CUNY-HPCC machine is a hybrid cluster called Arrow. Sixty two (62) of its nodes are identical GPU enabled SMP servers each with 2 x GPU K20m, 3 are SMP but with extended memory (fat nodes), one node is distributed shared memory node (NUMA, see below) and 2 are fat SMP servers especially designed to support 8 NVIDIA GPU per node. The latter are connected via SXM interface. In addition HPCC operates the cluster Herbert dedicated only to education.

Distributed shared memory computer is tightly coupled server in which the memory is physically distributed, but it is logically unified as a single block. The system resembles SMP, but the number of cpu cores and the amounts of memory possible is far beyond limitations of the SMP. Because the memory is distributed, the access times across address space are non-uniform. Thus, this architecture is called Non Uniform Memory Access (NUMA) architecture. Similarly to SMP, the NUMA systems are typically used for applications such as data mining and decision support system in which processing can be parceled out to a number of processors that collectively work on a common data. HPCC operates the NUMA node at Arrow named Appel. This node does not have GPU.

Infrastructure systems:

o Master Head Node (MHN/Arrow) is a redundant login node from which all jobs on all servers start. This server is not directly accessible from outside CSI campus. Note that name of main server and its login nodes are the same Arrow. Thus users can access the Arrow login nodes using name Arrow or MHN.

o Chizen is a redundant gateway server which provides access to protected HPCC domain.

o Cea is a file transfer node allowing transfer of files between users’ computers to/from /scratch space or to/from /global/u/<usarid>. Cea is accessible directly (not only via Chizen), but allows only limited set of shell commands.

Table 1 below provides a quick summary of the attributes of each of the sub clusters of the main HPC Center called Arow.

Master Head Node Sub System Tier Type Type of Jobs Nodes CPU Cores GPUs Mem/node Mem/core Chip Type GPU Type and Interface
Arrow Penzias Advanced Hybrid Cluster Sequential & Parallel jobs w/wo GPU 66 16 2 64 GB 4 GB SB, EP 2.20 GHz K20m GPU, PCIe v2
Sequential & Parallel jobs 1 24 - 1500 GB 62 GB HL, 2.30 GHz -
36 - 768 GB 21 GB -
24 - 768 GB 32 GB -
Appel NUMA Massive Parallel, sequential, OpenMP 1 384 - 11 TB 28 GB IB, 3 GHz -
Cryo SMP Sequential and Parallel jobs, with GPU 1 40 8 1500 GB 37 GB SL, 2.40 GHz V100 (32GB) GPU, SXM
Blue Moon Hybrid Cluster Sequential and Parallel jobs w/wo GPU 24 32 - 192 GB 6 GB SL, 2.10 GHz -
2 32 2 V100(16GB) GPU, PCIe
Karle SMP Visualization, MATLAB/Mathematica 1 36* - 768 GB 21 GB HL, 2.30 GHz -
Chizen Gateway No jobs allowed -
CFD Condo SMP Parallel, Seq, OpenMP 1 48 2 768 GB EM, 4.8 GHz A40, PCIe, v4
1 48 - 512 GB ER, 4.3 GHz -
PHYS Condo SMP 1 48 2 640 GB ER, 4 GHz L40, PCIe, v4
1 48 - 512 GB ER, 4.3 GHz -
CHEM Condo SMP 1 48 2 256 GB EM, 2.8 GHz A30, PCIe, v4
1 128 8 512 GB ER, 2.0 GHz A100/40, SXM
ASRC Condo SMP 1 48 2 256 GB ER, 2.8 GHz A30, PCIe, v4

Note: SB = Intel(R) Sandy Bridge, HL = Intel (R) Haswell, IB = Intel (R) Ivy Bridge, SL = Intel (R) Xeon(R) Gold, ER = AMD(R) EPYC ROMA, EM = AMD(R) EPYC MILAN, EG = AMD (R) EPYC GENOA

Recovery of operational costs

CUNY-HPCC, a not-for-profit core research facility affiliated with CUNY, is dedicated to supporting a wide range of research endeavors that necessitate advanced computational resources. Notably, CUNY-HPCC’s operations are not directly or indirectly funded by CUNY or the College of Staten Island (CSI). Consequently, CUNY-HPCC employs a cost recovery model that exclusively recoups operational expenses, without generating any profit for the HPCC. The recovered costs are meticulously calculated using comprehensive documentation of actual operational expenditures and are designed to achieve a break-even point for all CUNY users. This methodology is approved by CUNY-RF and is employed in other CUNY research facilities. The cost recovery charging schema is based on unit-hour usage, encompassing both CPU and GPU units. Definitions for these units are provided in the accompanying table.

Definitions of unit-hour
Type of resource Unit-hour For V100, A30, A40 or L40 For A100
CPU unit 1 cpu core/hour -- --
GPU unit (4 cpu cores + 1 GPU thread )/hour 4 cpu cores + 1 GPU 4 cpu cores and 1/7 A100

Compute on public resources

The cost recovery model for public (non-condominium) servers offers the following options:

  1. Minimal Access Plan (MAP)
  2. Compute on Demand (CODP)
  3. Lease a node(s)

Minimal Access Plan (MAP)

Colleges may participate in any of the Minimal Access Plan tiers. The tiered pricing structure is as follows:

- Tier A: $5,000 per year

- Tier B: $10,000 per year

- Tier C: $25,000 per year

Within each tier, the cost is $0.015 per CPU hour and $0.09 per GPU hour. It is important to note that the Minimum Access Plan (MAP) tiers A, B, and C are not all-inclusive. Therefore, even if a college pays for a higher tier, it does not guarantee unlimited use of all college employees, faculty, and students throughout the year.

  1. The MAP plan is indirectly linked to the number of hours used. Consequently, the definition of “up to 12 users for B tier” should not be interpreted as “all users up to 12 per college receive unlimited access to HPCC resources.”
  2. The number of users per tier is determined by statistical analysis of resource usage and statistics for the average duration of a job across all CUNY institutions. This means that if the number of users from a college exceeds the number of hours encoded in the MAP fee, the additional hours will be charged at the preferred rate of $0.015 per CPU hour and $0.09 per GPU thread hour.
  3. Furthermore, the MAP fee may fully cover the expenses for individuals from a given college for a year, but it may also not. This depends on the actual usage and type of resources, as well as the number of additional individuals from the same college who appear during the year. It is crucial to understand that allocation on HPCC is not restricted; our focus is solely on the completion of tasks.
  4. The free 11,540 CPU hours and 1,440 GPU hours are allocated per PI account and project and are available only for MAP-B and MAP-C (see below). These free hours are intended to facilitate project establishment. Consequently, the PI can request that a member or members of a group utilize time and explore new research opportunities. For instance, upon creating his account the PI X will receive free hours. If he/she/whatever hires a graduate student(s), they may share these free hours if they work on the same project.
  5. It is important to note that each GPU requires at least four CPU cores to operate. Therefore, if a user requests one GPU thread, it equates to one GPU thread and four CPU threads, which is equivalent to $0.15 per unit hour for that unit (units are explained above). Note that not all GPU support virtualization, so unit may include the whole GPU depend on used GPU type.

Compute on demand (CODP)

Users from colleges that do not participate in MAP (A,B,C) are charged $0.018 per CPU hour and $0.11 per GPU hour. There is no free time associated with CODP.

Lease a public node(s)

Users may lease a node(s) for project. That ensures them 100% access and no time or job limitations over leased resource. The minimum lease time is 30 days (one month). Longe leases (more than 90 days) have 10% discount. MAP users are charged between $172 and $950 per month (see below), depending on the type of node. Non-MAP users are charged between $249.58 and $1,399 per month, depending on the type of node. Please see below for details and examples.

Condo

Condo users only pay for infrastructure support. The annual fee depends on the type of node and ranges from $1,540 to $4,520 per year.

Lease on condo node(s)

Users may lease condo nodes for a project, ensuring them 100% access and eliminating any time or job limitations on the leased resource. The minimum lease duration is 30 days (one month). Longer leases (more than 90 days) receive a 10% discount. The lease of a condo node is contingent upon the owner of the required node agreeing to lease for a specific time period and duration. Users interested in leasing a condo node of a particular type must contact the HPCC director for options. Lease fees vary depending on the type of node and are currently (note that prices are reviewed once every six months) between $230 and $1100 per month.  

Storage

Storage costs are $60 per TB per year, backup costs are $45 per TB per year, and archive costs are $35 per TB per year. The first 50 GB of scratch storage are free. Prices are calculated at the end of each month.

  1. Additionally, note that file transfers from/to HPCC are free, but it is important to consider the CUNY network speed, which is significantly slower than modern standards. This will impact the time required to download large data sets. For large data HPCC utilizes and recommend to use secure parallel download via Globus.
  2. All services associated with data and storage provided by HPCC are free for CUNY users.

HPCC access plans details and examples

a.     Minimum access (MAP):

Minimum Access is designed to provide extensive support for research activities across various colleges, foster collaboration between institutions, facilitate the establishment of new research projects, and serve as a testing ground for innovative studies. MAP accounts operate under a stringent fair share policy, which determines the actual waiting time for job allocation in a queue based on the resources utilized by that account in previous cycles. Furthermore, all jobs are subject to strict time constraints. Consequently, extended jobs necessitate the implementation of checkpoints.

The MAP offers three tiers of access:

· A: The Basic tier incurs a yearly fee of $5,000. It is tailored to support users from colleges with limited research activities. The fee covers infrastructure expenses associated with one to two users from these colleges.

· B: The Medium tier incurs a yearly fee of $15,000. It covers infrastructure expenses for up to twelve users from these colleges. Additionally, every account under the Medium tier receives complimentary 11,520 CPU hours and 1,440 GPU hours upon account creation.

· C: The Advanced tier incurs a yearly fee of $25,000. It covers infrastructure expenses for all users from these colleges. Furthermore, every new account from this tier receives complimentary 11,520 CPU hours and 1,440 GPU hours upon account creation.  Minimum access is designed to provide wide support for research activities in any college, to promote collaboration between colleges, to help establish a new research project, and/or to be testbed for new studies. MAP accounts operate under strict fair share policy so actual waiting time for a job in a que depends on resources used by that account in previous cycles. In addition all jobs have strict time limitations. Therefore long jobs must use check-points.

The MAP users get charged per CPU/GPU hour at low rate of $0.015 per cpu hour and $0.09 per GPU hour.

Cost recovery fees for MAP users example
Job Cpu cores GPU Cost/hour
1 core no GPU 1 0 $0.015/hour
16 cores no GPU 4 0 $0.24/hour
 4 cores + 1 GPU 4 1 $0.15/hour
16 cores + 1 GPU 16 1   $0.33/hour
16 cores + 2 GPU 16 2   $0.42/hour
32 cores + 2 GPU 32 2   $0.66/hour
40 cores + 8 GPU 40 8 $1.32/hour


b.     Computing on demand (CODP)

Computing on demand plan (CODP) is open for all users from all CUNY colleges that do not participate in MAP plan, but want to use the HPCC resources. CODP accounts operate under strict fair share policy, so actual waiting time for a job in a que depends on resources previously used. In addition, all jobs have time limitations, so long jobs must use check-points. The users in CODP are charged for the time (CPU and GPU) per hour. The current rates are $0.018 per cpu hour and $0.11 per GPU hour.  In difference to MAP, the new CODP accounts does not come with free time. The invoices are generated and send to users (PI only) at the end of each month.  The examples in following table explain the fees structure:

Cost recovery fees for CODP plan
Job Cpu cores GPU Cost/hour
1 core no GPU 1 0 $0.018/hour
16 cores no GPU 16 0 $0.288/hour
4 cores + 1 GPU 4 1 $0.293/hour
16 cores + 1 GPU 16 1   $0.334/hour
32 cores + 1 GPU 32 1   $0.666/hour
32 cores + 2 GPU 32 2 $0.756/hour


c. Leasing public node(s) (LNP)

Leasing node plan allows the users to lease the node(s) for the duration of the project. The minimum lease time is 30 days (one month), but leases of any length are possible. Discounts of 10% are given to users whose lease is longer than 90 days. Discounts cannot be combined.  In difference to MAP and CODP the LNP users do not compete for resources and have full access to rented resources 24/7.  

Fees for lease node(s) for MAP users
Job (MAP users) Cpu cores GPU Cost/30 days
1 core no GPU 1 0 NA
16 cores no GPU 16 0 $172.80
32 cores no GPU 32 0 $264.96
16 cores + 2 GPU 16 1   $302.40
32 cores + 2 GPU 32 2 $475.20
40 cores + 8 GPU 40 8 $760.0
64 cores + 8 GPU 64 8 $950.40
Fees for lease a node(s) for non -MAP users
Job (non-MAP users) Cpu cores GPU Cost/month
1 core no GPU 1 0 NA
16 cores no GPU 16 0 $249.82
32 cores no GPU 32 0 $497.64
16 cores + 1 GPU 16 2 $443.23
32 cores + 2 GPU 32 2 $886.64
40 cores + 8 GPU 40 8 $1399.68


d.     Condo Ownership (COP)

Condo describes a model when user(s) own a node/server managed by HPCC. Only full time faculty can own condo node. Condo nodes are fully integrated into HPCC infrastructure. The owners pay only HPCC’s infrastructure support operational fee which includes only proportional part of licenses and materials need for day-to-day operations. The fees are reviewed twice a year and currently are $0.003 per CPU hour and $0.02 per GPU hour. Condo owners can “borrow”  (upon agreement) free of charge any node(s) from condo stack and can also lease (for higher fee – see below) their own nodes to non-condo users. The minimum let time is 30 days. The fees collected from non-condo users offset payments of the owner.

Condo owners costs per year
Type of condo node Cpu cores GPU Cost/year
Large hybrid SXM 128 8 $4518.92
Small hybrid 48 2 $1540.54
Medium compute 96 0 $2464.86
Large compute 128 0 $3286.49


Condo owners can contract their node(s) to other non-condo users. Renting period is unlimited with min. length of 30 days. The table below shows the payments the non-condo users recompense the condo owners. These fees are accumulated in owners account(s) and do offset the owner’s duties. Discount of 10% is applied for leases longer than 90 days.

Type of nodes and monthly lease fees for condo nodes
Type of node Renters cost/month Long term (90+ days) rent cost/month CPU/node CPU type GPU/node GPU type GPU interface
Laghe Hybrid $602.52 $564.86 128 EPYC, 2.2 GHz 8 A100/80 SXM
Small Hybrid $205.41 $192.57 48 EPYC, 2.8 GHz 2 A40, A30, L40 PCIe v4
Medium Non GPU $328.65 $308.11 96 EPYC, 4.11GHz 48 None NA
Lagre Non GPU $438.20 $410.81 128 EPYC, 2.0 GHz 128 None NA

Free time

Any new project from CUNY colleges/centers that participate in MAP-B or MAP-C is entitled to get free 11520 CPU hours and 1440 GPU hours. Users under MAP-A are not entitled for free time. The free compute hours are intended to help to establish a project and thus are shared for all members of the project. Thus compute free hours can be used either by PI or by any number of project's members. It is important to note that free time is per project not per user account, so any project can have free time only once. External collaborators to CUNY are not normally eligible for free time. Additional hours beyond free time are charged with MAP plan rates. Please contact CUNY-HPCC director for further details.

Support for research grants

All proposals dated on Jan 1st 2026 ( 01/01/26 ) and later that require computational resources must include budget for cost recovery fees at CUNY-HPCC. For a project the PI can choose between:

  • lease the node(s), That is useful option for well defined projects and those with high computational component requiring 100% availability of the computational resource.
  • use "on-demand" resources. That is flexible option good for experimental projects or exploring new areas of study. The downgrade is that resources are shared among all users under fair share policy. Thus immediate access to resource cannot be guaranteed.
  • participate in CONDO tier. That is most beneficial option in terms of availability of resources and level of support. It fits best the focused research of group(s) (e.g. materials science).

In all cases the PI can use the appropriate rates listed above to establish correct budget for the proposal. PI should contact the Director of CUNY-HPCC Dr. Alexander Tzanov (alexander.tzanov@csi.cuny.edu) and discuss the project's computational requirements including optimal and most economical computational workflows, suitable hardware, shared or own resources, CUNY-HPCC support options and any other matter concerning correct and optimal computational budget for the proposal.    

Partitions and jobs

The only way to submit job(s) to HPCC servers is through SLURM batch system. Any job despite of its type (interactive, batch, serial, parallel etc.) must be submitted via SLURM. The latter allocates the requested resources on proper server and starts the job(s) according to predefined strict fair share policy. Computational resources (cpu-cores, memory, GPU) are organized in partitions. The table below describes the partitions and their limitations. The users are granted permissions house one or other partition and corresponding QOS key. The table below shows the limitations of the partitions (in progress).

Partition Max cores/job Max jobs/user Total cores/group Time limits Tier GPU types
partnsf 128 50 256 240 Hours Advanced K20m, V100/16, A100/40
partchem 128 50 256 No limit Condo A100/80, A30
partcfd 96 50 96 No limit Condo A40
partsym 96 50 96 No limit Condo A30
partasrc 48 16 16 No limit Condo A30
partmatlabD 128 50 256 240 Hours Advanced V100/16,A100/40
partmatlabN 384 50 384 240 Hours Advanced None
partphys 96 50 96 No limit Condo L40
  • partnsf is the main partition with assigned resources across all sub-servers. Users may submit sequential, thread parallel or distributed parallel jobs with or without GPU.
  • partchem is CONDO partition.
  • partphys is CONDO partition
  • partsym is CONDO partition
  • partasrc is CONDO partition
  • partmatlabD partition allows to run MATLAB's Distributes Parallel Server across main cluster.
  • partmatlabN partition to access large matlab node with 384 cores and 11 TB of shared memory. It is useful to run parallel Matlab jobs with Parallel ToolBox
  • partdev is dedicated to development. All HPCC users have access to this partition with assigned resources of one computational node with 16 cores, 64 GB of memory and 2 GPU (K20m). This partition has time limit of 4 hours.

Hours of Operation

In order to maximize the use of resources HPCC applies “rolling” maintenance scheme across all systems. When downtime is needed, HPCC will notify all users a week or more in advance (unless emergency situation occur). Typically, the fourth Tuesday mornings in the month from 8:00AM to 12PM is normally reserved (but not always used) for scheduled maintenance. Please plan accordingly. Unplanned maintenance to remedy system related problems may be scheduled as needed out of above mentioned days. Reasonable attempts will be made to inform users running on those systems when these needs arise. Note that users are strongly encouraged to use checkpoints in their jobs.

User Support

Users are strongly encouraged to read this Wiki carefully before submitting ticket(s) for help. In particular, the sections on compiling and running parallel programs, and the section on the SLURM batch queueing system will give you the essential knowledge needed to use the CUNY HPCC systems. We have strived to maintain the most uniform user applications environment possible across the Center's systems to ease the transfer of applications and run scripts among them.

The CUNY HPC Center staff, along with outside vendors, offer regular courses and workshops to the CUNY community in parallel programming techniques, HPC computing architecture, and the essentials of using our systems. Please follow our mailings on the subject and feel free to inquire about such courses. We regularly schedule training visits and classes at the various CUNY campuses. Please let us know if such a training visit is of interest. In the past, topics have include an overview of parallel programming, GPU programming and architecture, using the evolutionary biology software at the HPC Center, the SLURM queueing system at the CUNY HPC Center, Mixed GPU-MPI and OpenMP programming, etc. Staff has also presented guest lectures at formal classes throughout the CUNY campuses.

If you have problems accessing your account and cannot login to the ticketing service, please send an email to:

 hpchelp@csi.cuny.edu 

Warnings and modes of operation

1. hpchelp@csi.cuny.edu is for questions and accounts help communication only and does not accept tickets unless ticketing system is not operational. For tickets please use the ticketing system mentioned above. This ensures that the person on staff with the most appropriate skill set and job related responsibility will respond to your questions. During the business week you should expect a 48h response, quite often even same day response. During the weekend you may not get any response.

2. E-mails to hpchelp@csi.cuny.edu must have a valid CUNY e-mail as reply address. Messages originated from public mailers (google, hotmail, etc) are filtered out.

3. Do not send questions to individual CUNY HPC Center staff members directly. These will be returned to the sender with a polite request to submit a ticket or email the Helpline. This applies to replies to initial questions as well.

The CUNY HPC Center staff members are focused on providing high quality support to its user community, but compared to other HPC Centers of similar size our staff is extremely lean. Please make full use of the tools that we have provided (especially the Wiki), and feel free to offer suggestions for improved service. We hope and expect your experience in using our systems will be predictably good and productive.

User Manual

The old version of the user manual provides PBS not SLURM batch scripts as examples. Currently CUNY-HPCC uses SLURM scheduler so users must check and use only the updated brief SLURM manual distributed with new accounts or ask CUNY-HPCC for a copy of the latter.