Main Page: Difference between revisions

From HPCC Wiki
Jump to navigation Jump to search
 
(57 intermediate revisions by the same user not shown)
Line 1: Line 1:
[[File:CUNY-HPCC-HEADER-LOGO.jpg]]
[[File:CUNY-HPCC-HEADER-LOGO.jpg|center|frameless|789x789px]]
[[File:Hpcc-panorama3.png|center|frameless|1000x1000px]]
__TOC__


__TOC__
The City University of New York (CUNY) High Performance Computing Center (HPCC) is located on the campus of the College of Staten Island, 2800 Victory Boulevard, Staten Island, New York 10314.  The CUNY-HPCC supports computational research and computational intensive courses on graduate and undergraduate level offered at all CUNY colleges in fields such as Computer Science, Engineering, Bioinformatics, Chemistry, Physics, Materials Science, Genetics, Genomics, Proteomics, Computational Biology, Finance, Economics, Linguistics, Anthropology, Psychology, Neuroscience, Computational Fluid Mechanics  and many others.  CUNY-HPCC  provides educational outreach to local schools and supports undergraduates who work in the research programs of the host institution (e.g. REU program from NSF). The primary mission of CUNY-HPCC is:  
 
[[Image:hpcc-panorama3.png]]
 
The City University of New York (CUNY) High Performance Computing Center (HPCC) is located on the campus of the College of Staten Island, 2800 Victory Boulevard, Staten Island, New York 10314.  The CUNY-HPCC supports computational research and computational intensive courses on graduate and undergraduate level offered at all CUNY colleges in fields such as Computer Science, Engineering, Bioinformatics, Chemistry, Physics,Materials Science, Genetics, Computational Biology, Finance and others.  HPCC  provides educational outreach to local schools and supports undergraduates who work in the research programs of the host institution (e.g. REU program from NSF). The primary mission of HPCC is:  


* To enable advanced research and scholarship at CUNY colleges by providing faculty, staff, and students with access to high-performance computing, adequate storage resources and visualization resources;
* To enable advanced research and scholarship at CUNY colleges by providing faculty, staff, and students with access to high-performance computing, adequate storage resources and visualization resources;
* To enable advanced education and cross disciplinary  education by providing flexible and scalable resources;
* To enable advanced education and cross disciplinary  education by providing flexible and scalable resources;
* To provide CUNY faculty and their collaborators at other universities, CUNY research staff and CUNY graduate and undergraduate students with expertise in scientific computing, parallel scientific computing (HPC), software development, advanced data analytics, data driven science and simulation science, visualization, advanced database engineering, and others.
* To provide CUNY faculty and their collaborators at other universities, CUNY research staff and CUNY graduate and undergraduate students with expertise in scientific computing, parallel scientific computing (HPC), software development, advanced data analytics, data driven science and simulation science, visualization, advanced database engineering, and others;
* Leverage the HPC Center capabilities to acquire additional research resources for CUNY faculty, researchers and students in existing and major new programs.
* Leverage the HPC Center capabilities to acquire additional research resources for CUNY faculty, researchers and students in existing and major new programs;
* Create opportunities for the CUNY research community to win grants from national funding institutions and to develop new partnerships with the government and private sectors.
* Create opportunities for the CUNY research community to win grants from national funding institutions and to develop new partnerships with the government and private sectors.
CUNY-HPPC is voting member of '''Coalition for Academic Scientific Computation (CASC)'''. Originally formed in the 1990s as a small group of the heads of national supercomputing centers, CASC expanded to more than 100 member institutions representing many of the nation’s most forward-thinking universities and computing centers. CASC includes the leadership of large academic computing centers such as TACC or San Diego SC and recently attracts a greater diversity of smaller institutions such as non-R1s, HBCUs, HSIs, TCUs, etc. CASC’s mission as to be  “''dedicated to advocating for the use of the most advanced computing technology to accelerate scientific discovery for national competitiveness, global security, and economic success, as well as develop a diverse and well-prepared 21st century workforce.”''
CUNY-HPPC is voting member of '''Coalition for Academic Scientific Computation (CASC)'''. Originally formed in the 1990s as a small group of the heads of national supercomputing centers, CASC expanded to more than 100 member institutions representing many of the nation’s most forward-thinking universities and computing centers. CASC includes the leadership of large academic computing centers such as TACC or San Diego SC and recently attracts a greater diversity of smaller institutions such as non-R1s, HBCUs, HSIs, TCUs, etc. CASC’s mission as to be  “''dedicated to advocating for the use of the most advanced computing technology to accelerate scientific discovery for national competitiveness, global security, and economic success, as well as develop a diverse and well-prepared 21st century workforce.”''


== CUNY-HPCC - Democratization of Research ==
== CUNY-HPCC - Democratization of Research ==
In last few years the model of cloud computing (called also computing-on-demand) made the promise that anyone, no matter where the user is, could leverage almost unlimited computing resources. This computing supposed to “democratize” research and level the playing field, as it were. Unfortunately that is not entirely true (for now) because the cloud computing even available to nearly anyone, from nearly anywhere remains comparatively expensive to local resources and lacks the flexibility and accessibility of local support tailored to education and research offered by the local research HPCC. Indeed, every computational environment has limitations and a learning curve such that students and faculty coming from variety of  backgrounds and having might feel crushed and helpless without  close and personalized local support. in this sense the carefully designed, user centered, academically focused  CUNY-HPCC has the transformative capability for rapidly evolving computation and data-driven research, and creates opportunities for vast collaboration and convergence research activities and thus provides the real democratization of the research.   
In last few years the model of cloud computing (called also computing-on-demand) made the promise that anyone, no matter where the user is, could leverage almost unlimited computing resources. This computing supposed to “democratize” research and level the playing field, as it were. Unfortunately that is not entirely true (for now) because the cloud computing even available to nearly anyone, from nearly anywhere remains comparatively expensive to local resources and lacks the flexibility and accessibility of local support tailored to education and research offered by the local research HPCC. Indeed, every computational environment has limitations and a learning curve such that students and faculty coming from variety of  backgrounds and having might feel crushed and helpless without  close and personalized local support. in this sense the carefully designed, user centered, academically focused  HPC has the transformative capability for rapidly evolving computation and data-driven research, and creates opportunities for vast collaboration and convergence research activities and thus provides the real democratization of the research.   


== Pedagogical value of CUNY- HPCC ==
== Pedagogical Value of CUNY- HPCC ==
CUNY-HPPC supports whole variety of  classes on graduate and undergraduate level from all CUNY-colleges, Graduate Center and institutes. It is important to mention that CUNY-HPCC impact goes beyond the STEM disciplines. Thus the CUNY-HPCC:
CUNY-HPPC supports whole variety of  classes on graduate and undergraduate level from all CUNY-colleges, Graduate Center and institutes. It is important to mention that CUNY-HPCC impact goes beyond the STEM disciplines. Thus the CUNY-HPCC:
[[File:NAnoBio6.jpg|right|frameless|Dr Alexander Tzanov, the director of CUNY-HPCC speaks on NanoBioNYC workshop]]
[[File:NAnoBio6.jpg|right|frameless|Dr Alexander Tzanov, the director of CUNY-HPCC speaks on NanoBioNYC workshop]]
* '''Allows to conduct analysis of datasets that are too large to work with easily on personal devices, or that cannot easily be shared or disseminated.''' These data sets are not coming only from STEM fields (i.e. finance, economics, linguistics etc.). Facilitating these analyses provides the students with opportunity to interact in real time with  increasingly large amounts of data, enabling them to gather important skills and experiences.
* '''Allows to conduct analysis of datasets that are too large to work with easily on personal devices, or that cannot easily be shared or disseminated.''' These data sets are not coming only from STEM fields (i.e. finance, economics, linguistics etc.). Facilitating these analyses provides the students with opportunity to interact in real time with  increasingly large amounts of data, enabling them to gather important skills and experiences.
* '''CUNY-HPCC provides collaborative space for entire courses. T'''he multi-user capabilities and environment of HPCC facilitates collaborative work among learners and promotes more complex closer to reality learning problems.
* '''CUNY-HPCC provides collaborative space for entire courses. T'''he multi-user capabilities and environment of HPCC facilitates collaborative work among learners and promotes more complex closer to reality learning problems.
* Large computational and visualization capabilities of '''HPCC is enabler for applying analytical techniques too large for personal devices.''' Students can  run unattended parameter sweeps or workflows in order to explore the problem in detail. That self exploration has proven positive effect on learning
* The large computational and visualization capabilities of CUNY-HPCC '''are enabler for applying analytical techniques too large for personal devices.''' Students can  run unattended parameter sweeps or workflows in order to explore the problem in detail. That self exploration has proven positive effect on learning.
* '''Use of CUNY-HPCC resources provides students with needed pre request skills and knowledge''' they may need later when explore larger HPC environments. For instance the CUNY-HPCC workflow and environment is extremely close to the environment of other research centers of ACCESS resources.
* '''Use of CUNY-HPCC resources provides students with needed pre request skills and knowledge''' they may need later when explore larger HPC environments. For instance the CUNY-HPCC workflow and environment is extremely close to the environment of other research centers of ACCESS resources.
* '''CUNY-HPCC participates in educational programs such as NSF funded NanoBioNYC Ph.D. traineeship program at  CUNY.''' This program is focused on developing groundbreaking bio-nanoscience solutions to address urgent human and planetary health issues and preparing students to become tomorrow’s leaders in diverse STEM careers.
* '''CUNY-HPCC participates in educational programs such as NSF funded NanoBioNYC Ph.D. traineeship program at  CUNY.''' This program is focused on developing groundbreaking bio-nano science solutions to address urgent human and planetary health issues and preparing students to become tomorrow’s leaders in diverse STEM careers.
 
== Research Value of CUNY- HPCC ==
High performance computing is a backbone of any modern simulation research. About 80% of the
 
==Available Computational Architectures and Storage Systems==
 
The HPC Center operates variety of architectures in order to support complex and demanding workflows.  The deployed systems include:  distributed memory (also referred to as “cluster”) computers, symmetric multiprocessor (also referred as SMP) and shared memory (also reffred as NUMA machines). 
 
=== Computational Systems ===
'''SMP''' servers have several processors (working under a single operating system) which "share everything".  Thus  all cpu-cores allocate a common memory block via shared bus or data path. SMP servers support all combinations of memory VS cpu (up to the limits of the particular computer). The SMP servers are commonly used to run sequential or thread parallel (e.g. OpenMP) jobs and they may have or may not have GPU. Currently, HPCC operates several detached SMP servers named '''Math, Cryo ''' and '''Karle'''. Karle is a server which does not have GPU and is used for visualizations, visual analytics  and interactive MATLAB/Mathematica jobs. '''Math''' is a condominium server without GPU as well. Cryo (CPU+GPU server) is  specialized server with  eight (8) NVIDIA V100 (32G) GPU designed to support large scale multi-core multi-GPU jobs.
 
'''Cluster''' is defined as a single system comprizing a  set of SMP servers interconnected with high performance network. Specific software coordinates  programs on and/or across those in order to  perform computationally intensive tasks. Each SMP member of the cluster is called a '''node'''. All nodes run independent copies of the same operating system (OS). Some or all of the nodes may incorporate GPU.  The main cluster at HPCC is a hybrid (CPU+GPU) cluster called '''Arrow.''' Sixty two (62) of its nodes (P-nodes) have 2 x GPU K20m, one node has 8 GPU, Volta series V100 with 32 GB per board, and 2 nodes have 2 GPU Volta V100 with 16GB per card. Remaining 24 (BM-nodes) does not have GPU. 
 
'''Distributed shared memory''' computer is tightly coupled server in which the memory is physically distributed, but it is logically unified as a single block. The system resembles SMP, but the number of cpu cores and the amounts of memory possible is far beyond limitations of the SMP.  Because the memory is distributed, the access times across address space are non-uniform. Thus, this architecture is called Non Uniform Memory Access (NUMA) architecture.  Similarly to SMP, the '''NUMA''' systems are typically used for applications such as data mining and decision support system in which processing can be parceled out to a number of processors that collectively work on a common data. CUNY-HPCC operates the '''NUMA''' server called '''Appel'''.  This server has only K20m GPU.
 
=== Storage Systems ===
The '''/scratch''' file system mentioned above is small fast file system mounted on all nodes. This file system resides on solid state drives and has capacity of '''256GB'''. Note that files on '''/scratch''' are '''not backup-ed and are not protected.''' This file system does not have quota so users can submit large jobs. The file system is automatically purged ifd: '''1.''' the load of the file system exceeds 70% or '''2.''' file(s) are not accessed for <u>60 days whatever comes first.</u> The partition '''/global/u''' in main HPFS file system holds user home directories.  The HPFS is the hybrid file system and combines SSD and HDD (solid state and hard disks) with capabilities for dynamic relocation of files. The capacity is 2 Peta Bytes (PB). This file system, was purchased under NSF grant OAC-2215760. That file system is mounted on all  nodes. 
 
=== Support Systems ===
These systems are responsible to provide access to HPCC resources, submitting jobs  and file transfers.
 
o Master Head Node ('''MHN''') is a redundant login node from which all jobs on all servers start. This server is not directly accessible from outside CSI campus.
 
o '''Chizen''' is a redundant gateway server which provides access to protected HPCC domain.


==Research Computing Infrastructure==
o '''Cea''' is a file transfer node allowing transfer of files between users’ computers to/from  /scratch space or to/from /global/u/<usarid>. '''Cea''' is accessible directly (not only via '''Chizen'''), but allows only limited set of shell commands. 
 
==Overview of  HPCC's Research Computing Infrastructure (RCI)==
[[File:HPCC_structure_12_24.png|right|frameless|682x682px|Organization of HPCC resources]]
[[File:HPCC_structure_12_24.png|right|frameless|682x682px|Organization of HPCC resources]]
The research computing infrastructure is depicted in the figure on right. In order to support various types of research projects the CUNY-HPCC support variety of computational architectures.  All computational resources are organized in '''3''' tiers - '''''Condominium Tier (CT), Free Tier''''' (FT) and '''''Advanced Tier (AT) plus visualization (Viz).'''''  All nodes in all tiers are attached to central file system '''HPFS''' which provides  '''/scratch''' and  Global Storage ('''GS''') - '''/global/u/.'''  The below table shows tiers and their use. Note that * denotes hyper-thread and '''**''' denotes outdated GPU not suitable for large scale research, but useful for education.   
The above resources are organized in research computing infrastructure depicted in the figure on right. In order to support various types of research projects the CUNY-HPCC supports a variety of computational architectures.  All computational resources are organized in '''3''' tiers - '''''Condominium Tier (CT), Free Tier''''' (FT) and '''''Advanced Tier (AT) plus visualization (Viz).'''''  (see next section for details). All nodes in all tiers are attached to central file system '''HPFS''' which provides  '''/scratch''' and  Global Storage ('''GS''') - '''/global/u/.'''  The below table shows tiers and their use. Note that * denotes hyper-thread and '''**''' denotes outdated GPU not suitable for large scale research, but useful for education.   
{| class="wikitable mw-collapsible"
{| class="wikitable mw-collapsible"
|+Tiers and their use
|+Tiers and their use
Line 59: Line 83:
|}
|}


=== Storage systems ===
=== Organization of Computational and Visualization Resources ===
The '''/scratch''' file system mentioned above is small fast file system mounted on all nodes. The file system resides on solid state drives so it is fast. This file system has capacity of '''256GB'''. Note that files on '''/scratch''' are '''not backup-ed and are not protected.''' This file system does not have quota so users can submit large jobs. The file system is automatically purged ifd: '''1.''' the load of the file system exceeds 70% or '''2.''' file(s) are not accessed for <u>60 days whatever comes first.</u> The partition '''/global/u''' in main HPFS file system holds user home directories.  The HPFS is the hybrid file system and combines SSD and HDD (solid state and hard disks) with capabilities for dynamic relocation of files. The capacity is 2 Peta Bytes (PB). This file system, was purchased under NSF grant OAC-2215760. That file system is mounted on all  nodes. 
The  computational resources in 3 tiers mentioned above are combined within ARROW hybrid cluster. In addition CUNY-HPCC operates specialized visualization server which shares the file system with all nodes. That allows to conduct i<u>n-situ  visualizations</u> of simulations.  The description in nodes is given in a table below. Note that '''black''' denotes '''basic''' tier, '''blue''' denotes '''condo''' tier and '''orange''' denotes '''advanced''' tier. The '''yellow''' marks the '''visualization''' tier.   
 
=== Computational and Visualization Resources ===
The  computrational resources in 3 tiers mentioned above are combined within ARROW hybrid cluster. In addition the HPCC operates specialized visualization server which shares the file system with all nodes. That allow to conduct i<u>n-situ  visualizations</u> of simulations.  The description in nodes is given in a table below. Note that '''black''' denotes '''basic''' tier, '''blue''' denotes '''condo''' tier and '''orange''' denotes '''advanced''' tier. The '''yellow''' marks the '''visualization''' tier.   
[[File:Arrow Viz Resources.png|frameless|Computational And Visualization Resources|911x911px|left]]
===Submit jobs on HPCC servers ===
Upon  registering with HPCC every user will get 2 directories:
 
:• '''<font face="courier">/scratch/<font color="red"><userid></font></font>''' – this is temporary workspace on the HPC systems
:• '''<font face="courier">/global/u/<font color="red"><userid></font></font>''' – space for “home directory”, i.e., storage space on the HPFS for program, scripts, and data
:• In some instances a user can also have use of disk space on the iRODS in '''<font face="courier">/cunyZone/home/<font color="red"><projectid></font></font>''' (IRods).
The '''/global/u/<userid>''' directory has quota (see below for details). 
 
The jobs must be submitted  to run from master head node (MHN) despite of the tier. The users do not need to address particular resource/node directly since the jobs are automatically placed in proper tier and proper node based on job submission policy and available resources.  All jobs are subject of strict fair share policy which allows all users to get equal share of resources. In brief all jobs at HPCC must: 
 
* '''>>''' Start from user's directory on '''scratch''' file system '''- /scratch/<userid>''' . Jobs cannot be started from users' home directories - '''/global/u/<userid>'''
* '''>>''' Use SLURM job submission system (job scheduler)'''.'''  All jobs submission scripts written for other job scheduler(s) (i.e. PBS pro) <u>must be converted to SLURM syntax.</u>
* '''>>''' All jobs in all tiers <u>must start from Master Hear Node (MHN)</u>.  In near future the process of submission of jobs will be improved further with launch of HPC job submission portal.


All useful users' data must be kept in user's home directory '''/global/u/<userid>.'''. This file system is mounted only on login node. In contrast '''/scratch''' is mounted on all nodes and thus all jobs can be submitted only from '''/scratch'''. It is important to remember that '''/scratch''' is not main storage for users' accounts (home directories), but a <u>temporary storage used for job submission only.</u> Thus:
[[File:Arrow_Viz_Resources.png|frameless|911x911px]]
#data in '''/scratch''' are not protected, preserved or backup-ed and can be lost at any time. CUNY-HPCC has no obligation to preserve user data in '''/scratch.'''
#'''/scratch''' undergoes regular and automatic file purging when either or both conditions are satisfied:
##load of the '''/scratch''' file system reaches '''70+%.'''
##there is/are inactive file(s) older than 60 days.  


==== Condominium  Tier ====
==== Condominium  Tier ====
Condominium tier (called '''condo''') organizes resources purchased and owned by faculty, but '''fully maintained and supported by HPCC'''. The list of available resources is given in above table marked in '''blue'''. All condo nodes have advanced GPU with memory per GPU board from 24 to 40GB. The participation in this tier is '''strictly voluntary'''. Several faculty/research groups can combine finds to purchase and consequently share the hardware (a node or several nodes). All nodes in this tier must meet certain hardware specifications including to be fully warranted for time life of the node(s) in order to be accepted. If you want to participate in condominium please sent a request mail to hpchelp@csi.cuny.edu and consult HPCC before making a purchase.  Condominium tier:  
Condominium tier (called '''condo''') organizes resources purchased and owned by faculty, but '''fully maintained and supported by CUNY-HPCC'''. The list of available resources is given in above table marked in '''blue'''. All condo nodes have advanced GPU, large shared memory and advanced GPU with memory per GPU board from 24 to 40GB. The participation in this tier is '''strictly voluntary'''. Several faculty/research groups can combine finds to purchase and consequently share the hardware (a node or several nodes). All nodes in this tier must meet certain hardware specifications including to be fully warranted for time life of the node(s) in order to be accepted. If you want to participate in condominium please sent a request mail to hpchelp@csi.cuny.edu and consult CUNY-HPCC before making a purchase.  Condominium tier:  


* Promotes vertical and horizontal collaboration between research groups;
* Promotes vertical and horizontal collaboration between research groups;
* Makes possible to utilize small amounts of research money or "left-over" money wisely and to obtain advanced resources;
* Makes possible to utilize small amounts of research money or "left-over" money wisely and to obtain advanced resources;
*Helps researchers to conduct large scope high quality research including collaborative projects leading to successful grants with high impact.
*Helps researchers to conduct large scope high quality research including collaborative projects leading to successful grants with high impact.
The owners (and their groups) of condo resources have guaranteed access to their servers. All users registered with condo tier must use the main login node of NHN/Arrow. To access their own node the condo users must  specify their own private partition. In addition there are partitions which operate over two or more nodes owned by condo members.  Condo users may us  (private partition) to access their node. he condo tier members benefit from professional support from HPCC security and maintenance. Upon approval (from the node owner)  any idle node(s) can be used by any other member(s). For instance a member can borrow (for agreed time) node with more advanced GPUs than those installed on his/her own node(s). The owners of the equipment are responsible for any repair costs for their node(s). Other users may rent any of the described below condo resources if agreed with owners. The unused cycles can be shared with other members of the community.
The owners (and their groups) of condo resources have guaranteed access to their servers. All users registered with condo tier must use the main login node of NHN/Arrow. To access their own node the condo users must  specify their own private partition and use specific QOS qualifier. In addition there are partitions which operate over two or more nodes owned by condo members.  The condo tier members benefit from professional support from HPCC staff, in addition to professional maintenance and security hardening of the servers. Upon approval (from the node owner)  any idle node(s) can be used by any other condo member(s). For instance a condo member can borrow (for agreed time) node with more advanced GPUs than those installed on his/her own node(s). The owners of the equipment are responsible for any repair costs for their node(s). Other users may rent any of the described in a table condo resources from owners of the server/node. Upon agreement with owners CUNY-HPCC  may harvest unused cycles and provide other members of CUNY community with cpu time.  


In sum the benefits of condo are:     
In sum the benefits of condo are:     


*'''5 year lifecycle''' - condo resources will be available for a duration of 5 years.
*'''5 year lifecycle''' - condo resources will be available for a duration of 5 years.
*'''Access to more cpu cores than purchased''' by sharing resources with other condo members.
*'''Access to more cpu cores than purchased''' by sharing resources with other condo members.


*'''Support''' - HPCC staff will install, upgrade, secure and maintain condo hardware throughout its lifecycle.
*'''Support''' - HPCC staff will install, upgrade, secure and maintain condo hardware throughout its lifecycle.
*'''Access to main application server'''
*'''Access to main application server.'''
*'''Access to HPC analytics'''
*'''Access to HPC analytics.'''
Responsibilities of condo memnbers
Responsibilities of condo memnbers


*'''To share their resources''' (when idle or partially available) with other members of a condo;
*'''To share their resources''' (when idle or partially available) with other members of a condo;
*'''To include in their research and instrumentation grants money''' for computing used to cover operational expences (non-tax-levy expenses) of the HPCC.  
*'''To include in their research and instrumentation grants money''' for computing used to cover operational expences (non-tax-levy expenses) of the HPCC.
The table below summarized the available resources
The table below summarized the available resources


==== '''Advanced Tier''' ====
==== Advanced Tier ====
The advanced tier holds the resources used for more advanced or large scale simulation and visual computing research which utilizes distributed parallel codes and/or instruction parallel (OMP) jobs with or without GPU; very large memory jobs on 3 fat nodes and GPU enabled GPU jobs. Note that this tier does not support NVSwitch technology so the GPU (all Volta class) cannot be shared across nodes. The resources for this tier are detailed in above table ('''orange'''). This tier provides nodes with Volta class GPUs with 16 GB and 32 GB on board.   
The advanced tier holds the resources used for more advanced or large scale simulation and visual computing research which utilizes distributed parallel codes and/or instruction parallel (OMP) jobs with or without GPU; very large memory jobs on 3 fat nodes and GPU enabled GPU jobs. Note that this tier does not support NVSwitch technology, so the GPU (all Volta class) cannot be shared across nodes. The resources for this tier are detailed in above table ('''orange'''). This tier provides nodes with Volta class GPUs with 16 GB and 32 GB on board.   


==== Basic tier ====
==== Basic Tier ====
The basic tier provides resources for sequential and moderate size parallel jobs. The resources in this tier are described in above table ('''black font)'''.The openMP jobs can be run only in a scope of single node. The distributed parallel jobs (MPI) can be run across cluster.  This tier alsop support Matlab Parallel server which can be run across nodes. The users also can run GPU enabled jobs since this tier has 132 GPU Tesla K20m. Please note that these GPU are not supported by NVIDIA anymore. Many applications also may not support this GPU as well. The table below summaries the resources for their tier.   
The basic tier provides resources for sequential and moderate size parallel jobs. The resources in this tier are described in above table ('''black font)'''.The openMP jobs can be run only in a scope of single node. The distributed parallel jobs (MPI) can be run across cluster.  This tier alsop support Matlab Parallel server which can be run across nodes. The users also can run GPU enabled jobs since this tier has 124 GPU Tesla K20m. Please note that these GPU are not supported by NVIDIA anymore. Many applications also may not support this GPU as well. The table below summaries the resources for their tier.   


== Access to HPCC resources ==
==== Visualization ====
Access to all HPCC resources is provided by bastion host called '''''chizen.''''' This server is redundant and runs very limited shell. In addition '''chizen''' must not be used save files. All files left there will be deleted automatically '''within 24 hours'''.  However chizen can be used to tunnel files to user directory in HPFS/scratch file system. Please check the section "File transfer" for details. To transfer files from/to main file system and/or scratch users can use the Data Transfer Node called '''Cea which''' allows file transfer from/to remote sites directly to/from '''<font face="courier">/global/u/<font color="red"><userid></font></font>''' or to/from  '''<font face="courier">/scratch/<font color="red"><userid></font></font>'''
HPCC supports specialized viasualization server. That server shares main file system of all nodes and thus allows to conduct in-situations visualization and/or post processing visualization. The parameters of the server are written in table above ('''yellow'''). 
==HPC systems and their architectures - an overview==
== Quick Start to HPCC ==


The HPC Center operates variety of architectures in order to support complex and demanding workflowsThe deployed systems include:  distributed memory (also referred to as “cluster”) computers, symmetric multiprocessor (also referred as SMP) and shared memory (also reffred as NUMA machines).
=== Access to HPCC Resources ===
Access to all HPCC resources is the same from in CSI or outside CSI campus. The HPCC resources are placed in secure domain accessible only via bastion host called '''''chizen.''''' This server is redundant and runs very limited shell. The server is dedicated only to provide access and cannot be used to store any data. Note that all data placed on chizen will be deleted automaticallyHowever chizen allows secure tunneling of data (without save data on chizen itself) from to user machines and HPFS/scratch file system. Please check the section "File transfer" below for details. However for data transfer is preferable to use the File Transfer Node (FTN) called '''Cea which''' allows direct secure file transfer from/to '''<font face="courier">/global/u/<font color="red"><userid></font></font>''' or to/from  '''<font face="courier">/scratch/<font color="red"><userid>.</font></font>'''


''Computational Systems'':
=== Accounts  ===
Every user must register with HPCC and obtain an account. Please see the section "Administrative information" for further details how to register with HPCC. Upon  registering with HPCC every user will get 2 directories:


'''SMP''' servers have several processors (working under a single operating system) which "share everything". Thus  all cpu-cores allocate a common memory block via shared bus or data path. SMP servers support all combinations of memory VS cpu (up to the limits of the particular computer). The SMP servers are commonly used to run sequential or thread parallel (e.g. OpenMP) jobs and they may have or may not have GPU. Currently, HPCC operates several detached SMP servers named '''Math, Cryo ''' and '''Karle'''. Karle is a server which does not have GPU and is used for visualizations, visual analytics  and interactive MATLAB/Mathematica jobs. '''Math''' is a condominium server without GPU as well. Cryo (CPU+GPU server) is  specialized server with  eight (8) NVIDIA V100 (32G) GPU designed to support large scale multi-core multi-GPU jobs.  
:• '''<font face="courier">/scratch/<font color="red"><userid></font></font>''' – this is temporary workspace on the HPC systems;
:• '''<font face="courier">/global/u/<font color="red"><userid></font></font>''' – space for “home directory”, i.e., storage space on the HPFS for program, scripts, and data;
:• In some instances a user can also have use of disk space on the iRODS in '''<font face="courier">/cunyZone/home/<font color="red"><projectid></font></font>''' (iRODS).
The '''/global/u/<userid>''' directory has quota (see "Administrative information" section for details).  


'''Cluster''' is defined as a single system comprizing a set of SMP servers interconnected with high performance network. Specific software coordinates programs on and/or across those in order to perform computationally intensive tasks. Each SMP member of the cluster is called a '''node'''. All nodes run independent copies of the same operating system (OS). Some or all of the nodes may incorporate GPUThe main cluster at HPCC is a hybrid (CPU+GPU) cluster called '''Penzias'''.  Sixty six (66) of Penzias nodes have 2 x GPU K20m, and the 3 fat nodes (nodes with large number of CPU-cores and memory) of the cluster do not have GPU.   In addition HPCC operates the cluster '''Herbert''' dedicated only to education.
=== Jobs ===
All jobs must be submitted to execution from master head node (MHN) despite of the tier. However is important to mention that the users '''do not need to address particular resource/node directly''' since the jobs are automatically placed in proper tier and proper node based on job submission policy and available resourcesAll jobs are subject to '''strict fair share policy''' which allows all users to get equal share of resources. There are '''no "privileged" queues of any kind.''' In brief all jobs at HPCC must: 


'''Distributed shared memory''' computer is tightly coupled server in which the memory is physically distributed, but it is logically unified as a single block. The system resembles SMP, but the number of cpu cores and the amounts of memory possible is far beyond limitations of the SMP. Because the memory is distributed, the access times across address space are non-uniform. Thus, this architecture is called Non Uniform Memory Access (NUMA) architecture.  Similarly to SMP, the '''NUMA''' systems are typically used for applications such as data mining and decision support system in which processing can be parceled out to a number of processors that collectively work on a common data. HPCC operates the '''NUMA''' server called '''Appel'''.  This server does not have GPU.  
* '''>>''' Start from user's directory on '''scratch''' file system '''- /scratch/<userid>''' . Jobs cannot be started from users' home directories - '''/global/u/<userid>'''
* '''>>''' Use SLURM job submission system (job scheduler)'''.''' All jobs submission scripts written for other job scheduler(s) (i.e. PBS pro) <u>must be converted to SLURM syntax.</u>
* '''>>''' All jobs in all tiers <u>must start from Master Hear Node (MHN)</u>In near future the process of submission of jobs will be improved further with launch of HPC job submission portal.


======'' Infrastructure systems'':======
All useful users' data must be kept in user's home directory '''/global/u/<userid>.'''. This file system is mounted only on login node. In contrast '''/scratch''' is mounted on all nodes and thus all jobs can be submitted only from '''/scratch'''. It is important to remember that '''/scratch''' is not main storage for users' accounts (home directories), but a <u>temporary storage used for job submission only.</u> Thus:
o Master Head Node ('''MHN''') is a redundant login node from which all jobs on all servers start. This server is not directly accessible from outside CSI campus.
#data in '''/scratch''' are not protected, preserved or backup-ed and can be lost at any time. CUNY-HPCC has no obligation to preserve user data in '''/scratch.'''
 
#'''/scratch''' undergoes regular and automatic file purging when either or both conditions are satisfied:
o '''Chizen''' is a redundant gateway server which provides access to protected HPCC domain.
##load of the '''/scratch''' file system reaches '''70+%.'''
 
##there is/are inactive file(s) older than '''60 days.'''
o '''Cea''' is a file transfer node allowing transfer of files between users’ computers to/from  /scratch space or to/from /global/u/<usarid>. '''Cea''' is accessible directly (not only via '''Chizen'''), but allows only limited set of shell commands. 
 
'''Table 1''' below provides a quick summary of the attributes of each of the systems available at the HPC Center.
 
<nowiki>*</nowiki> hyperthread


==SLURM Partitions ==
=== SLURM Partitions ===
The only way to submit job(s) to HPCC servers is through SLURM batch system. Any  job despite of its type (interactive, batch, serial, parallel etc.) must be submitted via SLURM. The latter allocates the requested resources on proper server and starts the job(s) according to predefined strict fair share policy. Computational resources (cpu-cores, memory, GPU) are organized in partitions. The main partition is called production. This is routing partition which distributes the jobs in several sub-partitions depend on job’s requirements. Thus the serial job submitted in '''production''' will land in '''partsequential''' partition.  No SLURM Pro scripts should be ever used and all existing SLURM scripts must be converted to SLURM before use. The table below shows the limitations of the partitions. Condo tier operates over 7 private partitions. For more details see the section "SLURM partitions on condo" below. The '''basic and advanced tiers utilize only public partitions described in a table below.'''   
The only way to submit job(s) to CUNY-HPCC servers is through SLURM batch system. Any  job despite of its type (interactive, batch, serial, parallel etc.) must be submitted via SLURM. The latter allocates the requested resources on proper server and starts the job(s) according to predefined strict fair share policy. Computational resources (cpu-cores, memory, GPU) are organized in partitions. The main partition is called production. This is routing partition which distributes the jobs in several sub-partitions depend on job’s requirements. Thus, the serial job submitted in '''production''' will land in '''partsequential''' partition.  No SLURM Pro scripts should be ever used and all existing SLURM scripts must be converted to SLURM before use. The table below shows the limitations of the partitions. Condo tier operates over 7 private partitions. For more details see the section "Running jobs". The '''basic and advanced tiers utilize only public partitions described in a table below.'''   
{| class="wikitable sortable mw-collapsible"
{| class="wikitable sortable mw-collapsible"
|+Public partitions for Basic and Advanced Tier
|+Public partitions for Basic and Advanced Tier
Line 185: Line 190:
o '''partmatlab''' partition allows to run MATLAB's Distributes Parallel  Server across main cluster. Note however that parallel toolbox programs  can be submitted via production partition, but only as thread parallel jobs.  
o '''partmatlab''' partition allows to run MATLAB's Distributes Parallel  Server across main cluster. Note however that parallel toolbox programs  can be submitted via production partition, but only as thread parallel jobs.  


o '''partdev''' is dedicated to development. All HPCC users have access to this partition with assigned resources of one computational node with 16 cores, 64 GB of memory and 2 GPU (K20m). This partition has time limit of 4 hours.
o '''partdev''' is dedicated to development. All HPCC users have access to this partition with assigned resources of one computational node with 16 cores, 64 GB of memory and 2 GPU (K20m). This partition has time limit of '''4 hours.'''


'''NB!''' '''Condo''' tier operates over '''7 private partitions.''' For more details see the section "SLURM partitions on condo" below.
'''NB!''' '''Condo''' tier operates over '''7 private partitions.''' For more details see the section "Running jobs".  


==Hours of Operation ==
=== Hours of Operation ===
The HPCC operates 24/7 with goal to be online minimum 250 days. The second Tuesday mornings in the month from 8:00AM to 12PM are normally reserved (but not always used) for scheduled maintenance.  Please plan accordingly. Unplanned maintenance to remedy system related problems may be scheduled as needed.  Reasonable attempts will be made to inform users running on those systems when these needs arise.
The HPCC operates 24/7 with goal to be online minimum 250 days. The second Tuesday mornings in the month from 8:00AM to 12PM are normally reserved (but not always used) for scheduled maintenance.  Please plan accordingly. Unplanned maintenance to remedy system related problems may be scheduled as needed.  Reasonable attempts will be made to inform users running on those systems when these needs arise.


==User Support==
=== User Support ===
Users are encouraged to read this Wiki carefully.  In particular, the sections on compiling and running
Users are encouraged to read this Wiki carefully.  In particular, the sections on compiling and running parallel programs, and the section on the SLURM batch queueing system will give you the essential knowledge needed to use the CUNY-HPCC systems.  We have strived to maintain the most uniform
parallel programs, and the section on the SLURM batch queueing system will give you the essential
knowledge needed to use the CUNY HPC Center systems.  We have strived to maintain the most uniform
user applications environment possible across the Center's systems to ease the transfer of applications
user applications environment possible across the Center's systems to ease the transfer of applications
and run scripts among them.   
and run scripts among them.   


The CUNY HPC Center staff, along with outside vendors, offer regular courses and workshops to the CUNY
The CUNY HPC Center staff, along with outside vendors, offer regular courses and workshops to the CUNY community in parallel programming techniques, HPC computing architecture, and the essentials of using our systems. Please follow our mailings on the subject and feel free to inquire about such courses.  We regularly schedule training visits and classes at the various CUNY campuses.  Please let us know if such a training visit is of interest.  In the past, topics have include an overview of parallel programming, GPU programming and architecture, using the evolutionary biology software at the HPC Center,  the SLURM queueing system at the CUNY-HPCC, Mixed GPU-MPI and OpenMP programming, etc.  Staff has also presented guest lectures at formal classes throughout the CUNY campuses.     
community in parallel programming techniques, HPC computing architecture, and the essentials of using our
systems. Please follow our mailings on the subject and feel free to inquire about such courses.  We regularly
schedule training visits and classes at the various CUNY campuses.  Please let us know if such a training visit
is of interest.  In the past, topics have include an overview of parallel programming, GPU programming and
architecture, using the evolutionary biology software at the HPC Center,  the SLURM queueing system at the
CUNY HPC Center, Mixed GPU-MPI and OpenMP programming, etc.  Staff has also presented guest lectures
at formal classes throughout the CUNY campuses.     


If you have problems accessing your account and cannot login to the ticketing service, please send an email to:
If you have problems accessing your account and cannot login to the ticketing service, please send an email to:
Line 212: Line 208:
   [mailto:hpchelp@csi.cuny.edu hpchelp@csi.cuny.edu]  
   [mailto:hpchelp@csi.cuny.edu hpchelp@csi.cuny.edu]  


==Warnings and modes of operation==
=== Additional Information ===




1. hpchelp@csi.cuny.edu is for questions and accounts help communication '''only''' and does not accept tickets. For tickets please use  the ticketing system mentioned above. This ensures that the person on staff with the most appropriate skill set and job related responsibility will respond to your questions. During the business week you should expect a 48h response, quite  often even same day response. During the weekend you may not get any response.  
1. '''hpchelp@csi.cuny.edu is for questions and accounts help communication only and does not accept tickets.''' For tickets please use  the ticketing system which ensures that the person on staff with the most appropriate skill set and job related responsibility will respond to your questions. During the business week you should expect a 48h response, quite  often even same day response. During the weekend you may not get any response.


2. '''E-mails to hpchelp@csi.cuny.edu must have a valid CUNY e-mail as reply address.''' Messages originated from public mailers (google, hotmail, etc) are filtered out.
2. '''E-mails to hpchelp@csi.cuny.edu must have a valid CUNY e-mail as reply address.''' Messages originated from public mailers (google, hotmail, etc) are filtered out.


3. '''Do not send questions to individual CUNY HPC Center staff members directly.'''  These will be returned to the sender with a polite request to submit a ticket or email the Helpline.  This applies to replies to initial questions as well.
3. '''Do not send questions to individual CUNY-HPC Center staff members directly.'''  These will be returned to the sender with a polite request to submit a ticket or email the Helpline.  This applies to replies to initial questions as well.


The CUNY HPC Center staff members are focused on providing high quality support to its user community, but compared
The CUNY-HPCC staff members are focused on providing high quality support to its user community, but compared
to other HPC Centers of similar size '''our staff is extremely lean'''.  Please make full use of the tools that we have provided (especially
to other Academic Research Computing Centers in the Country of similar size we operate with 90% less personnel. Because '''our staff is extremely lean p'''lease make full use of the tools that we have provided (especially
the Wiki), and feel free to offer suggestions for improved service.  We hope and expect your experience in using
the Wiki), and feel free to offer suggestions for improved service.  We hope and expect your experience in using
our systems will be predictably good and productive.
our systems will be predictably good and productive.

Latest revision as of 17:12, 17 January 2025

CUNY-HPCC-HEADER-LOGO.jpg
Hpcc-panorama3.png

The City University of New York (CUNY) High Performance Computing Center (HPCC) is located on the campus of the College of Staten Island, 2800 Victory Boulevard, Staten Island, New York 10314. The CUNY-HPCC supports computational research and computational intensive courses on graduate and undergraduate level offered at all CUNY colleges in fields such as Computer Science, Engineering, Bioinformatics, Chemistry, Physics, Materials Science, Genetics, Genomics, Proteomics, Computational Biology, Finance, Economics, Linguistics, Anthropology, Psychology, Neuroscience, Computational Fluid Mechanics and many others. CUNY-HPCC provides educational outreach to local schools and supports undergraduates who work in the research programs of the host institution (e.g. REU program from NSF). The primary mission of CUNY-HPCC is:

  • To enable advanced research and scholarship at CUNY colleges by providing faculty, staff, and students with access to high-performance computing, adequate storage resources and visualization resources;
  • To enable advanced education and cross disciplinary education by providing flexible and scalable resources;
  • To provide CUNY faculty and their collaborators at other universities, CUNY research staff and CUNY graduate and undergraduate students with expertise in scientific computing, parallel scientific computing (HPC), software development, advanced data analytics, data driven science and simulation science, visualization, advanced database engineering, and others;
  • Leverage the HPC Center capabilities to acquire additional research resources for CUNY faculty, researchers and students in existing and major new programs;
  • Create opportunities for the CUNY research community to win grants from national funding institutions and to develop new partnerships with the government and private sectors.

CUNY-HPPC is voting member of Coalition for Academic Scientific Computation (CASC). Originally formed in the 1990s as a small group of the heads of national supercomputing centers, CASC expanded to more than 100 member institutions representing many of the nation’s most forward-thinking universities and computing centers. CASC includes the leadership of large academic computing centers such as TACC or San Diego SC and recently attracts a greater diversity of smaller institutions such as non-R1s, HBCUs, HSIs, TCUs, etc. CASC’s mission as to be “dedicated to advocating for the use of the most advanced computing technology to accelerate scientific discovery for national competitiveness, global security, and economic success, as well as develop a diverse and well-prepared 21st century workforce.”

CUNY-HPCC - Democratization of Research

In last few years the model of cloud computing (called also computing-on-demand) made the promise that anyone, no matter where the user is, could leverage almost unlimited computing resources. This computing supposed to “democratize” research and level the playing field, as it were. Unfortunately that is not entirely true (for now) because the cloud computing even available to nearly anyone, from nearly anywhere remains comparatively expensive to local resources and lacks the flexibility and accessibility of local support tailored to education and research offered by the local research HPCC. Indeed, every computational environment has limitations and a learning curve such that students and faculty coming from variety of backgrounds and having might feel crushed and helpless without close and personalized local support. in this sense the carefully designed, user centered, academically focused HPC has the transformative capability for rapidly evolving computation and data-driven research, and creates opportunities for vast collaboration and convergence research activities and thus provides the real democratization of the research.

Pedagogical Value of CUNY- HPCC

CUNY-HPPC supports whole variety of classes on graduate and undergraduate level from all CUNY-colleges, Graduate Center and institutes. It is important to mention that CUNY-HPCC impact goes beyond the STEM disciplines. Thus the CUNY-HPCC:

Dr Alexander Tzanov, the director of CUNY-HPCC speaks on NanoBioNYC workshop
  • Allows to conduct analysis of datasets that are too large to work with easily on personal devices, or that cannot easily be shared or disseminated. These data sets are not coming only from STEM fields (i.e. finance, economics, linguistics etc.). Facilitating these analyses provides the students with opportunity to interact in real time with increasingly large amounts of data, enabling them to gather important skills and experiences.
  • CUNY-HPCC provides collaborative space for entire courses. The multi-user capabilities and environment of HPCC facilitates collaborative work among learners and promotes more complex closer to reality learning problems.
  • The large computational and visualization capabilities of CUNY-HPCC are enabler for applying analytical techniques too large for personal devices. Students can run unattended parameter sweeps or workflows in order to explore the problem in detail. That self exploration has proven positive effect on learning.
  • Use of CUNY-HPCC resources provides students with needed pre request skills and knowledge they may need later when explore larger HPC environments. For instance the CUNY-HPCC workflow and environment is extremely close to the environment of other research centers of ACCESS resources.
  • CUNY-HPCC participates in educational programs such as NSF funded NanoBioNYC Ph.D. traineeship program at CUNY. This program is focused on developing groundbreaking bio-nano science solutions to address urgent human and planetary health issues and preparing students to become tomorrow’s leaders in diverse STEM careers.

Research Value of CUNY- HPCC

High performance computing is a backbone of any modern simulation research. About 80% of the

Available Computational Architectures and Storage Systems

The HPC Center operates variety of architectures in order to support complex and demanding workflows. The deployed systems include: distributed memory (also referred to as “cluster”) computers, symmetric multiprocessor (also referred as SMP) and shared memory (also reffred as NUMA machines).

Computational Systems

SMP servers have several processors (working under a single operating system) which "share everything". Thus all cpu-cores allocate a common memory block via shared bus or data path. SMP servers support all combinations of memory VS cpu (up to the limits of the particular computer). The SMP servers are commonly used to run sequential or thread parallel (e.g. OpenMP) jobs and they may have or may not have GPU. Currently, HPCC operates several detached SMP servers named Math, Cryo and Karle. Karle is a server which does not have GPU and is used for visualizations, visual analytics and interactive MATLAB/Mathematica jobs. Math is a condominium server without GPU as well. Cryo (CPU+GPU server) is specialized server with eight (8) NVIDIA V100 (32G) GPU designed to support large scale multi-core multi-GPU jobs.

Cluster is defined as a single system comprizing a set of SMP servers interconnected with high performance network. Specific software coordinates programs on and/or across those in order to perform computationally intensive tasks. Each SMP member of the cluster is called a node. All nodes run independent copies of the same operating system (OS). Some or all of the nodes may incorporate GPU. The main cluster at HPCC is a hybrid (CPU+GPU) cluster called Arrow. Sixty two (62) of its nodes (P-nodes) have 2 x GPU K20m, one node has 8 GPU, Volta series V100 with 32 GB per board, and 2 nodes have 2 GPU Volta V100 with 16GB per card. Remaining 24 (BM-nodes) does not have GPU.

Distributed shared memory computer is tightly coupled server in which the memory is physically distributed, but it is logically unified as a single block. The system resembles SMP, but the number of cpu cores and the amounts of memory possible is far beyond limitations of the SMP. Because the memory is distributed, the access times across address space are non-uniform. Thus, this architecture is called Non Uniform Memory Access (NUMA) architecture. Similarly to SMP, the NUMA systems are typically used for applications such as data mining and decision support system in which processing can be parceled out to a number of processors that collectively work on a common data. CUNY-HPCC operates the NUMA server called Appel. This server has only K20m GPU.

Storage Systems

The /scratch file system mentioned above is small fast file system mounted on all nodes. This file system resides on solid state drives and has capacity of 256GB. Note that files on /scratch are not backup-ed and are not protected. This file system does not have quota so users can submit large jobs. The file system is automatically purged ifd: 1. the load of the file system exceeds 70% or 2. file(s) are not accessed for 60 days whatever comes first. The partition /global/u in main HPFS file system holds user home directories. The HPFS is the hybrid file system and combines SSD and HDD (solid state and hard disks) with capabilities for dynamic relocation of files. The capacity is 2 Peta Bytes (PB). This file system, was purchased under NSF grant OAC-2215760. That file system is mounted on all nodes.

Support Systems

These systems are responsible to provide access to HPCC resources, submitting jobs and file transfers.

o Master Head Node (MHN) is a redundant login node from which all jobs on all servers start. This server is not directly accessible from outside CSI campus.

o Chizen is a redundant gateway server which provides access to protected HPCC domain.

o Cea is a file transfer node allowing transfer of files between users’ computers to/from /scratch space or to/from /global/u/<usarid>. Cea is accessible directly (not only via Chizen), but allows only limited set of shell commands.

Overview of HPCC's Research Computing Infrastructure (RCI)

Organization of HPCC resources

The above resources are organized in research computing infrastructure depicted in the figure on right. In order to support various types of research projects the CUNY-HPCC supports a variety of computational architectures. All computational resources are organized in 3 tiers - Condominium Tier (CT), Free Tier (FT) and Advanced Tier (AT) plus visualization (Viz). (see next section for details). All nodes in all tiers are attached to central file system HPFS which provides /scratch and  Global Storage (GS) - /global/u/. The below table shows tiers and their use. Note that * denotes hyper-thread and ** denotes outdated GPU not suitable for large scale research, but useful for education.

Tiers and their use
Tier # Cores # GPU Use
Condo 704

1408*

21 Heavy  instruction parallel and distributed parallel or hybrid (OMP + MPI) calculations which can be GPU accelerated, massive GPU enabled simulations requiring matrix of modern GPU, advanced AI and ML, Big Data in all disciplines, advanced CFD, Genomics, Finance, Econometrics, Neuroscience etc. Virtualization of the GPUs is possible. Support for GPU virtualization and unification of GPUs across nodes over 400Gbps network.
Advanced 1336 12 Large Instruction parallel and distributed parallel or hybrid (OMP + MPI) calculations which can be GPU accelerated. Big data jobs in all disciplines i.e. Genomics, Proteomics, Genetics, AI, CFD, Finance etc.
Basic 992 124** Parametric Studies, Sequential jobs, Small to medium distributed parallel jobs, Small hybrid (OMP+MPI) jobs up to 16 cores per node, Education, hand on experience including GPU programming principles.
Viz 36/72* NA In-situ (real time) and post processing visualization.    

Organization of Computational and Visualization Resources

The computational resources in 3 tiers mentioned above are combined within ARROW hybrid cluster. In addition CUNY-HPCC operates specialized visualization server which shares the file system with all nodes. That allows to conduct in-situ visualizations of simulations. The description in nodes is given in a table below. Note that black denotes basic tier, blue denotes condo tier and orange denotes advanced tier. The yellow marks the visualization tier.

Arrow Viz Resources.png

Condominium Tier

Condominium tier (called condo) organizes resources purchased and owned by faculty, but fully maintained and supported by CUNY-HPCC. The list of available resources is given in above table marked in blue. All condo nodes have advanced GPU, large shared memory and advanced GPU with memory per GPU board from 24 to 40GB. The participation in this tier is strictly voluntary. Several faculty/research groups can combine finds to purchase and consequently share the hardware (a node or several nodes). All nodes in this tier must meet certain hardware specifications including to be fully warranted for time life of the node(s) in order to be accepted. If you want to participate in condominium please sent a request mail to hpchelp@csi.cuny.edu and consult CUNY-HPCC before making a purchase. Condominium tier:

  • Promotes vertical and horizontal collaboration between research groups;
  • Makes possible to utilize small amounts of research money or "left-over" money wisely and to obtain advanced resources;
  • Helps researchers to conduct large scope high quality research including collaborative projects leading to successful grants with high impact.

The owners (and their groups) of condo resources have guaranteed access to their servers. All users registered with condo tier must use the main login node of NHN/Arrow. To access their own node the condo users must specify their own private partition and use specific QOS qualifier. In addition there are partitions which operate over two or more nodes owned by condo members. The condo tier members benefit from professional support from HPCC staff, in addition to professional maintenance and security hardening of the servers. Upon approval (from the node owner) any idle node(s) can be used by any other condo member(s). For instance a condo member can borrow (for agreed time) node with more advanced GPUs than those installed on his/her own node(s). The owners of the equipment are responsible for any repair costs for their node(s). Other users may rent any of the described in a table condo resources from owners of the server/node. Upon agreement with owners CUNY-HPCC may harvest unused cycles and provide other members of CUNY community with cpu time.

In sum the benefits of condo are:

  • 5 year lifecycle - condo resources will be available for a duration of 5 years.
  • Access to more cpu cores than purchased by sharing resources with other condo members.
  • Support - HPCC staff will install, upgrade, secure and maintain condo hardware throughout its lifecycle.
  • Access to main application server.
  • Access to HPC analytics.

Responsibilities of condo memnbers

  • To share their resources (when idle or partially available) with other members of a condo;
  • To include in their research and instrumentation grants money for computing used to cover operational expences (non-tax-levy expenses) of the HPCC.

The table below summarized the available resources

Advanced Tier

The advanced tier holds the resources used for more advanced or large scale simulation and visual computing research which utilizes distributed parallel codes and/or instruction parallel (OMP) jobs with or without GPU; very large memory jobs on 3 fat nodes and GPU enabled GPU jobs. Note that this tier does not support NVSwitch technology, so the GPU (all Volta class) cannot be shared across nodes. The resources for this tier are detailed in above table (orange). This tier provides nodes with Volta class GPUs with 16 GB and 32 GB on board.

Basic Tier

The basic tier provides resources for sequential and moderate size parallel jobs. The resources in this tier are described in above table (black font).The openMP jobs can be run only in a scope of single node. The distributed parallel jobs (MPI) can be run across cluster. This tier alsop support Matlab Parallel server which can be run across nodes. The users also can run GPU enabled jobs since this tier has 124 GPU Tesla K20m. Please note that these GPU are not supported by NVIDIA anymore. Many applications also may not support this GPU as well. The table below summaries the resources for their tier.

Visualization

HPCC supports specialized viasualization server. That server shares main file system of all nodes and thus allows to conduct in-situations visualization and/or post processing visualization. The parameters of the server are written in table above (yellow).

Quick Start to HPCC

Access to HPCC Resources

Access to all HPCC resources is the same from in CSI or outside CSI campus. The HPCC resources are placed in secure domain accessible only via bastion host called chizen. This server is redundant and runs very limited shell. The server is dedicated only to provide access and cannot be used to store any data. Note that all data placed on chizen will be deleted automatically. However chizen allows secure tunneling of data (without save data on chizen itself) from to user machines and HPFS/scratch file system. Please check the section "File transfer" below for details. However for data transfer is preferable to use the File Transfer Node (FTN) called Cea which allows direct secure file transfer from/to /global/u/<userid> or to/from /scratch/<userid>.

Accounts

Every user must register with HPCC and obtain an account. Please see the section "Administrative information" for further details how to register with HPCC. Upon registering with HPCC every user will get 2 directories:

/scratch/<userid> – this is temporary workspace on the HPC systems;
/global/u/<userid> – space for “home directory”, i.e., storage space on the HPFS for program, scripts, and data;
• In some instances a user can also have use of disk space on the iRODS in /cunyZone/home/<projectid> (iRODS).

The /global/u/<userid> directory has quota (see "Administrative information" section for details).

Jobs

All jobs must be submitted to execution from master head node (MHN) despite of the tier. However is important to mention that the users do not need to address particular resource/node directly since the jobs are automatically placed in proper tier and proper node based on job submission policy and available resources. All jobs are subject to strict fair share policy which allows all users to get equal share of resources. There are no "privileged" queues of any kind. In brief all jobs at HPCC must:

  • >> Start from user's directory on scratch file system - /scratch/<userid> . Jobs cannot be started from users' home directories - /global/u/<userid>
  • >> Use SLURM job submission system (job scheduler). All jobs submission scripts written for other job scheduler(s) (i.e. PBS pro) must be converted to SLURM syntax.
  • >> All jobs in all tiers must start from Master Hear Node (MHN). In near future the process of submission of jobs will be improved further with launch of HPC job submission portal.

All useful users' data must be kept in user's home directory /global/u/<userid>.. This file system is mounted only on login node. In contrast /scratch is mounted on all nodes and thus all jobs can be submitted only from /scratch. It is important to remember that /scratch is not main storage for users' accounts (home directories), but a temporary storage used for job submission only. Thus:

  1. data in /scratch are not protected, preserved or backup-ed and can be lost at any time. CUNY-HPCC has no obligation to preserve user data in /scratch.
  2. /scratch undergoes regular and automatic file purging when either or both conditions are satisfied:
    1. load of the /scratch file system reaches 70+%.
    2. there is/are inactive file(s) older than 60 days.

SLURM Partitions

The only way to submit job(s) to CUNY-HPCC servers is through SLURM batch system. Any job despite of its type (interactive, batch, serial, parallel etc.) must be submitted via SLURM. The latter allocates the requested resources on proper server and starts the job(s) according to predefined strict fair share policy. Computational resources (cpu-cores, memory, GPU) are organized in partitions. The main partition is called production. This is routing partition which distributes the jobs in several sub-partitions depend on job’s requirements. Thus, the serial job submitted in production will land in partsequential partition. No SLURM Pro scripts should be ever used and all existing SLURM scripts must be converted to SLURM before use. The table below shows the limitations of the partitions. Condo tier operates over 7 private partitions. For more details see the section "Running jobs". The basic and advanced tiers utilize only public partitions described in a table below.

Public partitions for Basic and Advanced Tier
Partition Max cores/job Max jobs/user Total cores/group Time limits
production 128 50 256 240 Hours
partedu 16 2 216 72 Hours
partmath 128 128 128 240 Hours
partmatlab 1972 50 1972 240 Hours
partdev 16 16 16 4 Hours

o production is the main partition with assigned resources across all servers (except Math and Cryo).It is routing partition so the actual job(s) will be placed in proper sub-partition automatically. Users may submit sequential, thread parallel or distributed parallel jobs with or without GPU.

o partedu partition is only for education. Assigned resources are on educational server Herbert. Partedu is accessible only to students (graduate and/or undergraduate) and their professors who are registered for a class supported by HPCC. Access to this partition is limited by the duration of the class.

o partmatlab partition allows to run MATLAB's Distributes Parallel Server across main cluster. Note however that parallel toolbox programs can be submitted via production partition, but only as thread parallel jobs.

o partdev is dedicated to development. All HPCC users have access to this partition with assigned resources of one computational node with 16 cores, 64 GB of memory and 2 GPU (K20m). This partition has time limit of 4 hours.

NB! Condo tier operates over 7 private partitions. For more details see the section "Running jobs".

Hours of Operation

The HPCC operates 24/7 with goal to be online minimum 250 days. The second Tuesday mornings in the month from 8:00AM to 12PM are normally reserved (but not always used) for scheduled maintenance. Please plan accordingly. Unplanned maintenance to remedy system related problems may be scheduled as needed. Reasonable attempts will be made to inform users running on those systems when these needs arise.

User Support

Users are encouraged to read this Wiki carefully. In particular, the sections on compiling and running parallel programs, and the section on the SLURM batch queueing system will give you the essential knowledge needed to use the CUNY-HPCC systems. We have strived to maintain the most uniform user applications environment possible across the Center's systems to ease the transfer of applications and run scripts among them.

The CUNY HPC Center staff, along with outside vendors, offer regular courses and workshops to the CUNY community in parallel programming techniques, HPC computing architecture, and the essentials of using our systems. Please follow our mailings on the subject and feel free to inquire about such courses. We regularly schedule training visits and classes at the various CUNY campuses. Please let us know if such a training visit is of interest. In the past, topics have include an overview of parallel programming, GPU programming and architecture, using the evolutionary biology software at the HPC Center, the SLURM queueing system at the CUNY-HPCC, Mixed GPU-MPI and OpenMP programming, etc. Staff has also presented guest lectures at formal classes throughout the CUNY campuses.

If you have problems accessing your account and cannot login to the ticketing service, please send an email to:

 hpchelp@csi.cuny.edu 

Additional Information

1. hpchelp@csi.cuny.edu is for questions and accounts help communication only and does not accept tickets. For tickets please use the ticketing system which ensures that the person on staff with the most appropriate skill set and job related responsibility will respond to your questions. During the business week you should expect a 48h response, quite often even same day response. During the weekend you may not get any response.

2. E-mails to hpchelp@csi.cuny.edu must have a valid CUNY e-mail as reply address. Messages originated from public mailers (google, hotmail, etc) are filtered out.

3. Do not send questions to individual CUNY-HPC Center staff members directly. These will be returned to the sender with a polite request to submit a ticket or email the Helpline. This applies to replies to initial questions as well.

The CUNY-HPCC staff members are focused on providing high quality support to its user community, but compared to other Academic Research Computing Centers in the Country of similar size we operate with 90% less personnel. Because our staff is extremely lean please make full use of the tools that we have provided (especially the Wiki), and feel free to offer suggestions for improved service. We hope and expect your experience in using our systems will be predictably good and productive.