Data Storage and Management System: Difference between revisions

From HPCC Wiki
Jump to navigation Jump to search
m (Text replacement - "[pP][bB][sS]" to "SLURM")
Line 1: Line 1:
=Data Storage and Management System (DSMS)=
=Global Data Storage (GDS)=


__TOC__
__TOC__


Key features of the '''DSMS''' system include:
Key features of the GDS system include:


:• '''User''' home directories in a standard Unix file system called /global/u.
:• '''User''' home directories in a standard Unix file system called /global/u.

Revision as of 16:13, 17 April 2023

Global Data Storage (GDS)

Key features of the GDS system include:

User home directories in a standard Unix file system called /global/u.
• Enhanced parallel scratch space on the HPC systems.
Project directories in an Integrated Rule-Oriented Data-management System (iRODS) managed resource. Project directories exist in a “virtual file space” called cunyZone which contains a resource called Storage Resource 1 (SR1). For the purpose of this document, we will use the terminology SR1 to describe Project file space.
• Automated backups.


The DSMS is the HPC Center’s primary file system and is accessible from all existing HPC systems, except for HERBERT . It will similarly be accessible from all future HPC systems.

The DSMS provides a 3-level data storage infrastructure: - HOME filesystem, SCRATCH filesystems, SR1 (long-tern storage resource)

DSMS features are explained below.


"Home" directories are on /global/u

/global/u is a standard Unix file system that holds the home directories of individual users. When users request and are granted an allocation of HPC resources, they are assigned a <userid> and a 50 GB allocation of disk space for home directories on /global/u/<userid>. These home directories are on the DSMS, not on the HPC systems, but can be accessed from any Center system. All home directories are backed up on weekly basis.

/scratch

Disk storage on the HPC systems is used only for scratch files. scratch files are temporary and are not backed up. /scratch is used by jobs queued for or in execution. Output from jobs may temporarily be located in /scratch.

In order to submit a job for execution, a user must stage or mount the files required by the job to /scratch from /global/u using UNIX commands and/or from SR1 using iRODS commands.

Files in /scratch on a system are automatically purged when (1) usage reaches 70% of available space, or (2) file residence on scratch exceeds two weeks, whichever occurs first.


“Project” directories

“Project” directories are managed through iRODS and accessible through iRODS commands, not standard UNIX commands. In iRODS terminology, a “collection” is the equivalent of “directory”.

A “Project” is an activity that usually involves multiple users and/or many individual data files. A “Project” is normally led by a “Principal Investigator” (PI), who is a faculty member or a research scientist. The PI is the individual responsible to the University or a granting agency for the “Project”. The PI has overall responsibility for “Project” data and “Project” data management. To establish a Project, the PI completes and submits the online “Project Application Form”.

Additional information on the DSMS is available in Section 4 of the User Manual
http://www.csi.cuny.edu/cunyhpc/pdf/User_Manual.pdf


Typical Workflow

Typical workflows in are described below:

1. Copying files from a user’s home directory or from SR1 to SCRATCH.
If working with HOME:

  cd /scratch/<userid>
  mkdir mySLURM_Job && cd mySLURM_Job
  cp /global/u/<userid>/myProject/a.out ./
  cp /global/u/<userid>/myProject/mydatafile ./

If working with SR1:

  cd /scratch/<userid>
  mkdir mySLURM_Job && cd mySLURM_Job
  iget myProject/a.out 
  iget myProject/mydatafile


2. Prepare SLURM job script. Typical SLURM sript is similar to the following:

  #!/bin/bash 
  #SBATCH --partition production 
  #SBATCH -J test 
  #SBATCH --nodes 1 
  #SBATCH --ntasks 8 
  #SBATCH --mem 4000
  echo "Starting…" 
  cd $SLURM_SUBMIT_DIR
  mpirun -np 4 ./a.out ./mydatafile > myoutputs
  echo "Done…"

Your SLURM may be different depending on your needs. Read section Submitting Jobs for a reference.


3. Run the job

  sbatch ./mySLURM_script


4. Once job is finished, clean up SCRATCH and store outputs in your user home directory or in SR1.

If working with HOME:

  mv ./myoutputs /global/u/<userid>/myProject/.
  cd ../
  rm -rf mySLURM_Job

If working with SR1:

  iput ./myoutputs myProject/. 
  cd ../
  rm -rf mySLURM_Job


5. If output files are stored in SR1 tag them with metadata.

  imeta addw -d myoutput zvalue 15 meters
  imeta addw -d myoutput colorLabel RED


iRODS

iRODS is the integrated Rule-Oriented Data-management System, a community-driven, open source, data grid software solution. iRODS is designed to abstract data services from data storage hardware and provide users with hardware-agnostic way to manipulate data.


iRODS is a primary tool that is used by the CUNY HPCC users to seamlessly access 1PB storage resource (further referenced as SR1 here) from any of the HPCC's computational systems.

Access to SR1 is provided via so-called i-commands:

iinit
ils
imv

Comprehesive list of i-commands with detailed description can be obtained at iRODS wiki.


To obtain quick help on any of the commands while being logged into any of the HPCC's machines type i-command -h. For example:

ils -h

Following is the list of some of the most relevant i-commands:

iinit -- Initialize session and store your password in a scrambled form for automatic use by other icommands.
iput -- Store a file
iget -- Get a file
imkdir -- Like mkdir, make an iRODS collection (similar to a directory or Windows folder)
ichmod -- Like chmod, allow (or later restrict) access to your data objects by other users.
icp -- Like cp or rcp, copy an iRODS data object
irm -- Like rm, remove an iRODS data object
ils -- Like ls, list iRODS data objects (files) and collections (directories)
ipwd -- Like pwd, print the iRODS current working directory
icd -- Like cd, change the iRODS current working directory
ichksum -- Checksum one or more data-object or collection from iRODS space.
imv -- Moves/renames an irods data-object or collection.
irmtrash -- Remove one or more data-object or collection from a RODS trash bin.
imeta -- Add, remove, list, or query user-defined Attribute-Value-Unit triplets metadata
iquest -- Query (pose a question to) the ICAT, via a SQL-like interface


Before using any of the i-commands users need to identify themselves to the iRODS server running command

# iinit

and providing HPCC's password.

Typical workflow that involves operations on files stored in SR1 include storing/getting data to and from SR1, tagging data with metadata, searching for data, sharing (setting permissions).

Storing data to SR1

1. Create iRODS directory (aka 'collection'):

  # imkdir myProject

2. Store all files 'myfile*' into this directory (collection):

  # iput -r <userid> myfile* myProject/.

3. Verify that files are stored:

  # ils
  /cunyZone/home/<userid>:
  C- /cunyZone/home/<userid>/myProject
  # ils myProject
  /cunyZone/home/<userid>/myProject:
     myfile1
     myfile2
     myfile3

Symbol 'C-' in the beginning of output of 'ils' shows that listed item is a collection.

4. Combining 'ils', 'imkdir', 'iput', 'icp', 'ipwd', 'imv' user can create iRODS directories and store files in them similarly to what is normally done with UNIX commands 'ls', 'mkdir', 'cp', 'pwd', 'mv' etc...

Getting data from SR1

1. To copy file from SR1 to current working directory run

  # iget myProject/myfile1

2. Now listing current working directory should reveal myfile1:

  # ls
  myfile1

3. Instead of individual files the whole directory (with sub-directories) can be copied with '-r' flag (stands for 'recursive')

  # iget -r myProject


NOTE: wildcards are not supported, therefore the command below will not work

  # iget myProject/myfile*

Tagging data with metadata

iRODS provides users with extremely powerful mechanism of managing data with metadata. While working with large datasets it's sometimes easy to forget what is stored in this or the other file. Metadata tags help organizing data in a very easy and reliable manner.

Let's tag files from previous example with some metadata:

# imeta add -d myProject/myfile1 zvalue 15 meters
AVU added to 1 data-objects
# imeta add -d myProject/myfile1 colorLabel RED
AVU added to 1 data-objects
# imeta add -d myProject/myfile1 comment "This is file number 1"
AVU added to 1 data-objects
# imeta add -d myProject/myfile2 zvalue 10 meters
AVU added to 1 data-objects
# imeta add -d myProject/myfile2 colorLabel RED
AVU added to 1 data-objects
# imeta add -d myProject/myfile2 comment "This is file number 2"
AVU added to 1 data-objects
# imeta add -d myProject/myfile3 zvalue 15 meters
AVU added to 1 data-objects
# imeta add -d myProject/myfile3 colorLabel BLUE
AVU added to 1 data-objects
# imeta add -d myProject/myfile3 comment "This is file number 3"
AVU added to 1 data-objects

Here we've tagged myfile1 with 3 metadata labels:

- zvalue 10 meters

- colorlabel RED

- comment "This is file number 1"

Similar tags were added to 'myfile2' and 'myfile3'

Metadata come in form of AVU -- Attribute|Value|Unit. As seen from the above examples Unit is not necessary.

Let's list all metadata assigned to file 'myfie1':

# imeta ls -d myProject/myfile1
AVUs defined for dataObj myProject/myfile1:
attribute: zvalue
value: 15
units: meters
----
attribute: colorLabel
value: RED
units:
----
attribute: comment
value: This is file number 1
units:

To remove an AVU assigned to a file run:

# imeta rm -d myProject/myfile1 zvalue 15 meters
# imeta ls -d myProject/myfile1
AVUs defined for dataObj myProject/myfile1:
attribute: colorLabel
value: RED
units:
----
attribute: comment
value: This is file number 1
units:
#
#
# imeta add -d myProject/myfile1 zvalue 15 meters

Metadata may be assigned to directories as well:

# imeta add -C myProject simulationsPool 1
# imeta ls -C myProject
AVUs defined for collection myProject:
attribute: simulationsPool
value: 1
units:

Note the '-C' key that is used instead of '-d'.


Searching for data

Power of metadata becomes obvious when data needs to be found in large collections. Here is an illustration of how easy this task is done with iRODS via imeta queries:

# imeta qu -d zvalue = 15
collection: /cunyZone/home/<userid>/myProject
dataObj: myfile1
----
collection: /cunyZone/home/<userid>/myProject
dataObj: myfile3


We see both files that were tagged with label 'zvalue 10 meters'. Here is different query:

# imeta qu -d colorLabel = RED
collection: /cunyZone/home/<userid></myProject
dataObj: myfile1
----
collection: /cunyZone/home/<userid>/myProject
dataObj: myfile2


Another powerful mechanism to query data is provided with 'iquest'. Following is a number of examples that show 'iquest' capabilities:

iquest "SELECT DATA_NAME, DATA_SIZE WHERE DATA_RESC_NAME like 'cuny%'"
iquest "For %-12.12s size is %s" "SELECT DATA_NAME ,  DATA_SIZE  WHERE COLL_NAME = '/cunyZone/home/<userid>'"
iquest "SELECT COLL_NAME WHERE COLL_NAME like '/cunyZone/home/%' AND USER_NAME like '<userid>'"
iquest "User %-6.6s has %-5.5s access to file %s" "SELECT USER_NAME,  DATA_ACCESS_NAME, DATA_NAME WHERE COLL_NAME = '/cunyZone/home/<userid>'"
iquest " %-5.5s access has been given to user %-6.6s for the file %s" "SELECT DATA_ACCESS_NAME, USER_NAME, DATA_NAME WHERE COLL_NAME = '/cunyZone/home/<userid>>'"
iquest no-distinct "select META_DATA_ATTR_NAME"
iquest  "select COLL_NAME, DATA_NAME WHERE DATA_NAME like 'myfile%'"
iquest "User %-9.9s uses %14.14s bytes in %8.8s files in '%s'" "SELECT USER_NAME, sum(DATA_SIZE),count(DATA_NAME),RESC_NAME"
iquest "select sum(DATA_SIZE) where COLL_NAME = '/cunyZone/home/<userid>'"
iquest "select sum(DATA_SIZE) where COLL_NAME like '/cunyZone/home/<userid>%'"
iquest "select sum(DATA_SIZE), RESC_NAME where COLL_NAME like '/cunyZone/home/<userid>%'"
iquest "select order_desc(DATA_ID) where COLL_NAME like '/cunyZone/home/<userid>%'"
iquest "select count(DATA_ID) where COLL_NAME like '/cunyZone/home/<userid>%'"
iquest "select RESC_NAME where RESC_CLASS_NAME IN ('bundle','archive')"
iquest "select DATA_NAME,DATA_SIZE where DATA_SIZE BETWEEN '100000' '100200'"

Sharing data

Access to the data can be controlled via 'ichmod' command. It's behavior is similar to UNIX 'chmod' command. For example if there is a need to provide user <userid> with read access to file myProject/myfile1 execute the following command:

  ichmod read <userid1> myProject/myfile1

To see who has access to a file/directory use:

  # ils -A myProject/myfile1
  /cunyZone/home/<userid>/myProject/myfile1
  ACL - <userid1>
  #cunyZone:read object   <userid>#cunyZone:own

In the above example user <userid1> has read access to the file and user <userid> is an owner of the file.

Possible levels of access to a data object are null/read/write/own.



Backups

Backups. /global/u user directories and SR1 Project files are backed up automatically to a remote tape silo system over a fiber optic network. Backups are performed daily.

If the user deletes a file from /global/u or SR1, it will remain on the tape silo system for 30 days, after which it will be deleted and cannot be recovered. If a user, within the 30 day window finds it necessary to recover a file, the user must expeditiously submit a request to hpchelp@csi.cuny.edu.

Less frequently accessed files are automatically transferred to the HPC Center robotic tape system, freeing up space in the disk storage pool and making it available for more actively used files. The selection criteria for the migration are age and size of a file. If a file is not accessed for 90 days, it may be moved to a tape in the tape library – in fact to two tapes, for backup. This is fully transparent to the user. When a file is needed, the system will copy the file back to the appropriate disk directory. No user action is required.



Data retention and account expiration policy

Project directories on SR1 are retained as long as the project is active. The HPC Center will coordinate with the Principal Investigator of the project before deleting a project directory. If the PI is no longer with CUNY, the HPC Center will coordinate with the PI’s departmental chair or Research Dean, whichever is appropriate.

For user accounts, current user directories under /global/u are retained as long as the account is active. If a user account is inactive for one year, the HPC Center will attempt to contact the user and request that the data be removed from the system. If there is no response from the user within three months of the initial notice, or if the user cannot be reached, the user directory will be purged.



DSMS Technical Summary

File Space Purpose Accessibility Quota Backups Purges
Scratch:

/scratch/<userid> on *PENZIAS, ANDY, SALK, BOB*

High Performance Parallel scratch filesystems. Work area for jobs, datasets, restart files, files to be pre-/post processed. Temporary space for data that will be removed within a short amount of time. Not globally accessible.

Separate /scratch/<userid> exists on each system. Visible on login and compute nodes of each system and on the data transfer nodes.

None None Files older than 2 weeks are automatically deleted

OR when scratch filesystem reaches 70% utilization

Home:

/global/u/<userid>

User home filespace. Essential data should be stored here, such as user's source code, documents, and data structures. Globally accessible on the login and on the data transfer nodes through native GPFS or NFS mounts Nominally 50 GB Yes, backed up nightly to tape. If the active copy is deleted, the most recent backup is stored for 30 days. Not purged
Project:

/SR1/<PID>

Project space allocations Accessible on the login and on the data transfer nodes. Accessible outside CUNY HPC Center through iRODS. Allocated according to project needs Yes, backed up nightly to tape. If the active copy is deleted, the most recent backup is stored for 30 days and retrievable on request, but the iRODS metadata may be lost.

SR1 is tuned for high bandwidth, redundancy, and resilience. It is not optimal for handling large quantities of small files. If you need to archive more than a thousand of files on SR1, please create a single archive using tar.

• A separate /scratch/<userid> exists on each system. On PENZIAS, SALK, KARLE, and ANDY, this is a Lustre parallel file system, on HERBERT it is NFS. These /scratch directories are visible on the login and compute nodes of the system only and on the data transfer nodes, but are not shared across HPC systems.

/scratch/<userid> is used as a high performance parallel scratch filesystem, for example, temporary files (e.g. restart files) should be stored here.

• There are no quotas on /scratch/<userid>, however any files older than 2 weeks are automatically deleted. Also, a cleanup script is scheduled to run every two weeks or whenever the /scratch disk space utilization exceeds 70%. Dot-files are generally left intact from these cleanup jobs.

• /scratch space is available to all users. If the scratch space is exhausted, jobs will not be able to run. Purge any files in /scratch/<userid>, which are no longer needed, even before the automatic deletion kicks in.

/scratch/<userid> directory may be empty when you login, you will need to copy any files required for submitting your jobs (submission scripts, data sets) from /global/u or from SR1. Once your jobs complete copy any files you need to keep back to /global/u or SR1 and remove all files from /scratch.

• Do not use /tmp for storing temporary files. The file system where /tmp resides in memory is very small and slow. Files will be regularly deleted by automatic procedures.

/scratch/<userid> is not backed up and there is no provision for retaining data stored in these directories.

Good data handling practices

DSMS, i.e., /global/u and SR1

• The DSMS is not an archive for non-HPC users. It is an archive for users who are processing data at the HPC Center. “Parking” files on the DSMS as a back-up to local data stores is prohibited.

• Do not store more than 1,000 files in a single directory. Store collections of small files into an archive (for example, tar). Note that for every file, a stub of about 4MB is kept on disk even if the rest of the file is migrated to tape, meaning that even migrated files take up some disk space. It also means that files smaller than the stub size are never migrated to tape because that would not make sense. Storing a large number of small files in a single directory degrades the file system performance.

/scratch

• Please regularly remove unwanted files and directories and avoid keeping duplicate copies in multiple locations. File transfer among the HPC Center systems is very fast. It is forbidden to use "touch jobs" to prevent the cleaning policy from automatically deleting your files from the /scratch directories. Use tar -xmvf, not tar -xvf to unpack files. tar -xmvf updates the times stamp on the unpacked files. The tar -xvf command preserves the time stamp from the original file and not the time when the archive was unpacked. Consequently, the automatic deletion mechanism may remove files unpacked by tar –xvf, which are only a few days old.