SAMTOOLS: Difference between revisions
(Created page with "SAMTOOLS is part of a sequence alignment and analysis tool chain developed at John Hopkins, University of California at Berkeley, and Harvard, and distributed through the Center for Bioinformatics and Computational Biology. The other tools in this collection, BOWTIE, CUFFLINKS, and TOPHAT are also installed at the CUNY HPC Center. Additional information can be found at the SAMTOOLS home page here [http://samtools.sourceforge.net/index.shtml]. At the CUNY HPC Center S...") |
m (Text replacement - "[pP][bB][sS]" to "SLURM") |
||
Line 7: | Line 7: | ||
utilities for extracting, reformatting, and displaying nucleotide sequences. The primary tool is called 'samtools' | utilities for extracting, reformatting, and displaying nucleotide sequences. The primary tool is called 'samtools' | ||
and offers a large number of command-line options. For smaller tasks, SAMTOOLS can be run interactively, | and offers a large number of command-line options. For smaller tasks, SAMTOOLS can be run interactively, | ||
but should be run in | but should be run in SLURM batch mode when larger, longer tasks are anticipated. NOTE: that display tasks | ||
cannot be run in pure | cannot be run in pure SLURM batch mode because the output is displayed. Larger display tasks should be | ||
run in | run in SLURM interactive mode as described in the SLURM section elsewhere in this document. | ||
Below is an example | Below is an example SLURM script that will convert the 'toy.sam' file provided with the distribution from | ||
SAM to BAM format. This and all the example files can be copied from the local installation directory | SAM to BAM format. This and all the example files can be copied from the local installation directory | ||
to your current location as follows: | to your current location as follows: | ||
Line 27: | Line 27: | ||
Running 'samtools' from the interactive prompt without any options will provide a brief description of the form of the | Running 'samtools' from the interactive prompt without any options will provide a brief description of the form of the | ||
command-line argument and options. Here is | command-line argument and options. Here is SLURM batch script that does a short format conversion in batch mode: | ||
<pre> | <pre> | ||
#!/bin/bash | #!/bin/bash | ||
# | #SLURM -q production | ||
# | #SLURM -N SAMTLS_serial | ||
# | #SLURM -l select=1:ncpus=1:mem=2880mb | ||
# | #SLURM -l place=free | ||
# | #SLURM -V | ||
# Find out name of master execution host (compute node) | # Find out name of master execution host (compute node) | ||
echo -n ">>>> | echo -n ">>>> SLURM Master compute node is: " | ||
hostname | hostname | ||
# You must explicitly change to the working directory in | # You must explicitly change to the working directory in SLURM | ||
cd $ | cd $SLURM_O_WORKDIR | ||
# Point to the execution directory to run | # Point to the execution directory to run | ||
Line 56: | Line 56: | ||
</pre> | </pre> | ||
Running this conversion test case should take less than 1 minutes and will produce | Running this conversion test case should take less than 1 minutes and will produce SLURM output and error files beginning with | ||
the job name 'SAMTLS_serial'. The primary SAMTOOLS application results will be written into the user-specified file at the end | the job name 'SAMTLS_serial'. The primary SAMTOOLS application results will be written into the user-specified file at the end | ||
of the TOPHAT command line after the greater-than sign. Here it is named 'toy.bam' The expression '2 > toy.err' at the end | of the TOPHAT command line after the greater-than sign. Here it is named 'toy.bam' The expression '2 > toy.err' at the end | ||
of the command-line directs Unix standard error to the file 'toy.err'. Users should always explicitly specify the name of the | of the command-line directs Unix standard error to the file 'toy.err'. Users should always explicitly specify the name of the | ||
application's output files to ensure that they are written directly into the user's working directory which has much more disk | application's output files to ensure that they are written directly into the user's working directory which has much more disk | ||
space than the | space than the SLURM spool directory on /var. | ||
Details on the meaning of the | Details on the meaning of the SLURM script are covered below in the SLURM section. The most important lines are the '#SLURM -l select=1:ncpus=1:mem=2880mb' | ||
and the '# | and the '#SLURM -l pack=free' lines. The first instructs SLURM to select 1 resource 'chunk' with 1 processor (core) and 2,880 MBs | ||
of memory in it for the job. The second instructs | of memory in it for the job. The second instructs SLURM to place this job on the least buy node where the requested resources | ||
can be found (freely). The master compute node that | can be found (freely). The master compute node that SLURM finally selects to run your job will be printed in the SLURM output file | ||
by the 'hostname' command. | by the 'hostname' command. |
Latest revision as of 20:36, 20 October 2022
SAMTOOLS is part of a sequence alignment and analysis tool chain developed at John Hopkins, University of California at Berkeley, and Harvard, and distributed through the Center for Bioinformatics and Computational Biology. The other tools in this collection, BOWTIE, CUFFLINKS, and TOPHAT are also installed at the CUNY HPC Center. Additional information can be found at the SAMTOOLS home page here [1].
At the CUNY HPC Center SAMTOOLS is installed on ANDY. SAMTOOLS is a collection of utilities for extracting, reformatting, and displaying nucleotide sequences. The primary tool is called 'samtools' and offers a large number of command-line options. For smaller tasks, SAMTOOLS can be run interactively, but should be run in SLURM batch mode when larger, longer tasks are anticipated. NOTE: that display tasks cannot be run in pure SLURM batch mode because the output is displayed. Larger display tasks should be run in SLURM interactive mode as described in the SLURM section elsewhere in this document.
Below is an example SLURM script that will convert the 'toy.sam' file provided with the distribution from SAM to BAM format. This and all the example files can be copied from the local installation directory to your current location as follows:
cp /share/apps/samtools/default/examples/* .
To include all required environmental variables and the path to the SAMTOOLS executables run the modules load command (the modules utility is discussed in detail above):
module load samtools
Running 'samtools' from the interactive prompt without any options will provide a brief description of the form of the command-line argument and options. Here is SLURM batch script that does a short format conversion in batch mode:
#!/bin/bash #SLURM -q production #SLURM -N SAMTLS_serial #SLURM -l select=1:ncpus=1:mem=2880mb #SLURM -l place=free #SLURM -V # Find out name of master execution host (compute node) echo -n ">>>> SLURM Master compute node is: " hostname # You must explicitly change to the working directory in SLURM cd $SLURM_O_WORKDIR # Point to the execution directory to run echo ">>>> Begin SAMTLS Serial Run ..." samtools view -bS toy.sam > toy.bam 2 > toy.err echo ">>>> End SAMTLS Serial Run ..."
This script can be dropped in to a file (say samtools_ser.job) and started with the command:
qsub samtools_ser.job
Running this conversion test case should take less than 1 minutes and will produce SLURM output and error files beginning with the job name 'SAMTLS_serial'. The primary SAMTOOLS application results will be written into the user-specified file at the end of the TOPHAT command line after the greater-than sign. Here it is named 'toy.bam' The expression '2 > toy.err' at the end of the command-line directs Unix standard error to the file 'toy.err'. Users should always explicitly specify the name of the application's output files to ensure that they are written directly into the user's working directory which has much more disk space than the SLURM spool directory on /var.
Details on the meaning of the SLURM script are covered below in the SLURM section. The most important lines are the '#SLURM -l select=1:ncpus=1:mem=2880mb' and the '#SLURM -l pack=free' lines. The first instructs SLURM to select 1 resource 'chunk' with 1 processor (core) and 2,880 MBs of memory in it for the job. The second instructs SLURM to place this job on the least buy node where the requested resources can be found (freely). The master compute node that SLURM finally selects to run your job will be printed in the SLURM output file by the 'hostname' command.