<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.csi.cuny.edu/cunyhpc/index.php?action=history&amp;feed=atom&amp;title=TOPHAT</id>
	<title>TOPHAT - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.csi.cuny.edu/cunyhpc/index.php?action=history&amp;feed=atom&amp;title=TOPHAT"/>
	<link rel="alternate" type="text/html" href="https://wiki.csi.cuny.edu/cunyhpc/index.php?title=TOPHAT&amp;action=history"/>
	<updated>2026-05-02T22:04:46Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.38.4</generator>
	<entry>
		<id>https://wiki.csi.cuny.edu/cunyhpc/index.php?title=TOPHAT&amp;diff=106&amp;oldid=prev</id>
		<title>James: Created page with &quot;The other tools in this collection, BOWTIE, CUFFLINKS, and SAMTOOLS are also installed at the CUNY HPC Center.  Additional information can be found at the TOPHAT home page here [http://tophat.cbcb.umd.edu/index.html].  At the CUNY HPC Center TOPHAT is installed on ANDY.  TOPHAT is a parallel threaded code (pthreads) that takes its input from a simple text file provided on the command line.  Below is an example SLURM script that will run the mRNA test case provided with t...&quot;</title>
		<link rel="alternate" type="text/html" href="https://wiki.csi.cuny.edu/cunyhpc/index.php?title=TOPHAT&amp;diff=106&amp;oldid=prev"/>
		<updated>2022-10-20T20:22:57Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;The other tools in this collection, BOWTIE, CUFFLINKS, and SAMTOOLS are also installed at the CUNY HPC Center.  Additional information can be found at the TOPHAT home page here [http://tophat.cbcb.umd.edu/index.html].  At the CUNY HPC Center TOPHAT is installed on ANDY.  TOPHAT is a parallel threaded code (pthreads) that takes its input from a simple text file provided on the command line.  Below is an example SLURM script that will run the mRNA test case provided with t...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;The other tools in this collection, BOWTIE, CUFFLINKS,&lt;br /&gt;
and SAMTOOLS are also installed at the CUNY HPC Center.  Additional information can be found at the TOPHAT&lt;br /&gt;
home page here [http://tophat.cbcb.umd.edu/index.html].&lt;br /&gt;
&lt;br /&gt;
At the CUNY HPC Center TOPHAT is installed on ANDY.  TOPHAT is a parallel threaded code (pthreads)&lt;br /&gt;
that takes its input from a simple text file provided on the command line.  Below is an example SLURM script that will run&lt;br /&gt;
the mRNA test case provided with the distribution and which can be copied from the local installation directory&lt;br /&gt;
to your current location as follows:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
cp /share/apps/tophat/default/examples/* .&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
To include all required environmental variables and the path to the TOPHAT executable run the modules load command (the&lt;br /&gt;
modules utility is discussed in detail above):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
module load tophat&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Running &amp;#039;tophat&amp;#039; from the interactive prompt without any options will provide a brief description of the form of the &lt;br /&gt;
command-line argument and options. Here is SLURM batch script that builds  the index and aligns the sequences in serial mode:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SBATCH --partition production&lt;br /&gt;
#SBATCH --job-name TOPHAT_serial&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --ntasks=1&lt;br /&gt;
#SBATCH --mem=2880&lt;br /&gt;
&lt;br /&gt;
# Find out name of master execution host (compute node)&lt;br /&gt;
echo -n &amp;quot;&amp;gt;&amp;gt;&amp;gt;&amp;gt; SLURM Master compute node is: &amp;quot;&lt;br /&gt;
hostname&lt;br /&gt;
&lt;br /&gt;
# You must explicitly change to the working directory in SLURM&lt;br /&gt;
cd $SLURM_SUMBIT_DIR&lt;br /&gt;
&lt;br /&gt;
# Point to the execution directory to run&lt;br /&gt;
echo &amp;quot;&amp;gt;&amp;gt;&amp;gt;&amp;gt; Begin TOPHAT Serial Run ...&amp;quot;&lt;br /&gt;
tophat -r 20 test_ref reads_1.fq reads_2.fq &amp;gt; tophat_mrna.out 2&amp;gt;&amp;amp;1&lt;br /&gt;
echo &amp;quot;&amp;gt;&amp;gt;&amp;gt;&amp;gt; End   TOPHAT Serial Run ...&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This script can be dropped in to a file (say tophat_ser.job) and started with the command:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
qsub tophat_ser.job&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Running the mRNA test case should take less than 2 minutes and will produce SLURM output and error files beginning with&lt;br /&gt;
the job name &amp;#039;TOPHAT_serial&amp;#039;. The primary TOPHAT application results will be written into the user-specified file at the end&lt;br /&gt;
of the TOPHAT command line after the greater-than sign. Here it is named &amp;#039;tophat_mrna.out.&amp;#039;  The expression &amp;#039;2&amp;gt;&amp;amp;1&amp;#039; at the end&lt;br /&gt;
of the command-line combines Unix standard output from the program with Unix standard error. Users should always explicitly&lt;br /&gt;
specify the name of the application&amp;#039;s output file in this way to ensure that it is written directly into the user&amp;#039;s working directory&lt;br /&gt;
which has much more disk space than the SLURM spool directory on /var.&lt;br /&gt;
&lt;br /&gt;
Details on the meaning of the SLURM script are covered below in the SLURM section. The most important lines are the &amp;#039;#SBATCH --nodes=1:ntasks=1 mem=2880&amp;#039;.  The first instructs SLURM to select 1 resource &amp;#039;chunk&amp;#039; with 1 processor (core) and 2,880 MBs&lt;br /&gt;
of memory in it for the job. The second instructs SLURM to place this job wherever the least used resources can be  found (freely).&lt;br /&gt;
The master compute node that SLURM finally selects to run your job will be printed in the SLURM output file by the &amp;#039;hostname&amp;#039;&lt;br /&gt;
command.&lt;br /&gt;
&lt;br /&gt;
To run TOPHAT in parallel-threads mode several changes to the script are required.  Here is a modified script&lt;br /&gt;
that shows how to run TOPHAT using two threads.  ANDY has as many as 8 physical compute cores per compute&lt;br /&gt;
node, and therefore as many as 8 cores-threads might be chosen.  Once a parallel job starts it will generally (not&lt;br /&gt;
always) complete in less time, but jobs requesting a larger the number of cores-threads or memory per node&lt;br /&gt;
may wait longer to start on a busy system as SLURM looks for a compute node with all the resources requested.  &lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
#!/bin/bash&lt;br /&gt;
#SLURM --partition production&lt;br /&gt;
#SBATCH --job-name TOPHAT_threads&lt;br /&gt;
#SBATCH --nodes=1&lt;br /&gt;
#SBATCH --ntasks=2&lt;br /&gt;
#SBATCH --mem=5760&lt;br /&gt;
&lt;br /&gt;
# Find out name of master execution host (compute node)&lt;br /&gt;
echo -n &amp;quot;&amp;gt;&amp;gt;&amp;gt;&amp;gt; SBATCH Master compute node is: &amp;quot;&lt;br /&gt;
hostname&lt;br /&gt;
&lt;br /&gt;
# You must explicitly change to the working directory in SBATCH&lt;br /&gt;
cd $SBATCH_SUBMIT_DIR&lt;br /&gt;
&lt;br /&gt;
# Point to the execution directory to run&lt;br /&gt;
echo &amp;quot;&amp;gt;&amp;gt;&amp;gt;&amp;gt; Begin TOPHAT Threaded Run ...&amp;quot;&lt;br /&gt;
tophat -p 2 -r 20 test_ref reads_1.fq reads_2.fq &amp;gt; tophat_thrds.out 2&amp;gt;&amp;amp;1&lt;br /&gt;
echo &amp;quot;&amp;gt;&amp;gt;&amp;gt;&amp;gt; End   TOPHAT Threaded Run ...&amp;quot;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Notice the difference in the &amp;#039;-l select&amp;#039; line where the resource &amp;#039;chunk&amp;#039; now includes 2 cores (ntasks=2) and requests&lt;br /&gt;
twice as much memory as before.  Also, notice that the TOPHAT command-line now includes the &amp;#039;-p 2&amp;#039; option to&lt;br /&gt;
run the code with 2 threads working in parallel.   Perfectly or &amp;#039;embarrassingly&amp;#039; parallel workloads can run close to&lt;br /&gt;
2, 4, or more times as fast as the same workload in serial mode depending on the number of threads requested, but&lt;br /&gt;
workloads cannot be counted on to be perfectly parallel. &lt;br /&gt;
&lt;br /&gt;
The speed ups that you observe will typically be less than perfect and diminish as you ask for more cores-threads.&lt;br /&gt;
Large data jobs will typically scale more efficiently as you add cores-threads, but users should take note of the performance&lt;br /&gt;
gains that they see as cores-threads are added and select a core-thread count the provides efficient scaling and avoids&lt;br /&gt;
diminishing returns.&lt;/div&gt;</summary>
		<author><name>James</name></author>
	</entry>
</feed>