Applications Environment/itasser

From HPCC Wiki
Jump to navigation Jump to search

I-TASSER



Description: I-Tasser is a platform for protein structure and function predictions. 3D models are built based on multiple-threading alignments by LOMETS and iterative template fragment assembly simulations; function inslights are derived by matching the 3D models with BioLiP protein function database.

Additional Notes: Package Contents include:

I-TASSER: A standalone I-TASSER package for protein 3D structure prediction and refinement.
COACH: A function annotation program based on COFACTOR, TM-SITE and S-SITE.
COFACTOR: A program for ligand-binding site, EC number & GO term prediction.
TM-SITE: A structure-based approach for ligand-binding site prediction.
S-SITE: A sequence-based approach for ligand-binding site prediction.
LOMETS: A set of locally installed threading programs for meta-server protein fold-recognition.
MUSTER: A threading program to identify templates from a non-redundant protein structure library.
SPICKER: A clustering program to identify near-native protein model from structure decoys.
HAAD: A program for quickly adding hydrogen atoms to protein heavy-atom structures.
EDTSurf: A program to construct triangulated surfaces of protein molecules.
ModRefiner: A program to construct and refine atomic-level protein models from C-alpha traces.
NW-align: A robust program for protein sequence-to-sequence alignments by Needleman-Wunsch algorithm.
PSSpred: A highly accurate program for protein secondary structure prediction.

Availability: PENZIAS

Module file: itasser

Citation: Include in published paper the following citation regarding use of I-TASSER:

Zhang, Y. and Skolnick, J., 2004, PNAS (for TASSER method), J.Yang,R.Yan,R.Roy,D.Xu,J.Poisson,Y.Zhang The I-TASSER Suite.Protein structure and function prediction. Nature Methods (2014)., Wu S, Zhang Y Nucleic Acids Res. 2007; 35(10):3375-82 (for MUSTER), Wu S, Zhang Y Nucleic Acids Res. 2007; 35(10):3375-82. (for LOMETS)

Documentation:There is no complete documentation for the package. Some examples are available on I-TASSER web server site: http://zhanglab.ccmb.med.umich.edu/I-TASSER/
Tutorials: There are not comprehensive tutorials as well. The useful information for beginners can be found in the I-TASSER forum at: http://zhanglab.ccmb.med.umich.edu/bbs/?q=forum/2

Related Packages:

HHpred
RaptorX
Modeller

Use: Load module with command:

 
module load itasser 

Loading module will set up a path to all programs installed under I-TASSER umbrella (see above list).

Example: The I-TASSER prediction pipeline includes four general steps: template identification, structure re-assembly, atomic model construction, and final model selection.

Step 1. Template identification.
There are 2 scripts in I-TASSER suite dedicated to template identification. The first one is called MUSTER, which uses an extended sequence profile-profile alignment algorithm with the alignment score assisted by secondary structure match, fragment structure profile, solvent accessibility, backbone torsion angle, and hydrophobic scoring matrix. The second one is LOMETS used for a local meta-threading.
Step 2. Structure assembly.
The cluster centroids are generated by SPICKER, which clusters all the trajectories. I-TASSER structure assembly simulations contain 14 independent runs by default. This number can be modified if the user wants to run more simulations, especially for big protein without good templates.
Step 3. Structure assembly.
The SPICKER cluster centroids from I-TASSER are reduced models, with each residue represented by its Cα and side-chain center. The full models can be constructed via other application i.e. REMO which is not part of installed suite.
Step 4. Model Selection.
The procedure of model selection is matter of choice according to the current research workflow.


The I-TASSER can be run in sequential or parallel mode. The application is perl script which uses default perl interpreter on a cluster. The name of the application is runI-TASSER.pl.

For serial mode the following script can be used. Please note that for sequential scripts the chunk is ntasks=1, and the placement is pack (-l place=pack).


For meaning and proper choice of values in <chunks>and <tasks> fields please read the section “Writing a job submit script.”

PENZIAS

Sample submit script
#!/bin/bash 
#!/bin/bash 
#SBATCH --partition production
#SBATCH --job-name TASSER 
#SBATCH --nodes=<chunks>
#SBATCH --ntasks=<<tasks>

# Explicitly change to the working directory in SLURM 
cd $SLURM_SUBMIT_DIR  

# Start job

 runI-TASSER.pl –pkgdir /share/apps/itasser/4.2 –libdir /share/apps/itasser/4.2/ITLIB –seqname <name_of_input_sequence> -datadir $HOME –java_home /usr 2>&1 

echo ">>>> End <<<<”
Parallel submit script

For parallel run on I-TASSER the SBATCH job script below can be used. The parallelism is quite primitive: the I-TASSER script submits several parts of itself at the same time to different cores. Thus the number of used cores is determined by set in –l select=[N:]<chunks> line (resources list) in SBATCH job script. For example for running the below script on 4 cores the chunk could be:

 nodes=4 ntasks=1


#!/bin/bash 
#SBATCH --partition production
#SBATCH --job-name ITASSERJOB 
#SBATCH --nodes=[N:]<<chunk>


# Explicitly change to the working directory in SLURM 
cd $SLURM_SUBMIT_DIR  

# Start job

 runI-TASSER.pl –pkgdir /share/apps/itasser/4.2 –libdir /share/apps/itasser/4.2/ITLIB –seqname <name_of_input_sequence> -datadir $HOME –runstyle parallel –java_home /usr 2>&1 

echo ">>>> End <<<<”


Additional Notes: The I-TASSER is a selection of several perl scripts. The actual use of the scripts varies according to a particular research workflow. The runI-TASSER.pl is a main perl script. It can take the following parameters apart from mentioned above:

-homoflag, [real, benchmark],"real" will use all templates, "benchmark" will exclude homologous templates
-idcut, sequence identity cutoff for "benchmark" runs, default value is 0.3, range is in [0,1]
-ntemp, number of top templates output for each threading program, default is 20, range is in [1,50]
-nmodel, number of final models output by I-TASSER, default value is 5, range is in [1,10]
-LBS, [true or false], whether to predict ligand-binding site, default is false
-EC, [true or false], whether to predict EC number, default is false
-GO, [true or false], whether to predict GO terms, default is false
-restraint1, specify distance/contact restraints
-restraint2, specify template with alignment
-restraint3, specify template name without alignment
-restraint4, specify template file without alignment
-temp_excl, exclude specific templates from template library
-traj, this option means to deposit the trajectory files
-hours, specify maximum hours of simulations (default=5 when -light=true). Must match or be less than walltime on SLURM chunk .
-outdir, where the final results should be saved. Default is start up directory.