Osnat Penn
Eyal Privman
Haim Ashkenazy
Itamar Sela
Giddy Landan
Dan Graur
Tal Pupko
GUIDANCE is a software package for aligning biological sequences (DNA or amino acids) using either MAFFT, PRANK, or CLUSTALW, and calculating confidence scores for each column, sequence and residue in the alignment.
-
Sela, I., Ashkenazy, H., Katoh, K. and Pupko, T. (2015) GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters. Nucleic Acids Research, 2015 Jul 1; 43: W7-W14.; doi: 10.1093/nar/gkq443
-
Landan, G., and D. Graur. (2008). Local reliability measures from sets of co-optimal multiple sequence alignments. Pac Symp Biocomput 13:15-24
- Penn, O., Privman, E., Landan, G., Graur, D. and Pupko, T. (2010). An alignment confidence score capturing robustness to guide-tree uncertainty. Molecular Biology and Evolution, 2010 Aug;27(8):1759-67; doi:10.1093/molbev/msq066
- Landan, G., and D. Graur. (2008). Local reliability measures from sets of co-optimal multiple sequence alignments. Pac Symp Biocomput 13:15-24
-
Unpack the archive.
$ tar -xzf guidance.v2.01.tgz
-
Compile the package.
$ cd guidance.v2.01 $ make # Running `make' takes a while
-
Set <path/to/guidance>/guidance/www/Guidance in the $PATH environment variable in your .bash_profile.
-
Check for the external dependencies.
-
BioInformatics:
- MAFFT: Type "mafft" and check that you have version 6.712 or newer.
- PRANK: Type "prank" and check that you have it insalled.
- CLUSTALW: Type "clustalw" and check that you have it insalled.
- MUSCLE: Type "muscle" and check that you have it insalled.
- PAGAN: Type "pagan" and check that you have it installed.
- In case PAGAN is used not for user provided alignment (using --msaFile), mafft is also required to be installed.
-
Other Languages:
Notes:
* Guidance scripts use relative paths. Please do not move them out of their packaged directories.
* GUIDANCE uses flags in the command line arguments
* For help, type: "perl guidance"
$ perl guidance --seqFile SEQFILE --msaProgram [MAFFT|PRANK|CLUSTALW|MUSCLE] --seqType [aa|nuc|codon] --outDir FULL_PATH_OUTDIR
Required parameters:
--seqFile Input sequence file in FASTA format
--seqType Sequence type may be either of: nuc (nucleotides), aa (amino acids),
or codon (nucleotides that will be treated as whole codons)
--msaProgram The alignment program - may be either MAFFT, PRANK, CLUSTALW or MUSCLE
--outDir The output directory were all output files will be created [please provide full and not relative path]
(will be created automatically)
Optional parameters:
--program The confidence measure may be GUIDANCE2, GUIDANCE or HoT. default=GUIDANCE2
--bootstraps Number of bootstrap iterations. default=100
--genCode Genetic code for use in codon sequence. default=1
1> Nuclear Standard
15> Nuclear Blepharisma
6> Nuclear Ciliate
10> Nuclear Euplotid
2> Mitochondria Vertebrate
5> Mitochondria Invertebrate
3> Mitochondria Yeast
13> Mitochondria Ascidian
9> Mitochondria Echinoderm
14> Mitochondria Flatworm
4> Mitochondria Protozoan
--outOrder May be either aligned or as_input. default=aligned
--msaFile Input alignment file - not recommended, see documentation online at: guidance.tau.ac.il
--seqCutoff Sequence confidence cutoff between 0 to 1. default=0.6
--colCutoff Columnd confidence cutoff between 0 to 1. default=0.93
--mafft Path to mafft executable. default=mafft
--prank Path to prank executable. default=prank
--clustalw path to clustalw executable. default=clustalw
--muscle path to muscle executable. default=muscle
--pagan path to pagan executable. default=pagan
--ruby path to ruby executable. default=ruby
--dataset Unique name for the Dataset - will be used as prefix to outputs (default=MSA)
--MSA_Param: Passing parameters for the alignment program e.g -F to prank. To pass parameter containning '-' in it, add \ before each '-' e.g. \-F for PRANK
--proc_num: Number of processors to use (default=1)
$ perl guidance.pl --seqFile protein.fas --msaProgram MAFFT --seqType aa --outDir /somedir/protein.guidance
# Will align the amino acid sequences in the fasta file "protein.fas" using MAFFT and output all results to the diretory
# "/somedir/protein.guidance"
$ perl guidance.pl --seqFile codingSeq.fas --msaProgram PRANK --seqType codon --outDir /somedir/codingSeq.guidance --genCode 2 --bootstraps 30
# Will align the codon sequences in the fasta file "codingSeq.fas" using PRANK after translation using the vertebrate
# mitochondrial genetic code and output all results to the diretory "/somedir/codingSeq.guidance". Only 30 bootstrap iterations
# will be done instead of the default 100 (cut run-time by a factor of 3)
* To modify the code, or use parts of it for other purposes, permission should be requested. Please contact Tal Pupko: [email protected]
* Please note that the use of the GUIDANCE program is for academic use only