The folder "data" contains the smaller data files for the morning session of day 2. However, we would advice you to use the data provided on SAGA as it will make the data transfer to your folder much faster.
The folder "scripts" contains all the scripts to be run on SAGA. However, we would advice you that you transfer them from SAGA as described in the exercise below.
The folder "Lecture" contains the lecture from this session.
-
For the TreSpEx analyses we first need to install some modules in your home directory as they are not included in the availables ones on SAGA.
module load Perl/5.32.0-GCCcore-10.2.0 export PERL_CPANM_HOME=/tmp/cpanm_$USER # this time the variable is a variable of the system and can be used as is. cpanm Statistics::LineFit cpanm --force Statistics::Test::WilcoxonRankSum export PERL5LIB=/cluster/home/$USER/perl5/lib/perl5:/cluster/home/$USER/perl5/:$PERL5LIB # this time the variable is a variable of the system and can be used as is. nano .bashrc copy the line from e) into the text editor at the end close the file and save it with the same name
-
For getting started, copy all data from the folder "Day2Morning" to your folder in the project area in SAGA.
cd /cluster/projects/nn9458k/phylogenomics/$YOURNAME cp -r ../week1/Day2Morning . cd Day2Morning
-
First we need to get a tree with bootstrap values for each individual loci.
cd SingleGenes sh SingleGene_Analyses.sh
-
For the actual paralogy screening, we need the program TreSpEx and the blast folder within it. We will also need a relaxed phylip file format for the alignment input.
perl /cluster/projects/nn9458k/phylogenomics/week1/Programs/FASconCAT-G/FASconCAT-G_v1.05.pl -o -a -p -p -s cd ../ cp -r /cluster/projects/nn9458k/phylogenomics/week1/Programs/TreSpEx/TreSpEx.v1.2_SAGA.pl /cluster/projects/nn9458k/phylogenomics/week1/Programs/TreSpEx/blast .
-
For the paralogy screening and cleaning you will run TreSpEx on the trees with bootstrap values.
sbatch sbatch_TreSpEx_Paralogy.sh
-
The alignment files have been sorted into the folders "FilesPruned" and "FilesNotPruned" within the "Results" folder. They have to be concatenated into one file now.
mkdir SingleGenesPara cp Results/FilesNotPruned/FcC_* Results/FilesPruned/FcC_* SingleGenesPara/ cd SingleGenesPara cp ../../../week1/Programs/FASconCAT-G/FASconCAT-G_v1.05.pl . cp ../../Day1Afternoon/SingleGenes/sbatch_Concatenation.sh .
- Modify the ".sh" file to fit your needs for this analysis
sbatch sbatch_Concatenation.sh
-
Next you will run a tree reconstruction to assess effect this paralogy pruning approach.
cd Supermatrix cp ../../sbatch_Supermatrix_Cleaned_tree.sh .
- Check if the name of the supermatrix is fitting for your supermatrix. If not please change it accordingly.
sbatch sbatch_Supermatrix_Cleaned_tree.sh
The folder "Results" contains the most important results from this session.