The folder "data" contains the smaller data files for the afternoon session of day 2. However, we would advice you to use the data provided on SAGA as it will make the data transfer to your folder much faster.
The folder "scripts" contains all the scripts to be run on SAGA. However, we would advice you that you transfer them from SAGA as described in the exercise below.
The folder "Lecture" contains the lecture from this session.
-
For getting started, copy all data from the folder "Day2Afternon" to your folder in the project area in SAGA
cd /cluster/projects/nn9458k/phylogenomics/$YOURNAME cp -r ../week1/Day2Afternoon . cd Day2Afternoon
-
From the concatenation this morning we have a concatenated dataset in fasta format as well as a corresponding partition file. However, here we will proceed with the original orthologous files from the study, so that it is only a subset of the files they used and not something new. These files are already provided in the folder. However, the FASconCAT output fotmat is a bit different from the input format needed for MARE. Therefore, we first must change it.
sed "s/DNA,/charset/" < Matrix_original_supermatrix_partition.txt | sed "s/$/ ;/" > Matrix_original_supermatrix_partition_MARE.txt
-
We need the program MARE.
wget http://software.zfmk.de/MARE_v0.1.2-rc.zip unzip MARE_v0.1.2-rc.zip cd MARE_v0.1.2-rc/ make cd ../
-
For the matrix reduction, we will run MARE with different settings for -d and -t to test their influence on the exclusion of taxa and loci
sbatch sbatch_MARE.sh
-
Next we want to see what effect this has on the tree reconstruction
-
Copy the file "sbatch_Supermatrix_MARE_tree.sh" to the three results folders
-
Change them so that they fit to appropriate fasta file (check the MARE manual in the Lecture folder or at the link above for this; what name has the reduced supermatrix?)
-
Submit the sbatch files
-
-
Next will calculate the average bootstrap support for each orthologous loci included in the original dataset
-
Copy the .treefile files of the first 100 loci from the excerise of this morning to the folder "Day2Afternoon"
-
"sbatch sbatch_TreSpEx_AveBoot.sh" from the folder "Day2Afternoon"
-
-
Let's take a look at the values
cat Average_BS_perPartition.txt cut -f2 Average_BS_perPartition.txt | sort
-
Now we extract all files, which have an average bootstrap support above 70
awk -F" " '{if($2>70)print$1}' < Average_BS_perPartition.txt | sed "s/fas.treefile/fas/" > Average_BS_above70.txt mkdir Above70 while read LINE; do cp ../Day2Morning/SingleGenes/$LINE Above70; done < Average_BS_above70.txt
-
Now we need to concatenate these again and run a tree reconstruction of the new supernatrix
cd Above70
- Copy FASconCAT-G to this folder as well as the "sbatch_Concatenation.sh" we used yesterday (see yesterday's exercise for this)
- Modify the .sh-file to suite your needs now
sbatch sbatch_Concatenation.sh
-
When it is done, run a tree reconstruction again on the supermatrix as explained above in point 5 (the concatenated files should be in the folder "Supermatrix"); I would suggest to rename "sbatch_Supermatrix_MARE_tree.sh" to "sbatch_Supermatrix_AveBoot_tree.sh"
The folder "Results" contains the most important results from this session.