DATA

The folder "data" contains the smaller data files for the afternoon session of day 2. However, we would advice you to use the data provided on SAGA as it will make the data transfer to your folder much faster.

SCRIPTS

The folder "scripts" contains all the scripts to be run on SAGA. However, we would advice you that you transfer them from SAGA as described in the exercise below.

LECTURE

The folder "Lecture" contains the lecture from this session.

Missingness & Phylogenetic signal
MARE manual

EXERCISE

For getting started, copy all data from the folder "Day2Afternon" to your folder in the project area in SAGA
```
cd /cluster/projects/nn9458k/phylogenomics/$YOURNAME
cp -r ../week1/Day2Afternoon .
cd Day2Afternoon
```
From the concatenation this morning we have a concatenated dataset in fasta format as well as a corresponding partition file. However, here we will proceed with the original orthologous files from the study, so that it is only a subset of the files they used and not something new. These files are already provided in the folder. However, the FASconCAT output fotmat is a bit different from the input format needed for MARE. Therefore, we first must change it.
```
sed "s/DNA,/charset/" < Matrix_original_supermatrix_partition.txt | sed "s/$/ ;/" > Matrix_original_supermatrix_partition_MARE.txt
```

We need the program MARE.

wget http://software.zfmk.de/MARE_v0.1.2-rc.zip
unzip MARE_v0.1.2-rc.zip
cd MARE_v0.1.2-rc/
make
cd ../

For the matrix reduction, we will run MARE with different settings for -d and -t to test their influence on the exclusion of taxa and loci
```
sbatch sbatch_MARE.sh
```
Next we want to see what effect this has on the tree reconstruction
- Copy the file "sbatch_Supermatrix_MARE_tree.sh" to the three results folders
- Change them so that they fit to appropriate fasta file (check the MARE manual in the Lecture folder or at the link above for this; what name has the reduced supermatrix?)
- Submit the sbatch files
Next will calculate the average bootstrap support for each orthologous loci included in the original dataset
- Copy the .treefile files of the first 100 loci from the excerise of this morning to the folder "Day2Afternoon"
- "sbatch sbatch_TreSpEx_AveBoot.sh" from the folder "Day2Afternoon"

Let's take a look at the values

cat Average_BS_perPartition.txt
cut -f2 Average_BS_perPartition.txt | sort

Now we extract all files, which have an average bootstrap support above 70

awk -F" " '{if($2>70)print$1}' < Average_BS_perPartition.txt | sed "s/fas.treefile/fas/" > Average_BS_above70.txt
mkdir Above70
while read LINE; do cp ../Day2Morning/SingleGenes/$LINE Above70; done < Average_BS_above70.txt

Now we need to concatenate these again and run a tree reconstruction of the new supernatrix
```
cd Above70
```
- Copy FASconCAT-G to this folder as well as the "sbatch_Concatenation.sh" we used yesterday (see yesterday's exercise for this)
- Modify the .sh-file to suite your needs now
```
sbatch sbatch_Concatenation.sh
```
When it is done, run a tree reconstruction again on the supermatrix as explained above in point 5 (the concatenated files should be in the folder "Supermatrix"); I would suggest to rename "sbatch_Supermatrix_MARE_tree.sh" to "sbatch_Supermatrix_AveBoot_tree.sh"

RESULTS

The folder "Results" contains the most important results from this session.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DATA

SCRIPTS

LECTURE

EXERCISE

RESULTS

Files

README.md

Latest commit

History

README.md

File metadata and controls

DATA

SCRIPTS

LECTURE

EXERCISE

RESULTS