DATA

The folder "data" contains the smaller data files for the morning session of day 4. However, we would advice you to use the data provided on SAGA as it will make the data transfer to your folder much faster.

SCRIPTS

The folder "scripts" contains all the scripts to be run on SAGA. However, we would advice you that you transfer them from SAGA as described in the exercise below.

LECTURE

The folder "Lecture" contains the lecture from this session.

Evolutionary rate & Saturation

EXERCISE

For getting started, copy all data from the folder "Day4Morning" to your folder in the project area in SAGA
```
cd /cluster/projects/nn9458k/phylogenomics/$YOURNAME
cp -r ../week1/Day4Morning .
cd Day4Morning
```
For the calculation of the saturation indices based on slope we will need the treefiles and phylip alignment files for each orthologous loci included in the original dataset
- Copy the .treefile files and relaxed phylip files of the first 100 loci from the excerise of the morning of Day 2 to this folder
```
sbatch sbatch_TreSpEx_Saturation.sh
```
For the c indices will only need the alignment file of the supermatrix with its partitions. Both are already in the folder.
```
sbatch sbatch_BaCoCa.sh
```
Download the following files to your own computer using scp
- Correlation_Results/Correlation_Slope_Summary.txt
- BaCoCa_Results/summarized_frequencies.txt

On your own computer

* Open "summarized_frequencies.txt" in a text editor, delete the first line and save the file

Important these two txt-files in R studio using "Import Database/From text (base)"; the heading to "yes"; RowNames to "Use first column"
We create now density plots in R to explore the distribution of the data
- Create a new R script and type in it the following:
```
x <- density(Correlation_Slope_Summary$Slope)
plot(x)
y <- density(Correlation_Slope_Summary$R2)
plot(y)
z <- density(log10(summarized_frequencies$c.value))
plot(z)
```
- execute the R script
- explore the plots, what could be a reasonable threshold?

Back on SAGA

We now extract all files, which have a value above your specified threshold and which shall be included; please do the following step for all three values (c value, R2 and slope); one example is given

awk -F"\t" '{if($26<100)print$1}' < BaCoCa_Results/summarized_frequencies.txt | sed "s/locus/FcC_locus/" | sed "s/$/.phy/" > summarized_frequencies_below100.txt
mkdir Cvalue_below100
while read LINE; do cp ../Day2Morning/SingleGenes/$LINE Cvalue_below100; done < summarized_frequencies_below100.txt

Now we need to concatenate these again and run a tree reconstruction of the new supermatrix
```
cd Cvalue_above100
```
- Copy FASconCAT-G to this folder as well as the sbatch_Concatenation.sh we used yesterday (see yesterday's exercise for this)
- Modify the .sh-file to suite your needs now
```
sbatch sbatch_Concatenation.sh
```
When it is done, run a tree reconstruction again on the supermatrix as done before

RESULTS

The folder "Results" contains the most important results from this session.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DATA

SCRIPTS

LECTURE

EXERCISE

RESULTS

Files

README.md

Latest commit

History

README.md

File metadata and controls

DATA

SCRIPTS

LECTURE

EXERCISE

RESULTS