Skip to content

Latest commit

 

History

History
 
 

week1_day2_morning

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

The folder "data" contains the smaller data files for the morning session of day 2. However, we would advice you to use the data provided on SAGA as it will make the data transfer to your folder much faster.

The folder "scripts" contains all the scripts to be run on SAGA. However, we would advice you that you transfer them from SAGA as described in the exercise below.

The folder "Lecture" contains the lecture from this session.

EXERCISE

  1. For the TreSpEx analyses we first need to install some modules in your home directory as they are not included in the availables ones on SAGA.

    module load Perl/5.32.0-GCCcore-10.2.0
    export PERL_CPANM_HOME=/tmp/cpanm_$USER # this time the variable is a variable of the system and can be used as is.
    cpanm Statistics::LineFit
    cpanm --force Statistics::Test::WilcoxonRankSum
    export PERL5LIB=/cluster/home/$USER/perl5/lib/perl5:/cluster/home/$USER/perl5/:$PERL5LIB # this time the variable is a variable of the system and can be used as is.
    nano .bashrc
    copy the line from e) into the text editor at the end
    close the file and save it with the same name
    
  2. For getting started, copy all data from the folder "Day2Morning" to your folder in the project area in SAGA.

    cd /cluster/projects/nn9458k/phylogenomics/$YOURNAME
    cp -r ../week1/Day2Morning .
    cd Day2Morning
    
  3. First we need to get a tree with bootstrap values for each individual loci.

    cd SingleGenes
    sh SingleGene_Analyses.sh
    
  4. For the actual paralogy screening, we need the program TreSpEx and the blast folder within it. We will also need a relaxed phylip file format for the alignment input.

    perl /cluster/projects/nn9458k/phylogenomics/week1/Programs/FASconCAT-G/FASconCAT-G_v1.05.pl -o -a -p -p -s
    cd ../
    cp -r /cluster/projects/nn9458k/phylogenomics/week1/Programs/TreSpEx/TreSpEx.v1.2_SAGA.pl /cluster/projects/nn9458k/phylogenomics/week1/Programs/TreSpEx/blast .
    
  5. For the paralogy screening and cleaning you will run TreSpEx on the trees with bootstrap values.

    sbatch sbatch_TreSpEx_Paralogy.sh
    
  6. The alignment files have been sorted into the folders "FilesPruned" and "FilesNotPruned" within the "Results" folder. They have to be concatenated into one file now.

    mkdir SingleGenesPara
    cp Results/FilesNotPruned/FcC_* Results/FilesPruned/FcC_* SingleGenesPara/
    cd SingleGenesPara
    cp ../../../week1/Programs/FASconCAT-G/FASconCAT-G_v1.05.pl .
    cp ../../Day1Afternoon/SingleGenes/sbatch_Concatenation.sh .
    
    • Modify the ".sh" file to fit your needs for this analysis
    sbatch sbatch_Concatenation.sh
    
  7. Next you will run a tree reconstruction to assess effect this paralogy pruning approach.

    cd Supermatrix
    cp ../../sbatch_Supermatrix_Cleaned_tree.sh .
    
    • Check if the name of the supermatrix is fitting for your supermatrix. If not please change it accordingly.
    sbatch sbatch_Supermatrix_Cleaned_tree.sh
    

The folder "Results" contains the most important results from this session.