Please direct all questions or comments to Stephanie via spielman <AT> rowan <DOT> edu
.
-
All directories and SLURM scripts are named as one of the following, representing the data they process:
simulation
: MutSel simulationssimulation_control
: WAG+I+G simulationspandit
: PANDIT analysis
-
scripts/
contains all code for generating data, results, including SLURM (*sbatch
) submission scripts for HPC -
simulations/
contains all simulated alignments, DMS preferences, and phylogenies used for simulation. Simulations were performed withscripts/simulate.py
andscripts/simulate_control.py
and associatedscripts/submit_simulations.sbatch
(for MutSel only, as those are slower than WAG which are quickly run locally). -
selected_models_*
contains all results from model selection usingModelFinder
with option-m TESTONLY
. Contents created withscripts/submit_model_selection_*.sbatch
-
processed_model_selection/
contains CSV's of results from model selection, including scores for all models as well as the specific "quantile" models. Contents created with withscripts/process_model_selection.sh
(callsscripts/selected_models_to_csv.py
andscripts/parse_selected_models.R
) -
fitted_trees_ufb_*
contains all inferred phylogenies with IQTREE and various associated logfiles. Contents created withscripts/infer_trees.py
and associatedscripts/submit_tree_inference_*.sbatch
. -
topology_tests_*
contains results from AU tests as produced byIQTREE
. Contents created withscripts/run_topology_tests.py
. -
results_analysis/
contains a post-processing R script for stats+viz, relevant CSV files to post-process, and directory of figures used in MS produced by said script.csv_files/
contains all processed CSVs created with one of the scripts inscripts/
all_models_pearson.csv
contain Pearson correlations among rate matrices (exchangabilities only!), created byscripts/compare_all_models.py
rf_fit_*.csv
contains robinson-foulds distances from true tree, and BIC, for all simulations. Contents created withscripts/calculate_rf_ufb.py
.rf_pandit.csv
contains all-to-all RF distances for PANDIT data. Contents created withscripts/calculate_rf_ufb.py
.topology_tests_*.csv
contains summarized results for topology tests. Contents created withscripts/parse_topology_tests.py
load.R
loads up data, libraries, and formats data for analysis in:build_main_figures.R
creates all manuscript figures, saved inmain_figures/
build_si_figures_tables.R
creates all SI manuscript figures and exports CSV tables, saved insi_figures_tables/
linear_model_and_misc_wrangling.Rmd
performs all linear models presented in the manuscript, and some miscellaneous data wrangling.