Name		Name	Last commit message	Last commit date
parent directory ..
00_gff_parsing		00_gff_parsing
01_oma		01_oma
02_hmm		02_hmm
03_protein_coding_alignment		03_protein_coding_alignment
04_add_cormorant		04_add_cormorant
HOG_final_alignment_seqids		HOG_final_alignment_seqids
README.md		README.md
alignment_summary_stats		alignment_summary_stats
all_hog_info.tsv		all_hog_info.tsv
check_homology_matrix.R		check_homology_matrix.R
good_PAML_HOG_protein_transcript_info		good_PAML_HOG_protein_transcript_info
hog_qc.R		hog_qc.R
make_protein_info_table.R		make_protein_info_table.R
new_hog_list.txt		new_hog_list.txt
oma_species_list.csv		oma_species_list.csv
sp_list.txt		sp_list.txt
updated_hog_matrix.txt		updated_hog_matrix.txt

README.md

Homology Inference

Homology inference and protein alignment in 4 steps:

Extract longest protein translation for each gene from GFF files (00_gff_parsing)
Run OMA to infer homologous groups among them (01_oma)
Extract proteins not assigned to a group, and use HMMER to screen for possible merges, producing new homologous groups (02_hmm)
Run alignment and filtering on those groups to produce final analysis set (03_protein_coding_alignment)

Files in the main directory are largely useful output files and code.

alignment_summary_stats: summary statistics from initial alignments used for error checking homology assignments and filtering all_hog_info.tsv: primary data table of homologous groups for filtering and processing; columns with species codes (from oma_species_list.csv) represent number of proteins assigned to that hog for that species (NA = 0) good_PAML_HOG_protein_transcript_info: transcript info for HOGs that passed filtering for PAML and related analysis HOG_final_alignment_seqids: seqids for further analysis new_hog_list.txt: hog assignments in long format oma_species_list.csv: list of species used for OMA analysis with data sources sp_list.txt: list of all species abbreviations that should be present in output files updated_hog_matrix.txt: hog assignments as a matrix (wide format)

Code

hog_qc.R check_homology_matrix.R make_protein_info_table.R

Note

updated_hog_matrix.txt is the homologous group assignments after the HMM step. The original is the in the OMA directory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

03_homology

03_homology

README.md

Homology Inference

Code

Note

Files

03_homology

Directory actions

More options

Directory actions

More options

Latest commit

History

03_homology

Folders and files

parent directory

README.md

Homology Inference

Code

Note