Identifying genetic variants in the pangenome using a reference tree

Introduction

This repository is very much a work in progress. If you use our code, please do reach out with questions and feedback.

Usage

from graph import PangenomeGraph

# Read a .gfa file
gfa_path = "/path/to/graph.gfa"
reference_path_index = 1
G, walks, walk_sample_names = PangenomeGraph.from_gfa(gfa_path, 
                                                return_walks=True, compressed=False, 
                                                reference_path_index=reference_path_index)

# Generate vcf file
vcf_path = "/path/to/output.vcf"
tree_path = "/path/to/output.tree"
chr_id = "chr1"
G.write_vcf(walks, walk_sample_names, vcf_path, tree_path, chr_id, exclude_terminus=True)

# Enumerate variants of different types
edge_type_count: dict = G.variant_edges_summary()

# Get the genotype of a walk, then reconstruct edge visit counts
genotype: dict = G.genotype(walks[0])
edge_visit_counts: dict = G.count_edge_visits(genotype)

Command Line Interface

graph_var <gfa_file> [options]

Required Arguments

gfa_file: Path to the input GFA file containing the pangenome graph

Optional Arguments

--nodeinfo: Path to input node info file (CSV format)
--edgeinfo: Path to input edge info file (CSV format)
--out-nodeinfo: Output path for node info file (CSV format)
--out-edgeinfo: Output path for edge info file (CSV format)
--vcf: Output path for VCF file
--chr-id: Chromosome ID for VCF file (default: "chr0")
--ref-path-index: Index of the reference path (default: None, will try to find GRCh38)
--summary [file]: Print summary of variant types. If a file path is provided, write the summary to that file instead of stdout

Example Usage

# Basic usage - analyze a GFA file and output VCF
# Will automatically try to find GRCh38 reference path
graph_var input.gfa --vcf output.vcf

# Generate node and edge info files (CSV format)
graph_var input.gfa --out-nodeinfo nodes.csv --out-edgeinfo edges.csv

# Analyze with specific reference path and chromosome ID
graph_var input.gfa --vcf output.vcf --ref-path-index 2 --chr-id chr20

# Print variant summary to console
graph_var input.gfa --summary

# Write variant summary to a file
graph_var input.gfa --summary summary.txt

Name		Name	Last commit message	Last commit date
Latest commit History 130 Commits
.idea		.idea
data		data
graph_var		graph_var
notebooks		notebooks
output		output
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
analyze_gfa.py		analyze_gfa.py
c4a_test_script.py		c4a_test_script.py
reference_tree_variation_methods_rough.docx		reference_tree_variation_methods_rough.docx
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Identifying genetic variants in the pangenome using a reference tree

Table of Contents

Introduction

Usage

Command Line Interface

Required Arguments

Optional Arguments

Example Usage

About

Releases

Packages

Contributors 3

Languages

lukejoconnor/graph_var

Folders and files

Latest commit

History

Repository files navigation

Identifying genetic variants in the pangenome using a reference tree

Table of Contents

Introduction

Usage

Command Line Interface

Required Arguments

Optional Arguments

Example Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages