Skip to content

Commit

Permalink
More documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
snayfach committed Oct 23, 2015
1 parent 9c35ef1 commit f9addc9
Show file tree
Hide file tree
Showing 5 changed files with 48 additions and 2 deletions.
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,9 @@ PhyloCNV is an integrated pipeline that estimates bacterial species abundance an
5. Scripts to merge results across samples:
* [Merge copy-number variants] (https://github.com/snayfach/PhyloCNV/blob/master/docs/merge_cnvs.md)
* [Merge single-nucleotide variants] (https://github.com/snayfach/PhyloCNV/blob/master/docs/merge_snvs.md)
6. Functionally annotate results:
* [Annotate copy-number variants] (https://github.com/snayfach/PhyloCNV/blob/master/docs/annotate_cnvs.md)
* [Annotate single-nucleotide variants] (https://github.com/snayfach/PhyloCNV/blob/master/docs/annotate_snvs.md)

## Citation
If you use this tool, please cite:
Expand Down
39 changes: 39 additions & 0 deletions docs/annotate_cnvs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
## Annotate CNVs
Functionally annotate gene copy number matrix. Aggregate presence/absence values or copy number values by function_id.

## Usage
```
usage: annotate_genes.py [options]
optional arguments:
-h, --help show this help message and exit
-i CNV_MATRIX Gene CNV matrix. Expected file name:
{species_id}.presabs or {species_id}.copynum
-o FUNCTION_MATRIX Function CNV matrix
-f {kegg,figfams,go,ec}
kegg=KEGG pathways, figfams=FIGfams, go=Gene Ontology,
ec=Enzyme Commission
-v, --verbose
```

## Example
Run using defaults:
`annotate_snps.py -i 57955.presabs -o 57955.kegg -f kegg`

## Output
A matrix where row names are function_ids and column names are either sample_ids or genome_ids.

Example of function mantrix for one species:

| function_id | sample_1 | sample_2 | ... | sample_n | genome_1 | ... | genome_n |
| :----------:|:-------: | :-------:| :--: | :-------:| :-------:| :--: | :-------:|
| 00720 | 1.0 | 1.0 | ... | 1.0 | 1.0 | ... | 1.0 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 00920 | 1.0 | 2.0 | ... | 0.0 | 2.0 | ... | 3.0 |

Descriptions of function ids can be found in the following files:

* PhyloCNV/ref_db/ontologies/kegg.txt
* PhyloCNV/ref_db/ontologies/figfams.txt
* PhyloCNV/ref_db/ontologies/go.txt
* PhyloCNV/ref_db/ontologies/ec.txt
4 changes: 4 additions & 0 deletions docs/merge_cnvs.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,3 +47,7 @@ This module generates the following outputs:

## Example


## Next steps
[Functionally annotate CNVs] (https://github.com/snayfach/PhyloCNV/blob/master/docs/annotate_cnvs.md)

1 change: 0 additions & 1 deletion scripts/annotate_genes.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,6 @@ def read_gene_map(args):
if gene_id not in gene_to_functions:
gene_to_functions[gene_id] = []
gene_to_functions[gene_id].append(function_id)
if index == 1000: break
return gene_to_functions

def compute_abundances(args, gene_to_function):
Expand Down
3 changes: 2 additions & 1 deletion scripts/download_ref_db.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,11 +51,12 @@ def decompress(tar, file, remove=True):
refdb_dir = '%s/ref_db' % main_dir
if not os.path.isdir(refdb_dir): os.mkdir(refdb_dir)
os.chdir(refdb_dir)
files = ["README.txt", "annotations.txt", "membership.txt", "marker_genes.tar.gz", "genome_clusters.tar.gz", ]
files = ["README.txt", "annotations.txt", "membership.txt", "marker_genes.tar.gz", "genome_clusters.tar.gz", "ontologies.tar.gz"]
for file in files:
download('%s/%s' % (url_base, file), file, progress=True)
decompress("marker_genes.tar.gz", "marker_genes")
decompress("genome_clusters.tar.gz", "genome_clusters")
decompress("ontologies.tar.gz", "ontologies")



0 comments on commit f9addc9

Please sign in to comment.