More documentation

khemlalnirmalkar · Oct 23, 2015 · f9addc9 · f9addc9
1 parent 9c35ef1
commit f9addc9
Show file tree

Hide file tree

Showing 5 changed files with 48 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -23,6 +23,9 @@ PhyloCNV is an integrated pipeline that estimates bacterial species abundance an
 5. Scripts to merge results across samples:
  * [Merge copy-number variants] (https://github.com/snayfach/PhyloCNV/blob/master/docs/merge_cnvs.md)
  * [Merge single-nucleotide variants] (https://github.com/snayfach/PhyloCNV/blob/master/docs/merge_snvs.md)
+6. Functionally annotate results:
+ * [Annotate copy-number variants] (https://github.com/snayfach/PhyloCNV/blob/master/docs/annotate_cnvs.md)
+ * [Annotate single-nucleotide variants] (https://github.com/snayfach/PhyloCNV/blob/master/docs/annotate_snvs.md)
 
 ## Citation
 If you use this tool, please cite:

diff --git a/docs/annotate_cnvs.md b/docs/annotate_cnvs.md
@@ -0,0 +1,39 @@
+## Annotate CNVs
+Functionally annotate gene copy number matrix. Aggregate presence/absence values or copy number values by function_id.
+
+## Usage
+```
+usage: annotate_genes.py [options]
+
+optional arguments:
+  -h, --help            show this help message and exit
+  -i CNV_MATRIX         Gene CNV matrix. Expected file name:
+                        {species_id}.presabs or {species_id}.copynum
+  -o FUNCTION_MATRIX    Function CNV matrix
+  -f {kegg,figfams,go,ec}
+                        kegg=KEGG pathways, figfams=FIGfams, go=Gene Ontology,
+                        ec=Enzyme Commission
+  -v, --verbose
+```
+
+## Example
+Run using defaults:  
+`annotate_snps.py -i 57955.presabs -o 57955.kegg -f kegg`
+
+## Output
+A matrix where row names are function_ids and column names are either sample_ids or genome_ids.
+
+Example of function mantrix for one species:
+
+| function_id | sample_1 | sample_2 | ...  | sample_n | genome_1 | ...  | genome_n |
+| :----------:|:-------: | :-------:| :--: | :-------:| :-------:| :--: | :-------:|
+| 00720       | 1.0      | 1.0      | ...  | 1.0      | 1.0      | ...  | 1.0      |
+| ...         | ...      | ...      | ...  | ...      | ...      | ...  | ...      |
+| 00920       | 1.0      | 2.0      | ...  | 0.0      | 2.0      | ...  | 3.0      |
+
+Descriptions of function ids can be found in the following files:
+
+* PhyloCNV/ref_db/ontologies/kegg.txt
+* PhyloCNV/ref_db/ontologies/figfams.txt
+* PhyloCNV/ref_db/ontologies/go.txt
+* PhyloCNV/ref_db/ontologies/ec.txt
diff --git a/docs/merge_cnvs.md b/docs/merge_cnvs.md
@@ -47,3 +47,7 @@ This module generates the following outputs:
 
 ## Example
 
+
+## Next steps
+[Functionally annotate CNVs] (https://github.com/snayfach/PhyloCNV/blob/master/docs/annotate_cnvs.md)
+
diff --git a/scripts/annotate_genes.py b/scripts/annotate_genes.py
@@ -39,7 +39,6 @@ def read_gene_map(args):
 			if gene_id not in gene_to_functions:
 				gene_to_functions[gene_id] = []
 			gene_to_functions[gene_id].append(function_id)
-		if index == 1000: break
 	return gene_to_functions
 
 def compute_abundances(args, gene_to_function):

diff --git a/scripts/download_ref_db.py b/scripts/download_ref_db.py
@@ -51,11 +51,12 @@ def decompress(tar, file, remove=True):
 	refdb_dir = '%s/ref_db' % main_dir
 	if not os.path.isdir(refdb_dir): os.mkdir(refdb_dir)
 	os.chdir(refdb_dir)
-	files = ["README.txt", "annotations.txt", "membership.txt", "marker_genes.tar.gz", "genome_clusters.tar.gz", ]
+	files = ["README.txt", "annotations.txt", "membership.txt", "marker_genes.tar.gz", "genome_clusters.tar.gz", "ontologies.tar.gz"]
 	for file in files:
 		download('%s/%s' % (url_base, file), file, progress=True)
 	decompress("marker_genes.tar.gz", "marker_genes")
 	decompress("genome_clusters.tar.gz", "genome_clusters")
+	decompress("ontologies.tar.gz", "ontologies")
Original file line number	Diff line number	Diff line change
Expand Up		@@ -47,3 +47,7 @@ This module generates the following outputs:

		## Example


		## Next steps
		[Functionally annotate CNVs] (https://github.com/snayfach/PhyloCNV/blob/master/docs/annotate_cnvs.md)