Skip to content

Automated gene-annotation based on sequence coverage, takes start position as input and gives annotated files (dna, protein, gff).

License

Notifications You must be signed in to change notification settings

caldetas/anotomat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

anotomat

Annotates genes based on sequence coverage, takes start position as input and gives annotated files (dna, protein, gff).
Introns are searched for by coverage differences and the GT...AG pattern. With the less stringent option ("--gt") the G+...+T pattern is defining the intron. Use option "-h" or "--help" to get further instructions.

MANUAL

anotomat.py


	--genome <genome.fasta>
	--pos <start_positions.txt> *exact format see below*
	--cov <coverage> *file with '#chr bp coverage' from "samtools depth -a -Q 0 <file.bam>"*

	--mincov <int> minimal coverage of the START pos for a reannotation*default=0*
	--mincov_exon <int> minimal coverage for annotation of exon after intron*default=0*
	--gt use less stringency in defining introns *G._.G defines intron*(default=GT_AG)
	--cores <int> number of cores used *default=(cpu_count-1)*
	--name <gene name> *default='genes' + <int>[0000-9999]*
	--out <name for output files> *default='genes'*

EXAMPLE

EXAMPLE for start_positions.txt:

Bgt_chr-01	39472
Bgt_chr-01	47323	~	#
Bgt_chr-01	49207	~BgtAcSP-31373	#make a comment
Bgt_chr-01	85828    #only comment
Bgt_chr-01	284818	~BgtE-20066	#tab separated???
Bgt_chr-01	300256	~BgtE-20114	#special sign "~ "before gene-name!!!

About

Automated gene-annotation based on sequence coverage, takes start position as input and gives annotated files (dna, protein, gff).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages