agotron_detector

Introduction

The agotron_detector toolkit identifies and quantifies agotrons in Ago CLIPseq datasets.

Prerequisites

bowtie2 (tested with v2.2.8)

samtools (tested with v1.3)

python (tested with v2.7.11) including modules: pysam, numpy, and mySQL

R (tested with v3.2.5) including packages: ggplot2, ggregex, dplyr, tidyr, and optparse

Scripts / Usage

The agotron_detector repository basically contains three scripts that should be run sequencially:

(1) UCSC_intron_retriever.py, a python script to extract short introns from the UCSC mySQL server

Usage:
  UCSC_intron_retriever.py [ARGUMENTS] > [OUTPUT]
  
Options:
  -db <string>      MySQL database (default='hg19')
  -table <string>   MySQL table (default='refGene')
  -max <int>        Maximum intron length (default=150)
  -min <int>        Minimum intron length (default=50)

Alternatively, use UCSC_mirna_retriever.py or tophat_intron_retriever.py to retrieve RefSeq annotated miRNA coordinates or intron coordinates from tophat-produced junctions.bed.

(2) analyzer.py, a python script to intersect mapped reads with coordinates of interest and output agotron-relevant features

Usage:
  analyzer.py [ARGUMENTS] < [INPUT] > [OUTPUT]
  
Required arguments:	
  -g <file>         Path to reference genome fastafile, must be indexed with samtools faidx
Optional arguments:	
  -f <files…>       Input bam-files (default=*.bam)
  -c <file>         Filename for coverage output (if empty, no coverage file is produced) 
                    (default='coverage.txt')
  -tr <int>         RPMM (reads per mapped million) expression threshold for output
                    (default = 5)
  -ts <int/'all'>   How many samples to meet RPMM expression threshold. For all samples, type ‘all’ 
                    (default = 2)
  -m <float>        Tolerance for reads mapping partly outside locus 
                    (default=0.1, e.g. 10% of the reads is allowed to map outside locus)
  -q <int>          Threshold for mapping quality 
                    (default=13)
  -a <int>          Add <int> flanking nucleotides (up and downstream) to the loci sequence output 
                    (default=10)

(3) annotater.R, an R script that annotates agotron and outputs a few different plots

Usage: 
  annotater.R [ARGUMENTS] < [INPUT]	

optional arguments:	
  -c <file>         Input coverage file (the -c output from analyzer.py) 
                    (default='coverage.txt')
  -p <string>       Prefix used in output files 
                    (default='agotron')
agotron definition:	
  -m <int>          Threshold for median read length 
                    (default=30)
  -h <float>        Minimum fraction of reads with distance (-d) between 5’end of read and 5’end of locus 
                    (default=0.7)
  -d <int>          Maximum distance allowed from predominant 5’end of reads to 5’end of locus 
                    (default=1)

Example

Clone the repository and run the example script, all_commands.sh. This:

Downloads and prepares reference genome (hg19)
Downloads and trims Ago CLIPseq dataset (GSE78059)
Maps dataset to the reference genome using bowtie2
Intersects the mapped reads (bam-files) with annotated short introns to detect and annotate agotrons.

sh all_commands.sh

To annotate agotrons in mapped data (use -db and -g options to specify the reference genome used):

python UCSC_intron_retriever.py -db hg19 | python analyzer.py -g /path/to/hg19.fa -f /path/to/*.bam | Rscript annotater.R

Citation

Hansen TB. Detecting agotrons in Ago CLIPseq data. MiMB, 2017, submitted

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agotron_detector

Introduction

Prerequisites

Scripts / Usage

Example

Citation

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitattributes		.gitattributes
README.md		README.md
UCSC_intron_retriever.py		UCSC_intron_retriever.py
UCSC_mirna_retriever.py		UCSC_mirna_retriever.py
all_commands.sh		all_commands.sh
analyzer.py		analyzer.py
annotater.R		annotater.R
license.txt		license.txt
tophat_intron_retriever.py		tophat_intron_retriever.py

License

ncrnalab/agotron_detector

Folders and files

Latest commit

History

Repository files navigation

agotron_detector

Introduction

Prerequisites

Scripts / Usage

Example

Citation

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages