Skip to content

cpockrandt/genmap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GenMap - Fast and Exact Computation of Genome Mappability BUILDSTATUS

GenMap is a tool to compute the mappability respectively frequency of nucleotide sequences. In particular, it computes the (k,e)-frequency, i.e., how often each k-mer from the sequence occurs with up to e errors in the sequence itself. The (k,e)-mappability is the inverse of the (k,e)-frequency. Hence, a mappability value of 1 at position i indicates that the k-mer in the sequence at position i occurs only once in the sequence with up to e errors. A low mappability value indicates that this k-mer belongs to a repetitive region.

A small example on how to run GenMap is listed below, for detailed examples such as marker sequence computation on multiple fasta files, please check out our GitHub Wiki pages.

For questions or feature requests feel free to open an issue on GitHub or send an e-mail to christopher.pockrandt [ÄT] fu-berlin.de.

The corresponding paper will be uploaded to biorxiv.org in mid-March. Until then major design changes of the interface and minor changes to its specification are possible.

Binaries

Your CPU must support the POPCNT instruction. If you have a modern CPU, you can go with the optimized 64 bit version that additionally uses SSE4. To verify whether your CPU supports POPCNT and SSE4, you can check the output of cat /proc/cpuinfo | grep 'popcnt\|sse4'.

Download binaries 64 bit requires POPCNT
64 bit optimized requires POPCNT and SSE4

Building from source

NOTE: Building from source can take up to 10 minutes depending on your machine.

$ git clone --recursive https://github.com/cpockrandt/genmap.git
$ mkdir genmap-build && cd genmap-build
$ cmake ../genmap -DCMAKE_BUILD_TYPE=Release
$ make genmap
$ ./bin/genmap

If you are using a very old version of Git (< 1.6.5) the flag --recursive does not exist. In this case you need to clone the submodule separately before you can run cmake:

$ git clone https://github.com/cpockrandt/genmap.git
$ cd genmap
$ git submodule update --init --recursive

Requirements

Operating System
GNU/Linux, Mac
Architecture
Intel/AMD platforms that support POPCNT
Compiler
GCC ≥ 4.9, LLVM/Clang ≥ 3.9
Build system
CMake ≥ 3.0
Language support
C++14

Getting started

Building the index

At first you have to build an index of the fasta file(s) whose mappability you want to compute. This step only has to performed once.

$ ./genmap index -G /path/to/fasta.fasta -I /path/to/index/folder

A new folder /path/to/index/folder will be created to store the index and all associated files.

There are two algorithms that can be chosen for index construction. One uses RAM (radix), one uses secondary memory (skew). Depending on the quota and main memory limitations you can choose the appropriate algorithm with -A radix or -A skew. For skew you can change the location of the temp directory via the environment variable (e.g., to choose a directory with more quota):

$ export TMPDIR=/somewhere/else/with/more/space

Computing the mappability

To compute the (30,2)-mappability of the previously indexed genome, simply run:

$ ./genmap map -E 2 -K 30 -I /path/to/index/folder -O /path/to/output/folder -t -w -b

This will create a text, wig and bed file in /path/to/output/folder storing the computed mappability in different formats. You can remove not required formats by ommitting the corresponding flags -t -w or -b.

Instead of the mappability, the frequency can be outputted, you only have to add the flag -fl to the previous command.

Help pages and examples

A detailed list of arguments and explanations can be retrieved using --help:

$ ./genmap --help
$ ./genmap index --help
$ ./genmap map --help

More detailed examples can be found in the Wiki.

About

GenMap - Fast and Exact Computation of Genome Mappability

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •