Skip to content

FxTools: a comprehensive toolkit for FASTA and FASTQ file manipulation

License

Notifications You must be signed in to change notification settings

sd880131/FxTools

 
 

Repository files navigation

FxTools

FxTools: a comprehensive toolkit for FASTA and FASTQ file manipulation

FxTools is a full-featured toolkit for comprehensive analysis of both FASTA and FASTQ file, covering users' needs from sequence modification to data analysis. This tool consists of three parts: Fatools , Fqtools and Formtools, categorized by the types of files to deal with. In Formtools, we provide simple conversion and file manipulation for other NGS data files including BAM,SAM and SOAP. FxTools is implemented in C/C++ language, available for Linux and Mac OS X operating system.

1) Download and Install


Download
 
Pre-installations of 4 libraries are required before installing FxTools  
1 htslib: samtools-1.6/htslib-1.6  
2 boost : boost with g++ > 4.9 is recommended  
3 zlib : zlib > 1.2.8 is recommended
4 ncurses: ncurses >5.7 is recommended

For linux /Unix statics

  • you can use the statically compiled programs directly
       git clone https://github.com/BGI-shenzhen/FxTools.git
       cd FxTools-XXX;  
       chmod 775 bin/FxTools_Linux ; 
       ./bin/FxTools_Linux

  • To compile FxTools, do [./configure] first and than [make]
  • Final software can be found in the direcoty [bin/FxTools]

For linux/Unix or MacOS

        git clone https://github.com/BGI-shenzhen/FxTools.git
        cd FxTools-XXX;
        chmod 755 configure ; ./configure
        make ;
        mv FxTools bin/;
	./bin/FxTools

2) Features


Parameter description

Program: FxTools
Version: 0.16   [email protected]/[email protected]     2018-5-20

        Usage:

                Fatools        Tools For Fasta
                Fqtools        Tools For Fastq
                Formtools      Tools For Form convert

                Help           Show help in detail

Fatools

Module Function Description
Summary
stat statistics of FASTA
dict generate a header file for FASTA
Split
split split FASTA. default by ID
rand randomly sample FASTA by proportion
Search
findN find the regions of N in FASTA file
findSubSeq find the region containing the subsequences
grep search for the target subsequence
extractP extract sequences with specific ID
extractN extract sequences by specified order range
getCdsPep find CDS & peptide sequences (GFF re-quired)
sort sort the FASTA by sequence ID or length
Modify
filter remove the sequences either too short or with too many missing N
reform edit the FASTA (reverse, complement,etc.)
mergaSca reform current FASTA into new scaffolds
JoinSca joining scaffolds into pseudo chromosomes
BaseModify modify a single base in FASTA
ChangePosi locate SNPs on original scaffolds based on current FASTA

Fqtools

Module Function Description
Summary
valid check validation of input FASTQ
stat statistics of FASTQ
fqcheck base and quality distribution
Split
splitpool split pooling FASTQ to samples for RAD (GBS)
splitFq split FASTQ by specifying number of sequences in output
cut extract subsequence in FASTQ
rand randomly sample FASTQ by proportion
Modify
filter filter FASTQ to clean dataset
rmAdapter remove adapter of FASTQ
reform edit the FASTQ file (reverse/complement)
Mul2Sin covert multiple-lines FASTQ sequences to single line
bubble filter regions with large number of N
changeQ update the quality of FASTQ
rmDup remove duplicated sequences

Formtools

Function Description
CDS2Pep convert CDS to Pep format
Soap2fq convert SOAP to FASTQ format
Bam2Fq convert BAM to FASTQ format
Soap2Bam convert SOAP to Bam/Sam format
Bam2SOAP convert BAM/SAM to SOAP format
Fa2Fq convert FASTA to FASTQ format
Fq2Fa convert FASTQ to FASTA format
SF finding intersections or differences of two files
Merge merge sorted files to one

3) Examples


    1. sort fa files
   # sort by seq length
   ./bin/FxTools  Fatools sort    -i  ref.fa      -s   length   -r    > ref.sort.fa
   #  sort by seq ID & gzip out 
   ./bin/FxTools  Fatools sort    -i  ref.fa      -s  name  -o  ref.sort.fa.gz
    1. split the fa files
# split by one seq on file 
	./FxTools  Fatools split   -i   in.fa.gz    -o outDir/ -g
# split to fixed Number of subflie 
	./FxTools  Fatools split   -i   in.fa       -o outDir/  -f 12 
# split to fixed seq Number in one sub-file 
        ./FxTools  Fatools split   -i   in.fa       -o outDir/  -s 12 
    1. Calculate qulity of fq files
# Give pdf of fastq  Base Q Distribute and stat result
	 ./FxTools  Fqtools   fqcheck  -i A_1.fq.gz    A_2.fq.gz  -o out1Prefix  out2Prefix 
# SE fqstq also can be supported 
	 ./FxTools  Fqtools   fqcheck  -i A.fq.gz   -o outPrefix 
    1. change qulity of fq
#  fstaq Q change : by ASCII33-->ASCII64[+31] with ResetID & MaxQ:h
	./FxTools  Fqtools   changeQ   -i in.fq.gz   -o out.fq  -s 4 

see more other Usage in the Documentation

4) Format


Format Introduction

5) discussion


######################swimming in the sky and flying in the sea ########################### ##

About

FxTools: a comprehensive toolkit for FASTA and FASTQ file manipulation

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 55.0%
  • Roff 25.5%
  • Shell 8.4%
  • C 5.0%
  • Makefile 4.2%
  • Perl 1.6%
  • M4 0.3%