calc_fastq-stats.pl
is a script to calculate basic statistics for bases and reads in a FASTQ file.
- Synopsis
- Description
- Usage
- Options
- Output
- Run environment
- Dependencies
- Author - contact
- Citation, installation, and license
- Changelog
perl calc_fastq-stats.pl -i reads.fastq
or
gzip -dc reads.fastq.gz | perl calc_fastq-stats.pl -i -
The script calculates some simple statistics, like individual and total base counts, GC content, and basic stats for the read lengths, and read/base qualities in a FASTQ file. The GC content calculation does not include 'N's. Stats are printed to STDOUT and optionally to an output file.
Because the quality of a read degrades over its length with all NGS machines, it is advisable to also plot the quality for each cycle as implemented in tools like FastQC or the fastx-toolkit.
If the sequence and the quality values are interrupted by line breaks (i.e. a read is not represented by four lines), please fix with Heng Li's seqtk:
seqtk seq -l 0 infile.fastq > outfile.fastq
An alternative tool, which is a lot faster, is fastq-stats from ea-utils.
zcat reads.fastq.gz | perl calc_fastq-stats.pl -i - -q 64 -c 175000000 -n 3000000
- -i, -input
Input FASTQ file or piped STDIN (-) from a gzipped file
- -q, -qual_offset
ASCII quality offset of the Phred (Sanger) quality values [default 33]
- -h, -help:
Help (perldoc POD)
- -c, -coverage_limit
Number of bases to sample from the top of the file
- -n, -num_read
Number of reads to sample from the top of the file
- -o, -output
Print stats in addition to STDOUT to the specified output file
- -v, -version
Print version number to STDERR
- STDOUT
Calculated stats are printed to STDOUT
- (outfile)
Optional outfile for stats
The Perl script runs under Windows and UNIX flavors.
If the following modules are not installed get them from CPAN:
Statistics::Descriptive
Perl module to calculate basic descriptive statistics
Statistics::Descriptive::Discrete
Perl module to calculate descriptive statistics for discrete data sets
Statistics::Descriptive::Weighted
Perl module to calculate descriptive statistics for weighted variates
Andreas Leimbach (aleimba[at]gmx[dot]de; Microbial Genome Plasticity, Institute of Hygiene, University of Muenster)
For citation, installation, and license information please see the repository main README.md.
- v0.1 (28.10.2014)