This library provides support for BGZF (Blocked GZip Format) in Ruby. BGZF, originally defined as part of the SAM/BAM specification, is used to compress record-oriented bioinformatics data in a way that facilitates random access, unlike plain gzip. A BGZF file consists of contatenated 64 KB blocks, each an independent gzip stream. It can be decompressed in its entirety with gzip, but this library enables random access using 'virtual offsets' as defined in SAM/BAM.
A virtual offset is a 64-bit quantity, with a 48-bit block offset giving the position in the file of the start of the block followed by a 16-bit data offset giving a position within the file.
gem install bio-bgzf
require 'bio-bgzf'
File.open('example.gz') do |f|
r = Bio::BGZF::Reader.new(f)
while true do
block_vo = r.tell
block = r.read_block
break unless block
end
block = f.read_block_at(block_vo)
end
The API doc is online. For more code examples see the test files in the source tree.
Information on the source tree, documentation, examples, issues and how to contribute, see
http://github.com/csw/bioruby-bgzf
The BioRuby community is on IRC server: irc.freenode.org, channel: #bioruby.
If you use this software, please cite one of
- BioRuby: bioinformatics software for the Ruby programming language
- Biogem: an effective tool-based approach for scaling up open source software development in bioinformatics
This Biogem is published at #bio-bgzf
Copyright (c) 2012 Artem Tarasov and Clayton Wheeler. See LICENSE.txt for further details.