slowION

slowION is a benchmark tool that can simulate data rates of a nanopore sequencer (e.g., PromethION) in chunks and see if a simple strategy coupled with a simple binary format like BLOW5 could meet the real-time writing requirement (plus reading back to cater real-time basecalling).

There seem to be a bit of a misconception that writing data from 144,000 channels in parallel cannot be done with simple binary binary formats like BLOW5 and instead requires fancy (potentially over-engineered) solutions. My biased opinion is, don't bring a chain saw when a hand saw would suffice. If you do, it looks fancy, but unanticipated issues will start appearing.

This is a quick benchmark tool that I wrote within a couple of days that demonstrate how BLOW5 can handle chunks, inspired by a strategy proposed by @lh3. This basic implementation that deliberately simulates a worse case scenario for BLOW5, even without doing any optimisations, seems to be able to handle 96,000 channels on my laptop (~2000 AUD) and 216,000 channels on a gaming work station (~10,000 AUD). The actual limit could be even much higher - I/O capacity never reached the maximum, instead, overheads in simulating the aquisition (random number generation, memory buffer copying, etc.) started becoming the bottleneck. Nevertheless, this is adequate to demonstrate the point.

Again, I would like to emphasise that this was a very quick implementation done in 2 days. So, don't mix implementation limitations with the limitations in the actual concept. For instance, some think BLOW5 signal records cannot be stored as a series of separately compressed chunks - probably from making their mind up before reading the format specification.

Building

You will need to install zlib and zstd 1.3 or higher development libraries. You will need gcc that supports c99.

sudo apt-get install zlib1g-dev libzstd1-dev #in latest Ubuntu: libzstd-dev
git clone --recursive https://github.com/hasindu2008/slowION && cd slowION
make
./slowION

On CentOS use sudo yum install libzstd-devel. On Mac, use brew install zstd. On Mac M1, if zstd.h is still not found, give the location to make as LDFLAGS=-L/opt/homebrew/lib/ CPPFLAGS=-I/opt/homebrew/include/ make.

Running

When you run example commands below, if many yellow colour WARNING are printed, that means there is a lag.

# Minion (1 position, 512 channels per position)
./slowION -p 1 -c 512

# PromethION P24 (24 positions, 3000 channels per position)
./slowION -p 24 -c 3000

# PromethION P48 (48 positions, 3000 channels per position)
./slowION -p 48 -c 3000

Because of the naive way I implemented for this proof of concept benchmark, you may get an error like "too many files open" on some systems. The easiest trick to get it working will be:

sudo su
./slowION -p 24 -c 3000

Results

The following tables shows how many sequencing positions (3000 channels per position) and many total channels each system could keep up.

system	CPU (cores/threads)	RAM	O/S	Disk System	File system	positions	total channels
laptop (Dell Inspiron)	Intel 11th Gen i7-11800H (8/16)	32 GB	Ubuntu 18	SSD	EXT4	32	96000
laptop (Dell XPS)	Intel 11th Gen i7-11800H (8/16)	32 GB	Pop OS 22	SSD	EXT4	48	144000
gaming desktop	AMD Ryzen Threadripper 3970X (32/64)	128 GB	Ubuntu 18	SSD	EXT4	72	216000
NVIDIA Jetson Xavier AGX	Armv8.2 (8/8)	16 GB	Ubuntu 18	SSD	EXT4	14	42000
server with HDD	Intel Xeon Gold 6154 (36/72)	377 GB	Ubuntu 18	HDD (12 disks RAID 6)!	EXT4	72	216000
server with network mount	Intel Xeon Silver 4114 CPU (20/40)	377 GB	Ubuntu 18	An HDD NAS (12 disks RAID 10)!	EXT4 over NFS	32	96000
Mac M1 mini	ARM M1 (8/8)	8 GB	macOS 12.1	SSD	APFS	10	30000
MacBook M2	ARM M2 (8/8)	24 GB	macOS 12.1	SSD	APFS	24	72000

Note: The benchmark was run for an hour on the gaming desktop and the servers, while only for 5 minutes on others due to limited free storage space.

Methods

Do not have time to write this at the moment. One may ask for clarification on GitHub issues, as the code lacks comments and would be a bit hard to follow.

Options

-p INT: number of positions [1]
-c INT: channels per position [512]
-T INT: simulation time in seconds [300]
-r INT: mean read length (num bases) [10000]
-b INT: average translocation speed (bases per second) [400]
-f INT: sample rate [4000]
-d DIR: output directory [./output]
--verbose INT: verbosity level [4]

Note that only the default parameters and values in the above examples in running section have been tested.

Random Notes

Documentation and error checking are minimal as it takes too much time. For clarification you can use GitHub issues.
Not optimised or feature rich - so perhaps underestimate the capability of a binary format.
mean read length (-r) is not the mean value of all the reads that is modelled by gamma distribution. It is NOT the max read length.
This is not an API. A chunk based writing API is a matter of implementation.
Only tested on Linux and Mac with limited gcc and clang versions Getting these working on Windows is just a simple implementation matter.
There could be bugs or mistakes, feel free to point them out.
This is NOT a signal simulator intended for basecalling. See Squigulator instead.
Opening many files was only for easy implementation.
Multiple threads can be used for writing to a single SLOW5 file, though not implemented here.
The 2-pass concept is inherently suitable for using expensive SSD as a cache before writing to cheap HDD or even a NAS (not implemented).

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
build		build
slow5lib @ f658218		slow5lib @ f658218
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

slowION

Building

Running

Results

Methods

Options

Random Notes

About

Releases

Packages

Contributors 2

Languages

License

hasindu2008/slowION

Folders and files

Latest commit

History

Repository files navigation

slowION

Building

Running

Results

Methods

Options

Random Notes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages