Slow assembly for diverse metagenome #278

Puumanamana · 2020-06-19T01:51:07Z

Hi,

I have used megahit multiple times for my analyses, and in general it works very well. Lately, I tried using it on a very diverse dataset, composed of bacteria, archaea, fungi and viruses. I have 9 samples with about 20M paired-end reads each.
I'm running megahit with default parameters on a 500 GB and 60 CPU threads. However after 3 days, the assembly is still stuck at k=21. Memory is not saturated, so there's probably little swap memory used. I also checked the intermediate assembly file for k=21, and I have about 2,4 billion contigs.

I have multiple questions regarding this issue:

Do you know what is the bottleneck here? Is it I/O? Simply the number of CPU threads?
I read in issue problem running with gpu #152 that GPU is not really supported, but I was wondering if this issue had been fixed since then
In general, do you have any recommendation in order to improve runtime?

Thank you for your time,
Cedric

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow assembly for diverse metagenome #278

Slow assembly for diverse metagenome #278

Puumanamana commented Jun 19, 2020

Slow assembly for diverse metagenome #278

Slow assembly for diverse metagenome #278

Comments

Puumanamana commented Jun 19, 2020