Fix two bugs: macOS MMseqs2 version and integer contig names #88
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
macOS MMseqs2 version
Currently, Dnaapler checks for an MMseqs2 version string in the format
major.minor
. However, the macOS MMseqs2 v13.45111 binary (downloaded from MMseqs2 release page) reports its version as a hash (45111b641859ed0ddd875b94d6fd1aef1a675b7e
). This causes Dnaapler's version check to fail.Fix: Updated the version-check logic to support hash-based version numbers.
Integer contig names
When running Dnaapler on an Autocycler assembly with integer-named sequences, the following issue arose:
In this case,
short_contig
was a string, but theqseqid
column in the MMseqs2 results dataframe was inferred as an integer type. This type mismatch caused the filtered dataframe to always be empty.Fix: Enforced the
qseqid
column to always load as a string (object type) when reading MMseqs2 results with Pandas.