Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix two bugs: macOS MMseqs2 version and integer contig names #88

Merged
merged 3 commits into from
Nov 22, 2024

Conversation

rrwick
Copy link
Contributor

@rrwick rrwick commented Nov 22, 2024

macOS MMseqs2 version

Currently, Dnaapler checks for an MMseqs2 version string in the format major.minor. However, the macOS MMseqs2 v13.45111 binary (downloaded from MMseqs2 release page) reports its version as a hash (45111b641859ed0ddd875b94d6fd1aef1a675b7e). This causes Dnaapler's version check to fail.

Fix: Updated the version-check logic to support hash-based version numbers.

Integer contig names

When running Dnaapler on an Autocycler assembly with integer-named sequences, the following issue arose:

filtered_df = MMseqs2_df[MMseqs2_df["qseqid"] == short_contig]

In this case, short_contig was a string, but the qseqid column in the MMseqs2 results dataframe was inferred as an integer type. This type mismatch caused the filtered dataframe to always be empty.

Fix: Enforced the qseqid column to always load as a string (object type) when reading MMseqs2 results with Pandas.

@rrwick
Copy link
Contributor Author

rrwick commented Nov 22, 2024

I tested the results on my Mac using an Autocycler assembly, and it all worked well! But more thorough testing may be warranted if you've got some varied test cases 😄

Copy link

codecov bot commented Nov 22, 2024

Codecov Report

Attention: Patch coverage is 33.33333% with 6 lines in your changes missing coverage. Please review.

Project coverage is 85.89%. Comparing base (7ef230e) to head (c516810).
Report is 50 commits behind head on main.

Files with missing lines Patch % Lines
src/dnaapler/utils/util.py 33.33% 3 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #88      +/-   ##
==========================================
- Coverage   87.27%   85.89%   -1.38%     
==========================================
  Files           9        9              
  Lines        1108     1177      +69     
  Branches      143      154      +11     
==========================================
+ Hits          967     1011      +44     
- Misses        103      118      +15     
- Partials       38       48      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@gbouras13
Copy link
Owner

Legend thank you @rrwick ! - this would have been a pre v1.0.0 bug too

@gbouras13 gbouras13 merged commit af14bac into gbouras13:main Nov 22, 2024
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants