-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug in get_proteins_by_id affecting pfam annotator #78
Comments
Hi @LuisFF, thanks for reporting this. I'll need to think for a bit since it's been a while since I wrote the code :) I would be careful around changing the logic of If I understand it correctly, this could also be solved by removing the Lines 503 to 505 in c836f08
|
Great, would you be up to implementing that behavior in your PR? @LuisFF |
Hi @prihoda , I pushed the suggested changes. Please let me know if there's anything else needed. |
First of all thanks for developing DeepBGC and making it available to the community.
I came across a bug in
HmmscanPfamRecordAnnotator
when generating theproteins_by_id
dictionary. Theutil
functionget_proteins_by_id
is currently looping through all the potential protein ids of a feature (e.g.unique_protein_id
,protein_id
andlocus_tag
) and this can cause features with id based onprotein_id
qualifier to be overwritten by another feature that shares the sameprotein_id
but it was deduplicated using theunique_protein_id
. This is causingPFAM_domain
features to be incorrectly placed in the genomic sequence becauseprotein_id
used inhmmscan
output file will match a different feature and pick the incorrect feature location.The text was updated successfully, but these errors were encountered: