You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, authors.
I’m encountering an issue when trying to read a file using esl.SequenceFile with the digital=True parameter. Here is the code I’m using for test:
When I set digital=True, I get the following error:
Traceback (most recent call last):
File "/home/gavin/bigslice-cj/debug/read_fa.py", line 7, in <module>
sequences = esl.SequenceFile(in_fasta_path, digital=True)
File "pyhmmer/easel.pyx", line 6289, in pyhmmer.easel.SequenceFile.__init__
File "pyhmmer/easel.pyx", line 6283, in pyhmmer.easel.SequenceFile.__init__
ValueError: Could not determine alphabet of file: 'test.fa'
If I don't set digital, it can run successfully and the output is here:
/home/gavin/miniconda3/envs/bigslice/bin/python /home/gavin/bigslice-cj/debug/read_fa.py
Name: bgc:465365|cds:8530054|hsp:8934241|18-46
TYYGNGVSCDDKKCTVDWGKAWSCGADR
Process finished with exit code 0
Here is the version information of pyhmmer I used:
Name: pyhmmer
Version: 0.10.15
Summary: Cython bindings and Python interface to HMMER3.
Home-page: https://github.com/althonos/pyhmmer
Author: Martin Larralde
Author-email: [email protected]
License: MIT
Location: /home/gavin/miniconda3/envs/bigslice/lib/python3.8/site-packages
Requires: psutil
Required-by: bigslice
I understand that the digital=True parameter is intended to convert amino acid letters to numeric values in the range 0-19. I have carefully checked my input sequence to ensure there are no invalid amino acid letters; all characters in the sequence conform to the standard protein alphabet. Despite this, I am still encountering the ValueError: Could not determine alphabet of file error. This is quite puzzling, and I would appreciate any guidance or insight you could provide on this issue.
Thank you for your help!
The text was updated successfully, but these errors were encountered:
This is quite likely coming from HMMER not being able to determine the alphabet of your sequence file because it is too short, and since digital=True requires an alphabet to succeed, the parser fails in digital mode but not in text mode.
If you know your sequences are always protein sequences you can provide an alphabet yourself:
Hi, authors.
I’m encountering an issue when trying to read a file using esl.SequenceFile with the digital=True parameter. Here is the code I’m using for test:
The test.fa file contains the following sequence in FASTA format:
When I set digital=True, I get the following error:
If I don't set digital, it can run successfully and the output is here:
Here is the version information of pyhmmer I used:
I understand that the
digital=True
parameter is intended to convert amino acid letters to numeric values in the range 0-19. I have carefully checked my input sequence to ensure there are no invalid amino acid letters; all characters in the sequence conform to the standard protein alphabet. Despite this, I am still encountering the ValueError: Could not determine alphabet of file error. This is quite puzzling, and I would appreciate any guidance or insight you could provide on this issue.Thank you for your help!
The text was updated successfully, but these errors were encountered: