ValueError: Could Not Determine Alphabet of File When Using digital=True in esl.SequenceFile #80

BioGavin · 2024-10-26T08:59:24Z

Hi, authors.
I’m encountering an issue when trying to read a file using esl.SequenceFile with the digital=True parameter. Here is the code I’m using for test:

import pyhmmer.easel as esl
in_fasta_path = "test.fa"
sequences = esl.SequenceFile(in_fasta_path, digital=True)
for sequence in sequences:
    print(f"Name: {sequence.name.decode('utf-8')}")
    print(sequence.sequence)

The test.fa file contains the following sequence in FASTA format:

>bgc:465365|cds:8530054|hsp:8934241|18-46
TYYGNGVSCDDKKCTVDWGKAWSCGADR

When I set digital=True, I get the following error:

Traceback (most recent call last):
  File "/home/gavin/bigslice-cj/debug/read_fa.py", line 7, in <module>
    sequences = esl.SequenceFile(in_fasta_path, digital=True)
  File "pyhmmer/easel.pyx", line 6289, in pyhmmer.easel.SequenceFile.__init__
  File "pyhmmer/easel.pyx", line 6283, in pyhmmer.easel.SequenceFile.__init__
ValueError: Could not determine alphabet of file: 'test.fa'

If I don't set digital, it can run successfully and the output is here:

/home/gavin/miniconda3/envs/bigslice/bin/python /home/gavin/bigslice-cj/debug/read_fa.py 
Name: bgc:465365|cds:8530054|hsp:8934241|18-46
TYYGNGVSCDDKKCTVDWGKAWSCGADR

Process finished with exit code 0

Here is the version information of pyhmmer I used:

Name: pyhmmer
Version: 0.10.15
Summary: Cython bindings and Python interface to HMMER3.
Home-page: https://github.com/althonos/pyhmmer
Author: Martin Larralde
Author-email: [email protected]
License: MIT
Location: /home/gavin/miniconda3/envs/bigslice/lib/python3.8/site-packages
Requires: psutil
Required-by: bigslice

I understand that the digital=True parameter is intended to convert amino acid letters to numeric values in the range 0-19. I have carefully checked my input sequence to ensure there are no invalid amino acid letters; all characters in the sequence conform to the standard protein alphabet. Despite this, I am still encountering the ValueError: Could not determine alphabet of file error. This is quite puzzling, and I would appreciate any guidance or insight you could provide on this issue.

Thank you for your help!

The text was updated successfully, but these errors were encountered:

althonos · 2024-10-26T17:44:38Z

Hi @BioGavin

This is quite likely coming from HMMER not being able to determine the alphabet of your sequence file because it is too short, and since digital=True requires an alphabet to succeed, the parser fails in digital mode but not in text mode.

If you know your sequences are always protein sequences you can provide an alphabet yourself:

import pyhmmer.easel as esl
in_fasta_path = "test.fa"
alphabet = esl.Alphabet.amino()
sequences = esl.SequenceFile(in_fasta_path, digital=True, alphabet=alphabet)
for sequence in sequences:
    print(f"Name: {sequence.name.decode('utf-8')}")
    print(sequence.sequence)

BioGavin · 2024-10-27T01:07:17Z

Thank you for your response. This solution worked perfectly, and the code now runs successfully.

BioGavin mentioned this issue Oct 27, 2024

ValueError: Could not determine alphabet of file medema-group/bigslice#82

Closed

althonos added the question Further information is requested label Oct 27, 2024

althonos closed this as completed Oct 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: Could Not Determine Alphabet of File When Using digital=True in esl.SequenceFile #80

ValueError: Could Not Determine Alphabet of File When Using digital=True in esl.SequenceFile #80

BioGavin commented Oct 26, 2024

althonos commented Oct 26, 2024 •

edited

Loading

BioGavin commented Oct 27, 2024

ValueError: Could Not Determine Alphabet of File When Using digital=True in esl.SequenceFile #80

ValueError: Could Not Determine Alphabet of File When Using digital=True in esl.SequenceFile #80

Comments

BioGavin commented Oct 26, 2024

althonos commented Oct 26, 2024 • edited Loading

BioGavin commented Oct 27, 2024

althonos commented Oct 26, 2024 •

edited

Loading