Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load more than 1024 HMM files #48

Closed
chtsai0105 opened this issue Jul 24, 2023 · 2 comments
Closed

Load more than 1024 HMM files #48

chtsai0105 opened this issue Jul 24, 2023 · 2 comments
Labels
question Further information is requested

Comments

@chtsai0105
Copy link

Hi - I have a related question about loading multiple HMM files mentioned in #24.

I'm working on a tool that use pyhmmer hmmsearch module to find orthologs among several sequences against the busco dataset. It worked great when searching against the fungi_odb10 dataset (contains 758 markers). But when I tested with mammalia_odb10 (with 9,226 markers) and vertebrata_odb10 (with 3,354 markers) markersets, a file not found error occurred even that file do exist.

I did a bunch of test and found that the file not found error always occurred on the 1020th markers. And later I realize it might be related to the system constrain. Many of the system limit a user to open up to 1024 files at the same time. (according to cmd ulimit -a)

Although might not be directly related to your package, do you have any suggestion about opening up to 1024 files through the context manager? Or is that possible to keep the HMM information after close the file?

@althonos
Copy link
Owner

Hi, you can indeed pre-load all you HMMs in memory first (into a list) so that you don't have to keep all the files open, hmmsearch can be given any iterable of HMM, not just a HMMFile. You could also change the implementation from #24 into an iterator that would open and close each file iteratively instead, e.g.

class HMMFiles(typing.Iterable[HMM]):
    def __init__(self, files):
        self.files = list(files)
    def __iter__(self):
        for file in self.files:
            with HMMFile(file) as hmm_file:
                yield from hmm_file

@chtsai0105
Copy link
Author

Thank you! I almost forgot the hmmsearch can also take HMM iterable as input.

@althonos althonos added the question Further information is requested label Jul 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants