You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi - I have a related question about loading multiple HMM files mentioned in #24.
I'm working on a tool that use pyhmmer hmmsearch module to find orthologs among several sequences against the busco dataset. It worked great when searching against the fungi_odb10 dataset (contains 758 markers). But when I tested with mammalia_odb10 (with 9,226 markers) and vertebrata_odb10 (with 3,354 markers) markersets, a file not found error occurred even that file do exist.
I did a bunch of test and found that the file not found error always occurred on the 1020th markers. And later I realize it might be related to the system constrain. Many of the system limit a user to open up to 1024 files at the same time. (according to cmd ulimit -a)
Although might not be directly related to your package, do you have any suggestion about opening up to 1024 files through the context manager? Or is that possible to keep the HMM information after close the file?
The text was updated successfully, but these errors were encountered:
Hi, you can indeed pre-load all you HMMs in memory first (into a list) so that you don't have to keep all the files open, hmmsearch can be given any iterable of HMM, not just a HMMFile. You could also change the implementation from #24 into an iterator that would open and close each file iteratively instead, e.g.
class HMMFiles(typing.Iterable[HMM]):
def __init__(self, files):
self.files = list(files)
def __iter__(self):
for file in self.files:
with HMMFile(file) as hmm_file:
yield from hmm_file
Hi - I have a related question about loading multiple HMM files mentioned in #24.
I'm working on a tool that use pyhmmer hmmsearch module to find orthologs among several sequences against the busco dataset. It worked great when searching against the fungi_odb10 dataset (contains 758 markers). But when I tested with mammalia_odb10 (with 9,226 markers) and vertebrata_odb10 (with 3,354 markers) markersets, a file not found error occurred even that file do exist.
I did a bunch of test and found that the file not found error always occurred on the 1020th markers. And later I realize it might be related to the system constrain. Many of the system limit a user to open up to 1024 files at the same time. (according to cmd
ulimit -a
)Although might not be directly related to your package, do you have any suggestion about opening up to 1024 files through the context manager? Or is that possible to keep the HMM information after close the file?
The text was updated successfully, but these errors were encountered: