Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work with multiprocessing #21

Closed
jpjarnoux opened this issue Jun 20, 2022 · 6 comments
Closed

Work with multiprocessing #21

jpjarnoux opened this issue Jun 20, 2022 · 6 comments
Labels
question Further information is requested

Comments

@jpjarnoux
Copy link

Hi,
I would work with multiple CPU, but I don't understand how to give more than one CPU to pyhmmer.
So I tried to use multiprocessing packages, but pyhmmer object are non-trivial __cinit__.
Example :
multiprocessing.pool.MaybeEncodingError: Error sending result: '<pyhmmer.plan7.TopHits object at 0x561959114ad0>'. Reason: 'TypeError('no default __reduce__ due to non-trivial __cinit__')'

Could you give me an example to use pyhmmer with more than one CPU if it's possible ?
Thanks

@althonos
Copy link
Owner

Hi @jpjarnoux

pyhmmer releases the GIL where applicable, so you don't have to use processes to get it to work, threads will work efficiently as well. Try using multiprocessing.pool.ThreadPool instead of multiprocessing.pool.Pool, this should already give you some decent performance (or use pyhmmer.hmmsearch which does it for you). Otherwise, I'll try adding pickle support to TopHits when I have some time.

@althonos althonos added the question Further information is requested label Jun 20, 2022
@jpjarnoux
Copy link
Author

jpjarnoux commented Jun 20, 2022

Okay thanks it's what I was reading. However if I have 16 cpu available it's look like they are not fully used. Maybe it's possible to say it to GLI ?
I will try your advice tomorrow and keep you in touch.
Thanks

@althonos
Copy link
Owner

Then it really depends what you are trying to achieve, I cannot really guess without seeing your usecase, perhaps you don't have enough target sequences to make complete use of all your CPUs.

In my benchmark, I also noticed that HMMER was having a hard time using more than the number of physical CPUs because it's using too many SIMD registers to benefit from hyperthreading. It could be that you're on a machine with 8 physical / 16 logical cores; in that case, you'll see no improvement using 16 jobs instead of 8.

@jpjarnoux
Copy link
Author

Sorry, I should explain more clearly what I'm doing. I'm trying to annotate proteins with 4000 thousand HMM. I have one file by HMM. Before I created one DB with all my HMM. Now, to be more efficient, I'm trying to split with multiple DB and to concatenate results.
I keep you in touch, thank you.

@jpjarnoux
Copy link
Author

Hi
I finally used the concurrent.futures.ThreadPoolExecutor and everything work very efficiently.
Thanks for your help.

@althonos
Copy link
Owner

Happy to hear this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants