Work with multiprocessing #21

jpjarnoux · 2022-06-20T15:39:47Z

Hi,
I would work with multiple CPU, but I don't understand how to give more than one CPU to pyhmmer.
So I tried to use multiprocessing packages, but pyhmmer object are non-trivial __cinit__.
Example :
multiprocessing.pool.MaybeEncodingError: Error sending result: '<pyhmmer.plan7.TopHits object at 0x561959114ad0>'. Reason: 'TypeError('no default __reduce__ due to non-trivial __cinit__')'

Could you give me an example to use pyhmmer with more than one CPU if it's possible ?
Thanks

The text was updated successfully, but these errors were encountered:

althonos · 2022-06-20T16:27:56Z

Hi @jpjarnoux

pyhmmer releases the GIL where applicable, so you don't have to use processes to get it to work, threads will work efficiently as well. Try using multiprocessing.pool.ThreadPool instead of multiprocessing.pool.Pool, this should already give you some decent performance (or use pyhmmer.hmmsearch which does it for you). Otherwise, I'll try adding pickle support to TopHits when I have some time.

jpjarnoux · 2022-06-20T16:41:29Z

Okay thanks it's what I was reading. However if I have 16 cpu available it's look like they are not fully used. Maybe it's possible to say it to GLI ?
I will try your advice tomorrow and keep you in touch.
Thanks

althonos · 2022-06-20T22:19:45Z

Then it really depends what you are trying to achieve, I cannot really guess without seeing your usecase, perhaps you don't have enough target sequences to make complete use of all your CPUs.

In my benchmark, I also noticed that HMMER was having a hard time using more than the number of physical CPUs because it's using too many SIMD registers to benefit from hyperthreading. It could be that you're on a machine with 8 physical / 16 logical cores; in that case, you'll see no improvement using 16 jobs instead of 8.

jpjarnoux · 2022-06-21T07:40:23Z

Sorry, I should explain more clearly what I'm doing. I'm trying to annotate proteins with 4000 thousand HMM. I have one file by HMM. Before I created one DB with all my HMM. Now, to be more efficient, I'm trying to split with multiple DB and to concatenate results.
I keep you in touch, thank you.

jpjarnoux · 2022-06-22T15:21:03Z

Hi
I finally used the concurrent.futures.ThreadPoolExecutor and everything work very efficiently.
Thanks for your help.

althonos · 2022-06-22T16:02:13Z

Happy to hear this!

althonos added the question Further information is requested label Jun 20, 2022

jpjarnoux closed this as completed Jun 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Work with multiprocessing #21

Work with multiprocessing #21

jpjarnoux commented Jun 20, 2022

althonos commented Jun 20, 2022

jpjarnoux commented Jun 20, 2022 •

edited

Loading

althonos commented Jun 20, 2022

jpjarnoux commented Jun 21, 2022

jpjarnoux commented Jun 22, 2022

althonos commented Jun 22, 2022

Work with multiprocessing #21

Work with multiprocessing #21

Comments

jpjarnoux commented Jun 20, 2022

althonos commented Jun 20, 2022

jpjarnoux commented Jun 20, 2022 • edited Loading

althonos commented Jun 20, 2022

jpjarnoux commented Jun 21, 2022

jpjarnoux commented Jun 22, 2022

althonos commented Jun 22, 2022

jpjarnoux commented Jun 20, 2022 •

edited

Loading