Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize predict subcommand #117

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

koute
Copy link

@koute koute commented Mar 13, 2022

Predicting from the CLI is painfully slow since it runs only on a single thread, so here's a quick fix.

Caveat: this essentially preloads the whole CSV into memory. It is still a net improvement though since prediction on a single core becomes a problem far earlier (for a CSV which is less than 100MB it's already unusably slow) than consuming too much memory does.

@nitsky
Copy link
Contributor

nitsky commented Mar 13, 2022

@koute thanks for the PR! We decided to make prediction single threaded for exactly the reason you identified: to avoid buffering the whole CSV. I agree that it is a good idea to provide the option to parallelize predictions, but I think we should put it behind a command line argument, --parallel, that explains the tradeoff. Would you be willing to update this PR with that change?

@koute
Copy link
Author

koute commented Mar 13, 2022

Okay, now it doesn't read the whole CSV but will still run in parallel. (:

Speed-wise it's the same; the old version with preloading the CSV:

real	0m11.739s
user	10m29.132s
sys	0m14.350s

the new version without preloading the CSV:

real	0m11.425s
user	9m53.702s
sys	0m51.725s

Is this acceptable or do you still want me to add an argument to disable parallelization?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants