Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Purpose of This process waited for (...) seconds without receiving a command from main. #2737

Closed
thomasbbrunner opened this issue Jan 30, 2025 · 1 comment · Fixed by #2752

Comments

@thomasbbrunner
Copy link
Contributor

thomasbbrunner commented Jan 30, 2025

Hello,

We've recently been using the MultiSyncDataCollector and encountered this error (raised here in collectors.py):

RuntimeError: This process waited for 1000.0 seconds without receiving a command from main. Consider increasing the maximum idle count if this is expected via the environment variable MAX_IDLE_COUNT (current value is 1000).

In our case, this error is actually not an error. We're doing some other tasks in between collecting data, which ends up triggering this.

The error message itself points to the solution (setting MAX_IDLE_COUNT). However, we feel like this is not a very ergonomic interface.

To the questions:

  • Out of curiosity, what is the motivation behind this error?
  • Could this be potentially replaced by a warning instead of an error?
  • Would you be open for changing this interface? For instance, passing the timeout value as an argument + optionally disabling it altogether.
@vmoens
Copy link
Contributor

vmoens commented Jan 30, 2025

Out of curiosity, what is the motivation behind this error?

In the early days of the lib we were very concerned about an env erroring and having a worker mysteriously hanging, this was intented as a safeguard (mainly for the tests I must admit!)
Now that more and more of us are doing agentic stuff where the time it takes to get a batch out of a collector is nondeterministic, this makes less and less sense.
I'd be open to put that time to infinity by default and a short span for the tests, that way we get to quickly kill stalling processes but users won't be impacted as much.

(in the meantime you can do MAX_IDLE_COUNT=A_SUPER_BIG_NUMBER python myscript.py)

@vmoens vmoens linked a pull request Feb 3, 2025 that will close this issue
@vmoens vmoens closed this as completed Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants