You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When reaching execution_timeout, a task fails but continues to retry. Either it should not be retried in this situation, or there should be a possibility to define another timeout for the "total" task execution, taking into account all retries.
Use case / motivation
In our case current behavior makes execution_timeout feature useless: we have retries in place to prevent random issues like network connectivity. At the same time, we want to make sure that the tasks don't run for too long and execution_time + retries would just make them running even longer.
Why? If the Airflow scheduler is restarted (e.g. due to
rollout of a new Kubernetes version), then the timeout
behaviour of a sensor should not be affected.
But before the code change, the timer would start from
zero again when the sensor is retried. This was unexpected.
Solution: When a sensor is retried, then the sensor
uses the start date of the earliest try to justify
a time-out. To stay backwards-compatible, the new behaviour
is only active when explicitly activated for that sensor.
Note: The exponential backoff feature for poking still uses
the start date of the current try. This is to keep the
code change small. No issues expected from that.
related: apache#9232 (the linked issue cares about execution_timeout for tasks
in general, not only sensors)
Description
When reaching execution_timeout, a task fails but continues to retry. Either it should not be retried in this situation, or there should be a possibility to define another timeout for the "total" task execution, taking into account all retries.
Use case / motivation
In our case current behavior makes execution_timeout feature useless: we have retries in place to prevent random issues like network connectivity. At the same time, we want to make sure that the tasks don't run for too long and execution_time + retries would just make them running even longer.
See also https://stackoverflow.com/questions/53830604/airflow-execution-timeout-resetting-every-retry: one more request for the same feature.
The text was updated successfully, but these errors were encountered: