Don't retry failures because of execution timeouts #9232

OleksandrBerchenko · 2020-06-11T13:41:09Z

Description

When reaching execution_timeout, a task fails but continues to retry. Either it should not be retried in this situation, or there should be a possibility to define another timeout for the "total" task execution, taking into account all retries.

Use case / motivation

In our case current behavior makes execution_timeout feature useless: we have retries in place to prevent random issues like network connectivity. At the same time, we want to make sure that the tasks don't run for too long and execution_time + retries would just make them running even longer.

See also https://stackoverflow.com/questions/53830604/airflow-execution-timeout-resetting-every-retry: one more request for the same feature.

boring-cyborg · 2020-06-11T13:41:10Z

Thanks for opening your first issue here! Be sure to follow the issue template!

Why? If the Airflow scheduler is restarted (e.g. due to rollout of a new Kubernetes version), then the timeout behaviour of a sensor should not be affected. But before the code change, the timer would start from zero again when the sensor is retried. This was unexpected. Solution: When a sensor is retried, then the sensor uses the start date of the earliest try to justify a time-out. To stay backwards-compatible, the new behaviour is only active when explicitly activated for that sensor. Note: The exponential backoff feature for poking still uses the start date of the current try. This is to keep the code change small. No issues expected from that. related: apache#9232 (the linked issue cares about execution_timeout for tasks in general, not only sensors)

OleksandrBerchenko added the kind:feature Feature Requests label Jun 11, 2020

rwitzel mentioned this issue Oct 27, 2020

Support consistent timeout for retried sensor rwitzel/airflow#1

Open

13 tasks

rwitzel mentioned this issue Oct 27, 2020

Support consistent timeout for retried sensor #11887

Closed

jscheffl added area:Scheduler including HA (high availability) scheduler area:core labels Aug 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't retry failures because of execution timeouts #9232

Don't retry failures because of execution timeouts #9232

OleksandrBerchenko commented Jun 11, 2020

boring-cyborg bot commented Jun 11, 2020

Don't retry failures because of execution timeouts #9232

Don't retry failures because of execution timeouts #9232

Comments

OleksandrBerchenko commented Jun 11, 2020

boring-cyborg bot commented Jun 11, 2020