-
Notifications
You must be signed in to change notification settings - Fork 14.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Xcomms with Taskflow API cause unwanted dependencies in UI #17686
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! |
I'm going to take a look at this one. |
I made a bunch of mistakes trying to make this work until I looked at the non-taskflow example. from datetime import datetime, timedelta
from airflow.decorators import dag
from airflow.decorators import task
def get_run_date():
return '01-01-2021'
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5)
}
@dag(default_args=default_args,
schedule_interval=None,
start_date=datetime(2021, 8, 1),
dagrun_timeout=timedelta(hours=4),
max_active_runs=1)
def xcomm_taskflow_dag():
@task()
def set_date():
date ="1-2-21"
return date
@task()
def xcomm_task2(**kwargs):
date=kwargs['ti'].xcom_pull(task_ids="set_date")
print(f"xcomm_task2:{date}")
@task()
def xcomm_task3(**kwargs):
date=kwargs['ti'].xcom_pull(task_ids="set_date")
print(f"xcomm_task3:{date}")
set_date() >> xcomm_task2() >> xcomm_task3()
xcomm_taskflow_dag=xcomm_taskflow_dag() |
Another way that will also work: @dag(default_args=default_args,
schedule_interval=None,
start_date=datetime(2021, 8, 1),
dagrun_timeout=timedelta(hours=4),
max_active_runs=1)
def xcomm_taskflow_dag():
@task()
def set_date():
date ="1-2-21"
return date
@task()
def xcomm_task2(date):
print(f"xcomm_task2:{date}")
@task()
def xcomm_task3(**kwargs):
date=kwargs['ti'].xcom_pull(task_ids="set_date")
print(f"xcomm_task3:{date}")
xcomm_task2(set_date()) >> xcomm_task3()
xcomm_taskflow_dag=xcomm_taskflow_dag() I think the documentation need to be improved |
I'm still inclined to call this a bug @ephraimbuddy. Here's why. I believe that all of the methods to define this should have the same behavior.
should output the same DAG as
which should output the same DAG as
I'm going to keep digging into this and see what differences there are between the 3 methods. |
I believe that no matter how we define the DAG, and as long as the code is valid, it should deterministically create the same result. |
I believe that the culprit for this bug is the |
Sorry about the comment spam, I just keep figuring out more and more about this issue. Here's what I've come up with from the research that I've done today. By doing I believe that it would make more sense within this context to not automatically set this dependency, even if it makes more sense logically to do so. We know logically that in order to pull an XCOM from a task, that previous task which pushes the XCOM must have been run first. So why not set that dependency automatically? The main diagram of the DAG and use case in the original post for this ticket gives the answer to that. By implicitly setting the dependencies with If we change to explicitly setting the dependencies, that would require the user to logically understand the relationship between tasks and XCOMs but also falls more in line with Zen of Python guidelines about explicit vs. implicit. Thoughts? |
I have thought about this and I think it's ok as it is just that we currently don't know how to use it. If we understand that calling a task decorated function in airflow produces an XComArg then it's not implicit that the relationship is set as is. From the PythonOperator code, you could see that xcoms were passed explicitly, but in the TaskFlow example, because we didn't know we could pass xcom explicitly as we did in the PythonOperator, we created extra tasks while setting up the dependencies. Anytime we call a task decorated function gives us an XComArg. It needs documentation especially in passing xcom as users think they can call the task decorated function many times and it remains the same task. Where possible, we should use |
We need to document the difference between passing |
Can someone assign me to this? TY! |
Quite agree - there are a number of things in the taskflow that are "implicit" or not really documented (I just fixe one of those seeming obvious things about context #18868 which was not obvious at all for new users (or even seasoned committers :D ) . So more docs on how TaskFlow works is really needed. |
In progress right now! Actively writing as we speak @potiuk |
@alex-astronomer are you still working on this issue? |
Yes I am, should have more time this coming week to get in a PR. Thanks for your patience. |
hey @alex-astronomer is this still happening in latest airflow version? |
hey there! @alex-astronomer any updates on this. The XCOM dependency edges on more complex DAGs are really annoying and make the DAGs (mainly dynamic dags) really hard to understand. I makes any developer question himself/herself if it's better to use taskflow (and the returned outputs/ wrapped XCOMS) to have more readable DAG python code vs defaulting to JINJA templating to pass values so that the graph is not rendered with so many edges. |
Is there any progress on this? I've been using TaskFlow and Dynamically Mapped Tasks as they seem the current best practice, but the dependencies are much harder to get working than "old school" task declaration. It's been unfortunate to bump into this issue, as most everything else about TaskFlow is easy to use and understand. |
Apache Airflow version:2.1.1
OS:Debian GNU/Linux 10
Apache Airflow Provider versions:
Deployment:Astronomer
What happened:
When Using the TaskFlow API To implement XComms, you can get a unwanted dependency created in the UI
XCOMM DAG using TaskFlow API
Because both downstream tasks use the varible created by the first task there is a direct connection from each of the task to the first task.
This also duplicates tasks in the Tree View (xcomm_task3)

This is a simple example, but one can imagine it could become quite unwieldy with complex DAGs
What you expected to happen:
If we create an XCOMM without using the TaskFlow API, there is no unwanted dependency created, just the one explicitly stated using the bit shift operators
XCOMM DAG without Using TaskFlow API
There are no duplicated tasks in the Tree View
How to reproduce it:
Upload the Included DAGs and View in the UI
Anything else we need to know: I understand how this could be designed this way as there is not only a downstream dependency, but direct dependency between the first DAG and downstream DAG. The old version did not create this dependency and the new TaskFlow API does. This may be feature request along the lines of Config setting to disable downstream dependencies or not create a dependency based of the TaskFlow API, or at least not have them shown in the UI. It would be nice to have clearer code like in the TaskFlow API example and also clearer UI like in the non-TaskFlow API example.
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: