title | excerpt | products | keywords | |||||||
---|---|---|---|---|---|---|---|---|---|---|
Integrate Apache Airflow with Timescale Cloud |
Apache Airflow is a platform to programmatically author, schedule, and monitor workflows. Integrate Apache Airflow with Timescale Cloud and create a data pipeline |
|
|
import IntegrationPrereqs from "versionContent/_partials/_integration-prereqs.mdx";
Apache Airflow® is a platform created by the community to programmatically author, schedule, and monitor workflows.
A DAG (Directed Acyclic Graph) is the core concept of Airflow, collecting Tasks together,
organized with dependencies and relationships to say how they should run. You declare a DAG in a Python file
in the $AIRFLOW_HOME/dags
folder of your Airflow instance.
This page shows you how to use a Python connector in a DAG to integrate Apache Airflow with a $SERVICE_LONG.
-
Install Python3 and pip3
-
Install Apache Airflow
Ensure that your Airflow instance has network access to $CLOUD_LONG.
This example DAG uses the company
table you create in Optimize time-series data in hypertables
To install the Python libraries required to connect to $CLOUD_LONG:
-
Enable PostgreSQL connections between Airflow and $CLOUD_LONG
pip install psycopg2-binary
-
Enable PostgreSQL connection types in the Airflow UI
pip install apache-airflow-providers-postgres
In your Airflow instance, securely connect to your $SERVICE_LONG:
-
Run Airflow
On your development machine, run the following command:
airflow standalone
The username and password for Airflow UI are displayed in the
standalone | Login with username
line in the output. -
Add a connection from Airflow to your $SERVICE_LONG
- In your browser, navigate to
localhost:8080
, then selectAdmin
>Connections
. - Click
+
(Add a new record), then use your connection info to fill in the form. TheConnection Type
isPostgres
.
- In your browser, navigate to
To exchange data between Airflow and your $SERVICE_LONG:
-
Create and execute a DAG
To insert data in your $SERVICE_LONG from Airflow:
-
In
$AIRFLOW_HOME/dags/timescale_dag.py
, add the following code:from airflow import DAG from airflow.operators.python_operator import PythonOperator from airflow.hooks.postgres_hook import PostgresHook from datetime import datetime def insert_data_to_timescale(): hook = PostgresHook(postgres_conn_id='the ID of the connenction you created') conn = hook.get_conn() cursor = conn.cursor() """ This could be any query. This example inserts data into the table you create in: https://docs.timescale.com/getting-started/latest/try-key-features-timescale-products/#optimize-time-series-data-in-hypertables """ cursor.execute("INSERT INTO crypto_assets (symbol, name) VALUES (%s, %s)", ('NEW/Asset','New Asset Name')) conn.commit() cursor.close() conn.close() default_args = { 'owner': 'airflow', 'start_date': datetime(2023, 1, 1), 'retries': 1, } dag = DAG('timescale_dag', default_args=default_args, schedule_interval='@daily') insert_task = PythonOperator( task_id='insert_data', python_callable=insert_data_to_timescale, dag=dag, )
This DAG uses the
company
table created in Create regular PostgreSQL tables for relational data. -
In your browser, refresh the Airflow UI.
-
In
Search DAGS
, typetimescale_dag
and press ENTER.
-
-
Verify that the data appears in $CLOUD_LONG
-
In Timescale Console, navigate to your service and click
SQL editor
. -
Run a query to view your data. For example:
SELECT symbol, name FROM company;
.You see the new rows inserted in the table.
-
You have successfully integrated Apache Airflow with $CLOUD_LONG and created a data pipeline.