It is year 2 for our jaffle shop and the shop owner started collecting advanced attributes about the orders.
We are tasked to understand what kind of jaffles we make the most money from.
So we decided to run a clustering algorithm to separate the orders into 3 different clusters and then to calculate all the revenue for each cluster. Once our clustering algorithm is done, we want to send a notification to a Slack channel notifying our team that the results are ready for viewing.
The fal + dbt-fal combination is the perfect tool for the task at hand:
-
dbt-fal
adapter lets us iterate on and ship our clustering algorithm right within our dbt project. This Python model works with any data warehouse, including ones without dbt Python support such as Postgres and Redshift. For Bigquery, it makes it easier to run Python models without having to stand up a Dataproc cluster. -
fal flow
CLI command lets us send a Slack notification, via a Python post-hook. We could also execute any arbitrary Python here, for example to push data to external services.
With this combo, you won't have to leave your dbt project and still add more capabilities to your stack.
- Install
fal
anddbt-fal
$ pip install fal dbt-fal[postgres]
# Add your favorite adapter here
- Specify the
fal
adapter in yourprofiles.yml
:
jaffle_shop:
target: fal_dev
outputs:
pg_dev:
type: postgres
host: localhost
port: 5432
user: pguser
password: pass
dbname: test
schema: public
threads: 4
fal_dev:
type: fal
db_profile: pg_dev # This points to the adapter to use for SQL-related tasks
With this profiles configuration, fal will run all the Python models and will leave the SQL models to the db_profile
.
Install the data science libraries to run the clustering script.
We now use fal_project.yml
to create isolated environments for Python models.
- Seed the test data
$ dbt seed
- Run your dbt models
$ dbt run
## Runs the SQL models on the datawarehouse and Python models locally with fal
- Run
fal flow run
to execute the full graph including fal Python scripts, that is thefal_scripts/notify.py
script. You can use the dbt graph selectors and much more. Withfal flow run
, you will not have to rundbt run
since fal handles the full execution.
$ fal flow run
## Runs dbt run and the associated scripts, in this case a Slack notification is triggered
If you would like to learn more about managing multiple environments for Python models and more, check out the docs!
Or say hi 👋 in the #tools-fal Slack channel.