The Open Source Repository of Flyte-based Projects
The purpose of this repository is to showcase Flyte's capabilities in end-to-end applications that do some form of data processing or machine learning.
The source code for each project can be found in the projects
directory, where each project has its
own set of dependencies.
Fork the repo on github, then clone it:
git clone https://github.com/<your-username>/flytelab
π Note |
---|
Make sure you're using Python > 3.7 |
Create a new branch for your project:
git checkout -b my_project # replace this with your project name
π Note |
---|
For MLOps Community Engineering Labs Hackathon participants: Each team will have its own branch on the main flyteorg/flytelab repo. If you're part of a team of more than one person, assign one teammate to create a project directory and push it into your team's branch. |
We use cookiecutter
to manage project templates.
Install prerequisites:
pip install cookiecutter
In the root of the repo, create a new project:
cookiecutter templates/basic -o projects
π Note |
---|
There are more templates in the templates directory depending on the requirements of your project. |
Answer the project setup questions:
project_name: my_project # replace this with your project name (can only contain alphanumeric characters and `_`)
project_author: foobar # replace this with your name
github_username: my_username # replace this with your github username
description: project description # optional
π Note |
---|
For MLOps Community Engineering Labs Hackathon participants: project_author should be your team name. |
The project structure looks like the following:
.
βββ Dockerfile
βββ README.md
βββ dashboard
βΒ Β βββ app.py # streamlit app
βΒ Β βββ remote.config
βΒ Β βββ sandbox.config
βββ deploy.py # deployment script
βββ my_project
βΒ Β βββ __init__.py
βΒ Β βββ workflows.py # flyte workflows
βββ requirements-dev.txt
βββ requirements.txt
Go into the project directory, then create your project's virtual environment:
cd projects/my_project
# create and activate virtual environment, name the venv whatever you want
python -m venv ~/venvs/my_project
source ~/venvs/my_project/bin/activate
# install requirements
pip install -r requirements.txt -r requirements-dev.txt
Run Flyte workflows locally:
python my_project/workflows.py
You should see something like this in the output (you can ignore the warnings):
trained model: LogisticRegression()
Congrats! You just setup your flytelab project π.
You can now modify and iterate on the workflows.py
file to create your very own Flyte
workflows using flytekit
. You can refer to the
User Guide,
Tutorials,
and Flytekit API Reference to
learn more about all of Flyte
's capabilities.
So far you've probably been running your workflows locally by invoking python my_project/workflows.py
.
The first step to deploying your workflows to a Flyte cluster is to test it out on a
local sandbox cluster.
Make sure you have docker installed.
Then install flytectl
:
OSX
brew install flyteorg/homebrew-tap/flytectl
Other Operating Systems
curl -sL https://ctl.flyte.org/install | sudo bash -s -- -b /usr/local/bin # You can change path from /usr/local/bin to any file system path
export PATH=$(pwd)/bin:$PATH # Only required if user used different path then /usr/local/bin
Start the sandbox cluster from your projects/my_project
directory:
flytectl sandbox start --source .
βΉ Interacting with Flyte sandbox
Get the status of sandbox:
flytectl sandbox status
Teardown the sandbox:
flytectl sandbox teardown
π Note |
---|
If you're having trouble getting the Flyte sandbox to start, see the troubleshooting guide. |
You should now be able to go to http://localhost:30081/console
on your browser to see the Flyte UI.
git commit
your changes, then deploy your project's workflows with:
python deploy.py
Expected output
You should see something like:
Successfully packaged 4 flyte objects into /Users/nielsbantilan/git/flytelab/projects/my_project/flyte-package.tgz
Registering Flyte workflows
---------------------------------------------------------------- --------- ------------------------------
| NAME (4) | STATUS | ADDITIONAL INFO |
---------------------------------------------------------------- --------- ------------------------------
| /tmp/register724861421/0_my_project.workflows.get_dataset_1.pb | Success | Successfully registered file |
---------------------------------------------------------------- --------- ------------------------------
| /tmp/register724861421/1_my_project.workflows.train_model_1.pb | Success | Successfully registered file |
---------------------------------------------------------------- --------- ------------------------------
| /tmp/register724861421/2_my_project.workflows.main_2.pb | Success | Successfully registered file |
---------------------------------------------------------------- --------- ------------------------------
| /tmp/register724861421/3_my_project.workflows.main_3.pb | Success | Successfully registered file |
---------------------------------------------------------------- --------- ------------------------------
4 rows
βΉ What just happened?
The python deploy.py
command just did the following:
- Built a docker image specified in your project's
Dockerfile
from within the sandbox docker container. flytekit
serializes your tasks and workflows into aflyte-package.tar.gz
file.flytectl
registers those Flyte-compatible artifacts to the playground cluster.
On the Flyte UI, you'll see a flytelab-<project-name>
project namespace on the homepage.
Navigate to the my_project.workflows.main
workflow and hit the Launch Workflow
button, then
the Launch
button on the model form.
π Congrats! You just kicked off your first workflow on your local Flyte sandbox cluster.
By default, Flyte uses docker images to encapsulate all the system and python dependencies of your application. If you update those dependencies then you'll need to re-build the docker image. However, if you want to quickly deploy code changes in your tasks/workflows, you can go through fast registration:
python deploy.py --fast
The Union.ai team maintains a playground Flyte cluster that you can use to run your workflows.
When you're ready to deploy your workflows to a full-fledged production Flyte cluster, first you'll need to
request an account on the Flyte OSS Slack #flytelab
channel.
π Note |
---|
For MLOps Community Engineering Labs Hackathon participants: you will receive these credentials after all teams have been finalized. |
You'll receive a username
and password
to sign into the Union.ai Playground, in addition to a client_id
and client_secret
if you want to use the FlyteRemote object to get the input and output data of your workflow executions from the playground.
Create a personal access token (PAT) on github. Make sure to give your PAT read and write access to packages
Then authenticate to the ghcr.io
registry:
export CONTAINER_REPO_TOKEN="<your-token>"
echo $CONTAINER_REPO_TOKEN | docker login ghcr.io -u <your-username> --password-stdin
Then, deploying to the playground is as simple as:
python deploy.py --remote
βΉ What just happened?
The python deploy.py --remote
command just did the following:
- Built a docker image specified in your project's
Dockerfile
. - Pushed the image to the github container registry under your username's package namespace.
flytekit
serializes your tasks and workflows into aflyte-package.tgz
file.flytectl
registers those Flyte-compatible artifacts to the playground cluster.
Go to https://github.com/<your-username>/flytelab/pkgs/container/flytelab
and you should see a package called flytelab
, then:
- Click Add Repository to link your fork of the
flytelab
repo. - Scroll down to the Danger Zone, click Change visibility, and make the package public.
Finally, go to https://playground.hosted.unionai.cloud, authenticate with your union.ai playground
username
and password
, where you can navigate to your flytelab-<project-name>
project
to run your workflows.
π Note |
---|
Fast registering is currently not enabled in the Union.ai playground. |
The basic
project template ships with a dashboard/app.py
script that uses
streamlit
as a UI for interacting with your model.
pip install streamlit
streamlit run dashboard/app.py
To access the data on the Union.ai playground, first export your client_id
and client_secret
to your terminal session.
export FLYTE_CREDENTIALS_CLIENT_ID="<client_id>"
export FLYTE_CREDENTIALS_CLIENT_SECRET="<client_secret>"
Then start serving your streamlit app with:
streamlit run dashboard/app.py -- --remote
If you want to use streamlit cloud to deploy your app to share with the world, push your changes to the remote github branch you're working from and point streamlit cloud to the streamlit app script:
flytelab/projects/my_project/dashboard/app.py
You'll need to use their Secrets management system on the streamlit cloud UI to add your client id and secret credentials so that it has access to the playground cluster:
FLYTE_BACKEND = "remote" # point the app to the playground backend
FLYTE_CREDENTIALS_CLIENT_ID = "<client_id>" # replace this with your client id
FLYTE_CREDENTIALS_CLIENT_SECRET = "<client_secret>" # replace this with your client secret
You can also add additional secrets to the secrets file if needed.