Skip to content

Commit

Permalink
Merge pull request fuzzylabs#6 from lmwilki/adding-bailo
Browse files Browse the repository at this point in the history
Added governance section along with Bailo
  • Loading branch information
archena authored Apr 1, 2022
2 parents f51aa1b + bfa9642 commit b873fbb
Showing 1 changed file with 35 additions and 29 deletions.
64 changes: 35 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,16 @@ This is the [Fuzzy Labs](https://fuzzylabs.ai) guide to the universe of free and

## Contents

* [What is MLOps, anyway?](#what-is-mlops-anyway)
* [What counts as open source?](#what-counts-as-open-source)
* [Data version control](#data-version-control)
* [Experiment tracking](#experiment-tracking)
* [Model training](#model-training)
* [Feature stores](#feature-stores)
* [Model deployment and serving](#model-deployment-and-serving)
* [Model monitoring](#model-monitoring)
* [Full stacks](#full-stacks)
* [More resources](#more-resources)
- [What is MLOps, anyway?](#what-is-mlops-anyway)
- [What counts as open source?](#what-counts-as-open-source)
- [Data version control](#data-version-control)
- [Experiment tracking](#experiment-tracking)
- [Model training](#model-training)
- [Feature stores](#feature-stores)
- [Model deployment and serving](#model-deployment-and-serving)
- [Model monitoring](#model-monitoring)
- [Full stacks](#full-stacks)
- [More resources](#more-resources)

# What is MLOps anyway?

Expand Down Expand Up @@ -42,46 +42,46 @@ Just like code, data grows and evolves over time. Data versioning tools help you
You might wonder why you can't just store data in Git (or equivalent). There are a few reasons this doesn't work, but the main one is size: Git is designed for small text files, and typical datasets used in machine learning are just too big. Some tools, like DVC, store the data externally, but also integrate with Git so that data versions can be linked to code versions.

| Name | License | Description |
|------------------------------------------------------------|------------|---------------------------------------------------------------------------------------------|
| ---------------------------------------------------------- | ---------- | ------------------------------------------------------------------------------------------- |
| [DVC](https://dvc.org) | Apache 2.0 | One of the most popular general-purpose data versioning tools. |
| [Delta Lake](https://delta.io) | Apache 2.0 | Data versioning for data warehouses. |
| [LakeFS](https://lakefs.io) | Apache 2.0 | Transform your object storage into a Git-like repository. |
| [LakeFS](https://lakefs.io) | Apache 2.0 | Transform your object storage into a Git-like repository. |
| [Git LFS](https://www.atlassian.com/git/tutorials/git-lfs) | MIT | Not specialised in machine learning use-cases, but another popular way to version datasets. |

# Experiment tracking

Machine learning involves a lot of experimentation. We end up training a lot of models, most of which are never intended to go into production, but represent progressive steps towards having something production-worthy. Experiment tracking tools are there to help us keep track of each experiment. What exactly do we need to track? typically this includes the code version, data version, input parameters, training performance metrics, as well as the final model assets.

| Name | License | Description |
|-------------------------------------------------------|------------|-------------|
| [Sacred](https://github.com/IDSIA/sacred) | MIT | |
| [Tensorboard](https://www.tensorflow.org/tensorboard) | Apache 2.0 | |
| [Guild.AI](https://guild.ai) | Apache 2.0 | |
| [MLFlow](https://mlflow.org) | Apache 2.0 | |
| Name | License | Description |
| ----------------------------------------------------- | ---------- | ----------------------------------------------------------------------------------------- |
| [Sacred](https://github.com/IDSIA/sacred) | MIT | |
| [Tensorboard](https://www.tensorflow.org/tensorboard) | Apache 2.0 | |
| [Guild.AI](https://guild.ai) | Apache 2.0 | |
| [MLFlow](https://mlflow.org) | Apache 2.0 | |
| [Kedro](https://kedro.readthedocs.io/) | Apache 2.0 | A Python framework for creating reproducible, maintainable and modular data science code. |

# Model training

| Name | License | Description |
|--------------------------------------------|------------|---------------------------------------------------------------------------------------------------|
| ------------------------------------------ | ---------- | ------------------------------------------------------------------------------------------------- |
| [MLFlow](https://mlflow.org) | Apache 2.0 | |
| [Kubeflow](https://www.kubeflow.org) | Apache 2.0 | |
| [Metaflow](https://metaflow.org) | Apache 2.0 | |
| [ZenML](https://github.com/zenml-io/zenml) | Apache 2.0 | An extensible, open-source MLOps framework to create production-ready machine learning pipelines. |

# Feature stores

| Name | License | Description |
|----------------------------|------------|---------------------------------------|
| [Feast](https://feast.dev) | Apache 2.0 | A complete open source feature store. |
| [Hopsworks](https://github.com/logicalclocks/hopsworks) | AGPL-3.0 | A feature store, feature engineering, and more. |
| Name | License | Description |
| ------------------------------------------------------- | ---------- | ----------------------------------------------- |
| [Feast](https://feast.dev) | Apache 2.0 | A complete open source feature store. |
| [Hopsworks](https://github.com/logicalclocks/hopsworks) | AGPL-3.0 | A feature store, feature engineering, and more. |

# Model deployment and serving

Model serving is the process of taking a trained model and presenting it behind a REST API, and this enables other software components to interact with a model. To make deployment of these model servers as simple as possible, it's commonplace to run them inside Docker containers and deploy them to a container orchestration system such as Kubernetes.

| Name | License | Description |
|--------------------------------------------------------|------------|----------------------------------------------------------|
| ------------------------------------------------------ | ---------- | -------------------------------------------------------- |
| [Seldon Core](https://github.com/SeldonIO/seldon-core) | Apache 2.0 | Turn your models into microservices to run on Kubernetes |
| [BentoML](https://github.com/bentoml/BentoML) | Apache 2.0 | |
| [Bodywork](https://www.bodyworkml.com) | AGPL-3.0 | |
Expand All @@ -92,7 +92,7 @@ Model serving is the process of taking a trained model and presenting it behind
Monitoring means making sure that each deployed model is both functioning, and producing sensible results. We don't just want to check for errors in the traditional sense, but also for things like drift and signs of bias in the predictions and decisions that come from a model.

| Name | License | Description |
|----------------------------------------------------------------------|------------|-------------------------------------------------------------------------------------------------------------------------------------------|
| -------------------------------------------------------------------- | ---------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
| [Evidently](https://evidentlyai.com) | Apache 2.0 | |
| [Boxkite ML](https://github.com/boxkite-ml/boxkite) | Apache 2.0 | |
| [Alibi Detect](https://github.com/SeldonIO/alibi-detect) (by Seldon) | Apache 2.0 | |
Expand All @@ -101,14 +101,20 @@ Monitoring means making sure that each deployed model is both functioning, and p
# Full stacks

| Name | License | Description |
|------------------------------------------------------------------------------------------------|---------|-------------|
| ---------------------------------------------------------------------------------------------- | ------- | ----------- |
| [Open MLOps](https://github.com/datarevenue-berlin/OpenMLOps) | MIT | |
| [You Don't Need a Bigger Boat](https://github.com/jacopotagliabue/you-dont-need-a-bigger-boat) | MIT | |

# Governance

| Name | License | Description |
| -------------------------------------- | ---------- | ----------------------------------------------------------------------------------------------------------------- |
| [Bailo](https://github.com/gchq/Bailo) | Apache 2.0 | Managing the lifecycle of machine learning to support scalability, impact, collaboration, compliance and sharing. |

# More resources

Here are some more resources for MLOps, both open-source and proprietary.

* [Top 10 Open Source MLOps Tools](https://thechief.io/c/editorial/top-10-open-source-mlops-tools)
* [Awesome MLOps](https://github.com/visenger/awesome-mlops) - a mixture of open source and proprietory tools and platforms.
* [Best open source MLOps tools](https://neptune.ai/blog/best-open-source-mlops-tools)
- [Top 10 Open Source MLOps Tools](https://thechief.io/c/editorial/top-10-open-source-mlops-tools)
- [Awesome MLOps](https://github.com/visenger/awesome-mlops) - a mixture of open source and proprietory tools and platforms.
- [Best open source MLOps tools](https://neptune.ai/blog/best-open-source-mlops-tools)

0 comments on commit b873fbb

Please sign in to comment.