Machine Learning Operations involves the infrastructure required to scale your ML capabilities - I cover the motivations and concepts in my talk at the 2018 EuroSciPy Conference on Scalable Data Science: The State of MLOps in 2018.
- This repository contains a curated list of awesome resources that will help you kick-start or enhance your machine learning operations
- Machine Learning Operations involve everything that is required to serve your ML, including deploying, monitoring, scaling, versioning, etc
- PMLL - The Predictive Model Markup Language standard in XML - (Video) [//]: #_
- Data Version Control (DVC) - A git fork that allows for version management of models
- ModelDB - Framework to track all the steps in your ML code to keep track of what version of your model obtained which accuracy, and then visualise it and query it via the UI
- Pachyderm - Open source distributed processing framework build on Kubernetes focused mainly on dynamic building of production machine learning pipelines - (Video)
- Jupyter Notebooks - Web interface python sandbox environments for reproducible development
- H2O Flow - Jupyter notebook-like inteface for H2O to create, save and re-use "flows"
- EdgeDB - NoSQL interface for Postgres that allows for object interaction to data stored
- BayesDB - Database that allows for built-in non-parametric Bayesian model discovery and queryingi for data on a database-like interface - (Video)
- Apache Arrow - In-memory columnar representation of data compatible with Pandas, Hadoop-based systems, etc
- Apache Parquet - On-disk columnar representation of data compatible with Pandas, Hadoop-based systems, etc
- Kafka
- auto-sklearn - Framework to automate algorithm and hyperparameter tuning for sklearn
- TPOT - Automation of sklearn pipeline creation (including feature selection, pre-processor, etc)
- Featuretools - An open source framework for automated feature engineering
- Colombus - A scalable framework to perform exploratory feature selection implemented in R
- automl - Automated feature engineering, feature/model selection, hyperparam. optimisation
- Seldon - Open source platform for deploying and monitoring machine learning models in kubernetes - (Video)
- Redis-ML - Module available from unstable branch that supports a subset of ML models as Redis data types
- MLeap - Standardisation of pipeline and model serialization for Spark, Tensorflow and sklearn
- Skytree 16.0 - End to end machine learning platform (Video)
- Algorithmia - Cloud platform to build, deploy and serve machine learning models (Video)
- y-hat - Deployment, updating and monitoring of predictive models in multiple languages(Video)
- Airflow
- Luigi
- Pinball
- Genie - Job orchestration engine to interface and trigger the execution of jobs from Hadoop-based systems
- Oozie - Workflow scheduler for Hadoop jobs
- Talend Studio