In this repository, I will share some useful notes and references about deploying deep learning-based models in production.
- PyTorch Production Level Tutorials [Fantastic]
- The road to 1.0: production ready PyTorch
- PyTorch 1.0 tracing JIT and LibTorch C++ API to integrate PyTorch into NodeJS [Good Article]
- Model Serving in PyTorch
- PyTorch Summer Hackathon [Very Important]
- Deploying PyTorch and Building a REST API using Flask [Important]
- PyTorch model recognizing hotdogs and not-hotdogs deployed on flask
- Serving PyTorch 1.0 Models as a Web Server in C++ [Useful Example]
- PyTorch Internals [Interesting & Useful Article]
- Flask application to support pytorch model prediction
- Serving PyTorch Model on Flask Thread-Safety
- Serving PyTorch Models on AWS Lambda with Caffe2 & ONNX
- Serving PyTorch Models on AWS Lambda with Caffe2 & ONNX (Another Version)
- EuclidesDB - multi-model machine learning feature database with PyTorch
- EuclidesDB - GitHub
- WebDNN: Fastest DNN Execution Framework on Web Browser
- FastAI PyTorch Serverless API (with AWS Lambda)
- FastAI PyTorch in Production (discussion)
- OpenMMLab Model Deployment Framework
- TorchServe [Great Tool]
- TorchServe Video Tutorial
- Loading a PyTorch Model in C++ [Fantastic]
- PyTorch C++ API [Bravo]
- An Introduction To Torch (Pytorch) C++ Front-End [Very Good]
- Blogs on using PyTorch C++ API [Good]
- ATen: A TENsor library
- Important Issue about PyTorch-like C++ interface
- PyTorch C++ API Test
- PyTorch via C++ [Useful Notes]
- AUTOGRADPP
- PyTorch C++ Library
- Direct C++ Interface to PyTorch
- A Python module for compiling PyTorch graphs to C
- How to deploy Machine Learning models with TensorFlow - Part1
- How to deploy Machine Learning models with TensorFlow - Part2
- How to deploy Machine Learning models with TensorFlow - Part3
- Neural Structured Learning (NSL) in TensorFlow [Great]
- Building Robust Production-Ready Deep Learning Vision Models
- Creating REST API for TensorFlow models
- "How to Deploy a Tensorflow Model in Production" by Siraj Raval on YouTube
- Code for the "How to Deploy a Tensorflow Model in Production" by Siraj Raval on YouTube
- How to deploy an Object Detection Model with TensorFlow serving [Very Good Tutorial]
- Freeze Tensorflow models and serve on web [Very Good Tutorial]
- How to deploy TensorFlow models to production using TF Serving [Good]
- How Zendesk Serves TensorFlow Models in Production
- TensorFlow Serving Example Projects
- Serving Models in Production with TensorFlow Serving [TensorFlow Dev Summit 2017 Video]
- Building TensorFlow as a Standalone Project
- TensorFlow C++ API Example
- TensorFlow.js
- Introducing TensorFlow.js: Machine Learning in Javascript
- Deep learning in production with Keras, Redis, Flask, and Apache [Rank: 1st & General Usefult Tutorial]
- Deploying a Keras Deep Learning Model as a Web Application in Python [Very Good]
- Deploying a Python Web App on AWS [Very Good]
- Deploying Deep Learning Models Part 1: Preparing the Model
- Deploying your Keras model
- Deploying your Keras model using Keras.JS
- "How to Deploy a Keras Model to Production" by Siraj Raval on Youtube
- Deploy Keras Model with Flask as Web App in 10 Minutes [Good Repository]
- Deploying Keras Deep Learning Models with Flask
- keras2cpp
- Model Server for Apache MXNet
- Running the Model Server
- Multi Model Server (MMS) Documentation
- Introducing Model Server for Apache MXNet
- Single Shot Multi Object Detection Inference Service
- Amazon SageMaker
- How can we serve MXNet models built with gluon api
- MXNet C++ Package
- MXNet C++ Package Examples
- MXNet Image Classification Example of C++
- MXNet C++ Tutorial
- An introduction to the MXNet API [Very Good Tutorial for Learning MXNet]
- GluonCV
- GluonNLP
- Model Quantization for Production-Level Neural Network Inference [Excellent]
- Cortex: Deploy machine learning models in production
- Cortex - Main Page
- Why we deploy machine learning models with Go — not Python
- Go-Torch
- Gotch - Go API for PyTorch
- TensorFlow Go Lang
- Go-onnx
- OpenVINO Toolkit - Deep Learning Deployment Toolkit repository [Great]
- ClearML - ML/DL development and production suite
- Model Deployment Using Heroku: A Complete Guide on Heroku [Good]
- NVIDIA Triton Inference Server [Great]
- NVIDIA Triton Inference Server - GitHub [Great]
- Cohere Boosts Inference Speed With NVIDIA Triton Inference Server
- NVIDIA Deep Learning Examples for Tensor Cores [Interesting]
- Deploying the Jasper Inference model using Triton Inference Server [Useful]
- Nvidia MLOPs Course via Triton
- Awesome Production Machine Learning [Great]
- ONNX (Open Neural Network Exchange)
- Tutorials for using ONNX
- MMdnn [Fantastic]
- Convert Full ImageNet Pre-trained Model from MXNet to PyTorch [Fantastic, & Full ImageNet model means the model trained on ~ 14M images]
- Mnist using caffe2
- Caffe2 C++ Tutorials and Examples
- Make Transfer Learning of SqueezeNet on Caffe2
- Build Basic program by using Caffe2 framework in C++
- ReactJS vs Angular5 vs Vue.js
- A comparison between Angular and React and their core languages
- A Guide to Becoming a Full-Stack Developer [Very Good Tutorial]
- Roadmap to becoming a web developer in 2018 [Very Good Repository]
- Modern Frontend Developer in 2018
- Roadmap to becoming a React developer in 2018
- 2019 UI and UX Design Trends [Good]
- Streamlit [The fastest way to build custom ML tools]
- Gradio [Good]
- Web Developer Monthly
- 23 Best React UI Component Frameworks
- 9 React Styled-Components UI Libraries for 2018
- 35 New Tools for UI Design
- 5 Tools To Speed Up Your App Development [Very Good]
- How to use ReactJS with Webpack 4, Babel 7, and Material Design
- Adobe Typekit [Great fonts, where you need them]
- Build A Real World Beautiful Web APP with Angular 6
- You Don't Know JS
- JavaScript Top 10 Articles
- Web Design with Adobe XD
- INSPINIA Bootstrap Web Theme
- A Learning Tracker for Front-End Developers
- The best front-end hacking cheatsheets — all in one place [Useful & Interesting]
- GUI-fying the Machine Learning Workflow (Machine Flow)
- Electron - Build cross platform desktop apps with JavaScript [Very Good]
- Opyrator - Turns Python functions into microservices with web API [Great]
- A First Look at PyScript: Python in the Web Browser [Interesting]
- PyTorch Mobile [Excellent]
- Mobile UI Design Trends In 2018
- ncnn - high-performance neural network inference framework optimized for the mobile platform [Useful]
- Alibaba - MNN
- Awesome Mobile Machine Learning
- EMDL - Embedded and Mobile Deep Learning
- Fritz - machine learning platform for iOS and Android
- TensorFlow Lite
- Tiny Machine Learning: The Next AI Revolution
- TLT - NVIDIA Transfer Learning Toolkit
- NVIDIA Jetson Inference [Great]
- Modern Backend Developer in 2018
- Deploying frontend applications — the fun way [Very Good]
- RabbitMQ [Message Broker Software]
- Celery [Distributed Task Queue]
- Kafka [Distributed Streaming Platform]
- Docker training with DockerMe
- Kubernetes - GitHub
- Deploy Machine Learning Pipeline on Google Kubernetes Engine
- An introduction to Kubernetes for Data Scientists
- Jenkins and Kubernetes with Docker Desktop
- Helm: The package manager for Kubernetes
- Create Cluster using docker swarm
- deepo - Docker Image for all DL Framewors
- Kubeflow [deployments of ML workflows on Kubernetes]
- kubespray - Deploy a Production Ready Kubernetes Cluster
- KFServing - Kubernetes for Serving ML Models
- Deploying a HuggingFace NLP Model with KFServing [Interesting]
- Seldon Core - Deploying Machine Learning Models on Kubernetes
- Seldon Core - GitHub
- Machine Learning: serving models with Kubeflow on Ubuntu, Part 1
- CoreWeave Kubernetes Cloud
- MLOps References [DevOps for ML]
- Data Version Control - DVC [Great]
- MLEM: package and deploy machine learning models
- PySyft - A library for encrypted, privacy preserving deep learning
- LocalStack - A fully functional local AWS cloud stack
- poetry: Python packaging and dependency management
- GPUtil
- py3nvml [Python 3 binding to the NVIDIA Management Library]
- PyCUDA - GitHub
- PyCUDA
- PyCUDA Tutorial
- setGPU
- Monitor your GPUs [Excellent]
- GPU-Burn - Multi-GPU CUDA stress test [Useful]
- Grafana - Monitoring and Observability [Excellent]
- Prometheus [Excellent for monitoring solution & extract required metrics]
- Numba - makes Python code fast
- Dask - natively scales Python
- What is Dask
- Ray - running distributed applications
- Neural Network Distiller [Distillation & Quantization of Deep Learning Models in PyTorch]
- PyTorch Pruning Tutorial
- Can you remove 99% of a neural network without losing accuracy? - An introduction to weight pruning
- PocketFlow - An Automatic Model Compression (AutoMC) framework [Great]
- Introducing the Model Optimization Toolkit for TensorFlow
- TensorFlow Model Optimization Toolkit — Post-Training Integer Quantization
- TensorFlow Post-training Quantization
- Dynamic Quantization in PyTorch
- Static Quantization in PyTorch
- NVIDIA DALI - highly optimized data pre-processing in deep learning
- Horovod - Distributed training framework
- ONNX Float32 to Float16
- Speeding Up Deep Learning Inference Using TensorRT
- Speed up Training
- Native PyTorch automatic mixed precision for faster training on NVIDIA GPUs
- JAX - Composable transformations of Python+NumPy programs
- TensorRTx - popular DL networks with tensorrt
- Speeding up Deep Learning Inference Using TensorFlow, ONNX, and TensorRT
- TensorRT Developer Guide
- How to Convert a Model from PyTorch to TensorRT and Speed Up Inference [Good]
- MLOps-Basics [Great]
- MLOPs-Zoomcamp [Great]
- A collection of resources to learn about MLOPs [Great]
- MLEM: package and deploy machine learning models
- DevOps Exercises
- MlOPs Sample Project
- prefect: Orchestrate and observe all of your workflows
- A Guide to Production Level Deep Learning
- Facebook Says Developers Will Love PyTorch 1.0
- Some PyTorch Workflow Changes
- wandb - A tool for visualizing and tracking your machine learning experiments
- PyTorch and Caffe2 repos getting closer together
- PyTorch or TensorFlow?
- Choosing a Deep Learning Framework in 2018: Tensorflow or Pytorch?
- Deep Learning War between PyTorch & TensorFlow
- Embedding Machine Learning Models to Web Apps (Part-1)
- Deploying deep learning models: Part 1 an overview
- Machine Learning in Production
- how you can get a 2–6x speed-up on your data pre-processing with Python
- Making your C library callable from Python
- MIL WebDNN
- Multi-GPU Framework Comparisons [Great]