This repository contains references and code for the book Distributed Machine Learning Patterns from Manning Publications by Yuan Tang.
🔥 The book is now available on the Manning Early Access Program (40% off with promo code mltang through July 10). You can read the book chapter-by-chapter while it's being written and get the final eBook as soon as it's finished. If you pre-order the pBook, you'll get it long before it's available in stores.
🔔 Stay tuned for any updates and announcements by following the author on Twitter and LinkedIn.
In Distributed Machine Learning Patterns you will learn how to:
- Apply patterns to build scalable and reliable machine learning systems.
- Construct machine learning pipelines with data ingestion, distributed training, model serving, and more.
- Automate machine learning tasks with Kubernetes, TensorFlow, Kubeflow, and Argo Workflows.
- Make trade off decisions between different patterns and approaches.
- Manage and monitor machine learning workloads at scale.
This book teaches you how to take machine learning models from your personal laptop to large distributed clusters. You’ll explore key concepts and patterns behind successful distributed machine learning systems, and learn technologies like TensorFlow, Kubernetes, Kubeflow, and Argo Workflows directly from a key maintainer and contributor. Real-world scenarios, hands-on projects, and clear, practical advice DevOps techniques and let you easily launch, manage, and monitor cloud-native distributed machine learning pipelines.
Scaling up models from personal devices to large distributed clusters is one of the biggest challenges faced by modern machine learning practitioners. Distributing machine learning systems allow developers to handle extremely large datasets across multiple clusters, take advantage of automation tools, and benefit from hardware accelerations. In this book, Yuan Tang shares patterns, techniques, and experience gained from years spent building and managing cutting-edge distributed machine learning infrastructure.
Distributed Machine Learning Patterns is filled with practical patterns for running machine learning systems on distributed Kubernetes clusters in the cloud. Each pattern is designed to help solve common challenges faced when building distributed machine learning systems, including supporting distributed model training, handling unexpected failures and dynamic model serving traffic. Real-world scenarios provide clear examples of how to apply each pattern, alongside the potential trade offs for each approach. Once you’ve mastered these cutting edge techniques, you’ll put them all into practice and finish up by building a comprehensive distributed machine learning system.
For data analysts, data scientists, and software engineers familiar with the basics of machine learning algorithms and running machine learning in production. Readers should be familiar with the basics of Bash, Python, and Docker.
Yuan Tang is a senior software engineer at Ant Group, where he works on AI infrastructure and AutoML platforms on Kubernetes. He is a key maintainer and contributor to many of the technologies used in this book, including co-chair of Kubeflow, top contributor of Argo Workflows, and committer of TensorFlow. He is the co-author of TensorFlow in Practice in Chinese, and author of the TensorFlow implementation of Dive into Deep Learning.