Apache Submarine (Submarine for short) is the ONE PLATFORM
to allow Data Scientists to create end-to-end machine learning workflow. ONE PLATFORM
means it supports Data Scientists to finish their jobs on the same platform without frequently switching their toolsets. From dataset exploring data pipeline creation, model training (experiments), and push model to production (model serving and monitoring). All these steps can be completed within the ONE PLATFORM
.
There're already a lot of open-source and comericial projects are trying to create a end-to-end machine-learning/deep-learning platform, what's the vision of Submarine?
- Existing products lack of good User-Interface (API, SDK, etc) to run training workload at scale, repeatable and easy for data scientist to understand on cloud/premise.
- Data-Scientist want to focus on domain-specific target (e.g. improve Click-Through-Rate), however available products always give user a platform (a SDK to run distributed Pytorch script).
- Many products provided functionalities to do data exploring, model training, and serving/monitoring. However these functionalities are largely disconnected with each other. And cannot organically work with each other.
Theodore Levitt once said:
“People don’t want to buy a quarter-inch drill. They want a quarter-inch hole.”
- Can run experiment (training jobs) on prem, on cloud. Via easy-to-use User-Interfaces
- Easy for Data-Scientist (DS) to manage training code and dependencies (Docker, Python Dependencies, etc.) .
- ML-focused APIs to run/track experiment from Python SDK (notebook), REST API, and CLI.
- Provide APIs to run training jobs by using popular frameworks (Standalone/Distributed TensorFlow/PyTorch/Hovorod).
- Pre-packaged Training Template for Data-Scientists to focus on domain-specific tasks (like using DeepFM to build a CTR prediction model).
- Support GPU and other compute speed-up devides.
- Support running on K8s/YARN or other resource management system.
- Pipeline is also on the backlog, we will look into pipeline for training in the future.
- Submarine is target to provide notebook service, which allows users to create/edit/delete a notebook instance (such as a Jupyter notebook) running on the cluster.
- Users can submit experiement, manage models using Submarine SDK.
- Model management for model-serving/versioning/monitoring is on the roadmap.
Like mentioned above, Submarine is targeted to bring Data-Scientist-friendly user-interfaces to make their life easier. Here're some examples of Submarine user-interfaces.
<FIXME: Add/FIX more contents below>
(Available on 0.6.0, see Roadmap)
If you want to knwow more about Submarine's architecture, components, requirements and design doc, they can be found on Architecture-and-requirement
Detailed design documentation, implementation notes can be found at: Implementation notes
Read the Apache Submarine Community Guide
How to contribute Contributing Guide
Issue Tracking: https://issues.apache.org/jira/projects/SUBMARINE
See Developper Guide Home Page
What to know more about what's coming for Submarine? Please check the roadmap out: https://cwiki.apache.org/confluence/display/SUBMARINE/Roadmap
The Apache Submarine project is licensed under the Apache 2.0 License. See the LICENSE file for details.