forked from spotify/klio
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request spotify#97 from spotify/lynn/more-faqs
[docs] Add more introductory FAQs
- Loading branch information
Showing
12 changed files
with
144 additions
and
65 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
How are DAGs described within Klio? | ||
=================================== | ||
|
||
A DAG (directed acyclic graph) of streaming Klio jobs is defined in a job's ``klio-job.yaml`` configuration file. | ||
The :doc:`output </userguide/io/index>` of one job can be used as the input of another job. | ||
|
||
Learn more about how DAGs are used in Klio :doc:`here </userguide/anatomy/graph>`. | ||
|
||
Learn more about setting up a DAG of Klio jobs through configuration :doc:`here </userguide/config/job_config>`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
Does Klio take care of downloading input/uploading output data files to/from workers? | ||
===================================================================================== | ||
|
||
Currently, Klio has some :doc:`basic utilities </reference/audio/api/io>` for downloading and uploading audio to & from memory. | ||
We will certainly be building this out, but welcome contributions to the effect as well. | ||
Klio also provides :ref:`builtin transforms <pipeline-using-builtins>` to ensure media is not unnecessarily processed. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
How does Klio compare to Kubeflow? | ||
================================== | ||
|
||
`Kubeflow <https://www.kubeflow.org/docs/about/kubeflow/>`_ is a very powerful platform that uses `Kubernetes <https://kubernetes.io/>`_ under the hood to help construct workflows. | ||
Kubeflow allows its users to process data and to use that data to experiment & train ML models. | ||
|
||
|
||
On the other hand, Klio takes complex algorithms, whether a trained ML model or a media processing algorithm, and enables deployment within research or production pipelines with the focus on optimizing for heavy file I/O and its related resources. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
How does Klio relate to Spotify Scio? | ||
===================================== | ||
How does Klio compare to Spotify Scio? | ||
====================================== | ||
|
||
Both projects bring Apache Beam to new domains: `Scio <https://github.com/spotify/scio>`_ brings Beam pipelines to Scala, while Klio focuses Beam on analyzing, manipulating, and transforming large binary media (e.g. images, audio, video) where the content in its native form can’t really fit or be analyzed in a database in any meaningful way. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
How does Klio compare to Tensorflow Serving? | ||
============================================ | ||
|
||
`Tensorflow Serving <https://www.tensorflow.org/tfx/guide/serving>`_ enables creating a service around a Tensorflow-based ML model. | ||
Although a streaming Klio job could be compared to serving a model with a service, Klio is meant for media processing pipelines, not necessarily serving a model. | ||
Klio enables heavy file I/O for processing media, whether it's using a model or not. | ||
As well, Klio is agnostic to the type of ML model used (Tensorflow, PyTorch, scikit-learn, etc.). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
Does Klio allow you to use native Beam components? | ||
================================================== | ||
|
||
Yes, definitely. Klio's design is meant to enhance Beam's primitives (`Pipeline <https://beam.apache.org/documentation/programming-guide/#creating-a-pipeline>`_, `PCollections <https://beam.apache.org/documentation/programming-guide/#pcollections>`_, `Transforms <https://beam.apache.org/documentation/programming-guide/#transforms>`_, etc.). | ||
Write a Klio pipeline should feel very similar to writing a Beam pipeline. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
Why not other open source frameworks? | ||
===================================== | ||
|
||
There are a number of well-developed, supported data processing frameworks available in the open. | ||
At Spotify, we've standardized around `Apache Beam <https://beam.apache.org/>`_ with our sister open source framework, `Scio <https://spotify.github.io/scio/>`_. | ||
We've found that Beam is a framework that engineers and researchers alike can pick up quickly to create `embarrassingly parallel <https://en.wikipedia.org/wiki/Embarrassingly_parallel>`_ pipelines. | ||
But no solution yet existed to handle the resource and environment demands of processing media. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
What's the performance of a Klio pipeline? | ||
========================================== | ||
|
||
As a simple test, we've `downsampled <https://en.wikipedia.org/wiki/Downsampling_(signal_processing)>`_ 10s of millions of songs in :violetemph:`6 days` using 600 `n1-standard-16 <https://cloud.google.com/compute/docs/machine-types#n1_machine_types>`_ machines (16 vCPUs, 60GB memory). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
Can Klio be used for smaller loads for ongoing research? or just production loads? | ||
================================================================================== | ||
|
||
Klio is meant for processing media - it doesn't matter how big the collection of media files that it processes. | ||
It can be used on cloud infrastructure, or locally on one's computer. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters