Skip to content

Commit

Permalink
[docs] Add FAQ on migrating from setup.py to FnAPI for packaging
Browse files Browse the repository at this point in the history
  • Loading branch information
fallonchen committed Jan 19, 2021
1 parent 1ce919f commit 1dee0f4
Show file tree
Hide file tree
Showing 3 changed files with 176 additions and 0 deletions.
4 changes: 4 additions & 0 deletions docs/src/faqs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ FAQs
custom_proto_def
dags_in_klio
migrate_from_fnapi
migrate_from_setup
setup_packaging


General
Expand Down Expand Up @@ -76,3 +78,5 @@ Technical
publish_kmsgs_from_non_klio_job
custom_proto_def
migrate_from_fnapi
migrate_from_setup
setup_packaging
2 changes: 2 additions & 0 deletions docs/src/faqs/migrate_from_fnapi.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _migrate-from-fnapi:

How Do I Migrate from FnAPI to ``setup.py``?
============================================

Expand Down
170 changes: 170 additions & 0 deletions docs/src/faqs/migrate_from_setup.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
.. _migrate-from-setup:

How Do I Migrate from ``setup.py`` to FnAPI?
============================================


The FnAPI (pronounced "fun API") is what allows Klio to use custom Docker images
on Dataflow workers.
However, it's still considered experimental.
The fully-supported way to run a job that has dependencies
(both Python and OS-level dependencies) is via `setup.py <https://beam.apache.org/documentation/
sdks/python-pipeline-dependencies>`_.

Below describes what changes need to be made to an existing job to move from
``setup.py`` to FnAPI.

Creating a new Klio job that does not use ``setup.py`` from the start via:

.. code-block:: console
$ klio job create --use-fnapi true
Note that this command above will create a job that uses FnAPI for packaging - you will not
need to follow any of the steps below. The steps below convert a job that uses ``setup.py`` for
packaging to one that uses FnAPI.

Required Setup/Changes
----------------------

Update: ``klio-job.yaml``
^^^^^^^^^^^^^^^^^^^^^^^^^

Under ``pipeline_options``:

1. **DELETE** the key ``setup_file``.
2. **ADD** the list key ``experiments`` under ``pipeline_options``, containing the item ``beam_fn_api``. Using the ``beam_fn_api`` experiment in conjunction with setting the ``worker_harness_container_image`` tells Klio and Dataflow to use FnAPI rather than the setup file to package the job.

.. collapsible:: Minimal Example ``klio-job.yaml``

.. code-block:: yaml
job_name: my-job
pipeline_options:
# NOTE! setup_file is absent
experiments:
- beam_fn_api
worker_harness_container_image: gcr.io/my-project/my-job-image
runner: DataflowRunner
# <-- snip -->
Update: ``Dockerfile``
^^^^^^^^^^^^^^^^^^^^^^

Required Changes
~~~~~~~~~~~~~~~~


1. **MOVE** the ``COPY`` line that copies ``job-requirements.txt`` into the image ahead of the rest of the lines that copy in Python files.

2. **UPDATE** ``RUN pip install .`` to ``RUN pip install -r job-requirements.txt``

.. collapsible:: Why is this needed?

We now install dependencies directly on the worker image.

3. **MOVE** ``RUN pip install -r job-requirements.txt`` to the line right after the one from step 1, that copies in ``job-requirements.txt``.

.. collapsible:: Why is this needed?

This is done as an image build optimization - since your job's Python files are more likely to change than the dependencies in `job-requirements.txt`, it is more efficient install them first.

4. **ADD** any system-level dependencies using ``RUN apt-get update && apt-get install ...``.

.. collapsible:: Why is this needed?

These dependencies were previously installed through specifying them in ``setup.py`` and running ``pip install .``. They now need to be installed directly on the worker image for your Klio job to use.

.. collapsible:: Example of Required Changes

.. code-block:: diff
FROM apache/beam_python3.6_sdk:2.24.0
WORKDIR /usr/src/app
ENV GOOGLE_CLOUD_PROJECT my-project \
PYTHONPATH /usr/src/app
+ RUN apt-get update && apt-get install my-package
RUN pip install --upgrade pip setuptools
+ COPY job-requirements.txt job-requirements.txt
+ RUN pip install -r job-requirements.txt
COPY __init__.py \
run.py \
transforms.py \
- job-requirements.txt \
/usr/src/app/
- RUN pip install .
Suggested Changes
~~~~~~~~~~~~~~~~~

The following is a collection of suggested changes to optimize Docker builds by removing no longer used layers and to closer mimic the runtime environment on Dataflow.

.. caution::

**Most of these changes are incompatible with using setup.py.**

The following changes will break your job if you return to using ``setup.py`` to package your dependencies. If you choose to switch back, simply undo these deletions.

* **DELETE** lines copying ``MANIFEST.in`` and ``setup.py`` since they are no longer used. If you remove those files from your job directory without also editing your the copy commands out of your Dockerfile, your build will break.

.. collapsible:: Example of Suggested Changes

.. code-block:: diff
FROM apache/beam_python3.6_sdk:2.24.0
WORKDIR /usr/src/app
ENV GOOGLE_CLOUD_PROJECT my-project \
PYTHONPATH /usr/src/app
RUN pip install --upgrade pip setuptools
COPY __init__.py \
- setup.py \
- MANIFEST.in \
klio-job.yaml \
run.py \
transforms.py \
job-requirements.txt \
/usr/src/app/
RUN pip install .
.. collapsible:: Combined Example of Required & Suggested Changes

.. code-block:: diff
FROM apache/beam_python3.6_sdk:2.24.0
WORKDIR /usr/src/app
ENV GOOGLE_CLOUD_PROJECT my-project \
PYTHONPATH /usr/src/app
+ RUN apt-get update && apt-get install my-package
RUN pip install --upgrade pip setuptools
+ COPY job-requirements.txt job-requirements.txt
+ RUN pip install -r job-requirements.txt
COPY __init__.py \
- setup.py \
- MANIFEST.in \
klio-job.yaml \
run.py \
transforms.py \
- job-requirements.txt \
/usr/src/app/
- RUN pip install .

0 comments on commit 1dee0f4

Please sign in to comment.