Skip to content

Latest commit

 

History

History
204 lines (147 loc) · 10.1 KB

20191127-pip-structure.md

File metadata and controls

204 lines (147 loc) · 10.1 KB

Improved pip package structure

Status Implemented
RFC # 182
Author(s) Anna Revinskaya ([email protected])
Sponsor Alex Passos ([email protected])
Updated 2020-02-04

Objective

We propose to simplify TensorFlow pip package structure to enable IDE features such as autocomplete, jump-to-definition and quick-documentation.

Motivation

Current package structure

TensorFlow package structure has grown quite complex over time as we started to support multiple versions (1.x and 2.x) and import external sub-packages (such as tensorflow_estimator and tensorboard). This complexity is expected to grow if we split out more components into separate pip packages.

Sources of complexity:

  • Versioning: tensorflow_core API lives under _api/v1 or _api/v2 directory depending on the version.
  • Virtual pip package: Installing TensorFlow actually installs 2 directories: tensorflow/ and tensorflow_core/ under site-packages/. TensorFlow code lives under tensorflow_core/. TensorFlow uses lazy loading to import everything from tensorflow_core/ to tensorflow/. Two-directory structure helps work-around circular imports caused by tensorflow_estimator.

Outline of the current structure:

tensorflow
    __init__.py (contains "from tensorflow_core import *")

tensorflow_core
    python/...
    lite/...
    _api/v2
        __init__.py
        audio/__init__.py
        autograph/__init__.py
        ...

Rationale behind current package structure

Multiple version support

To prepare for TensorFlow 2.0 launch, we added a way to build two versions: 1.x and 2.x. Each version has its own respective genrule that outputs file for 1.x or 2.x since API modules are different (for e.g. tensorflow/manip/__init__.py only exists in 1.x and not 2.x API). Now, bazel does not allow two genrules to output files to the same directory. Therefore, we have _api/v1/ and _api/v2/ subdirectories.

Note that we could still place the API directly under tensorflow/ in the pip package since a pip package contains a single version of TensorFlow. This option became out of reach when tensorflow/contrib/lite/ was migrated to tensorflow/lite/. Now tensorflow/lite/ API directory would conflict with tensorflow/lite/ source directory if the API was under tensorflow/ instead of _api/vN/.

Circular dependencies

Estimator depends on TensorFlow. At the same time, TensorFlow includes estimator as a part of its API. This creates a cycle.

alt_text

Metapackage vs base package plans

Modular TensorFlow RFC proposes to keep two pip packages: tensorflow-base would only contain core TensorFlow (for e.g. no estimator). TensorFlow Metapackage would be a thin package defining composition of TensorFlow which includes base, estimator, keras and tensorboard. Note that this 2-package approach is not implemented yet. However, its proposal demonstrates how keeping a virtual pip package could be beneficial in the future.

alt_text

Current structure looks more like this (except tensorflow/ and tensorflow_core/ are directories as opposed to separate pip packages) and meant to be the first step towards structure above:

alt_text

Current state of IDE code features

PyCharm 2019.1.1

  • Autocomplete:
    • Works in most cases after switching to use relative imports.
    • Doesn’t work for tf.compat.v1.keras and tf.compat.v2.keras.
    • Doesn’t work for keras if importing it using from import (i.e. from tensorflow import keras).
  • Jump-to-definition doesn’t work.
  • Quick documentation doesn’t work.

PyCharms with 2019.3 EAP build 193.3793.14

Latest version of PyCharms added custom handling for tensorflow

  • Autocomplete works in most cases.
  • Doesn’t work for keras if importing it using from import (i.e. from tensorflow import keras).
  • Jump-to-definition works.
  • Quick documentation works.

VS Code 1.40 (October 2019 release)

  • Autocomplete:
    • Works in most cases.
    • Doesn’t work for tf.estimator or tf.keras.
    • Doesn’t work for tf.compat.v1.keras and tf.compat.v2.keras.
    • Doesn’t work for keras if importing it using from import (i.e. from tensorflow import keras).
  • Jump-to-definition doesn’t work.
  • Quick documentation doesn’t work.

User Benefit

TensorFlow package structure creates difficulties for those who use IDEs. Autocomplete, quick documentation and jump-to-definition features often rely on module structure matching directory structure. For example, TensorFlow code uses from tensorflow.foo imports but lives under tensorflow_core package. Simplifying package structure would improve productivity for TensorFlow users.

Design Proposal

The best way I can think of to fix the autocomplete issues is to make our package structure as clean as possible. In this case, autocomplete will work out of the box.

Short term: Remove virtual pip package

Primary purpose of keeping the virtual pip package is to workaround circular estimator imports. Alternatively, we can resolve this issue by lazy loading estimator.

Estimator import in root __init__.py file:

from tensorflow.python.util.lazy_loader import LazyLoader as _LazyLoader
estimator = _LazyLoader(
    "estimator", globals(),
    "tensorflow_estimator.python.estimator.api._v2.estimator")
setattr(_current_module, "estimator", estimator)

Lazy loading by itself would mean that we no longer have autocomplete for estimator. As a workaround, we can import estimator without lazy loading if typing.TYPE_CHECKING is True.

After building a pip package with this change all of the following work in PyCharms (both released and EAP) and VS Code:

  • jump-to-definition
  • quick documentation
  • autocomplete for compat.v1.keras, compat.v2.keras
  • autocomplete for keras when using from tensorflow import keras
  • ...basically any import I tested works with autocompletion

To support the TensorFlow Metapackage plans we could add a new pip package that specifies dependencies on tensorflow, tensorflow_estimator, tensorboard, etc.. Its sole purpose would be to get all dependencies installed.

alt_text

Long term (optional): Import from external package directly

Short term would fix IDE issues, but the package structure is still not as clean as it could be. We resolve cycles with lazy loading but it would be even better not to have this circular structure at all.

Therefore, I propose that we don’t import external packages into TensorFlow 3.0. Users who want to use estimator, tensorboard summaries or keras could import them separately:

Current code that looks like:

import tensorflow as tf

tf.estimator
tf.keras
tf.summary

Would be changed to:

import tensorflow as tf
import tensorflow_estimator as estimator
import keras
from tenosorboard import summaries

Rationale for this change:

  • One way dependencies (estimator depends on tensorflow and not vise-versa).
  • Minimal overhead for users. Adding an extra import is easy.

Note that this change cannot be done in TensorFlow 2.x due to API guarantees. Also, accessing these packages from tf. would match familiar workflows. Therefore, we can keep tf.estimator, tf.keras (once it is moved out of TensorFlow), tf.summary available as an alternative to importing pip package directly. This would require some work to make sure these packages contain the right API (for e.g. tensorflow_estimator.estimator currently always contains V1 API).

Alternatives Considered

Alternatively, we could solve IDE autocomplete issues by changing all imports in TensorFlow to import from tensorflow_core instead of tensorflow.

Advantages:

  • Keep supporting external libraries included as a sub-namespace, for e.g. tf.estimator.

Disadvantages:

  • This is a more invasive change since it requires updating every Python file in TensorFlow. It would also mean that external packages such as tensorflow_estimator need to use imports of the form from tensorflow_core instead of from tensorflow.

The main proposal in this document seems simpler to me (it removes complexity instead of adding it) and therefore preferred.

Performance Implications

I am not expecting major performance changes since this is just a package structure proposal.

Dependencies

This proposal does not add new dependencies. The rest of the proposal largely describes how we plan to handle dependencies.

Engineering Impact

We don't expect changes to binary size / startup time / build time / test time.

Platforms and Environments

This should work on all platforms and we will test it to make sure.

Best Practices, Tutorials and Examples

There are no user-visible changes other than fixes to enable IDE features.

Compatibility

Short term proposal does not have any compatibility concerns. Long term, however, proposes to remove tf.estimator, etc.. which is not a backwards compatible change. We can only make this change at the next major release.

User Impact

There are no user-visible changes other than fixes to enable IDE features.