Skip to content

Commit

Permalink
Hypothesis strategy for generating Variable objects (pydata#8404)
Browse files Browse the repository at this point in the history
* copied files defining strategies over to this branch

* placed testing functions in their own directory

* moved hypothesis strategies into new testing directory

* begin type hinting strategies

* renamed strategies for consistency with hypothesis conventions

* added strategies to public API (with experimental warning)

* strategies for chunking patterns

* rewrote variables strategy to have same signature as Variable constructor

* test variables strategy

* fixed most tests

* added helpers so far to API docs

* add hypothesis to docs CI env

* add todo about attrs

* draft of new user guide page on testing

* types for dataarrays strategy

* draft for chained chunking example

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* only accept strategy objects

* fixed failure with passing in two custom strategies that must be compatible

* syntax error in example

* allow sizes dict as argument to variables

* copied subsequences_of strategy

* coordinate_variables generates non-dimensional coords

* dataarrays strategy given nothing working!

* improved docstrings

* datasets strategy works (given nothing)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* pass dims or data to dataarrays() strategy

* importorskip hypothesis in tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added warning about inefficient example generation

* remove TODO about deterministic examples in docs

* un-restrict names strategy

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed convert kwarg

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* avoid using subsequences_of

* refactored into separate function for unique subset of dims

* removed subsequences_of

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix draw(st.booleans())

* remove all references to chunking until chunks strategy merged upstream in dask

* added example of complicated strategy for dims dict

* remove superfluous utils file

* removed elements strategy

* removed np_arrays strategy from public API

* min_ndims -> min_dims

* forbid non-matching dims and data completely

* simple test for data_variables strategy

* passing arguments to datasets strategy

* whatsnew

* add attrs strategy

* autogenerate attrs for all objects

* attempt to make attrs strategy quicker

* extend deadline

* attempt to speed up attrs strategy

* promote all strategies to be functions

* valid_dtypes -> numeric_dtypes

* changed hypothesis error type

* make all strategies keyword-arg only

* min_length -> min_side

* correct error type

* remove coords kwarg

* test different types of coordinates are sometimes generated

* zip dict

Co-authored-by: Zac Hatfield-Dodds <[email protected]>

* add dim_names kwarg to dimension_sizes strategy

* return a dict from _alignable_variables

* add coord_names arg to coordinate_variables strategy

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change typing of dims arg

* support dims as list to datasets strat when data not given

* put coord and data var generation in optional branch to try to improve shrinking

* improve simple test example

* add documentation on creating duck arrays

* okexcept for sparse examples

* fix sparse dataarrays example

* todo about building a duck array dataset

* fix imports and cross-links

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add hypothesis library to intersphinx mapping

* fix many links

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed all local mypy errors

* move numpy strategies import

* reduce sizes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix some api links in docs

* remove every strategy beyond variables

* variable strategy now accepts callable generating array strategies

* use only readable unicode characters in names

* examples

* only use unicode characters that docs can deal with

* docs: dataarrays -> variables

* update tests for variables strategy

* test values in attrs dict

* duck array type examples

* altered whatsnew

* maybe fix mypy

* fix some mypy errors

* more typing changes

* fix import

* skip doctests in docstrings

* fix link to duckarrays page

* don't actually try to run cupy in docs env

* missed a skip

* okwarning

* just remove the cupy example

* ensure shape is always passed to array_strategy_fn

* test using make_strategies_namespace

* test catching array_strategy_fn that returns different dtype

* test catching array_strategy_fn that returns different shape

* generalise test of attrs strategy

* remove misguided comments

* save working version of test_mean

* expose unique_subset_of

* generalize unique_subset_of to handle iterables

* type hint unique_subset_of using overloads

* use iterables in test_mean example

* test_mean example in docs now uses iterable of dimension_names

* fix some warnings in docs build

* example of passing list to unique_subset_of

* fix import in docs page

* try to satisfy sphinx

* Minor corrections to docs

* Add supported_dtypes to list of public strategies in docs

* Generate number of dimensions in test_given_arbitrary_dims_list

Co-authored-by: Zac Hatfield-Dodds <[email protected]>

* Update minimum version of hypothesis

Co-authored-by: Zac Hatfield-Dodds <[email protected]>

* fix incorrect indentation in autosummary

* link to docs page on testing

* use warning imperative for array API non-compliant dtypes

* fix bugs in sparse examples

* add tag for array API standard info

* move no-dependencies-on-other-values-inputs to given decorator

* generate everything that can be generated

* fix internal link to page on strategies

* split up TypeError messages for each arg

* use hypothesis.errors.InvalidArgument

* generalize tests for generating specific number of dimensions

* fix some typing errors

* test that reduction example in docs actually works

* fix typing errors

* simply generation of sparse arrays in example

* fix impot in docs example

* correct type hints in sparse example

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Use .copy in convert_to_sparse

Co-authored-by: Justus Magin <[email protected]>

* Use st.builds in sparse example

Co-authored-by: Justus Magin <[email protected]>

* correct intersphinx link in whatsnew

* rename module containing assertion functions

* clarify sentence

* add general ImportError if hypothesis not installed

* add See Also link to strategies docs page from docstring of every strategy

* typo in ImportError message

* remove extra blank lines in examples

* remove smallish_arrays

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Zac Hatfield-Dodds <[email protected]>
Co-authored-by: Justus Magin <[email protected]>
  • Loading branch information
4 people authored Dec 5, 2023
1 parent 1f94829 commit ab6a255
Show file tree
Hide file tree
Showing 14 changed files with 1,077 additions and 10 deletions.
1 change: 1 addition & 0 deletions ci/requirements/doc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ dependencies:
- cartopy
- cfgrib
- dask-core>=2022.1
- hypothesis>=6.75.8
- h5netcdf>=0.13
- ipykernel
- ipywidgets # silence nbsphinx warning
Expand Down
21 changes: 21 additions & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1069,6 +1069,27 @@ Testing
testing.assert_allclose
testing.assert_chunks_equal

Hypothesis Testing Strategies
=============================

.. currentmodule:: xarray

See the :ref:`documentation page on testing <testing.hypothesis>` for a guide on how to use these strategies.

.. warning::
These strategies should be considered highly experimental, and liable to change at any time.

.. autosummary::
:toctree: generated/

testing.strategies.supported_dtypes
testing.strategies.names
testing.strategies.dimension_names
testing.strategies.dimension_sizes
testing.strategies.attrs
testing.strategies.variables
testing.strategies.unique_subset_of

Exceptions
==========

Expand Down
1 change: 1 addition & 0 deletions doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -326,6 +326,7 @@
"dask": ("https://docs.dask.org/en/latest", None),
"cftime": ("https://unidata.github.io/cftime", None),
"sparse": ("https://sparse.pydata.org/en/latest/", None),
"hypothesis": ("https://hypothesis.readthedocs.io/en/latest/", None),
"cubed": ("https://tom-e-white.com/cubed/", None),
"datatree": ("https://xarray-datatree.readthedocs.io/en/latest/", None),
"xarray-tutorial": ("https://tutorial.xarray.dev/", None),
Expand Down
2 changes: 2 additions & 0 deletions doc/internals/duck-arrays-integration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@ property needs to obey `numpy's broadcasting rules <https://numpy.org/doc/stable
(see also the `Python Array API standard's explanation <https://data-apis.org/array-api/latest/API_specification/broadcasting.html>`_
of these same rules).

.. _internals.duckarrays.array_api_standard:

Python Array API standard support
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
1 change: 1 addition & 0 deletions doc/user-guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,5 @@ examples that describe many common tasks that you can accomplish with xarray.
dask
plotting
options
testing
duckarrays
303 changes: 303 additions & 0 deletions doc/user-guide/testing.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,303 @@
.. _testing:

Testing your code
=================

.. ipython:: python
:suppress:
import numpy as np
import pandas as pd
import xarray as xr
np.random.seed(123456)
.. _testing.hypothesis:

Hypothesis testing
------------------

.. note::

Testing with hypothesis is a fairly advanced topic. Before reading this section it is recommended that you take a look
at our guide to xarray's :ref:`data structures`, are familiar with conventional unit testing in
`pytest <https://docs.pytest.org/>`_, and have seen the
`hypothesis library documentation <https://hypothesis.readthedocs.io/>`_.

`The hypothesis library <https://hypothesis.readthedocs.io/>`_ is a powerful tool for property-based testing.
Instead of writing tests for one example at a time, it allows you to write tests parameterized by a source of many
dynamically generated examples. For example you might have written a test which you wish to be parameterized by the set
of all possible integers via :py:func:`hypothesis.strategies.integers()`.

Property-based testing is extremely powerful, because (unlike more conventional example-based testing) it can find bugs
that you did not even think to look for!

Strategies
~~~~~~~~~~

Each source of examples is called a "strategy", and xarray provides a range of custom strategies which produce xarray
data structures containing arbitrary data. You can use these to efficiently test downstream code,
quickly ensuring that your code can handle xarray objects of all possible structures and contents.

These strategies are accessible in the :py:mod:`xarray.testing.strategies` module, which provides

.. currentmodule:: xarray

.. autosummary::

testing.strategies.supported_dtypes
testing.strategies.names
testing.strategies.dimension_names
testing.strategies.dimension_sizes
testing.strategies.attrs
testing.strategies.variables
testing.strategies.unique_subset_of

These build upon the numpy and array API strategies offered in :py:mod:`hypothesis.extra.numpy` and :py:mod:`hypothesis.extra.array_api`:

.. ipython:: python
import hypothesis.extra.numpy as npst
Generating Examples
~~~~~~~~~~~~~~~~~~~

To see an example of what each of these strategies might produce, you can call one followed by the ``.example()`` method,
which is a general hypothesis method valid for all strategies.

.. ipython:: python
import xarray.testing.strategies as xrst
xrst.variables().example()
xrst.variables().example()
xrst.variables().example()
You can see that calling ``.example()`` multiple times will generate different examples, giving you an idea of the wide
range of data that the xarray strategies can generate.

In your tests however you should not use ``.example()`` - instead you should parameterize your tests with the
:py:func:`hypothesis.given` decorator:

.. ipython:: python
from hypothesis import given
.. ipython:: python
@given(xrst.variables())
def test_function_that_acts_on_variables(var):
assert func(var) == ...
Chaining Strategies
~~~~~~~~~~~~~~~~~~~

Xarray's strategies can accept other strategies as arguments, allowing you to customise the contents of the generated
examples.

.. ipython:: python
# generate a Variable containing an array with a complex number dtype, but all other details still arbitrary
from hypothesis.extra.numpy import complex_number_dtypes
xrst.variables(dtype=complex_number_dtypes()).example()
This also works with custom strategies, or strategies defined in other packages.
For example you could imagine creating a ``chunks`` strategy to specify particular chunking patterns for a dask-backed array.

Fixing Arguments
~~~~~~~~~~~~~~~~

If you want to fix one aspect of the data structure, whilst allowing variation in the generated examples
over all other aspects, then use :py:func:`hypothesis.strategies.just()`.

.. ipython:: python
import hypothesis.strategies as st
# Generates only variable objects with dimensions ["x", "y"]
xrst.variables(dims=st.just(["x", "y"])).example()
(This is technically another example of chaining strategies - :py:func:`hypothesis.strategies.just()` is simply a
special strategy that just contains a single example.)

To fix the length of dimensions you can instead pass ``dims`` as a mapping of dimension names to lengths
(i.e. following xarray objects' ``.sizes()`` property), e.g.

.. ipython:: python
# Generates only variables with dimensions ["x", "y"], of lengths 2 & 3 respectively
xrst.variables(dims=st.just({"x": 2, "y": 3})).example()
You can also use this to specify that you want examples which are missing some part of the data structure, for instance

.. ipython:: python
# Generates a Variable with no attributes
xrst.variables(attrs=st.just({})).example()
Through a combination of chaining strategies and fixing arguments, you can specify quite complicated requirements on the
objects your chained strategy will generate.

.. ipython:: python
fixed_x_variable_y_maybe_z = st.fixed_dictionaries(
{"x": st.just(2), "y": st.integers(3, 4)}, optional={"z": st.just(2)}
)
fixed_x_variable_y_maybe_z.example()
special_variables = xrst.variables(dims=fixed_x_variable_y_maybe_z)
special_variables.example()
special_variables.example()
Here we have used one of hypothesis' built-in strategies :py:func:`hypothesis.strategies.fixed_dictionaries` to create a
strategy which generates mappings of dimension names to lengths (i.e. the ``size`` of the xarray object we want).
This particular strategy will always generate an ``x`` dimension of length 2, and a ``y`` dimension of
length either 3 or 4, and will sometimes also generate a ``z`` dimension of length 2.
By feeding this strategy for dictionaries into the ``dims`` argument of xarray's :py:func:`~st.variables` strategy,
we can generate arbitrary :py:class:`~xarray.Variable` objects whose dimensions will always match these specifications.

Generating Duck-type Arrays
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Xarray objects don't have to wrap numpy arrays, in fact they can wrap any array type which presents the same API as a
numpy array (so-called "duck array wrapping", see :ref:`wrapping numpy-like arrays <internals.duckarrays>`).

Imagine we want to write a strategy which generates arbitrary ``Variable`` objects, each of which wraps a
:py:class:`sparse.COO` array instead of a ``numpy.ndarray``. How could we do that? There are two ways:

1. Create a xarray object with numpy data and use the hypothesis' ``.map()`` method to convert the underlying array to a
different type:

.. ipython:: python
import sparse
.. ipython:: python
def convert_to_sparse(var):
return var.copy(data=sparse.COO.from_numpy(var.to_numpy()))
.. ipython:: python
sparse_variables = xrst.variables(dims=xrst.dimension_names(min_dims=1)).map(
convert_to_sparse
)
sparse_variables.example()
sparse_variables.example()
2. Pass a function which returns a strategy which generates the duck-typed arrays directly to the ``array_strategy_fn`` argument of the xarray strategies:

.. ipython:: python
def sparse_random_arrays(shape: tuple[int]) -> sparse._coo.core.COO:
"""Strategy which generates random sparse.COO arrays"""
if shape is None:
shape = npst.array_shapes()
else:
shape = st.just(shape)
density = st.integers(min_value=0, max_value=1)
# note sparse.random does not accept a dtype kwarg
return st.builds(sparse.random, shape=shape, density=density)
def sparse_random_arrays_fn(
*, shape: tuple[int, ...], dtype: np.dtype
) -> st.SearchStrategy[sparse._coo.core.COO]:
return sparse_random_arrays(shape=shape)
.. ipython:: python
sparse_random_variables = xrst.variables(
array_strategy_fn=sparse_random_arrays_fn, dtype=st.just(np.dtype("float64"))
)
sparse_random_variables.example()
Either approach is fine, but one may be more convenient than the other depending on the type of the duck array which you
want to wrap.

Compatibility with the Python Array API Standard
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Xarray aims to be compatible with any duck-array type that conforms to the `Python Array API Standard <https://data-apis.org/array-api/latest/>`_
(see our :ref:`docs on Array API Standard support <internals.duckarrays.array_api_standard>`).

.. warning::

The strategies defined in :py:mod:`testing.strategies` are **not** guaranteed to use array API standard-compliant
dtypes by default.
For example arrays with the dtype ``np.dtype('float16')`` may be generated by :py:func:`testing.strategies.variables`
(assuming the ``dtype`` kwarg was not explicitly passed), despite ``np.dtype('float16')`` not being in the
array API standard.

If the array type you want to generate has an array API-compliant top-level namespace
(e.g. that which is conventionally imported as ``xp`` or similar),
you can use this neat trick:

.. ipython:: python
:okwarning:
from numpy import array_api as xp # available in numpy 1.26.0
from hypothesis.extra.array_api import make_strategies_namespace
xps = make_strategies_namespace(xp)
xp_variables = xrst.variables(
array_strategy_fn=xps.arrays,
dtype=xps.scalar_dtypes(),
)
xp_variables.example()
Another array API-compliant duck array library would replace the import, e.g. ``import cupy as cp`` instead.

Testing over Subsets of Dimensions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A common task when testing xarray user code is checking that your function works for all valid input dimensions.
We can chain strategies to achieve this, for which the helper strategy :py:func:`~testing.strategies.unique_subset_of`
is useful.

It works for lists of dimension names

.. ipython:: python
dims = ["x", "y", "z"]
xrst.unique_subset_of(dims).example()
xrst.unique_subset_of(dims).example()
as well as for mappings of dimension names to sizes

.. ipython:: python
dim_sizes = {"x": 2, "y": 3, "z": 4}
xrst.unique_subset_of(dim_sizes).example()
xrst.unique_subset_of(dim_sizes).example()
This is useful because operations like reductions can be performed over any subset of the xarray object's dimensions.
For example we can write a pytest test that tests that a reduction gives the expected result when applying that reduction
along any possible valid subset of the Variable's dimensions.

.. code-block:: python
import numpy.testing as npt
@given(st.data(), xrst.variables(dims=xrst.dimension_names(min_dims=1)))
def test_mean(data, var):
"""Test that the mean of an xarray Variable is always equal to the mean of the underlying array."""
# specify arbitrary reduction along at least one dimension
reduction_dims = data.draw(xrst.unique_subset_of(var.dims, min_size=1))
# create expected result (using nanmean because arrays with Nans will be generated)
reduction_axes = tuple(var.get_axis_num(dim) for dim in reduction_dims)
expected = np.nanmean(var.data, axis=reduction_axes)
# assert property is always satisfied
result = var.mean(dim=reduction_dims).data
npt.assert_equal(expected, result)
4 changes: 4 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,10 @@ v2023.11.1 (unreleased)
New Features
~~~~~~~~~~~~

- Added hypothesis strategies for generating :py:class:`xarray.Variable` objects containing arbitrary data, useful for parametrizing downstream tests.
Accessible under :py:mod:`testing.strategies`, and documented in a new page on testing in the User Guide.
(:issue:`6911`, :pull:`8404`)
By `Tom Nicholas <https://github.com/TomNicholas>`_.
- :py:meth:`rolling` uses numbagg <https://github.com/numbagg/numbagg>`_ for
most of its computations by default. Numbagg is up to 5x faster than bottleneck
where parallelization is possible. Where parallelization isn't possible — for
Expand Down
3 changes: 2 additions & 1 deletion xarray/core/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,8 @@ def copy(

# Temporary placeholder for indicating an array api compliant type.
# hopefully in the future we can narrow this down more:
T_DuckArray = TypeVar("T_DuckArray", bound=Any)
T_DuckArray = TypeVar("T_DuckArray", bound=Any, covariant=True)


ScalarOrArray = Union["ArrayLike", np.generic, np.ndarray, "DaskArray"]
VarCompatible = Union["Variable", "ScalarOrArray"]
Expand Down
Loading

0 comments on commit ab6a255

Please sign in to comment.