Skip to content

Commit

Permalink
Examples added to docstrings (pydata#7936)
Browse files Browse the repository at this point in the history
* xarray.dataset.tail

* xarray.dataset.head

* xarray.dataset.dropna

* xarray.dataset.ffill

* xarray.dataset.bfill

* xarray.dataset.set_Coords

* xarray.dataset.reset_coords

* indentation changes

* indentation

* reset_coords example

* tail_edited

* bfill change

* change

* changes

* indented

* indented

* minute_changes

* doctest failure change

* changes_

* change

* change

* change

* indented

* .

* what's new

* .

* head & tail

* bfill & ffill

* dropna

* head & tail

* doctest

* doctest error

* Update xarray/core/dataset.py

Co-authored-by: Tom Nicholas <[email protected]>

* .

* Fix doctest

---------

Co-authored-by: Tom Nicholas <[email protected]>
  • Loading branch information
harshitha1201 and TomNicholas authored Jul 11, 2023
1 parent 743486f commit 799f12d
Show file tree
Hide file tree
Showing 2 changed files with 316 additions and 2 deletions.
3 changes: 3 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,9 @@ Bug fixes
Documentation
~~~~~~~~~~~~~

- Added examples to docstrings of :py:meth:`Dataset.tail`, :py:meth:`Dataset.head`, :py:meth:`Dataset.dropna`,
:py:meth:`Dataset.ffill`, :py:meth:`Dataset.bfill`, :py:meth:`Dataset.set_coords`, :py:meth:`Dataset.reset_coords`
(:issue:`6793`, :pull:`7936`) By `Harshitha <https://github.com/harshitha1201>`_ .
- Added page on wrapping chunked numpy-like arrays as alternatives to dask arrays.
(:pull:`7951`) By `Tom Nicholas <https://github.com/TomNicholas>`_.
- Expanded the page on wrapping numpy-like "duck" arrays.
Expand Down
315 changes: 313 additions & 2 deletions xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -1742,6 +1742,33 @@ def set_coords(self: T_Dataset, names: Hashable | Iterable[Hashable]) -> T_Datas
names : hashable or iterable of hashable
Name(s) of variables in this dataset to convert into coordinates.
Examples
--------
>>> dataset = xr.Dataset(
... {
... "pressure": ("time", [1.013, 1.2, 3.5]),
... "time": pd.date_range("2023-01-01", periods=3),
... }
... )
>>> dataset
<xarray.Dataset>
Dimensions: (time: 3)
Coordinates:
* time (time) datetime64[ns] 2023-01-01 2023-01-02 2023-01-03
Data variables:
pressure (time) float64 1.013 1.2 3.5
>>> dataset.set_coords("pressure")
<xarray.Dataset>
Dimensions: (time: 3)
Coordinates:
pressure (time) float64 1.013 1.2 3.5
* time (time) datetime64[ns] 2023-01-01 2023-01-02 2023-01-03
Data variables:
*empty*
On calling ``set_coords`` , these data variables are converted to coordinates, as shown in the final dataset.
Returns
-------
Dataset
Expand Down Expand Up @@ -1780,9 +1807,66 @@ def reset_coords(
If True, remove coordinates instead of converting them into
variables.
Examples
--------
>>> dataset = xr.Dataset(
... {
... "temperature": (
... ["time", "lat", "lon"],
... [[[25, 26], [27, 28]], [[29, 30], [31, 32]]],
... ),
... "precipitation": (
... ["time", "lat", "lon"],
... [[[0.5, 0.8], [0.2, 0.4]], [[0.3, 0.6], [0.7, 0.9]]],
... ),
... },
... coords={
... "time": pd.date_range(start="2023-01-01", periods=2),
... "lat": [40, 41],
... "lon": [-80, -79],
... "altitude": 1000,
... },
... )
# Dataset before resetting coordinates
>>> dataset
<xarray.Dataset>
Dimensions: (time: 2, lat: 2, lon: 2)
Coordinates:
* time (time) datetime64[ns] 2023-01-01 2023-01-02
* lat (lat) int64 40 41
* lon (lon) int64 -80 -79
altitude int64 1000
Data variables:
temperature (time, lat, lon) int64 25 26 27 28 29 30 31 32
precipitation (time, lat, lon) float64 0.5 0.8 0.2 0.4 0.3 0.6 0.7 0.9
# Reset the 'altitude' coordinate
>>> dataset_reset = dataset.reset_coords("altitude")
# Dataset after resetting coordinates
>>> dataset_reset
<xarray.Dataset>
Dimensions: (time: 2, lat: 2, lon: 2)
Coordinates:
* time (time) datetime64[ns] 2023-01-01 2023-01-02
* lat (lat) int64 40 41
* lon (lon) int64 -80 -79
Data variables:
temperature (time, lat, lon) int64 25 26 27 28 29 30 31 32
precipitation (time, lat, lon) float64 0.5 0.8 0.2 0.4 0.3 0.6 0.7 0.9
altitude int64 1000
Returns
-------
Dataset
See Also
--------
Dataset.set_coords
"""
if names is None:
names = self._coord_names - set(self._indexes)
Expand Down Expand Up @@ -2742,6 +2826,50 @@ def head(
The keyword arguments form of ``indexers``.
One of indexers or indexers_kwargs must be provided.
Examples
--------
>>> dates = pd.date_range(start="2023-01-01", periods=5)
>>> pageviews = [1200, 1500, 900, 1800, 2000]
>>> visitors = [800, 1000, 600, 1200, 1500]
>>> dataset = xr.Dataset(
... {
... "pageviews": (("date"), pageviews),
... "visitors": (("date"), visitors),
... },
... coords={"date": dates},
... )
>>> busiest_days = dataset.sortby("pageviews", ascending=False)
>>> busiest_days.head()
<xarray.Dataset>
Dimensions: (date: 5)
Coordinates:
* date (date) datetime64[ns] 2023-01-05 2023-01-04 ... 2023-01-03
Data variables:
pageviews (date) int64 2000 1800 1500 1200 900
visitors (date) int64 1500 1200 1000 800 600
# Retrieve the 3 most busiest days in terms of pageviews
>>> busiest_days.head(3)
<xarray.Dataset>
Dimensions: (date: 3)
Coordinates:
* date (date) datetime64[ns] 2023-01-05 2023-01-04 2023-01-02
Data variables:
pageviews (date) int64 2000 1800 1500
visitors (date) int64 1500 1200 1000
# Using a dictionary to specify the number of elements for specific dimensions
>>> busiest_days.head({"date": 3})
<xarray.Dataset>
Dimensions: (date: 3)
Coordinates:
* date (date) datetime64[ns] 2023-01-05 2023-01-04 2023-01-02
Data variables:
pageviews (date) int64 2000 1800 1500
visitors (date) int64 1500 1200 1000
See Also
--------
Dataset.tail
Expand Down Expand Up @@ -2788,6 +2916,48 @@ def tail(
The keyword arguments form of ``indexers``.
One of indexers or indexers_kwargs must be provided.
Examples
--------
>>> activity_names = ["Walking", "Running", "Cycling", "Swimming", "Yoga"]
>>> durations = [30, 45, 60, 45, 60] # in minutes
>>> energies = [150, 300, 250, 400, 100] # in calories
>>> dataset = xr.Dataset(
... {
... "duration": (["activity"], durations),
... "energy_expenditure": (["activity"], energies),
... },
... coords={"activity": activity_names},
... )
>>> sorted_dataset = dataset.sortby("energy_expenditure", ascending=False)
>>> sorted_dataset
<xarray.Dataset>
Dimensions: (activity: 5)
Coordinates:
* activity (activity) <U8 'Swimming' 'Running' ... 'Walking' 'Yoga'
Data variables:
duration (activity) int64 45 45 60 30 60
energy_expenditure (activity) int64 400 300 250 150 100
# Activities with the least energy expenditures using tail()
>>> sorted_dataset.tail(3)
<xarray.Dataset>
Dimensions: (activity: 3)
Coordinates:
* activity (activity) <U8 'Cycling' 'Walking' 'Yoga'
Data variables:
duration (activity) int64 60 30 60
energy_expenditure (activity) int64 250 150 100
>>> sorted_dataset.tail({"activity": 3})
<xarray.Dataset>
Dimensions: (activity: 3)
Coordinates:
* activity (activity) <U8 'Cycling' 'Walking' 'Yoga'
Data variables:
duration (activity) int64 60 30 60
energy_expenditure (activity) int64 250 150 100
See Also
--------
Dataset.head
Expand Down Expand Up @@ -5617,6 +5787,70 @@ def dropna(
Which variables to check for missing values. By default, all
variables in the dataset are checked.
Examples
--------
>>> dataset = xr.Dataset(
... {
... "temperature": (
... ["time", "location"],
... [[23.4, 24.1], [np.nan, 22.1], [21.8, 24.2], [20.5, 25.3]],
... )
... },
... coords={"time": [1, 2, 3, 4], "location": ["A", "B"]},
... )
>>> dataset
<xarray.Dataset>
Dimensions: (time: 4, location: 2)
Coordinates:
* time (time) int64 1 2 3 4
* location (location) <U1 'A' 'B'
Data variables:
temperature (time, location) float64 23.4 24.1 nan 22.1 21.8 24.2 20.5 25.3
# Drop NaN values from the dataset
>>> dataset.dropna(dim="time")
<xarray.Dataset>
Dimensions: (time: 3, location: 2)
Coordinates:
* time (time) int64 1 3 4
* location (location) <U1 'A' 'B'
Data variables:
temperature (time, location) float64 23.4 24.1 21.8 24.2 20.5 25.3
# Drop labels with any NAN values
>>> dataset.dropna(dim="time", how="any")
<xarray.Dataset>
Dimensions: (time: 3, location: 2)
Coordinates:
* time (time) int64 1 3 4
* location (location) <U1 'A' 'B'
Data variables:
temperature (time, location) float64 23.4 24.1 21.8 24.2 20.5 25.3
# Drop labels with all NAN values
>>> dataset.dropna(dim="time", how="all")
<xarray.Dataset>
Dimensions: (time: 4, location: 2)
Coordinates:
* time (time) int64 1 2 3 4
* location (location) <U1 'A' 'B'
Data variables:
temperature (time, location) float64 23.4 24.1 nan 22.1 21.8 24.2 20.5 25.3
# Drop labels with less than 2 non-NA values
>>> dataset.dropna(dim="time", thresh=2)
<xarray.Dataset>
Dimensions: (time: 3, location: 2)
Coordinates:
* time (time) int64 1 3 4
* location (location) <U1 'A' 'B'
Data variables:
temperature (time, location) float64 23.4 24.1 21.8 24.2 20.5 25.3
Returns
-------
Dataset
Expand Down Expand Up @@ -5877,18 +6111,56 @@ def ffill(self: T_Dataset, dim: Hashable, limit: int | None = None) -> T_Dataset
Parameters
----------
dim : Hashable
Specifies the dimension along which to propagate values when
filling.
Specifies the dimension along which to propagate values when filling.
limit : int or None, optional
The maximum number of consecutive NaN values to forward fill. In
other words, if there is a gap with more than this number of
consecutive NaNs, it will only be partially filled. Must be greater
than 0 or None for no limit. Must be None or greater than or equal
to axis length if filling along chunked axes (dimensions).
Examples
--------
>>> time = pd.date_range("2023-01-01", periods=10, freq="D")
>>> data = np.array(
... [1, np.nan, np.nan, np.nan, 5, np.nan, np.nan, 8, np.nan, 10]
... )
>>> dataset = xr.Dataset({"data": (("time",), data)}, coords={"time": time})
>>> dataset
<xarray.Dataset>
Dimensions: (time: 10)
Coordinates:
* time (time) datetime64[ns] 2023-01-01 2023-01-02 ... 2023-01-10
Data variables:
data (time) float64 1.0 nan nan nan 5.0 nan nan 8.0 nan 10.0
# Perform forward fill (ffill) on the dataset
>>> dataset.ffill(dim="time")
<xarray.Dataset>
Dimensions: (time: 10)
Coordinates:
* time (time) datetime64[ns] 2023-01-01 2023-01-02 ... 2023-01-10
Data variables:
data (time) float64 1.0 1.0 1.0 1.0 5.0 5.0 5.0 8.0 8.0 10.0
# Limit the forward filling to a maximum of 2 consecutive NaN values
>>> dataset.ffill(dim="time", limit=2)
<xarray.Dataset>
Dimensions: (time: 10)
Coordinates:
* time (time) datetime64[ns] 2023-01-01 2023-01-02 ... 2023-01-10
Data variables:
data (time) float64 1.0 1.0 1.0 nan 5.0 5.0 5.0 8.0 8.0 10.0
Returns
-------
Dataset
See Also
--------
Dataset.bfill
"""
from xarray.core.missing import _apply_over_vars_with_dim, ffill

Expand All @@ -5912,9 +6184,48 @@ def bfill(self: T_Dataset, dim: Hashable, limit: int | None = None) -> T_Dataset
than 0 or None for no limit. Must be None or greater than or equal
to axis length if filling along chunked axes (dimensions).
Examples
--------
>>> time = pd.date_range("2023-01-01", periods=10, freq="D")
>>> data = np.array(
... [1, np.nan, np.nan, np.nan, 5, np.nan, np.nan, 8, np.nan, 10]
... )
>>> dataset = xr.Dataset({"data": (("time",), data)}, coords={"time": time})
>>> dataset
<xarray.Dataset>
Dimensions: (time: 10)
Coordinates:
* time (time) datetime64[ns] 2023-01-01 2023-01-02 ... 2023-01-10
Data variables:
data (time) float64 1.0 nan nan nan 5.0 nan nan 8.0 nan 10.0
# filled dataset, fills NaN values by propagating values backward
>>> dataset.bfill(dim="time")
<xarray.Dataset>
Dimensions: (time: 10)
Coordinates:
* time (time) datetime64[ns] 2023-01-01 2023-01-02 ... 2023-01-10
Data variables:
data (time) float64 1.0 5.0 5.0 5.0 5.0 8.0 8.0 8.0 10.0 10.0
# Limit the backward filling to a maximum of 2 consecutive NaN values
>>> dataset.bfill(dim="time", limit=2)
<xarray.Dataset>
Dimensions: (time: 10)
Coordinates:
* time (time) datetime64[ns] 2023-01-01 2023-01-02 ... 2023-01-10
Data variables:
data (time) float64 1.0 nan 5.0 5.0 5.0 8.0 8.0 8.0 10.0 10.0
Returns
-------
Dataset
See Also
--------
Dataset.ffill
"""
from xarray.core.missing import _apply_over_vars_with_dim, bfill

Expand Down

0 comments on commit 799f12d

Please sign in to comment.