Skip to content

Commit

Permalink
Speed up decode_cf_datetime (pydata#1414)
Browse files Browse the repository at this point in the history
* Speed up `decode_cf_datetime`

Instead of casting the input numeric dates to float, they are
casted to nanoseconds as integer which makes `pd.to_timedelta()`
work much faster (x100 speedup on my machine)

* Moved _NS_PER_TIME_DELTA to top of module file

* Added entry to `whats-new.rst`
  • Loading branch information
cchwala authored and Joe Hamman committed Jul 25, 2017
1 parent 5d245b2 commit d275ad6
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 3 deletions.
3 changes: 3 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@ Enhancements
raster files are opened with :py:func:`~xarray.open_rasterio`.
By `Greg Brener <https://github.com/gbrener>`_

- Speed-up (x 100) of :py:func:`~xarray.conventions.decode_cf_datetime`.
By `Christian Chwala <https://github.com/cchwala>`_.

Bug fixes
~~~~~~~~~

Expand Down
24 changes: 21 additions & 3 deletions xarray/conventions.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,13 @@
# standard calendars recognized by netcdftime
_STANDARD_CALENDARS = set(['standard', 'gregorian', 'proleptic_gregorian'])

_NS_PER_TIME_DELTA = {'us': 1e3,
'ms': 1e6,
's': 1e9,
'm': 1e9 * 60,
'h': 1e9 * 60 * 60,
'D': 1e9 * 60 * 60 * 24}


def mask_and_scale(array, fill_value=None, scale_factor=None, add_offset=None,
dtype=float):
Expand Down Expand Up @@ -126,11 +133,14 @@ def decode_cf_datetime(num_dates, units, calendar=None):
operations, which makes it much faster than netCDF4.num2date. In such a
case, the returned array will be of type np.datetime64.
Note that time unit in `units` must not be smaller than microseconds and
not larger than days.
See also
--------
netCDF4.num2date
"""
num_dates = np.asarray(num_dates, dtype=float)
num_dates = np.asarray(num_dates)
flat_num_dates = num_dates.ravel()
if calendar is None:
calendar = 'standard'
Expand All @@ -155,10 +165,18 @@ def decode_cf_datetime(num_dates, units, calendar=None):
pd.to_timedelta(flat_num_dates.min(), delta) + ref_date
pd.to_timedelta(flat_num_dates.max(), delta) + ref_date

dates = (pd.to_timedelta(flat_num_dates, delta) + ref_date).values
# Cast input dates to integers of nanoseconds because `pd.to_datetime`
# works much faster when dealing with integers
flat_num_dates_ns_int = (flat_num_dates *
_NS_PER_TIME_DELTA[delta]).astype(np.int64)

dates = (pd.to_timedelta(flat_num_dates_ns_int, 'ns') +
ref_date).values

except (OutOfBoundsDatetime, OverflowError):
dates = _decode_datetime_with_netcdf4(flat_num_dates, units, calendar)
dates = _decode_datetime_with_netcdf4(flat_num_dates.astype(np.float),
units,
calendar)

return dates.reshape(num_dates.shape)

Expand Down

0 comments on commit d275ad6

Please sign in to comment.