Skip to content

BUG: pd.to_datetime failing to parse with exception error 01-Jun-2025 in sequence with 31-May-2025 #61395

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
johndrummond opened this issue May 3, 2025 · 4 comments
Labels
Bug Datetime Datetime data dtype

Comments

@johndrummond
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import sys

print(f"Pandas version: {pd.__version__}")
print(f"Python version: {sys.version}")

df = pd.DataFrame({'day': ["31-May-2025","01-Jun-2025","02-Jun-2025"]})
pd.to_datetime(df['day'])

Issue Description

gives
'Pandas version: 2.2.3'
'Python version: 3.11.11 (main, Dec 4 2024, 08:55:07) [GCC 11.4.0]'

ValueError: time data "01-Jun-2025" doesn't match format "%d-%B-%Y", at position 1. You might want to try:
- passing format if your strings have a consistent format;
- passing format='ISO8601' if your strings are all ISO8601 but not necessarily in exactly the same format;
- passing format='mixed', and the format will be inferred for each element individually. You might want to use dayfirst alongside this.
File , line 2
1 df = pd.DataFrame({'day': ["31-May-2025","01-Jun-2025","02-Jun-2025"]})
----> 2 pd.to_datetime(df['day'])
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/pandas/core/tools/datetimes.py:1067, in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, origin, cache)
1065 result = arg.map(cache_array)
1066 else:
-> 1067 values = convert_listlike(arg._values, format)
1068 result = arg._constructor(values, index=arg.index, name=arg.name)
1069 elif isinstance(arg, (ABCDataFrame, abc.MutableMapping)):
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/pandas/core/tools/datetimes.py:433, in _convert_listlike_datetimes(arg, format, name, utc, unit, errors, dayfirst, yearfirst, exact)
431 # format could be inferred, or user didn't ask for mixed-format parsing.
432 if format is not None and format != "mixed":
--> 433 return _array_strptime_with_fallback(arg, name, utc, format, exact, errors)
435 result, tz_parsed = objects_to_datetime64(
436 arg,
437 dayfirst=dayfirst,
(...)
441 allow_object=True,

Expected Behavior

it parses happily and correctly with no exception
interestingly it's having the transition end of may. start of June. Starting with 01-Jun-2025 works, ending with 31-May-2025 works,
dateparser.parse is happy
I'm guessing it infers a full month from the May when in fact it is a three character abbreviation.

Installed Versions

running in databricks notebook - checked in a separate version of python locally, with pandas 2.2.1
'Pandas version: 2.2.3'
'Python version: 3.11.11 (main, Dec 4 2024, 08:55:07) [GCC 11.4.0]' for the notebook.
pd.show_versions() doesn't return anything

locally
Pandas version: 2.2.1
Python version: 3.12.2 (main, Mar 25 2024, 11:48:28) [Clang 15.0.0 (clang-1500.3.9.4)]

and pd.show_versions() gives.

FileNotFoundError Traceback (most recent call last)
File /Users/J.Drummond/Documents/wip/python/truth_soc_1.py:2
1 # %%
----> 2 pd.show_versions()

File ~/Documents/wip/python/.venv/lib/python3.12/site-packages/pandas/util/_print_versions.py:141, in show_versions(as_json)
104 """
105 Provide useful information, important for bug reports.
106
(...)
138 ...
139 """
140 sys_info = _get_sys_info()
--> 141 deps = _get_dependency_info()
143 if as_json:
144 j = {"system": sys_info, "dependencies": deps}

File ~/Documents/wip/python/.venv/lib/python3.12/site-packages/pandas/util/_print_versions.py:98, in _get_dependency_info()
96 result: dict[str, JSONSerializable] = {}
97 for modname in deps:
---> 98 mod = import_optional_dependency(modname, errors="ignore")
99 result[modname] = get_version(mod) if mod else None
100 return result
...

@johndrummond johndrummond added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 3, 2025
@rhshadrach
Copy link
Member

Thanks for the report.

interestingly it's having the transition end of may. start of June. Starting with 01-Jun-2025 works, ending with 31-May-2025 works

When given no other information, pandas needs to infer the format from the first value. Starting with May, the short-form and the long-form of the month are the same. Thus pandas needs to guess. Regardless of how pandas guesses, some guesses will be wrong.

The resolution is provided in the error message: pass a format string. In this case, it's format="%d-%b-%Y".

Closing.

@rhshadrach rhshadrach added Datetime Datetime data dtype and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 3, 2025
@asishm
Copy link
Contributor

asishm commented May 3, 2025

see #58328 for additional context - duplicate of this

@johndrummond
Copy link
Author

Sorry to have missed the previous discussion. Interesting if one starts with any month aside from May it works fine. Which is what happened for us. And then when one gets to starting in May it throws an exception. But that's not a bug :)

@johndrummond
Copy link
Author

johndrummond commented May 4, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype
Projects
None yet
Development

No branches or pull requests

3 participants