-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: pd.to_datetime failing to parse with exception error 01-Jun-2025 in sequence with 31-May-2025 #61395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report.
When given no other information, pandas needs to infer the format from the first value. Starting with The resolution is provided in the error message: pass a format string. In this case, it's Closing. |
see #58328 for additional context - duplicate of this |
Sorry to have missed the previous discussion. Interesting if one starts with any month aside from May it works fine. Which is what happened for us. And then when one gets to starting in May it throws an exception. But that's not a bug :) |
Just wondering on the guesses it could guess from more than the first value
if ambiguous
…On Sat, 3 May 2025, 16:12 Richard Shadrach, ***@***.***> wrote:
*rhshadrach* left a comment (pandas-dev/pandas#61395)
<#61395 (comment)>
Thanks for the report.
interestingly it's having the transition end of may. start of June.
Starting with 01-Jun-2025 works, ending with 31-May-2025 works
When given no other information, pandas needs to infer the format from the
first value. Starting with May, the short-form and the long-form of the
month are the same. Thus pandas needs to guess. Regardless of how pandas
guesses, some guesses will be wrong.
The resolution is provided in the error message: pass a format string. In
this case, it's format="%d-%b-%Y".
Closing.
—
Reply to this email directly, view it on GitHub
<#61395 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAK223ZNX42XMUSWO4COI7L24TMFNAVCNFSM6AAAAAB4LYTH26VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQNBYGY3DONZXGM>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
gives
'Pandas version: 2.2.3'
'Python version: 3.11.11 (main, Dec 4 2024, 08:55:07) [GCC 11.4.0]'
ValueError: time data "01-Jun-2025" doesn't match format "%d-%B-%Y", at position 1. You might want to try:
- passing
format
if your strings have a consistent format;- passing
format='ISO8601'
if your strings are all ISO8601 but not necessarily in exactly the same format;- passing
format='mixed'
, and the format will be inferred for each element individually. You might want to usedayfirst
alongside this.File , line 2
1 df = pd.DataFrame({'day': ["31-May-2025","01-Jun-2025","02-Jun-2025"]})
----> 2 pd.to_datetime(df['day'])
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/pandas/core/tools/datetimes.py:1067, in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, origin, cache)
1065 result = arg.map(cache_array)
1066 else:
-> 1067 values = convert_listlike(arg._values, format)
1068 result = arg._constructor(values, index=arg.index, name=arg.name)
1069 elif isinstance(arg, (ABCDataFrame, abc.MutableMapping)):
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/pandas/core/tools/datetimes.py:433, in _convert_listlike_datetimes(arg, format, name, utc, unit, errors, dayfirst, yearfirst, exact)
431 #
format
could be inferred, or user didn't ask for mixed-format parsing.432 if format is not None and format != "mixed":
--> 433 return _array_strptime_with_fallback(arg, name, utc, format, exact, errors)
435 result, tz_parsed = objects_to_datetime64(
436 arg,
437 dayfirst=dayfirst,
(...)
441 allow_object=True,
Expected Behavior
it parses happily and correctly with no exception
interestingly it's having the transition end of may. start of June. Starting with 01-Jun-2025 works, ending with 31-May-2025 works,
dateparser.parse is happy
I'm guessing it infers a full month from the May when in fact it is a three character abbreviation.
Installed Versions
running in databricks notebook - checked in a separate version of python locally, with pandas 2.2.1
'Pandas version: 2.2.3'
'Python version: 3.11.11 (main, Dec 4 2024, 08:55:07) [GCC 11.4.0]' for the notebook.
pd.show_versions() doesn't return anything
locally
Pandas version: 2.2.1
Python version: 3.12.2 (main, Mar 25 2024, 11:48:28) [Clang 15.0.0 (clang-1500.3.9.4)]
and pd.show_versions() gives.
FileNotFoundError Traceback (most recent call last)
File /Users/J.Drummond/Documents/wip/python/truth_soc_1.py:2
1 # %%
----> 2 pd.show_versions()
File ~/Documents/wip/python/.venv/lib/python3.12/site-packages/pandas/util/_print_versions.py:141, in show_versions(as_json)
104 """
105 Provide useful information, important for bug reports.
106
(...)
138 ...
139 """
140 sys_info = _get_sys_info()
--> 141 deps = _get_dependency_info()
143 if as_json:
144 j = {"system": sys_info, "dependencies": deps}
File ~/Documents/wip/python/.venv/lib/python3.12/site-packages/pandas/util/_print_versions.py:98, in _get_dependency_info()
96 result: dict[str, JSONSerializable] = {}
97 for modname in deps:
---> 98 mod = import_optional_dependency(modname, errors="ignore")
99 result[modname] = get_version(mod) if mod else None
100 return result
...
The text was updated successfully, but these errors were encountered: