Skip to content

BUG: to_datetime raises "AttributeError: 'NoneType' object has no attribute 'total_seconds'" even with errors='coerce' #59769

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
enemyleft opened this issue Sep 10, 2024 · 12 comments
Labels
Bug Datetime Datetime data dtype

Comments

@enemyleft
Copy link

enemyleft commented Sep 10, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
# next line OK
date = pd.to_datetime("Wed, 1 Dec 2021 08:00:00 -0600 (CST)", errors='coerce', utc=True, format='mixed')
# next line raises Exception -> AttributeError: 'NoneType' object has no attribute 'total_seconds'
date = pd.to_datetime("Sun, 14 Apr 2024 20:00:00 +0200 (CET)", errors='coerce', utc=True, format='mixed')

Issue Description

I have many dates to parse, some have a TimeZone like "(CET)", "(CST)" and much others, some not. The format is not predictable, so I cannot pass a predefined format string. The shown examples may be similar here, but this is not the case in real life. After some hours of analysis I finally found one specific date, which actually raises an exception.

Sun, 14 Apr 2024 20:00:00 +0200 (CET)

Expected Behavior

First I would expect that with errors='coerce' no error will be raised even the format is completely wrong, it should instead return "NaT", as the documentation suggests.

Second to me there is no "big" difference between the working date string Wed, 1 Dec 2021 08:00:00 -0600 (CST) and the one that raises an error Sun, 14 Apr 2024 20:00:00 +0200 (CET). I.e. both have the same format, the biggest difference is the TimeZone abbreviation, which is present in both cases, but different. In fact, if I omit (CET), the string can be parsed correctly.

As a workaround I could manually check whether a TimeZone abbreviation is present and remove it prior to call to_datetime. Especially when the time offset is present as well, this information is kind of redundant, i.e. it should not even be of interest for to_datetime. But this workaround should not be necessary in my opinion, as I think this is a bug and should be resolved in "to_datetime".

See also a similar issue here: #54479. Although I cannot reproduce this in my environment.

Installed Versions

INSTALLED VERSIONS

commit : d9cdd2e
python : 3.12.5.final.0
python-bits : 64
OS : Linux
OS-release : 6.10.7-arch1-1
Version : #1 SMP PREEMPT_DYNAMIC Thu, 29 Aug 2024 16:48:57 +0000
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.2.2
numpy : 2.1.1
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : None
pip : 24.2
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : None
qtpy : None
pyqt5 : None

@enemyleft enemyleft added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 10, 2024
@enemyleft enemyleft changed the title BUG: BUG: to_datetime raises "AttributeError: 'NoneType' object has no attribute 'total_seconds'" even with errors='coerce' Sep 10, 2024
@rhshadrach
Copy link
Member

Thanks for the report, I cannot reproduce on 64-bit linux, pandas 2.2.2 nor pandas 2.2.x, with the same versions of NumPy, pytz, and dateutil. Can you post a full stack trace of the error.

@rhshadrach rhshadrach added Needs Info Clarification about behavior needed to assess issue datetime.date stdlib datetime.date support Datetime Datetime data dtype and removed Needs Triage Issue that has not been reviewed by a pandas team member datetime.date stdlib datetime.date support labels Sep 15, 2024
@enemyleft
Copy link
Author

Thanks for the reply. I just reproduced it with a colleague, which is using MacOS and he also run into the same error. Here is the stack trace of MacOS:

test/import pandas as pd.py:10: FutureWarning: Parsing 'CET' as tzlocal (dependent on system timezone) is deprecated and will raise in a future version. Pass the 'tz' keyword or call tz_localize after construction instead
date = pd.to_datetime("Sun, 14 Apr 2024 20:00:00 +0200 (CET)", errors='coerce', utc=True, format=None)
/Users/test/import pandas as pd.py:10: UserWarning: Could not infer format, so each element will be parsed individually, falling back to dateutil. To ensure parsing is consistent and as-expected, please specify a format.
date = pd.to_datetime("Sun, 14 Apr 2024 20:00:00 +0200 (CET)", errors='coerce', utc=True, format=None)
Traceback (most recent call last):
File "/Users/test/import pandas as pd.py", line 10, in
date = pd.to_datetime("Sun, 14 Apr 2024 20:00:00 +0200 (CET)", errors='coerce', utc=True, format=None)
File "/Users/test/.venv/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 1099, in to_datetime
result = convert_listlike(argc, format)
File "/Users/test/.venv/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 435, in _convert_listlike_datetimes
result, tz_parsed = objects_to_datetime64(
File "/Users/test/.venv/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py", line 2398, in objects_to_datetime64
result, tz_parsed = tslib.array_to_datetime(
File "tslib.pyx", line 414, in pandas._libs.tslib.array_to_datetime
File "tslib.pyx", line 578, in pandas._libs.tslib.array_to_datetime
AttributeError: 'NoneType' object has no attribute 'total_seconds'

@rhshadrach
Copy link
Member

Thanks, from the Python docs the call to utcoffset here:

nsecs = tz.utcoffset(None).total_seconds()

can return None if the UTC offset isn't known. I'm not sure when this might happen, but it appears the logic should handle this case.

cc @jbrockmendel

@rhshadrach rhshadrach removed the Needs Info Clarification about behavior needed to assess issue label Sep 22, 2024
@jbrockmendel
Copy link
Member

what kind of tzinfo object are you getting back? might be fixable by passing an appropriate pydatetime object to utcoffset, but we wouldn't want to pay the cost of constructing that in the general case.

@HolzmanoLagrene
Copy link

I was doing some debugging and it seems that the problem only arises if my own timezone matches the timezone in the brackets! So for example: My own timezone is CET, CEST. Any string containing those strings in the brackets of the timezone name fails. If i switch my system time, everything works perfectly.

So if my local tz is ('CET', 'CEST') as of the output of print(time.tzname) the line

pd.to_datetime("Sun, 14 Apr 2024 20:00:00 +0200 (CET)", errors='coerce', utc=True, format='mixed')

fails whereas

pd.to_datetime("Wed, 1 Dec 2021 08:00:00 -0600 (CST)", errors='coerce', utc=True, format='mixed')

works fine.

Interestingly, if I switch my system time to CST, both lines work fine. The issue seems to be with CET?

@iroddis
Copy link

iroddis commented Apr 30, 2025

Yes, the issue arises when trying to parse a timestamp that contains a timezone in time.tzname, but that isn't a valid timezone in the system zone files (or pytz's zone files). For instance:

>> import time
>>> time.tzname
('EST', 'EDT')               # I am running on a machine using US/Eastern timezone

In that case, dateutil returns the tzinfo as dateutil.tz.tzlocal(). So it's not None, but it cannot determine the UTC offset without a timestamp.

In the 2.3.x branch tslib.pyx calls:

nsecs = tz.utcoffset(None).total_seconds()

which only succeeds with fixed-offset timezones. A DST timezone can only resolve the offset if it knows the timestamp.

Somehow this behaviour changed in 2.x. 1.5.3 parsed without warnings to the correct offset. The latest master looks like it's got different code, but if the issue persists in there I'll submit an PR with a fix.

Ultimately it's better not to parse any XDY short codes like this. It won't work for DST zones defined outside of the system's own zone (e.g. ADT will never work on a host running in US/Eastern, pandas will silently strip the timezone and return a naive timestamp).

Better options:

  • Use fixed offsets
  • Parse the timestamps as naive, then use tz_localize (what many of the Future warnings suggest)

@thyoon7
Copy link

thyoon7 commented Apr 30, 2025

above is reproducible as following:

import pandas as pd
pd.to_datetime('2024-03-11 16:00:00.123456000 EDT', errors='coerce', utc=True)

produces below Traceback

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/thyoon/test/anaconda3/envs/nvs3.12/lib/python3.12/site-packages/pandas/core/tools/datetimes.py", line 1101, in to_datetime
    result = convert_listlike(np.array([arg]), format)[0]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/thyoon/test/anaconda3/envs/nvs3.12/lib/python3.12/site-packages/pandas/core/tools/datetimes.py", line 435, in _convert_listlike_datetimes
    result, tz_parsed = objects_to_datetime64(
                        ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/thyoon/test/anaconda3/envs/nvs3.12/lib/python3.12/site-packages/pandas/core/arrays/datetimes.py", line 2398, in objects_to_datetime64
    result, tz_parsed = tslib.array_to_datetime(
                        ^^^^^^^^^^^^^^^^^^^^^^^^
  File "tslib.pyx", line 414, in pandas._libs.tslib.array_to_datetime
  File "tslib.pyx", line 578, in pandas._libs.tslib.array_to_datetime
AttributeError: 'NoneType' object has no attribute 'total_seconds'

But doesn't happen with pd.Timestamp('2024-03-11 16:00:00.123456000 EDT')

Verified that it exists from pandas 2.*

@jbrockmendel
Copy link
Member

#50791 deprecated (now enforced in main) parsing strings to tzlocal based on the user's time.tzname. I suspect that addresses this issue.

@iroddis
Copy link

iroddis commented May 1, 2025

Unfortunately not, at least as of 2.2.3. The issue is that the current parse method tries to determine the UTC offset of the parsed timezone before parsing the naive portion of the timestamp. If the parsed timezone is tzlocal(), then it cannot determine the UTC offset without knowing the naive timestamp.

To reproduce, try to parse a timestamp in the format %Y-%m-%d %H:%M %Z with a DST short zone name and a timestamp that falls within the DST range.

[ins] In [1]: import pandas as pd

[ins] In [2]: pd.__version__
Out[2]: '2.2.3'

[ins] In [3]: import time

[ins] In [4]: time.tzname
Out[4]: ('AST', 'ADT')

[ins] In [5]: pd.to_datetime(["2025-01-17 09:19 ADT"])   # Will work, because ADT doesn't start until 2025-03-09
<ipython-input-5-dc275161194d>:1: FutureWarning: Parsing 'ADT' as tzlocal (dependent on system timezone) is deprecated and will raise in a future version. Pass the 'tz' keyword or call tz_localize after construction instead
  pd.to_datetime(["2025-01-17 09:19 ADT"])   # Will work, because ADT doesn't start until 2025-03-09
Out[5]: DatetimeIndex(['2025-01-17 09:19:00'], dtype='datetime64[ns]', freq=None)

[ins] In [6]: pd.to_datetime(["2025-03-17 09:19 ADT"])   # Will NOT work, because time is _actually_ in ADT
<ipython-input-6-761725df89ed>:1: FutureWarning: Parsing 'ADT' as tzlocal (dependent on system timezone) is deprecated and will raise in a future version. Pass the 'tz' keyword or call tz_localize after construction instead
  pd.to_datetime(["2025-03-17 09:19 ADT"])   # Will NOT work, because time is _actually_ in ADT
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[6], line 1
----> 1 pd.to_datetime(["2025-03-17 09:19 ADT"])   # Will NOT work, because time is _actually_ in ADT

File ~/.asdf/installs/python/3.11.3/lib/python3.11/site-packages/pandas/core/tools/datetimes.py:1099, in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, origin, cache)
   1097         result = _convert_and_box_cache(argc, cache_array)
   1098     else:
-> 1099         result = convert_listlike(argc, format)
   1100 else:
   1101     result = convert_listlike(np.array([arg]), format)[0]

File ~/.asdf/installs/python/3.11.3/lib/python3.11/site-packages/pandas/core/tools/datetimes.py:435, in _convert_listlike_datetimes(arg, format, name, utc, unit, errors, dayfirst, yearfirst, exact)
    432 if format is not None and format != "mixed":
    433     return _array_strptime_with_fallback(arg, name, utc, format, exact, errors)
--> 435 result, tz_parsed = objects_to_datetime64(
    436     arg,
    437     dayfirst=dayfirst,
    438     yearfirst=yearfirst,
    439     utc=utc,
    440     errors=errors,
    441     allow_object=True,
    442 )
    444 if tz_parsed is not None:
    445     # We can take a shortcut since the datetime64 numpy array
    446     # is in UTC
    447     out_unit = np.datetime_data(result.dtype)[0]

File ~/.asdf/installs/python/3.11.3/lib/python3.11/site-packages/pandas/core/arrays/datetimes.py:2398, in objects_to_datetime64(data, dayfirst, yearfirst, utc, errors, allow_object, out_unit)
   2395 # if str-dtype, convert
   2396 data = np.asarray(data, dtype=np.object_)
-> 2398 result, tz_parsed = tslib.array_to_datetime(
   2399     data,
   2400     errors=errors,
   2401     utc=utc,
   2402     dayfirst=dayfirst,
   2403     yearfirst=yearfirst,
   2404     creso=abbrev_to_npy_unit(out_unit),
   2405 )
   2407 if tz_parsed is not None:
   2408     # We can take a shortcut since the datetime64 numpy array
   2409     #  is in UTC
   2410     return result, tz_parsed

File tslib.pyx:414, in pandas._libs.tslib.array_to_datetime()

File tslib.pyx:578, in pandas._libs.tslib.array_to_datetime()

AttributeError: 'NoneType' object has no attribute 'total_seconds'

@jbrockmendel
Copy link
Member

jbrockmendel commented May 1, 2025 via email

@iroddis
Copy link

iroddis commented May 1, 2025

Sorry, I didn't read your response closely enough.

main does throw on any variation of DST zones. Thanks very much.

@rhshadrach
Copy link
Member

Thanks @jbrockmendel and @iroddis - closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype
Projects
None yet
Development

No branches or pull requests

6 participants