-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: race condition in Index.is_unique #21150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
virtually nothing is threadsafe in pandas you have to be very careful. you are welcome to submit a patch. |
see #2728 |
…ue with multiple threads accessing an Index
…ue with multiple threads accessing an Index
…ue with multiple threads accessing an Index
…ue with multiple threads accessing an Index
…ue with multiple threads accessing an Index
It's not surprising that modifying a frame while trying to use it from multiple threads is unsafe, but it's kind of weird that "read only" operations like |
Would this be an acceptable fix? I tested acquiring locks inside the |
We would need to measure the cost of acquiring the lock relative to the operation. Do you know, if I do Also, this feels like too low of a level for acquiring the lock, though I may be wrong about that. |
Note that this will affect not only "is_unique", but all other methods which are using the same decorator. Also, it locks on all calls, which is excessive. That should be easily fixable, tho. You have a better suggestion of where to add the locking? |
Maybe _ensure_mapping_populated? I'm not sure.
…On Wed, Jun 5, 2019 at 3:14 AM marberi ***@***.***> wrote:
Note that this will affect not only "is_unique", but all other methods
which are using the same decorator. Also, it locks on all calls, which is
excessive. That should be easily fixable, tho. You have a better suggestion
of where to add the locking?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#21150?email_source=notifications&email_token=AAKAOIVFQ6YHT3AONMAZRF3PY5YVTA5CNFSM4FA3IYUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODW66L5Q#issuecomment-498984438>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOIU7RIB2QCTN4HK2N3LPY5YVTANCNFSM4FA3IYUA>
.
|
I cannot reproduce on Python 3.12. |
Seems unlikely that what you are observing is reliable, right? The only reliable (ie robust to thread scheduling changes) to fix this is to add a lock on the pandas side? |
Just to be clear, the issue is being closed but the problem is not actually fixed, right? Because |
Looking over this again, as mentioned above essentially none of pandas is thread-safe. Even operations that appear to the user to be read-only can modify caches for performance. I am negative making modifications to support thread-safety that have a performance impact unless there is a PDEP that can address (a) level of effort (b) single thread performance cost and (c) benefit across the entire pandas API. Doing this in a one-off manner that has performance costs is not the appropriate way to address this. |
Code Sample, a copy-pastable example if possible
Input:
Output:
Problem description
When calling
Index.is_unique
from multiple threads simultaneously, the wrong answer is returned.Expected Output
Shouldn't raise.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.14.14-200.fc26.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C
LANG: C
LOCALE: None.None
pandas: 0.23.0
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.13.3
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 4.2.1
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.2
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: 0.1.5
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: