Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread safety: need lock for both read and write (_cache and other global variables) #471

Open
makyen opened this issue May 25, 2022 · 0 comments

Comments

@makyen
Copy link

makyen commented May 25, 2022

Currently, only the _cache variable has a threading lock, and it's only used for write operations. All global variables need to use a lock for both read and write accesses.

In order to be thread safe, values which have the possibility of being changed in one thread and which can be either read or written from another thread need to be protected by a threading lock prior to either reading or writing them. If they are protected only upon writing, then they can be read while the writing thread is in the middle of a write, which can result in undefined operation. While it would be possible to design Python such that a lock isn't needed upon read, Python isn't designed that way, and it's definitely not something which is guaranteed. Not needing a lock upon read accesses would only be possible if changes to every readable value were always something accomplished with a single memory write access. That's just not the case in Python. So, for thread safety, locks need to be obtained prior to both read and write operations from/to variables which are accessible to more than one thread (i.e. basically any value which isn't local to a function).

Specifically, every access to _cache needs to be protected by the _cache_lock RLock which is currently used only for writes. In addition, all other global variables need to use thread locking semantics for both reading and writing.

Ideally, everything which can, potentially, be changed and accessed by multiple threads should be protected by a lock of some sort (RLock() is normally good). You don't need to lock things like constants, which are never changed once initially set upon package initialization, but things like _named_args, _replacement_cache, _locale_sensitive, and even _cache_all, and any other global settings, should only be accessed (read or write) once a threading lock has been obtained. A potential alternative for some variables would be to use a value which is only accessible to the current thread.

I don't have a specific thread safety test cases for the regex package, but I have tested thread safety with only obtaining locks for writes, but not reads, in other Python code and experienced issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant