Skip to content

ThreadFactory is a concurrency framework for Python 3.13+ (No-GIL). It provides custom Work future objects and thread-safe collections, laying the foundation for scalable parallel execution in modern Python.

License

Notifications You must be signed in to change notification settings

Synaptic724/ThreadFactory

ThreadFactory

PyPI version License Python Version

PyPI Downloads PyPI Downloads PyPI Downloads

Upload Python Package Docs

High-performance thread-safe (No-GIL–friendly) data structures and parallel operations for Python 3.13+.

NOTE
ThreadFactory is designed and tested against Python 3.13+ in No-GIL mode.
This library will only function on 3.13 and higher.

Please see the benchmarks at the bottom of this page if interested there are more in the repository.
Jump to Benchmarks 🔥


🚀 Features

Concurrent Data Structures

ConcurrentBag

  • A thread-safe “multiset” collection that allows duplicates.
  • Methods like add, remove, discard, etc.
  • Ideal for collections where duplicate elements matter.

ConcurrentDict

  • A thread-safe dictionary.
  • Supports typical dict operations (update, popitem, etc.).
  • Provides map, filter, and reduce for safe, bulk operations.
  • Freeze support: When frozen, the dictionary becomes read-only. Lock acquisition is skipped during reads, dramatically improving performance in high-read workloads.

ConcurrentList

  • A thread-safe list supporting concurrent access and modification.
  • Slice assignment, in-place operators (+=, *=), and advanced operations (map, filter, reduce).
  • Freeze support: Prevents structural modifications while enabling safe, lock-free reads (e.g., __getitem__, iteration, and slicing). Ideal for caching and broadcast scenarios.

ConcurrentQueue

  • A thread-safe FIFO queue built atop collections.deque.
  • Tested and outperforms deque alone by up to 64% in our benchmark.
  • Supports enqueue, dequeue, peek, map, filter, and reduce.
  • Raises Empty when dequeue or peek is called on an empty queue.
  • Outperforms multiprocessing queues by over 400% in some cases — clone and run unit tests to see.

ConcurrentStack

  • A thread-safe LIFO stack.
  • Supports push, pop, peek operations.
  • Ideal for last-in, first-out (LIFO) workloads.
  • Built on deque for fast appends and pops.
  • Similar performance to ConcurrentQueue.

ConcurrentBuffer

  • A high-performance, thread-safe buffer using sharded deques for low-contention access.
  • Designed to handle massive producer/consumer loads with better throughput than standard queues.
  • Supports enqueue, dequeue, peek, clear, and bulk operations (map, filter, reduce).
  • Timestamp-based ordering ensures approximate FIFO behavior across shards.
  • Outperforms ConcurrentQueue by up to 60% in mid-range concurrency in even thread Producer/Consumer configuration with 10 shards.
  • Automatically balances items across shards; ideal for parallel pipelines and low-latency workloads.
  • Best used with shard_count ≈ thread_count / 2 for optimal performance, but keep shards at or below 10.

ConcurrentCollection

  • An unordered, thread-safe alternative to ConcurrentBuffer.
  • Optimized for high-concurrency scenarios where strict FIFO is not required.
  • Uses fair circular scans seeded by bit-mixed monotonic clocks to distribute dequeues evenly.
  • Benchmarks (10 producers / 20 consumers, 2M ops) show ~5.6% higher throughput than ConcurrentBuffer:
    • ConcurrentCollection: 108,235 ops/sec
    • ConcurrentBuffer: 102,494 ops/sec
    • Better scaling under thread contention.

ConcurrentSet

  • A thread-safe set implementation supporting all standard set algebra operations.
  • Supports add, discard, remove, and all bitwise set operations (|, &, ^, -) along with their in-place forms.
  • Provides map, filter, reduce, and batch_update to safely perform bulk transformations.
  • Freeze support: Once frozen, the set cannot be modified — but read operations become lock-free and extremely efficient.
  • Ideal for workloads where the set is mutated during setup but then used repeatedly in a read-only context (e.g., filters, routing tables, permissions).

Parallel Utilities

ThreadFactory provides a collection of parallel programming utilities inspired by .NET's Task Parallel Library (TPL).

parallel_for

  • Executes a traditional for loop in parallel across multiple threads.
  • Accepts start, stop, and a body function to apply to each index.
  • Supports:
    • Automatic chunking to balance load.
    • Optional local_init / local_finalize for per-thread local state.
    • Optional stop_on_exception to abort on the first error.

parallel_foreach

  • Executes an action function on each item of an iterable in parallel.
  • Supports:
    • Both pre-known-length and streaming iterables.
    • Optional chunk_size to tune batch sizes.
    • Optional stop_on_exception to halt execution when an exception occurs.
    • Efficient when processing large datasets or streaming data without loading everything into memory.

parallel_invoke

  • Executes multiple independent functions concurrently.
  • Accepts an arbitrary number of functions as arguments.
  • Returns a list of futures representing the execution of each function.
  • Optionally waits for all functions to finish (or fail).
  • Simplifies running unrelated tasks in parallel with easy error propagation.

parallel_map

  • Parallel equivalent of Python’s built-in map().
  • Applies a transform function to each item in an iterable concurrently.
  • Maintains the order of results.
  • Automatically splits the work into chunks for efficient multi-threaded execution.
  • Returns a fully materialized list of results.

Notes

  • All utilities automatically default to max_workers = os.cpu_count() if unspecified.
  • chunk_size can be manually tuned or defaults to roughly 4 × #workers for balanced performance.
  • Exceptions raised inside tasks are properly propagated to the caller.

📖 Documentation

Full API reference and usage examples are available at:

➡️ https://threadfactory.readthedocs.io


⚙️ Installation

Option 1: Clone and Install Locally (Recommended for Development)

# Clone the repository
git clone https://github.com/Synaptic724/ThreadFactory.git
cd threadfactory

# Create a Python 3.13+ virtual environment (No-GIL/Free concurrency recommended)
python -m venv .venv
source .venv/bin/activate  # or .venv\Scripts\activate on Windows

Option 2: Install the library from PyPI

# Install the library in editable mode
pip install threadfactory

📈 Real-World Benchmarking

Below are benchmark results from live multi-threaded scenarios using 10–40 real threads,
with millions of operations processed under load.

These benchmarks aren't just numbers —
they are proof that ThreadFactory's concurrent collections outperform traditional Python structures by 2x–5x,
especially in the new No-GIL world Python 3.13+ is unlocking.

Performance under pressure.
Architecture built for the future.

These are just our Concurrent Datastructures and not even the real thing.
Threadfactory is coming soon...


All benchmark tests below are available if you clone the library and run the tests.
See the Benchmark Details 🚀 for more benchmark stats.

🔥 Benchmark Results (10,000,000 ops — 10 producers / 10 consumers)

Queue Type Time (sec) Throughput (ops/sec) Notes
multiprocessing.Queue 119.99 ~83,336 Not suited for thread-only workloads, incurs unnecessary overhead.
thread_factory.ConcurrentBuffer 23.27 ~429,651 ⚡ Dominant here. Consistent and efficient under moderate concurrency.
thread_factory.ConcurrentQueue 37.87 ~264,014 Performs solidly. Shows stable behavior even at higher operation counts.
collections.deque 64.16 ~155,876 Suffers from contention. Simplicity comes at the cost of throughput.

✅ Highlights:

  • ConcurrentBuffer outperformed multiprocessing.Queue by 96.72 seconds.
  • ConcurrentBuffer outperformed ConcurrentQueue by 14.6 seconds.
  • ConcurrentBuffer outperformed collections.deque by 40.89 seconds.

💡 Observations:

  • ConcurrentBuffer continues to be the best performer under moderate concurrency.
  • ConcurrentQueue maintains a consistent performance but is outperformed by ConcurrentBuffer.
  • All queues emptied correctly (final length = 0).

🔥 Benchmark Results (20,000,000 ops — 20 Producers / 20 Consumers)

Queue Type Time (sec) Throughput (ops/sec) Notes
multiprocessing.Queue 249.92 ~80,020 Severely limited by thread-unfriendly IPC locks.
thread_factory.ConcurrentBuffer 138.64 ~144,270 Solid under moderate producer-consumer balance. Benefits from shard windowing.
thread_factory.ConcurrentBuffer 173.89 ~115,010 Too many shards increased internal complexity, leading to lower throughput.
thread_factory.ConcurrentQueue 77.69 ~257,450 ⚡ Fastest overall. Ideal for large-scale multi-producer, multi-consumer scenarios.
collections.deque 190.91 ~104,771 Still usable, but scalability is poor compared to specialized implementations.

✅ Notes:

  • ConcurrentBuffer performs better with 10 shards than 20 shards at this concurrency level.
  • ConcurrentQueue continues to be the most stable performer under moderate-to-high thread counts.
  • multiprocessing.Queue remains unfit for threaded-only workloads due to its heavy IPC-oriented design.

💡 Observations:

  • Shard count tuning in ConcurrentBuffer is crucial — too many shards can reduce performance.
  • Bit-flip balancing in ConcurrentBuffer helps under moderate concurrency but hits diminishing returns with excessive sharding.
  • ConcurrentQueue is proving to be the general-purpose winner for most balanced threaded workloads.
  • For ~40 threads, ConcurrentBuffer shows ~25% drop when doubling the number of shards due to increased dequeue complexity.
  • All queues emptied correctly (final length = 0).

About

ThreadFactory is a concurrency framework for Python 3.13+ (No-GIL). It provides custom Work future objects and thread-safe collections, laying the foundation for scalable parallel execution in modern Python.

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Packages

No packages published

Languages