Lists (1)
Sort Name ascending (A-Z)
Stars
Multi-platform high-performance compute language extension for Rust.
A no_std + serde compatible message library for Rust
Fast Matrix Multiplications for Lookup Table-Quantized LLMs
Code at the speed of thought – Zed is a high-performance, multiplayer code editor from the creators of Atom and Tree-sitter.
A PyTorch native library for large model training
Servo aims to empower developers with a lightweight, high-performance alternative for embedding web technologies in applications.
A web browser that plays old world blues to build new world hope
pytrace is a fast python tracer. it records function calls, arguments and return values. can be used for debugging and profiling.
Collection of crates to deal with crashes
A debugging toolset and library for debugging embedded ARM and RISC-V targets on a separate host
Schedule-Free Optimization in PyTorch
how to optimize some algorithm in cuda.
llama3 implementation one matrix multiplication at a time
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
A scalable, distributed, collaborative, document-graph database, for the realtime web
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
A library for building fast, reliable and evolvable network services.
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
A GPU-driven system framework for scalable AI applications
PyTorch emulation library for Microscaling (MX)-compatible data formats