Stars
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
A beautiful, simple, clean, and responsive Jekyll theme for academics
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
nnScaler: Compiling DNN models for Parallel Training
MSVBASE is a system that efficiently supports complex queries of both approximate similarity search and relational operators. It integrates high-dimensional vector indices into PostgreSQL, a relati…
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
A unified 3D Transformer Pipeline for visual synthesis
Tutel MoE: An Optimized Mixture-of-Experts Implementation
A validation and profiling tool for AI infrastructure
Antares: an automatic engine for multi-platform kernel generation and optimization. Supporting CPU, CUDA, ROCm, DirectX12, GraphCore, SYCL for CPU/GPU, OpenCL for AMD/NVIDIA, Android CPU/GPU backends.
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
A decoupled transaction component providing transaction processing for applications
Resource scheduling and cluster management for AI
Extension to connect OpenPAI clusters, submit AI jobs, simulate jobs locally, manage files, and so on.
A marketplace which stores examples and job templates of openpai. Users could use openpaimarketplace to share their jobs or run-and-learn others' sharing job.
Leinao / pai
Forked from microsoft/paiResource scheduling and cluster management for AI
A distributed approximate nearest neighborhood search (ANN) library which provides a high quality vector index build, search and distributed online serving toolkits for large scale vector search sc…
General-Purpose Kubernetes Pod Controller
High performance container overlay networks on Linux. Enabling RDMA (on both InfiniBand and RoCE) and accelerating TCP to bare metal performance. Freeflow requires zero modification on application …