Skip to content
View fanyangCS's full-sized avatar

Block or report fanyangCS

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 500 38 Updated Jan 11, 2025

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.

C++ 175 10 Updated Nov 18, 2024
Python 22 4 Updated Dec 21, 2024

A beautiful, simple, clean, and responsive Jekyll theme for academics

HTML 11,911 11,482 Updated Jan 20, 2025

[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable

Python 134 9 Updated Sep 21, 2024

nnScaler: Compiling DNN models for Parallel Training

Python 87 13 Updated Jan 10, 2025

Low-bit LLM inference on CPU with lookup table

C++ 649 49 Updated Jan 9, 2025

MSVBASE is a system that efficiently supports complex queries of both approximate similarity search and relational operators. It integrates high-dimensional vector indices into PostgreSQL, a relati…

C++ 90 12 Updated Nov 19, 2024
Python 134 11 Updated Jul 22, 2024
Python 73 1 Updated Feb 22, 2023

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

Python 14,102 1,819 Updated Jul 3, 2024

A unified 3D Transformer Pipeline for visual synthesis

2,808 163 Updated May 29, 2023

Tutel MoE: An Optimized Mixture-of-Experts Implementation

Python 748 94 Updated Jan 18, 2025

A validation and profiling tool for AI infrastructure

Python 286 60 Updated Jan 8, 2025

System for AI Education Resource.

Python 3,777 469 Updated Oct 25, 2024

An experimental parallel training platform

54 13 Updated Mar 25, 2024

Antares: an automatic engine for multi-platform kernel generation and optimization. Supporting CPU, CUDA, ROCm, DirectX12, GraphCore, SYCL for CPU/GPU, OpenCL for AMD/NVIDIA, Android CPU/GPU backends.

C++ 452 46 Updated Jan 11, 2025

A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.

C++ 972 162 Updated Sep 19, 2024

A decoupled transaction component providing transaction processing for applications

C++ 6 2 Updated Jul 15, 2020

Resource scheduling and cluster management for AI

JavaScript 2,646 549 Updated Jun 6, 2024

OpenPAI SDK

TypeScript 19 16 Updated Dec 10, 2022

Extension to connect OpenPAI clusters, submit AI jobs, simulate jobs locally, manage files, and so on.

TypeScript 14 5 Updated Dec 10, 2022

A marketplace which stores examples and job templates of openpai. Users could use openpaimarketplace to share their jobs or run-and-learn others' sharing job.

JavaScript 32 21 Updated Dec 13, 2022

Runtime for deep learning workload

Python 20 16 Updated May 24, 2022

Kubernetes Scheduler for Deep Learning

Go 255 39 Updated May 22, 2022

Resource scheduling and cluster management for AI

Java 2 Updated Jan 15, 2019

A distributed approximate nearest neighborhood search (ANN) library which provides a high quality vector index build, search and distributed online serving toolkits for large scale vector search sc…

C++ 4,857 580 Updated Jan 7, 2025

General-Purpose Kubernetes Pod Controller

Go 174 44 Updated Apr 4, 2023

High performance container overlay networks on Linux. Enabling RDMA (on both InfiniBand and RoCE) and accelerating TCP to bare metal performance. Freeflow requires zero modification on application …

C 614 92 Updated Jun 12, 2023
Next