Skip to content
View yandai's full-sized avatar

Block or report yandai

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

An open-source C++ library developed and used at Facebook.

C++ 28,994 5,640 Updated Feb 20, 2025

Tile primitives for speedy kernels

Cuda 2,055 115 Updated Feb 20, 2025

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…

Python 2,195 364 Updated Feb 20, 2025

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Python 8,546 1,424 Updated Feb 5, 2025

Materials for learning SGLang

273 16 Updated Feb 6, 2025

A guidance language for controlling large language models.

Jupyter Notebook 19,694 1,077 Updated Feb 20, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 38,673 5,799 Updated Feb 21, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 10,356 1,003 Updated Feb 20, 2025

HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training

C++ 971 200 Updated Sep 22, 2024

富途 OpenAPI Python SDK

Python 1,059 222 Updated Aug 24, 2023

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 10,723 1,021 Updated Jan 22, 2025

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Jupyter Notebook 27,447 3,444 Updated Jul 23, 2024

Development repository for the Triton language and compiler

MLIR 14,523 1,802 Updated Feb 21, 2025

📺IPTV电视直播源更新项目『✨秒播级体验🚀』:支持IPv4/IPv6;支持自定义频道;支持本地源、组播源、酒店源、订阅源、关键字搜索;每天自动更新两次,结果可用于TVBox等播放软件;支持工作流、Docker(amd64/arm64/arm v7)、命令行、GUI运行方式 | IPTV live TV source update project

Python 13,569 3,679 Updated Feb 20, 2025

Codes & examples for "CUDA - From Correctness to Performance"

C++ 80 19 Updated Oct 24, 2024

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 525 39 Updated Feb 14, 2025

All Algorithms implemented in Python

Python 197,477 46,282 Updated Feb 20, 2025

Learn Low Level Design (LLD) and prepare for interviews using free resources.

Java 11,582 2,827 Updated Jan 4, 2025

A library for transfer learning by reusing parts of TensorFlow models.

Python 3,493 1,661 Updated Jan 17, 2025

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Python 31,357 2,897 Updated Feb 21, 2025

Tensor library for machine learning

C++ 11,910 1,137 Updated Feb 12, 2025

how to optimize some algorithm in cuda.

Cuda 1,906 165 Updated Feb 19, 2025

The C based gRPC (C++, Python, Ruby, Objective-C, PHP, C#)

C++ 42,474 10,658 Updated Feb 21, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 2,110 218 Updated Feb 20, 2025

LLM inference in C/C++

C++ 74,855 10,818 Updated Feb 20, 2025

NVIDIA CUDA plugin for XMRig miner

C++ 392 168 Updated Dec 18, 2024

☄🌌️ The minimal, blazing-fast, and infinitely customizable prompt for any shell!

Rust 47,113 2,044 Updated Feb 20, 2025

A developer reference project for creating Retrieval Augmented Generation (RAG) chatbots on Windows using TensorRT-LLM

TypeScript 2,902 381 Updated Aug 21, 2024
Next