-
Institute of Computing Technology, CAS
- https://tfruan2000.github.io/
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Starred repositories
DeepEP: an efficient expert-parallel communication library
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Goal: Enable awesome tooling for Bazel users of the C language family.
The book "Performance Analysis and Tuning on Modern CPU"
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA
FlashInfer: Kernel Library for LLM Serving
🌟 Wiki of OI / ICPC for everyone. (某大型游戏线上攻略,内含炫酷算术魔法)
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
A model compilation solution for various hardware
DeepSeek Coder: Let the Code Write Itself
📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, Parallelism, etc. 🎉🎉
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
openxla / triton
Forked from triton-lang/tritonFork of Triton repository for OpenXLA uses of the Triton language and compiler
A machine learning compiler for GPUs, CPUs, and ML accelerators
FlagPerf is an open-source software platform for benchmarking AI chips.
📚 C/C++ 技术面试基础知识总结,包括语言、程序库、数据结构、算法、系统、网络、链接装载库等知识及面试经验、招聘、内推等信息。This repository is a summary of the basic knowledge of recruiting job seekers and beginners in the direction of C/C++ technology, in…
Backward compatible ML compute opset inspired by HLO/MHLO
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
FlagGems is an operator library for large language models implemented in Triton Language.
C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)
An awesome & curated list of best LLMOps tools for developers
Development repository for the Triton-Linalg conversion