Skip to content
View tfruan2000's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report tfruan2000

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

76 results for source starred repositories
Clear filter

DeepEP: an efficient expert-parallel communication library

Cuda 4,782 304 Updated Feb 25, 2025

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 9,521 1,113 Updated Feb 25, 2025

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

C++ 11,234 2,158 Updated Feb 1, 2025

Goal: Enable awesome tooling for Bazel users of the C language family.

Python 748 128 Updated Oct 8, 2024

Tenstorrent MLIR compiler

C++ 92 16 Updated Feb 25, 2025

The book "Performance Analysis and Tuning on Modern CPU"

TeX 2,829 198 Updated Feb 20, 2025

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Python 151 6 Updated Oct 30, 2024

A PyTorch Native LLM Training Framework

Python 734 41 Updated Dec 27, 2024

Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA

C++ 748 44 Updated Feb 25, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 2,173 224 Updated Feb 24, 2025

MLIR-based partitioning system

C++ 67 15 Updated Feb 25, 2025

🌟 Wiki of OI / ICPC for everyone. (某大型游戏线上攻略,内含炫酷算术魔法)

TypeScript 22,211 4,108 Updated Feb 25, 2025

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

C++ 846 165 Updated Dec 30, 2024
MLIR 405 72 Updated Feb 25, 2025

A model compilation solution for various hardware

MLIR 406 44 Updated Feb 20, 2025

DeepSeek Coder: Let the Code Write Itself

Python 20,462 2,280 Updated May 21, 2024

📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, Parallelism, etc. 🎉🎉

3,504 240 Updated Feb 24, 2025

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 2,457 255 Updated Feb 24, 2025

A machine learning compiler for GPUs, CPUs, and ML accelerators

C++ 2,976 509 Updated Feb 25, 2025

FlagPerf is an open-source software platform for benchmarking AI chips.

Python 324 109 Updated Feb 6, 2025

📚 C/C++ 技术面试基础知识总结,包括语言、程序库、数据结构、算法、系统、网络、链接装载库等知识及面试经验、招聘、内推等信息。This repository is a summary of the basic knowledge of recruiting job seekers and beginners in the direction of C/C++ technology, in…

C++ 35,495 8,023 Updated Mar 19, 2024

Backward compatible ML compute opset inspired by HLO/MHLO

MLIR 446 126 Updated Feb 25, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,631 161 Updated Feb 23, 2025

FlagGems is an operator library for large language models implemented in Triton Language.

Python 426 65 Updated Feb 25, 2025

The Mojo Programming Language

Mojo 23,744 2,590 Updated Feb 25, 2025

C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)

C++ 2,967 335 Updated Jul 31, 2024

An awesome & curated list of best LLMOps tools for developers

Shell 4,455 428 Updated Feb 11, 2025

Development repository for the Triton-Linalg conversion

C++ 175 17 Updated Feb 7, 2025

The ultimate Vim configuration (vimrc)

Vim Script 31,041 7,314 Updated Oct 6, 2024
Next