Skip to content
View xwhzz's full-sized avatar
:electron:
:electron:
  • 05:27 (UTC -12:00)

Highlights

  • Pro

Block or report xwhzz

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

A Python framework for high performance GPU simulation and graphics

Python 4,494 258 Updated Feb 1, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 186 14 Updated Feb 2, 2025

🚀🚀 「大模型」3小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 3 hours!

Python 7,330 757 Updated Dec 13, 2024

This is the Rust course used by the Android team at Google. It provides you the material to quickly teach Rust.

Rust 28,909 1,725 Updated Feb 1, 2025

An open-source C++ library developed and used at Facebook.

C++ 28,931 5,638 Updated Feb 2, 2025

Efficient, Flexible and Portable Structured Generation

C++ 633 38 Updated Jan 31, 2025

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 504 39 Updated Feb 2, 2025

Examples for using ONNX Runtime for model training.

C# 325 63 Updated Oct 23, 2024

Blazingly fast LLM inference.

Rust 4,924 342 Updated Feb 2, 2025

Large Language Model (LLM) Systems Paper List

755 26 Updated Jan 27, 2025

A retargetable MLIR-based machine learning compiler and runtime toolkit.

C++ 2,958 647 Updated Feb 2, 2025

OCR, layout analysis, reading order, table recognition in 90+ languages

Python 16,017 1,026 Updated Jan 31, 2025

Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.

Python 18,391 1,929 Updated Oct 15, 2024

Guidelines Support Library

C++ 6,291 743 Updated Jan 7, 2025

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 9,276 1,089 Updated Feb 2, 2025

GPU programming related news and material links

1,349 80 Updated Jan 6, 2025

Material for gpu-mode lectures

Jupyter Notebook 3,611 366 Updated Jan 6, 2025

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 2,190 230 Updated Jan 27, 2025

Solve puzzles. Learn CUDA.

Jupyter Notebook 10,431 804 Updated Sep 1, 2024

Code samples used on cloud.google.com

Jupyter Notebook 7,568 6,471 Updated Jan 31, 2025

LLM inference in C/C++

C++ 72,714 10,475 Updated Feb 2, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 1,910 192 Updated Feb 1, 2025

A throughput-oriented high-performance serving framework for LLMs

Cuda 715 29 Updated Sep 21, 2024

Lightning-fast serving engine for any AI model of any size. Flexible. Easy. Enterprise-scale.

Python 2,807 181 Updated Jan 21, 2025

A curated list of Rust code and resources.

Rust 48,450 2,833 Updated Jan 19, 2025

⏰ Collaboratively track deadlines of conferences recommended by CCF (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~

Vue 6,812 467 Updated Feb 1, 2025

llama3 implementation one matrix multiplication at a time

Jupyter Notebook 14,083 1,155 Updated May 23, 2024

A minimal GPU design in Verilog to learn how GPUs work from the ground up

SystemVerilog 7,382 571 Updated Aug 18, 2024

CUDA Templates for Linear Algebra Subroutines

C++ 6,104 1,053 Updated Feb 2, 2025
Next