Skip to content
View XLzed's full-sized avatar

Block or report XLzed

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Pipeline Parallelism Emulation and Visualization

Python 36 3 Updated Apr 21, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python 5,368 598 Updated May 20, 2025

Enabling PyTorch on XLA Devices (e.g. Google TPU)

Python 2,606 527 Updated May 23, 2025

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 21,515 1,421 Updated May 22, 2025

Examples for Recommenders - easy to train and deploy on accelerated infrastructure.

C++ 36 10 Updated May 21, 2025

NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the effective training time by minimizing the downtime due to fa…

Python 164 24 Updated May 22, 2025

Distributed RL System for LLM Reasoning

Python 1,271 58 Updated May 21, 2025

Official Repo for Open-Reasoner-Zero

Python 1,927 99 Updated Apr 8, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 8,351 1,024 Updated May 23, 2025

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 8,911 886 Updated May 21, 2025

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python 2,781 295 Updated Mar 10, 2025

GPUDirect Async support for IB Verbs

C++ 114 16 Updated Nov 10, 2022

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C++ 1,100 159 Updated Mar 26, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 7,678 772 Updated May 23, 2025

FlashMLA: Efficient MLA decoding kernels

Cuda 11,564 835 Updated Apr 29, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,776 277 Updated May 15, 2025

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4…

Python 7,707 655 Updated May 23, 2025

An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & RFT & Dynamic Sampling & Async Agent RL)

Python 6,790 660 Updated May 23, 2025

Finetune Qwen3, Llama 4, TTS, DeepSeek-R1 & Gemma 3 LLMs 2x faster with 70% less memory! 🦥

Python 39,194 3,073 Updated May 22, 2025

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 7,568 660 Updated Feb 10, 2025
Python 3,848 360 Updated May 6, 2025

Open source platform for the machine learning lifecycle

Python 20,614 4,538 Updated May 23, 2025

A user-friendly launcher for Bazel.

Go 2,288 345 Updated May 20, 2025

DLRover: An Automatic Distributed Deep Learning System

Python 1,451 179 Updated May 22, 2025
C++ 513 82 Updated May 7, 2025

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Python 7,315 650 Updated May 31, 2024

GoogleTest - Google Testing and Mocking Framework

C++ 36,073 10,369 Updated May 22, 2025

Fast C++ logging library.

C++ 26,207 4,773 Updated May 12, 2025
Next