XLzed

Follow

XLzed

Follow

2 followers · 6 following

Achievements

Achievements

Starred repositories

Victarry / PP-Schedule-Visualization

Pipeline Parallelism Emulation and Visualization

Python 36 3 Updated Apr 21, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python 5,368 598 Updated May 20, 2025

pytorch / xla

Enabling PyTorch on XLA Devices (e.g. Google TPU)

Python 2,606 527 Updated May 23, 2025

QwenLM / Qwen3

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 21,515 1,421 Updated May 22, 2025

NVIDIA / recsys-examples

Examples for Recommenders - easy to train and deploy on accelerated infrastructure.

C++ 36 10 Updated May 21, 2025

NVIDIA / nvidia-resiliency-ext

NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the effective training time by minimizing the downtime due to fa…

Python 164 24 Updated May 22, 2025

inclusionAI / AReaL

Distributed RL System for LLM Reasoning

Python 1,271 58 Updated May 21, 2025

Open-Reasoner-Zero / Open-Reasoner-Zero

Official Repo for Open-Reasoner-Zero

Python 1,927 99 Updated Apr 8, 2025

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 8,351 1,024 Updated May 23, 2025

deepseek-ai / 3FS

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 8,911 886 Updated May 21, 2025

deepseek-ai / DualPipe

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python 2,781 295 Updated Mar 10, 2025

gpudirect / libgdsync

GPUDirect Async support for IB Verbs

C++ 114 16 Updated Nov 10, 2022

NVIDIA / gdrcopy

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C++ 1,100 159 Updated Mar 26, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 7,678 772 Updated May 23, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient MLA decoding kernels

Cuda 11,564 835 Updated Apr 29, 2025

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,776 277 Updated May 15, 2025

modelscope / ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4…

Python 7,707 655 Updated May 23, 2025

OpenRLHF / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & RFT & Dynamic Sampling & Async Agent RL)

Python 6,790 660 Updated May 23, 2025

unslothai / unsloth

Finetune Qwen3, Llama 4, TTS, DeepSeek-R1 & Gemma 3 LLMs 2x faster with 70% less memory! 🦥

Python 39,194 3,073 Updated May 22, 2025

deepseek-ai / DeepSeek-R1

89,407 11,555 Updated Apr 9, 2025

Ucas-HaoranWei / GOT-OCR2.0

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 7,568 660 Updated Feb 10, 2025

LLaVA-VL / LLaVA-NeXT

Python 3,848 360 Updated May 6, 2025

mlflow / mlflow

Open source platform for the machine learning lifecycle

Python 20,614 4,538 Updated May 23, 2025

bazelbuild / bazelisk

A user-friendly launcher for Bazel.

Go 2,288 345 Updated May 20, 2025

intelligent-machine-learning / dlrover

DLRover: An Automatic Distributed Deep Learning System

Python 1,451 179 Updated May 22, 2025

aliyun / SimAI

C++ 513 82 Updated May 7, 2025

deepseek-ai / DeepSeek-V3

Python 97,033 15,780 Updated Apr 9, 2025

facebookresearch / DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Python 7,315 650 Updated May 31, 2024

google / googletest

GoogleTest - Google Testing and Mocking Framework

C++ 36,073 10,369 Updated May 22, 2025

gabime / spdlog

Fast C++ logging library.

C++ 26,207 4,773 Updated May 12, 2025

Starred topics

Machine learning