
- Beijing, China
-
05:52
(UTC +08:00) - haozheji.github.io
- @HaozJi
Highlights
- Pro
Stars
DeepEP: an efficient expert-parallel communication library
Minimal RLHF implementation built on top of minGPT.
Create Epic Math and Physics Animations From Text.
verl: Volcano Engine Reinforcement Learning for LLMs
This repository includes some detailed proofs of "Bias Variance Decomposition for KL Divergence".
Scalable toolkit for efficient model alignment
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & LoRA & vLLM & RFT)
Repository for "Generative Flow Networks as Entropy-Regularized RL" (AISTATS-2024, Oral)
[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
An in-browser, local-first Markdown resume builder.
A PowerPoint add-in to insert LaTeX equations into PowerPoint presentations on Windows and Mac
Oh my tmux! My self-contained, pretty & versatile tmux configuration made with 💛🩷💙🖤❤️🤍
Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.
ICLR2023 - Tailoring Language Generation Models under Total Variation Distance
Some preliminary explorations of Mamba's context scaling.
Easy TOC creation for GitHub README.md
Example models using DeepSpeed
A high-throughput and memory-efficient inference and serving engine for LLMs
Robust recipes to align language models with human and AI preferences
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
Reference implementation for DPO (Direct Preference Optimization)
DSPy: The framework for programming—not prompting—language models
A curated list of reinforcement learning with human feedback resources (continually updated)