Stars
hkxIron / ReST-MCTS
Forked from THUDM/ReST-MCTSReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)
Bringing BERT into modernity via both architecture changes and scaling
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
RLHF implementation details of OAI's 2019 codebase
Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
The official Python SDK for Model Context Protocol servers and clients
A collection of notebooks/recipes showcasing some fun and effective ways of using Claude.
The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization
A series of technical report on Slow Thinking with LLM
Create beautiful, publication-quality books and documents from computational content.
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
An Open Large Reasoning Model for Real-World Solutions
Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)
This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains
A simple and well styled PPO implementation. Based on my Medium series: https://medium.com/@eyyu/coding-ppo-from-scratch-with-pytorch-part-1-4-613dfc1b14c8.
Example models using DeepSpeed
The mirror of RL_Coding_Exercise.