hkxIron

Follow

hkxIron

Follow

4 followers · 6 following

Achievements

Achievements

Stars

microsoft / rStar

Python 356 28 Updated Jan 27, 2025

hkxIron / ReST-MCTS

Forked from THUDM/ReST-MCTS

ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)

Python 1 Updated Jan 20, 2025

AnswerDotAI / ModernBERT

Bringing BERT into modernity via both architecture changes and scaling

Python 1,139 73 Updated Jan 21, 2025

liguodongiot / llm-action

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

HTML 13,589 1,531 Updated Jan 15, 2025

vwxyzjn / lm-human-preference-details

RLHF implementation details of OAI's 2019 codebase

Python 172 9 Updated Jan 14, 2024

Lightning-AI / pytorch-lightning

Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.

Python 28,900 3,427 Updated Feb 3, 2025

modelcontextprotocol / python-sdk

The official Python SDK for Model Context Protocol servers and clients

Python 1,639 166 Updated Feb 4, 2025

anthropics / anthropic-cookbook

A collection of notebooks/recipes showcasing some fun and effective ways of using Claude.

Jupyter Notebook 10,165 1,127 Updated Jan 28, 2025

vwxyzjn / ppo-implementation-details

The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization

Python 679 101 Updated Mar 23, 2024

HFAiLab / hai-platform

一种任务级GPU算力分时调度的高性能深度学习训练平台

Python 419 57 Updated Oct 24, 2023

deepseek-ai / DeepSeek-V3

Python 72,856 11,295 Updated Feb 5, 2025

RUCAIBox / Slow_Thinking_with_LLMs

A series of technical report on Slow Thinking with LLM

Python 370 17 Updated Jan 26, 2025

jupyter-book / jupyter-book

Create beautiful, publication-quality books and documents from computational content.

Python 3,956 674 Updated Jan 31, 2025

THUDM / ReST-MCTS

ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)

Python 556 44 Updated Jan 20, 2025

OpenRLHF / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)

Python 4,247 414 Updated Feb 4, 2025

lqtrung1998 / mwp_ReFT

Python 454 56 Updated Jan 2, 2025

BrendanGraham14 / mcts-llm

Python 108 19 Updated Jun 18, 2024

allenai / open-instruct

Python 2,545 305 Updated Feb 5, 2025

AIDC-AI / Marco-o1

An Open Large Reasoning Model for Real-World Solutions

Python 1,420 75 Updated Nov 28, 2024

arogozhnikov / einops

Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)

Python 8,695 358 Updated Feb 4, 2025

mengwanglalala / RL-algorithms

RL algorithms

Python 140 29 Updated Mar 7, 2021

YuxiXie / MCTS-DPO

This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.

Jupyter Notebook 285 29 Updated Aug 6, 2024

OpenBMB / MiniCPM

MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.

Jupyter Notebook 7,153 468 Updated Nov 6, 2024

ezelikman / quiet-star

Code for Quiet-STaR

Python 710 89 Updated Aug 21, 2024

phymhan / llm-dpo

Python 8 3 Updated Jul 18, 2023

bklieger-groq / g1

g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains

Python 4,172 378 Updated Jan 27, 2025

ericyangyu / PPO-for-Beginners

A simple and well styled PPO implementation. Based on my Medium series: https://medium.com/@eyyu/coding-ppo-from-scratch-with-pytorch-part-1-4-613dfc1b14c8.

Python 848 123 Updated Oct 1, 2024

hkxIron / DeepSpeedExamples

Forked from deepspeedai/DeepSpeedExamples

Example models using DeepSpeed

Python 1 Updated Sep 17, 2024

SupermanCaozh / The_Coding_Foundation_in_Reinforcement_Learning

The mirror of RL_Coding_Exercise.

Python 66 5 Updated Sep 4, 2024

wangshusen / DRL

Deep Reinforcement Learning

3,503 601 Updated Dec 10, 2022