Skip to content
View hkxIron's full-sized avatar

Block or report hkxIron

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 356 28 Updated Jan 27, 2025

ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)

Python 1 Updated Jan 20, 2025

Bringing BERT into modernity via both architecture changes and scaling

Python 1,139 73 Updated Jan 21, 2025

本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)

HTML 13,589 1,531 Updated Jan 15, 2025

RLHF implementation details of OAI's 2019 codebase

Python 172 9 Updated Jan 14, 2024

Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.

Python 28,900 3,427 Updated Feb 3, 2025

The official Python SDK for Model Context Protocol servers and clients

Python 1,639 166 Updated Feb 4, 2025

A collection of notebooks/recipes showcasing some fun and effective ways of using Claude.

Jupyter Notebook 10,165 1,127 Updated Jan 28, 2025

The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization

Python 679 101 Updated Mar 23, 2024

一种任务级GPU算力分时调度的高性能深度学习训练平台

Python 419 57 Updated Oct 24, 2023

A series of technical report on Slow Thinking with LLM

Python 370 17 Updated Jan 26, 2025

Create beautiful, publication-quality books and documents from computational content.

Python 3,956 674 Updated Jan 31, 2025

ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)

Python 556 44 Updated Jan 20, 2025

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)

Python 4,247 414 Updated Feb 4, 2025
Python 454 56 Updated Jan 2, 2025
Python 108 19 Updated Jun 18, 2024
Python 2,545 305 Updated Feb 5, 2025

An Open Large Reasoning Model for Real-World Solutions

Python 1,420 75 Updated Nov 28, 2024

Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)

Python 8,695 358 Updated Feb 4, 2025

RL algorithms

Python 140 29 Updated Mar 7, 2021

This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.

Jupyter Notebook 285 29 Updated Aug 6, 2024

MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.

Jupyter Notebook 7,153 468 Updated Nov 6, 2024

Code for Quiet-STaR

Python 710 89 Updated Aug 21, 2024
Python 8 3 Updated Jul 18, 2023

g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains

Python 4,172 378 Updated Jan 27, 2025

A simple and well styled PPO implementation. Based on my Medium series: https://medium.com/@eyyu/coding-ppo-from-scratch-with-pytorch-part-1-4-613dfc1b14c8.

Python 848 123 Updated Oct 1, 2024

Example models using DeepSpeed

Python 1 Updated Sep 17, 2024

The mirror of RL_Coding_Exercise.

Python 66 5 Updated Sep 4, 2024

Deep Reinforcement Learning

3,503 601 Updated Dec 10, 2022
Next