-
Tsinghua University
- Beijing
-
19:17
(UTC +08:00) - https://jackory.github.io/
Lists (6)
Sort Name ascending (A-Z)
Stars
Understanding R1-Zero-Like Training: A Critical Perspective
Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".
[Support 0.48.x](Reset Cursor AI MachineID & Bypass Higher Token Limit) Cursor Ai ,自动重置机器ID , 免费升级使用Pro功能: You've reached your trial request limit. / Too many free trial accounts used on this machi…
A lightweight, powerful framework for multi-agent workflows
Recipes to train the self-rewarding reasoning LLMs.
Democratizing Reinforcement Learning for LLMs
This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
Official Repo for Open-Reasoner-Zero
Paper Reproduction Google SCoRE(Training Language Models to Self-Correct via Reinforcement Learning)
MR.Q is a general-purpose model-free reinforcement learning algorithm.
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
verl: Volcano Engine Reinforcement Learning for LLMs
Awesome lists about framework figures in papers
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
My learning notes/codes for ML SYS.
Fully open reproduction of DeepSeek-R1
NeurIPS 2024 tutorial on LLM Inference
Make websites accessible for AI agents
Repository for the paper Stream of Search: Learning to Search in Language
Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language - eko.fellou.ai
Automating the Search for Artificial Life with Foundation Models!
Clean single-file implementation of offline RL algorithms in JAX
Scalable RL solution for advanced reasoning of language models
Recipes to scale inference-time compute of open models