Stars
a benchmark to evaluate the situated inductive reasoning
Learn online intrinsic rewards from LLM feedback
Sandboxed code execution for AI agents, locally or on the cloud.
A lightweight task engine for building stateful AI agents that prioritizes simplicity and flexibility.
Benchmarking Agentic LLM and VLM Reasoning On Games
RAG that intelligently adapts to your use case, data, and queries
The first AI agent that builds third-party integrations through reverse engineering platforms' internal APIs.
Synthetic data derived by templating, few shot prompting, transformations on public domain corpora, and monte carlo tree search.
Long context evaluation for large language models
Code for Quiet-STaR
Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"
Playing Pokemon Red with Reinforcement Learning
Open-source framework for exporting your personal data.
rbren / OpenHands
Forked from All-Hands-AI/OpenHandsπ OpenDevin: Code Less, Make More
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery π§βπ¬
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.
Code for paper "Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent"
A comprehensive repository of reasoning tasks for LLMs (and beyond)
ποΈA research-friendly codebase for fast experimentation of single-agent reinforcement learning in JAX β’ End-to-End JAX RL