Lists (1)
Sort Name ascending (A-Z)
Stars
Lightweight coding agent that runs in your terminal
OpenPipe ART (Agent Reinforcement Trainer): train LLM agents
huggingface / yourbench
Forked from sumukshashidhar/yourbench🤗 Benchmark Large Language Models Reliably On Your Data
A collection of prompts to challenge the reasoning abilities of large language models in presence of misguiding information
My neovim configs (yes I use neovim instead of tmux and it is good 😱)
Performant, batteries-included completion plugin for Neovim
Exploring Applications of GRPO
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
Clue inspired puzzles for testing LLM deduction abilities
Train your own SOTA deductive reasoning model
Letting Claude Code develop his own MCP tools :)
Verdict is a library for scaling judge-time compute.
A framework for pitting LLMs against each other in an evolving library of games ⚔
A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning
qpwo / dsv3-lowmem
Forked from deepseek-ai/DeepSeek-V3run deepseek v3 on a single node. Drops unused experts from memory.
A user-friendly, feature-rich UI enhancing interaction with Anthropic's Claude AI, enabling model selection, chat saving, and improved prompt editing.
A native Jupyter notebook frontend with local + remote kernels, reactive cells, and IDE features, implemented in Rust