Lists (3)
Sort Name ascending (A-Z)
Stars
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
Fully open reproduction of DeepSeek-R1
Robust recipes to align language models with human and AI preferences
Curated list of datasets and tools for post-training.
WildEval / ZeroEval
Forked from allenai/WildBenchA simple unified framework for evaluating LLMs
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)
A framework for few-shot evaluation of language models.
Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!
Ongoing research training transformer models at scale
High performance, multi-platform VNC client and server
An educational resource to help anyone learn deep reinforcement learning.
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
800,000 step-level correctness labels on LLM solutions to MATH problems
A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you need to evaluate your own agents. See more in the blogpost:
π¦π Build context-aware reasoning applications
Training Sparse Autoencoders on Language Models
shehper / scaling_laws
Forked from karpathy/nanoGPTAn open-source implementation of Scaling Laws for Neural Language Models using nanoGPT
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 π and reasoning techniques.
A library for mechanistic interpretability of GPT-style language models
Modeling, training, eval, and inference code for OLMo
Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)
Everything about the SmolLM2 and SmolVLM family of models
Minimalistic large language model 3D-parallelism training