Skip to content
View 12kimih's full-sized avatar

Block or report 12kimih

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

Python 1,412 217 Updated Apr 14, 2025

Fully open reproduction of DeepSeek-R1

Python 23,921 2,183 Updated Apr 14, 2025
Python 23 6 Updated Mar 6, 2025

Robust recipes to align language models with human and AI preferences

Python 5,127 440 Updated Nov 21, 2024

Curated list of datasets and tools for post-training.

2,938 254 Updated Jan 29, 2025

A simple unified framework for evaluating LLMs

HTML 210 23 Updated Apr 11, 2025
Python 2,639 238 Updated Apr 14, 2025

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

Python 15,891 2,686 Updated Dec 18, 2024

ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)

Python 613 48 Updated Jan 20, 2025

A framework for few-shot evaluation of language models.

Python 8,629 2,303 Updated Apr 14, 2025

Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!

Jupyter Notebook 1,133 72 Updated Jan 7, 2025

Ongoing research training transformer models at scale

Python 12,073 2,704 Updated Apr 14, 2025

s1: Simple test-time scaling

Python 6,166 723 Updated Apr 4, 2025

High performance, multi-platform VNC client and server

C++ 5,883 1,028 Updated Apr 8, 2025

An educational resource to help anyone learn deep reinforcement learning.

Python 10,777 2,315 Updated Aug 5, 2024

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

Python 16,210 4,900 Updated Aug 1, 2024

800,000 step-level correctness labels on LLM solutions to MATH problems

Python 1,972 116 Updated Jun 1, 2023
Python 921 104 Updated Jan 23, 2025

A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you need to evaluate your own agents. See more in the blogpost:

HTML 67 12 Updated Dec 17, 2024

πŸ¦œπŸ”— Build context-aware reasoning applications

Jupyter Notebook 105,608 17,147 Updated Apr 14, 2025

Training Sparse Autoencoders on Language Models

Jupyter Notebook 717 163 Updated Apr 14, 2025

An open-source implementation of Scaling Laws for Neural Language Models using nanoGPT

Python 41 5 Updated Dec 8, 2023

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 πŸ“ and reasoning techniques.

6,646 365 Updated Apr 14, 2025
Jupyter Notebook 502 315 Updated Apr 12, 2025

A library for mechanistic interpretability of GPT-style language models

Python 2,053 366 Updated Apr 4, 2025

Modeling, training, eval, and inference code for OLMo

Python 5,492 591 Updated Apr 10, 2025

Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)

Python 87 3 Updated Dec 3, 2024

Go ahead and axolotl questions

Python 9,092 990 Updated Apr 14, 2025

Everything about the SmolLM2 and SmolVLM family of models

Python 2,171 127 Updated Mar 31, 2025

Minimalistic large language model 3D-parallelism training

Python 1,785 176 Updated Apr 14, 2025
Next