Stars
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
A library for advanced large language model reasoning
Retrieval-Augmented Theorem Provers for Lean
Llemma formal2formal (tactic prediction) theorem proving experiments
https://albertqjiang.github.io/Portal-to-ISAbelle/
The Paper List on Data Contamination for Large Language Models Evaluation.
Mix of Minimal Optimal Sets (MMOS) of dataset has two advantages for two aspects, higher performance and lower construction costs on math reasoning.
A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
800,000 step-level correctness labels on LLM solutions to MATH problems
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
PaL: Program-Aided Language Models (ICML 2023)
Train transformer language models with reinforcement learning.
Example models using DeepSpeed
ToRA is a series of Tool-integrated Reasoning LLM Agents designed to solve challenging mathematical reasoning problems by interacting with tools [ICLR'24].
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
GAOKAO-Bench is an evaluation framework that utilizes GAOKAO questions as a dataset to evaluate large language models.