A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour

Jupyter Notebook 36,594 5,298 Updated Jan 1, 2025

deepseek-ai / DeepSeek-Math

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Python 891 59 Updated Apr 15, 2024

cyzhh / ZZ-Math

ZZ-Math for solving math problem(TAL-SCQ)

Python 3 Updated Feb 26, 2024

EleutherAI / math-lm

Python 1,065 85 Updated Mar 12, 2024

openai / lean-gym

Lean 178 33 Updated Jan 23, 2023

openai / miniF2F

Formal to Formal Mathematics Benchmark

Objective-C++ 326 45 Updated Aug 16, 2023

google-deepmind / alphageometry

Python 4,236 481 Updated Oct 25, 2024

openai / prm800k

800,000 step-level correctness labels on LLM solutions to MATH problems

Python 1,773 104 Updated Jun 1, 2023

huggingface / accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

Python 8,118 999 Updated Dec 24, 2024

reasoning-machines / pal

PaL: Program-Aided Language Models (ICML 2023)

Python 480 60 Updated Jun 30, 2023

huggingface / trl

Train transformer language models with reinforcement learning.

Python 10,479 1,352 Updated Dec 29, 2024

microsoft / DeepSpeedExamples

Example models using DeepSpeed

Python 6,182 1,058 Updated Dec 24, 2024

microsoft / ToRA

ToRA is a series of Tool-integrated Reasoning LLM Agents designed to solve challenging mathematical reasoning problems by interacting with tools [ICLR'24].

Python 1,005 71 Updated Feb 22, 2024

openai / grade-school-math

Python 1,129 158 Updated Jan 21, 2024

open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Python 4,397 466 Updated Dec 31, 2024

OpenLMLab / GAOKAO-Bench

GAOKAO-Bench is an evaluation framework that utilizes GAOKAO questions as a dataset to evaluate large language models.

Python 574 42 Updated Mar 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zui Chen Zui-C

Highlights

Block or report Zui-C

Stars

hiyouga / LLaMA-Factory

google-research / deduplicate-text-datasets

Zui-C / MMOS-F2F

maitrix-org / llm-reasoners

lean-dojo / ReProver

wellecks / llemma_formal2formal

albertqjiang / Portal-to-ISAbelle

LuckyyySTA / Awesome-LLM-hallucination

lyy1994 / awesome-data-contamination

pytorch / torchtune

openai / transformer-debugger

openai / simple-evals

cyzhh / MMOS

thinkall / autogen-demos

microsoft / autogen