-
Notifications
You must be signed in to change notification settings - Fork 463
Insights: open-compass/opencompass
November 26, 2024 – December 26, 2024
Overview
Could not load contribution data
Please try again later
41 Pull requests merged by 12 people
-
[Fix] Fix model summarizer abbr
#1789 merged
Dec 27, 2024 -
[CI] Pypi deploy workflow update
#1786 merged
Dec 27, 2024 -
[CI] Update deploy python version
#1784 merged
Dec 27, 2024 -
[ci] remove daily step retry and update pr score
#1782 merged
Dec 26, 2024 -
[Update] Volc status exception handle
#1780 merged
Dec 26, 2024 -
[ci] remove testcase into volc engine
#1777 merged
Dec 25, 2024 -
[Update] Update OC academic 202412
#1771 merged
Dec 19, 2024 -
Customizable tokenizer for RULER
#1731 merged
Dec 19, 2024 -
[Fix] Fix Local Runner Params Save Path
#1768 merged
Dec 19, 2024 -
[Fix] Fix lark report is None
#1769 merged
Dec 18, 2024 -
[ci] add fullbench testcase
#1766 merged
Dec 18, 2024 -
[Fix] fix Order error
#1767 merged
Dec 18, 2024 -
[Bump] Bump version to 0.3.8
#1765 merged
Dec 17, 2024 -
[Update] Update requirement and deepseek configurations
#1764 merged
Dec 17, 2024 -
[Fix] Fix vllm max_seq_len parameter transfer
#1745 merged
Dec 16, 2024 -
[Feature] Added CompassArena-SubjectiveBench with Bradley-Terry Model
#1751 merged
Dec 16, 2024 -
add new dataset summerizer
#1758 merged
Dec 13, 2024 -
[ci] add common_summarizer return
#1724 merged
Dec 11, 2024 -
[Fix] Fix ChineseSimpleQA max_out_len
#1757 merged
Dec 11, 2024 -
[Update] Update dataset configurations with no max_out_len
#1754 merged
Dec 11, 2024 -
Add Chinese SimpleQA config
#1697 merged
Dec 11, 2024 -
[Feature] Add OC academic 2412
#1750 merged
Dec 10, 2024 -
[Change] Change Compassarena metric
#1749 merged
Dec 10, 2024 -
[Update] Update O1-style Benchmark and Prompts
#1742 merged
Dec 9, 2024 -
[Update] Add MATH500 & AIME2024 to LiveMathBench
#1741 merged
Dec 6, 2024 -
[Fix] Fix error in subjective default summarizer
#1740 merged
Dec 6, 2024 -
[Update] Update Skywork/Qwen-QwQ
#1728 merged
Dec 5, 2024 -
[Update] Update LiveMathBench Evaluation to Support Single Dataset Split Metric Computation
#1730 merged
Dec 5, 2024 -
[Update] Update Manifest
#1738 merged
Dec 5, 2024 -
[Update] Update init file for Korbench
#1737 merged
Dec 5, 2024 -
KOR-BENCH readme supplementation
#1734 merged
Dec 5, 2024 -
[Feature] DLC runner Lark report
#1735 merged
Dec 4, 2024 -
[Bump] Bump version to 0.3.7
#1733 merged
Dec 3, 2024 -
KOR-Bench
#1729 merged
Dec 2, 2024 -
[Update] Update max_out_len for datasets
#1726 merged
Dec 2, 2024 -
[Feature] Support LiveMathBench
#1727 merged
Nov 29, 2024 -
[Fix] Update P-MMEVAL OSS data
#1722 merged
Nov 28, 2024 -
[Feature] Add Openai Simpleqa dataset
#1720 merged
Nov 28, 2024 -
[Fix] Fix pmmeval_gen config
#1719 merged
Nov 28, 2024 -
[Feature] Add P-MMEval
#1714 merged
Nov 27, 2024 -
[Update] Support Arc Prize Public Evaluation
#1690 merged
Nov 27, 2024
4 Pull requests opened by 4 people
-
[Draft] Async pipeline
#1763 opened
Dec 15, 2024 -
[Feature] Support G-Pass@k and LiveMathBench
#1772 opened
Dec 20, 2024 -
[Feature] Support MMLU-CF Benchmark
#1775 opened
Dec 24, 2024 -
[Update] Add Theorem QA 0shot CoT config
#1783 opened
Dec 26, 2024
2 Issues closed by 2 people
-
[Bug] mbpp 数据量少了,只有500条;谷歌论文中是974条《Program Synthesis with Large Language Models》
#1770 closed
Dec 19, 2024 -
为什么会有两个config路径分别为./config和./opencompass/config.好像第二个是起作用的为什么还要第一个?
#1739 closed
Dec 5, 2024
21 Issues opened by 20 people
-
[Bug] gsm8k_gen_17d0dc-BOT思考提示语错误
#1788 opened
Dec 27, 2024 -
[Bug] Please make sure `./data/safety.txt` is correct
#1787 opened
Dec 27, 2024 -
[Bug] Academic Leaderboard 结果无法对齐,Summary只展示部分
#1779 opened
Dec 25, 2024 -
[Feature] 是否打算支持mamba或者mamba2的测评吗
#1778 opened
Dec 25, 2024 -
[Bug] torch.OutOfMemoryError
#1773 opened
Dec 23, 2024 -
[Bug] 推理完成了,但是因为插件版本,在做测评的时候报错了,但是预测的哪些文件都在,怎么重新继续做预测?还有就是运行的时候默认好像是多有的数据集都会测试
#1761 opened
Dec 14, 2024 -
[Bug] cannot use GPU when conduct evaluation
#1759 opened
Dec 12, 2024 -
[Bug] 提示新写的Dataset未注册
#1756 opened
Dec 11, 2024 -
[Bug] 无法多卡并行评测数据
#1755 opened
Dec 11, 2024 -
[Bug] 自定义数据集,模型进行评测结果为0分
#1752 opened
Dec 11, 2024 -
[Feature] Evaluate multiple metrics simultaneously
#1748 opened
Dec 9, 2024 -
[Bug] NPU 运行报错 core dump
#1747 opened
Dec 9, 2024 -
[Bug] 自定义模型评测配置
#1746 opened
Dec 9, 2024 -
HumanEvalPlusEvaluator使用的evalplus版本是多少的
#1743 opened
Dec 9, 2024 -
[Bug] Unsupported operand types in OpenAISDK
#1736 opened
Dec 4, 2024 -
[Bug] 微调过的Qwen2.5-7b输出全是感叹号/The output of fine-tuning Qwen2.5-7b is '!!!!!!'
#1732 opened
Dec 3, 2024 -
[Bug] 在无gpu的机器上执行case,运行时报错数据集未注册(其实已经注册)
#1725 opened
Nov 29, 2024 -
[Bug] windows下数据集的位置以及检测结果为0.0
#1723 opened
Nov 29, 2024 -
您好,请问L-Eval的主观题最终得分是使用rougeLsum这个分数吗?还有就是L-Eval数据集缺少了codeU和sci_fi有相关的评测配置文件么?
#1721 opened
Nov 28, 2024 -
[Feature] 请问使用API评测如何支持自定义数据集?
#1718 opened
Nov 28, 2024
3 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[Bug] stop_at_stop_token 删除了生成的方法体导致没有评估结果
#1710 commented on
Nov 27, 2024 • 0 new comments -
[Bug] MBPP evaluator cannot extract the correct anwser
#1407 commented on
Dec 12, 2024 • 0 new comments -
internlm2-7B-base CMMLU评测结果异常
#1281 commented on
Dec 27, 2024 • 0 new comments