Pulse · open-compass/opencompass · GitHub

November 26, 2024 – December 26, 2024

Overview

45 Active pull requests

23 Active issues

2 Releases published by 1 person

0.3.7
published Dec 4, 2024
0.3.8
published Dec 17, 2024

41 Pull requests merged by 12 people

[Fix] Fix model summarizer abbr
#1789 merged Dec 27, 2024
[CI] Pypi deploy workflow update
#1786 merged Dec 27, 2024
[CI] Update deploy python version
#1784 merged Dec 27, 2024
[ci] remove daily step retry and update pr score
#1782 merged Dec 26, 2024
[Update] Volc status exception handle
#1780 merged Dec 26, 2024
[ci] remove testcase into volc engine
#1777 merged Dec 25, 2024
[Update] Update OC academic 202412
#1771 merged Dec 19, 2024
Customizable tokenizer for RULER
#1731 merged Dec 19, 2024
[Fix] Fix Local Runner Params Save Path
#1768 merged Dec 19, 2024
[Fix] Fix lark report is None
#1769 merged Dec 18, 2024
[ci] add fullbench testcase
#1766 merged Dec 18, 2024
[Fix] fix Order error
#1767 merged Dec 18, 2024
[Bump] Bump version to 0.3.8
#1765 merged Dec 17, 2024
[Update] Update requirement and deepseek configurations
#1764 merged Dec 17, 2024
[Fix] Fix vllm max_seq_len parameter transfer
#1745 merged Dec 16, 2024
[Feature] Added CompassArena-SubjectiveBench with Bradley-Terry Model
#1751 merged Dec 16, 2024
add new dataset summerizer
#1758 merged Dec 13, 2024
[ci] add common_summarizer return
#1724 merged Dec 11, 2024
[Fix] Fix ChineseSimpleQA max_out_len
#1757 merged Dec 11, 2024
[Update] Update dataset configurations with no max_out_len
#1754 merged Dec 11, 2024
Add Chinese SimpleQA config
#1697 merged Dec 11, 2024
[Feature] Add OC academic 2412
#1750 merged Dec 10, 2024
[Change] Change Compassarena metric
#1749 merged Dec 10, 2024
[Update] Update O1-style Benchmark and Prompts
#1742 merged Dec 9, 2024
[Update] Add MATH500 & AIME2024 to LiveMathBench
#1741 merged Dec 6, 2024
[Fix] Fix error in subjective default summarizer
#1740 merged Dec 6, 2024
[Update] Update Skywork/Qwen-QwQ
#1728 merged Dec 5, 2024
[Update] Update LiveMathBench Evaluation to Support Single Dataset Split Metric Computation
#1730 merged Dec 5, 2024
[Update] Update Manifest
#1738 merged Dec 5, 2024
[Update] Update init file for Korbench
#1737 merged Dec 5, 2024
KOR-BENCH readme supplementation
#1734 merged Dec 5, 2024
[Feature] DLC runner Lark report
#1735 merged Dec 4, 2024
[Bump] Bump version to 0.3.7
#1733 merged Dec 3, 2024
KOR-Bench
#1729 merged Dec 2, 2024
[Update] Update max_out_len for datasets
#1726 merged Dec 2, 2024
[Feature] Support LiveMathBench
#1727 merged Nov 29, 2024
[Fix] Update P-MMEVAL OSS data
#1722 merged Nov 28, 2024
[Feature] Add Openai Simpleqa dataset
#1720 merged Nov 28, 2024
[Fix] Fix pmmeval_gen config
#1719 merged Nov 28, 2024
[Feature] Add P-MMEval
#1714 merged Nov 27, 2024
[Update] Support Arc Prize Public Evaluation
#1690 merged Nov 27, 2024

4 Pull requests opened by 4 people

[Draft] Async pipeline
#1763 opened Dec 15, 2024
[Feature] Support G-Pass@k and LiveMathBench
#1772 opened Dec 20, 2024
[Feature] Support MMLU-CF Benchmark
#1775 opened Dec 24, 2024
[Update] Add Theorem QA 0shot CoT config
#1783 opened Dec 26, 2024

2 Issues closed by 2 people

[Bug] mbpp 数据量少了，只有500条；谷歌论文中是974条《Program Synthesis with Large Language Models》
#1770 closed Dec 19, 2024
为什么会有两个config路径分别为./config和./opencompass/config.好像第二个是起作用的为什么还要第一个？
#1739 closed Dec 5, 2024

21 Issues opened by 20 people

[Bug] gsm8k_gen_17d0dc-BOT思考提示语错误
#1788 opened Dec 27, 2024
[Bug] Please make sure `./data/safety.txt` is correct
#1787 opened Dec 27, 2024
[Bug] Academic Leaderboard 结果无法对齐，Summary只展示部分
#1779 opened Dec 25, 2024
[Feature] 是否打算支持mamba或者mamba2的测评吗
#1778 opened Dec 25, 2024
[Bug] torch.OutOfMemoryError
#1773 opened Dec 23, 2024
[Bug] 推理完成了，但是因为插件版本，在做测评的时候报错了，但是预测的哪些文件都在，怎么重新继续做预测？还有就是运行的时候默认好像是多有的数据集都会测试
#1761 opened Dec 14, 2024
[Bug] commonsenseqa_gen_c946f2中的commonsenseqa_datasets数据集报错 TypeError: 'ListWrapper' object is not iterable
#1760 opened Dec 14, 2024
[Bug] cannot use GPU when conduct evaluation
#1759 opened Dec 12, 2024
[Bug] 提示新写的Dataset未注册
#1756 opened Dec 11, 2024
[Bug] 无法多卡并行评测数据
#1755 opened Dec 11, 2024
[Bug] 自定义数据集，模型进行评测结果为0分
#1752 opened Dec 11, 2024
[Feature] Evaluate multiple metrics simultaneously
#1748 opened Dec 9, 2024
[Bug] NPU 运行报错 core dump
#1747 opened Dec 9, 2024
[Bug] 自定义模型评测配置
#1746 opened Dec 9, 2024
HumanEvalPlusEvaluator使用的evalplus版本是多少的
#1743 opened Dec 9, 2024
[Bug] Unsupported operand types in OpenAISDK
#1736 opened Dec 4, 2024
[Bug] 微调过的Qwen2.5-7b输出全是感叹号/The output of fine-tuning Qwen2.5-7b is '！！！！！！'
#1732 opened Dec 3, 2024
[Bug] 在无gpu的机器上执行case，运行时报错数据集未注册（其实已经注册）
#1725 opened Nov 29, 2024
[Bug] windows下数据集的位置以及检测结果为0.0
#1723 opened Nov 29, 2024
您好，请问L-Eval的主观题最终得分是使用rougeLsum这个分数吗？还有就是L-Eval数据集缺少了codeU和sci_fi有相关的评测配置文件么？
#1721 opened Nov 28, 2024
[Feature] 请问使用API评测如何支持自定义数据集？
#1718 opened Nov 28, 2024

3 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

[Bug] stop_at_stop_token 删除了生成的方法体导致没有评估结果
#1710 commented on Nov 27, 2024 • 0 new comments
[Bug] MBPP evaluator cannot extract the correct anwser
#1407 commented on Dec 12, 2024 • 0 new comments
internlm2-7B-base CMMLU评测结果异常
#1281 commented on Dec 27, 2024 • 0 new comments