-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
36 changed files
with
412 additions
and
30 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,43 @@ | ||
|
||
# bigbenchhard | ||
+ **source**: github | ||
+ **url**: [https://github.com/suzgunmirac/BIG-Bench-Hard](https://github.com/suzgunmirac/BIG-Bench-Hard) | ||
- BIG-Bench의 200개 Task중, 모델이 Human Rater를 넘지 못한 Task 23개로 구성 | ||
- 23개의 Task 및 27개의 Sub-Task로 구성 | ||
- Few-shot을 사용했을 때, 모델들은 좋은 퍼포먼스를 보임 | ||
- Few-shot보다 더 좋은 프롬프트를 평가하기 위해 CoT Reasoning을 포함하여 구성 | ||
- [CoT Prompt](https://github.com/suzgunmirac/BIG-Bench-Hard/tree/main/cot-prompts) | ||
--- | ||
+ **source**: huggingface | ||
+ **hf_path**: maveriq/bigbenchhard | ||
+ **hf_name**: | ||
<details> | ||
<summary>Click</summary> | ||
<div> - <code>boolean_expressions</code></div> | ||
<div> - <code>causal_judgement</code></div> | ||
<div> - <code>date_understanding</code></div> | ||
<div> - <code>disambiguation_qa</code></div> | ||
<div> - <code>dyck_languages</code></div> | ||
<div> - <code>formal_fallacies</code></div> | ||
<div> - <code>geometric_shapes</code></div> | ||
<div> - <code>hyperbaton</code></div> | ||
<div> - <code>logical_deduction_five_objects</code></div> | ||
<div> - <code>logical_deduction_seven_objects</code></div> | ||
<div> - <code>logical_deduction_three_objects</code></div> | ||
<div> - <code>movie_recommendation</code></div> | ||
<div> - <code>multistep_arithmetic_two</code></div> | ||
<div> - <code>navigate</code></div> | ||
<div> - <code>object_counting</code></div> | ||
<div> - <code>penguins_in_a_table</code></div> | ||
<div> - <code>reasoning_about_colored_objects</code></div> | ||
<div> - <code>ruin_names</code></div> | ||
<div> - <code>salient_translation_error_detection</code></div> | ||
<div> - <code>snarks</code></div> | ||
<div> - <code>sports_understanding</code></div> | ||
<div> - <code>temporal_sequences</code></div> | ||
<div> - <code>tracking_shuffled_objects_five_objects</code></div> | ||
<div> - <code>tracking_shuffled_objects_seven_objects</code></div> | ||
<div> - <code>tracking_shuffled_objects_three_objects</code></div> | ||
<div> - <code>web_of_lies</code></div> | ||
<div> - <code>word_sorting</code></div> | ||
</details> | ||
|
||
+ **url**: [https://huggingface.co/datasets/maveriq/bigbenchhard](https://huggingface.co/datasets/maveriq/bigbenchhard) | ||
+ **paper**: [https://arxiv.org/pdf/2210.09261](https://arxiv.org/pdf/2210.09261) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,23 @@ | ||
|
||
# gpqa | ||
- Graduate-Level Google-Proof Q&A benchmark | ||
- 전문가들이 만든 생물학, 물리학, 화학 도메인의 448개의 데이터셋 | ||
- 전문가는 67%, 비전문가는 34%의 정답율을 보이는 극도로 난이도가 높은 벤치마크 | ||
- LLM이 인간 범주를 넘는다는 기반하에 사람도 풀기 어려운 벤치마크로 구성함 | ||
- 총 3개의 데이터셋 제공 | ||
- `GPQA Extended` : 데이터셋 풀버젼(564개) | ||
- `GPQA` : 검수과정에서 전문가는 모두 틀리고, 비전문가는 모두 맞춘 데이터는 제거한 메인 데이터셋(448개) | ||
- `GPQA Diamond` : 전문가는 모두 맞추고, 일반인은 모두 틀린 가장 질높은 데이터셋 (198개) | ||
--- | ||
+ **source**: huggingface | ||
+ **hf_path**: Idavidrein/gpqa | ||
+ **hf_name**: | ||
<details> | ||
<summary>Click</summary> | ||
<div> - <code>gpqa_extended</code></div> | ||
<div> - <code>gpqa_main</code></div> | ||
<div> - <code>gpqa_diamond</code></div> | ||
<div> - <code>gpqa_experts</code></div> | ||
</details> | ||
|
||
+ **url**: [https://huggingface.co/datasets/Idavidrein/gpqa](https://huggingface.co/datasets/Idavidrein/gpqa) | ||
+ **paper**: [https://arxiv.org/pdf/2311.12022](https://arxiv.org/pdf/2311.12022) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,17 @@ | ||
|
||
# gsm8k | ||
- Multi Step Mathematical Reasoning 벤치마크 | ||
- 초등학교 수준의 수학 문제로, 8.5K 크기의 데이터셋으로 구성 | ||
- 수학적 기호를 자제하고 자연어로 구어체식 문답법으로 구성됨 | ||
- 모델이 Multi-Step Reasoning 중 잘못된 `generate`를 통해 잘못된 방향으로 나가는 것을 포착하기 위한 벤치마크임 | ||
--- | ||
+ **source**: huggingface | ||
+ **hf_path**: openai/gsm8k | ||
+ **hf_name**: | ||
<details> | ||
<summary>Click</summary> | ||
<div> - <code>main</code></div> | ||
<div> - <code>socratic</code></div> | ||
</details> | ||
|
||
+ **url**: [https://huggingface.co/datasets/openai/gsm8k](https://huggingface.co/datasets/openai/gsm8k) | ||
+ **paper**: [https://arxiv.org/pdf/2110.14168](https://arxiv.org/pdf/2110.14168) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,33 @@ | ||
|
||
# haerae | ||
- 수학, 논리 추론에 중점을 둔 기존 벤치마크와 달리, 한국어의 문화적 특성에 중점을 둠 | ||
- 한국어 도메인에 특화된 1500개의 질문셋으로 구성 | ||
- 6개의 서브카테고리 포함 | ||
- 외래어(LW) | ||
- 표준명명법(SN) | ||
- 회귀어(RW) | ||
- 일반지식(GK) : 일반 상식은 전통, 법, k-pop, k-drama 등으로 구성 | ||
- 역사(HI) | ||
- 독해력(RC) | ||
--- | ||
+ **source**: huggingface | ||
+ **hf_path**: HAERAE-HUB/HAE_RAE_BENCH_1.1 | ||
+ **hf_name**: | ||
<details> | ||
<summary>Click</summary> | ||
<div> - <code>correct_definition_matching</code></div> | ||
<div> - <code>csat_geo</code></div> | ||
<div> - <code>csat_law</code></div> | ||
<div> - <code>csat_socio</code></div> | ||
<div> - <code>date_understanding</code></div> | ||
<div> - <code>general_knowledge</code></div> | ||
<div> - <code>history</code></div> | ||
<div> - <code>loan_words</code></div> | ||
<div> - <code>lyrics_denoising</code></div> | ||
<div> - <code>proverbs_denoising</code></div> | ||
<div> - <code>rare_words</code></div> | ||
<div> - <code>standard_nomenclature</code></div> | ||
<div> - <code>reading_comprehension</code></div> | ||
</details> | ||
|
||
+ **url**: [https://huggingface.co/datasets/HAERAE-HUB/HAE_RAE_BENCH_1.1](https://huggingface.co/datasets/HAERAE-HUB/HAE_RAE_BENCH_1.1) | ||
+ **paper**: [https://arxiv.org/pdf/2309.02706](https://arxiv.org/pdf/2309.02706) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,13 @@ | ||
|
||
# hellaswag | ||
+ **source**: github | ||
+ **url**: [https://github.com/rowanz/hellaswag](https://github.com/rowanz/hellaswag) | ||
- SWAG를 베이스로 하는 NLI 벤치마크 | ||
- HellaSWAG는 Input으로 문장이 들어오면, 다음에 발생할 상황을 고르는 문제 | ||
- SWAG는 비디오 자막이 주어지면 다음에 발생할 상황 4가지중 하나를 고르는 벤치마크 | ||
- 데이터셋은 ActivityNet과 WikiHow를 사용 | ||
- SWAG는 ActivityNet, LSMDC 데이터셋을 사용하지만 HellaSWAG는 ActivityNet만 사용 | ||
- WikiHow를 통해 CommonSense Reasoning을 측정 | ||
- Adversarial Filtering(AF)를 통해 좀더 그럴듯한 오답지를 생성하여 벤치마크 난이도를 올림 | ||
--- | ||
+ **source**: huggingface | ||
+ **hf_path**: Rowan/hellaswag | ||
+ **url**: [https://huggingface.co/datasets/Rowan/hellaswag](https://huggingface.co/datasets/Rowan/hellaswag) | ||
+ **paper**: [https://arxiv.org/pdf/1905.07830](https://arxiv.org/pdf/1905.07830) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,60 @@ | ||
|
||
# kmmlu | ||
- 인문학, STEM, 응용과학에 이르는 45개 Task의 한국어 MCQA 벤치마크 | ||
- 기계번역은 부자연스러워서 데이터셋의 질을 떨어뜨림 | ||
- 또한, MMLU 등의 영어권 벤치마크는 문화적 차이가 있음(미국 불문법 vs 한국 성문법) | ||
- 공무원시험, 한국 자격시험, 수능 등에서 발췌함 | ||
--- | ||
+ **source**: huggingface | ||
+ **hf_path**: HAERAE-HUB/KMMLU | ||
+ **hf_name**: | ||
<details> | ||
<summary>Click</summary> | ||
<div> - <code>Accounting</code></div> | ||
<div> - <code>Agricultural-Sciences</code></div> | ||
<div> - <code>Aviation-Engineering-and-Maintenance</code></div> | ||
<div> - <code>Biology</code></div> | ||
<div> - <code>Chemical-Engineering</code></div> | ||
<div> - <code>Chemistry</code></div> | ||
<div> - <code>Civil-Engineering</code></div> | ||
<div> - <code>Computer-Science</code></div> | ||
<div> - <code>Construction</code></div> | ||
<div> - <code>Criminal-Law</code></div> | ||
<div> - <code>Ecology</code></div> | ||
<div> - <code>Economics</code></div> | ||
<div> - <code>Education</code></div> | ||
<div> - <code>Electrical-Engineering</code></div> | ||
<div> - <code>Electronics-Engineering</code></div> | ||
<div> - <code>Energy-Management</code></div> | ||
<div> - <code>Environmental-Science</code></div> | ||
<div> - <code>Fashion</code></div> | ||
<div> - <code>Food-Processing</code></div> | ||
<div> - <code>Gas-Technology-and-Engineering</code></div> | ||
<div> - <code>Geomatics</code></div> | ||
<div> - <code>Health</code></div> | ||
<div> - <code>Industrial-Engineer</code></div> | ||
<div> - <code>Information-Technology</code></div> | ||
<div> - <code>Interior-Architecture-and-Design</code></div> | ||
<div> - <code>Law</code></div> | ||
<div> - <code>Machine-Design-and-Manufacturing</code></div> | ||
<div> - <code>Management</code></div> | ||
<div> - <code>Maritime-Engineering</code></div> | ||
<div> - <code>Marketing</code></div> | ||
<div> - <code>Materials-Engineering</code></div> | ||
<div> - <code>Mechanical-Engineering</code></div> | ||
<div> - <code>Nondestructive-Testing</code></div> | ||
<div> - <code>Patent</code></div> | ||
<div> - <code>Political-Science-and-Sociology</code></div> | ||
<div> - <code>Psychology</code></div> | ||
<div> - <code>Public-Safety</code></div> | ||
<div> - <code>Railway-and-Automotive-Engineering</code></div> | ||
<div> - <code>Real-Estate</code></div> | ||
<div> - <code>Refrigerating-Machinery</code></div> | ||
<div> - <code>Social-Welfare</code></div> | ||
<div> - <code>Taxation</code></div> | ||
<div> - <code>Telecommunications-and-Wireless-Technology</code></div> | ||
<div> - <code>Korean-History</code></div> | ||
<div> - <code>Math</code></div> | ||
</details> | ||
|
||
+ **url**: [https://huggingface.co/datasets/HAERAE-HUB/KMMLU](https://huggingface.co/datasets/HAERAE-HUB/KMMLU) | ||
+ **paper**: [https://arxiv.org/pdf/2402.11548](https://arxiv.org/pdf/2402.11548) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,20 @@ | ||
|
||
# mbpp | ||
- Mostly Basic Programming Problems | ||
- 구글에서 Programming 합성 문제를 측정하기 위한 2개의 벤치마크를 내놓음 - MBPP, MathQA-Python | ||
- MBPP는 974개의 프로그래밍 과제로, 입문자 수준의 프로그래머는 해결 가능한 수준 | ||
- 파이선 함수와 Text Description으로 구성됨 | ||
- 루프, 조건문 등을 주로 다룸 | ||
<img src="assets/mbpp.png" width=360> | ||
- 크라우드 소싱 및 수정 작업으로 이루어져 있으며, 수정본은 `sanitized` 버전으로 426개의 셋으로 제공됨 | ||
- HumanEval은 Docstring으로 프롬프트를 작성한 반면, MBPP는 자연어로 Description이 작성되어 있음 | ||
--- | ||
+ **source**: huggingface | ||
+ **hf_path**: google-research-datasets/mbpp | ||
+ **hf_name**: | ||
<details> | ||
<summary>Click</summary> | ||
<div> - <code>sanitized</code></div> | ||
</details> | ||
|
||
+ **url**: [https://huggingface.co/datasets/google-research-datasets/mbpp](https://huggingface.co/datasets/google-research-datasets/mbpp) | ||
+ **paper**: [https://arxiv.org/pdf/2108.07732](https://arxiv.org/pdf/2108.07732) |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.