Skip to content

Commit

Permalink
Fixmath (deepseek-ai#14)
Browse files Browse the repository at this point in the history
* fix math hungarian exam eval error

* fix yi math score
  • Loading branch information
DeepSeekPH authored Dec 4, 2023
1 parent 4f22977 commit 99dd569
Show file tree
Hide file tree
Showing 3 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ In line with Grok-1, we have evaluated the model's mathematical capabilities usi
<img src="images/mathexam.png" alt="result" width="70%">
</div>

**Remark:** Some results are obtained by DeepSeek LLM authors, while others are done by Grok-1 authors. We found some models count the score of the last question (Llemma 34b and Mammoth) while some (MetaMath-7B) are not in the original evaluation. In our evaluation, we count the last question score. Evaluation details are [here](https://github.com/deepseek-ai/DeepSeek-LLM/tree/HEAD/evaluation/hungarian_national_hs_solutions).
**Remark:** We have rectified an error from our initial evaluation. In this revised version, we have omitted the lowest scores for questions 16, 17, 18, as well as for the aforementioned image. Evaluation details are [here](https://github.com/deepseek-ai/DeepSeek-LLM/tree/HEAD/evaluation/hungarian_national_hs_solutions).


---
Expand Down
2 changes: 1 addition & 1 deletion evaluation/more_results.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@

| Model | DeepSeek LLM 67B Chat | Qwen-14B-Chat | ChatGLM3-6B | Baichuan2-Chat-13B | Yi-Chat-34B | GPT-3.5-Turbo | Grok-1 | Claude 2 | GPT-4 |
|:-----------------------------------:|:---------------------:|:-------------:|:-----------:|:------------------:|:-----------:|:-------------:|:------:|:--------:|:-----:|
| Hungarian National High-School Exam | **65** | 38.5 | 37 | 20.5 | 44 | 41 | 59 | 55 | 68 |
| Hungarian National High-School Exam | **58** | 36.5 | 32 | 19.5 | 39 | 41 | 59 | 55 | 68 |


| Model | Qwen-14B-Chat | ChatGLM3-6B | Baichuan2-Chat-13B | Yi-Chat-34B | PaLM2 Small | DeepSeek LLM 67B Chat | GPT-4 |
Expand Down
Binary file modified images/mathexam.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 99dd569

Please sign in to comment.