Skip to content

owenliang/qwen2.5-0.5b-grpo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

qwen2.5-0.5b-grpo

GRPO Training based on Qwen2.5 0.5B

hardware

  • NVIDIA A10 24GB x 1

dataset

openai gsm8k

time usage

45 minutes

codes

  • QwenGRPO.ipynb: training code
  • QwenTest.ipynb: test code

results

"aha moment" occurred at step 500.


query

树上7个鸟,又飞来1个鸟,一共几个鸟?

completion

<reasoning>
初始时,树上有 7 只鸟。后来又有 1 只鸟飞进来,所以总共的鸟的数量是 \(7 + 1 = 8\)。
</reasoning>
<answer>
8
</answer>

ref

About

Qwen2.5 0.5B GRPO

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published