⚔️
grinding
Stars
An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO
"Improving Mathematical Reasoning with Process Supervision" by OPENAI
simon-cogni / verilog-eval
Forked from NVlabs/verilog-evalVerilog evaluation benchmark for large language model
[NeurIPS 2023] Reflexion: Language Agents with Verbal Reinforcement Learning