-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pull] main from huggingface:main #11
Conversation
* Pin t * Pin t * Set top p * C * Tune math prompt * Improve math prompt * Update tables
🚨 gitStream Monthly Automation Limit Reached 🚨 Your organization has exceeded the number of pull requests allowed for automation with gitStream. To continue automating your PR workflows and unlock additional features, please contact LinearB. |
Reviewer's Guide by SourceryThis pull request updates the evaluation setup to align with DeepSeek's reported results, introduces standardized prompt templates for math and GPQA tasks, and updates dependencies to specific versions. Sequence diagram for the evaluation processsequenceDiagram
participant User
participant lighteval
participant vllm
participant Model
User->>lighteval: Runs evaluation with specified MODEL_ARGS (temperature=0.6, top_p=0.95)
lighteval->>vllm: Sends evaluation request with prompt
vllm->>Model: Generates response based on prompt
Model-->>vllm: Returns generated response
vllm-->>lighteval: Returns response
lighteval->>lighteval: Evaluates response
lighteval-->>User: Returns evaluation results
Updated class diagram for LightevalTaskConfigclassDiagram
class LightevalTaskConfig {
+name: str
+suite: list[str]
+prompt_function: function
+hf_repo: str
+hf_subset: str
+hf_avail_splits: list[str]
+metric_function: function
}
note for LightevalTaskConfig "The prompt_function attribute now uses standardized prompt templates for math and GPQA tasks."
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.1)
Can you help keep this open source service alive? 💖 Please sponsor : )
Summary by Sourcery
This pull request focuses on improving the reproducibility of Deepseek's evaluation results and updating dependencies. It introduces new prompt templates for MATH and GPQA tasks, and aligns the evaluation settings with the DeepSeek-R1 paper by using sampling with temperature 0.6 and top_p 0.95. The README is updated with detailed instructions and commands to reproduce the results, along with tables comparing the reproduced results with the reported results from Deepseek. Also, the dependencies in
setup.py
are updated to specific versions.Bug Fixes:
max_new_tokens
instead ofgeneration_size
.Enhancements:
Documentation:
Tests:
Chores:
setup.py
to specific versions, includingaccelerate
,transformers
, andtrl
.