-
Notifications
You must be signed in to change notification settings - Fork 264
Insights: stanford-crfm/helm
Overview
Could not load contribution data
Please try again later
57 Pull requests merged by 11 people
-
Release MMLU v1.13.0
#3265 merged
Jan 10, 2025 -
Add caching to Nova client
#3264 merged
Jan 10, 2025 -
Updating request time (inference time) to report in seconds instead of ms
#3263 merged
Jan 9, 2025 -
Release Lite v1.13.0
#3260 merged
Jan 9, 2025 -
Build frontend
#3259 merged
Jan 9, 2025 -
Refactor agreement frontend
#3258 merged
Jan 9, 2025 -
Add cti_to_mitre scenario
#3249 merged
Jan 8, 2025 -
New arg for quasi-exact match
#3257 merged
Jan 8, 2025 -
Lint fixes for African MMLU and Winogrande
#3256 merged
Jan 8, 2025 -
Adding MMLU and Winogrande human-translated into 11 African languages
#3237 merged
Jan 8, 2025 -
Simplify credential management for Bedrock client
#3255 merged
Jan 7, 2025 -
Update requirements.txt
#3254 merged
Jan 7, 2025 -
Add DeepSeek v3 model
#3253 merged
Jan 7, 2025 -
Fix Amazon dependencies versions
#3252 merged
Jan 7, 2025 -
Adding functionality to run benchmarks for Amazon nova models
#3251 merged
Jan 7, 2025 -
Bump jinja2 from 3.1.4 to 3.1.5 in the pip group across 1 directory
#3245 merged
Jan 7, 2025 -
Update requirements.txt
#3248 merged
Jan 6, 2025 -
New QWEN 2 VLM + o1 fixes for VHELM
#3247 merged
Jan 5, 2025 -
Fix LiveQA import
#3244 merged
Dec 23, 2024 -
New safety scenario: HarmBench GCG-T
#3035 merged
Dec 21, 2024 -
Clean up Markdown formatting in "Enterprise benchmark" documentation
#3243 merged
Dec 20, 2024 -
Make encrypt_scenario_states script idempotent
#3242 merged
Dec 20, 2024 -
Build frontend
#3236 merged
Dec 20, 2024 -
Add MedAlign scenario
#3038 merged
Dec 20, 2024 -
Add landing page for HELM Capabilities leaderboard
#3241 merged
Dec 20, 2024 -
Fix incorrect file path in CzechBankQAAnnotator
#3240 merged
Dec 20, 2024 -
Bump jinja2 version
#3239 merged
Dec 20, 2024 -
Adding TweetSentBR Scenario
#3219 merged
Dec 20, 2024 -
Fix incorrect package name in WildBenchAnnotator
#3234 merged
Dec 20, 2024 -
Add Encryption for GPQA
#3216 merged
Dec 20, 2024 -
Update run entries config for Nova
#3235 merged
Dec 20, 2024 -
Add more instructions customization to output_format_instructions run expander
#3233 merged
Dec 20, 2024 -
Add Safety to the Reproducing Leaderboards documentation
#3232 merged
Dec 20, 2024 -
Fix template import in WildBenchAnnotator
#3231 merged
Dec 20, 2024 -
Build frontend
#3155 merged
Dec 20, 2024 -
Add links to enterprise benchmarks documentation
#3230 merged
Dec 20, 2024 -
Added openai/o1-2024-12-17 and openai/gpt-4o-2024-11-20
#3228 merged
Dec 20, 2024 -
Update CzechBankQA
#3227 merged
Dec 20, 2024 -
Add IBM logo to partners
#3229 merged
Dec 20, 2024 -
Add run entries for HELM capabilities
#3226 merged
Dec 19, 2024 -
Fixes to HELM capabilities
#3225 merged
Dec 19, 2024 -
Enterprise benchmark README
#3211 merged
Dec 19, 2024 -
Make boolean arguments for run spec functions case-insensitive
#3224 merged
Dec 19, 2024 -
Set max_tokens to 512 and stop_sequences to empty list for Unitxt
#3223 merged
Dec 19, 2024 -
Add experimental CzechBankQA scenario
#3222 merged
Dec 19, 2024 -
Example scripts for AutoClient and ServerService make_request usage
#3221 merged
Dec 19, 2024 -
Re-release Lite v1.12.0
#3220 merged
Dec 18, 2024 -
Revert Lite to v1.11.0
#3218 merged
Dec 18, 2024 -
Release Lite and MMLU v1.12.0
#3217 merged
Dec 18, 2024 -
Bump nanoid from 3.3.6 to 3.3.8 in /helm-frontend in the npm_and_yarn group across 1 directory
#3209 merged
Dec 18, 2024 -
Revert triton version to 2.2.0
#3215 merged
Dec 18, 2024 -
Update requirements.txt
#3214 merged
Dec 17, 2024 -
Limit anthropic to <0.39
#3213 merged
Dec 17, 2024 -
Adding BigCodeBench
#3186 merged
Dec 16, 2024 -
Add Omni-MATH
#3187 merged
Dec 16, 2024 -
Add gemini-2.0-flash-exp model
#3210 merged
Dec 14, 2024 -
Adding WildBench
#3150 merged
Dec 13, 2024
2 Pull requests opened by 2 people
-
MedHelm: Add VQA-RAD scenario and specs
#3246 opened
Dec 24, 2024 -
Add support for Granite 3.1 model family (IBM)
#3261 opened
Jan 9, 2025
2 Issues closed by 1 person
-
Missing score_util in live_qa_annotator
#3238 closed
Dec 23, 2024 -
Add new model (upstage/solar-pro-preview-instruct)
#3040 closed
Dec 18, 2024
1 Issue opened by 1 person
-
Switch AnthropicTokenizer to use client.beta.messages.count_tokens()
#3212 opened
Dec 17, 2024
4 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
MedHelm: Implement medcalc bench scenario, metrics and specs
#3207 commented on
Jan 8, 2025 • 7 new comments -
Add validation for schema YAML files
#2528 commented on
Dec 17, 2024 • 0 new comments -
Reproducing WMT14
#3175 commented on
Jan 8, 2025 • 0 new comments -
Issue with running HEIM
#3080 commented on
Jan 12, 2025 • 0 new comments