-
Notifications
You must be signed in to change notification settings - Fork 263
Insights: stanford-crfm/helm
Overview
-
- 31 Merged pull requests
- 3 Open pull requests
- 2 Closed issues
- 0 New issues
Could not load contribution data
Please try again later
31 Pull requests merged by 8 people
-
Fix LiveQA import
#3244 merged
Dec 23, 2024 -
New safety scenario: HarmBench GCG-T
#3035 merged
Dec 21, 2024 -
Clean up Markdown formatting in "Enterprise benchmark" documentation
#3243 merged
Dec 20, 2024 -
Make encrypt_scenario_states script idempotent
#3242 merged
Dec 20, 2024 -
Build frontend
#3236 merged
Dec 20, 2024 -
Add MedAlign scenario
#3038 merged
Dec 20, 2024 -
Add landing page for HELM Capabilities leaderboard
#3241 merged
Dec 20, 2024 -
Fix incorrect file path in CzechBankQAAnnotator
#3240 merged
Dec 20, 2024 -
Bump jinja2 version
#3239 merged
Dec 20, 2024 -
Adding TweetSentBR Scenario
#3219 merged
Dec 20, 2024 -
Fix incorrect package name in WildBenchAnnotator
#3234 merged
Dec 20, 2024 -
Add Encryption for GPQA
#3216 merged
Dec 20, 2024 -
Update run entries config for Nova
#3235 merged
Dec 20, 2024 -
Add more instructions customization to output_format_instructions run expander
#3233 merged
Dec 20, 2024 -
Add Safety to the Reproducing Leaderboards documentation
#3232 merged
Dec 20, 2024 -
Fix template import in WildBenchAnnotator
#3231 merged
Dec 20, 2024 -
Build frontend
#3155 merged
Dec 20, 2024 -
Add links to enterprise benchmarks documentation
#3230 merged
Dec 20, 2024 -
Added openai/o1-2024-12-17 and openai/gpt-4o-2024-11-20
#3228 merged
Dec 20, 2024 -
Update CzechBankQA
#3227 merged
Dec 20, 2024 -
Add IBM logo to partners
#3229 merged
Dec 20, 2024 -
Add run entries for HELM capabilities
#3226 merged
Dec 19, 2024 -
Fixes to HELM capabilities
#3225 merged
Dec 19, 2024 -
Enterprise benchmark README
#3211 merged
Dec 19, 2024 -
Make boolean arguments for run spec functions case-insensitive
#3224 merged
Dec 19, 2024 -
Set max_tokens to 512 and stop_sequences to empty list for Unitxt
#3223 merged
Dec 19, 2024 -
Add experimental CzechBankQA scenario
#3222 merged
Dec 19, 2024 -
Example scripts for AutoClient and ServerService make_request usage
#3221 merged
Dec 19, 2024 -
Re-release Lite v1.12.0
#3220 merged
Dec 18, 2024 -
Revert Lite to v1.11.0
#3218 merged
Dec 18, 2024 -
Release Lite and MMLU v1.12.0
#3217 merged
Dec 18, 2024
3 Pull requests opened by 3 people
-
Adding MMLU and Winogrande human-translated into 11 African languages
#3237 opened
Dec 20, 2024 -
Bump jinja2 from 3.1.4 to 3.1.5 in the pip group across 1 directory
#3245 opened
Dec 24, 2024 -
MedHelm: Add VQA-RAD scenario and specs
#3246 opened
Dec 24, 2024
2 Issues closed by 1 person
-
Missing score_util in live_qa_annotator
#3238 closed
Dec 23, 2024 -
Add new model (upstage/solar-pro-preview-instruct)
#3040 closed
Dec 18, 2024
1 Unresolved conversation
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
MedHelm: Implement medcalc bench scenario, metrics and specs
#3207 commented on
Dec 24, 2024 • 0 new comments