Stars
This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.
g588928812 / FastChat_eval
Forked from lm-sys/FastChatusing eval part of FastChat to evaluate the current mess of open-source LLMs