Lighteval on LUMI

This repository is a fork of the Lighteval project, adjusted for evaluating large language models (LLMs) on the LUMI supercomputer.

Notes:

The current recommended setup is easily done with Pip.
Legacy branch is Lighteval version 0.3.0 and is no longer maintained.
Instructions for source code installation will be provided later.
Instructions for running evals on LUMI with transformers backend will be provided later.

Get started with Lighteval on LUMI

# Load modules so that a correct virtual environment can be created
module use /appl/local/csc/modulefiles/
module purge
module load LUMI
module load pytorch

# Set your project ID
export PROJECTID="your_project_id"

# Create a folder and a virtual environment in it
mkdir /projappl/$PROJECTID/$USER/lighteval
cd /projappl/$PROJECTID/$USER/lighteval
python -m venv .venv --system-site-packages
source .venv/bin/activate
unset PYTHONPATH
export PYTHONPATH=/projappl/$PROJECTID/$USER/lighteval/.venv/lib/python$(python -c "import sys; print(f'{sys.version_info.major}.{sys.version_info.minor}')")/site-packages
export HF_HOME=/scratch/$PROJECTID/$USER/hf_cache
pip install lighteval[accelerate,vllm,extended_tasks]

Lighteval is now installed.

You can do multi-GPU config with:

accelerate config

Test it with an interactive compute node

srun \
  --account=$PROJECTID \
  --partition=dev-g \
  --nodes=1 \
  --gres=gpu:mi250:2 \
  --time=2:00:00 \
  --mem=0 \
  --pty \
  bash -l -c \
  "module use /appl/local/csc/modulefiles/ && \
  module purge && \
  module load LUMI && \
  module load pytorch && \
  module load wget && \
  cd /projappl/$PROJECTID/$USER/lighteval && \
  export PS1='\u@\h:\w> ' && \
  source ./.venv/bin/activate && \
  exec bash"

Check the setup with

accelerate launch --multi_gpu --num_processes=2 -m \
    lighteval accelerate \
    --model_args "pretrained=gpt2" \
    --tasks "leaderboard|truthfulqa:mc|0|0" \
    --override_batch_size 1 \
    --output_dir="./evals/"

Run Fineweb evals

Fineweb evals use a custom task. The original custom task and instructions are available here.

For the new version of Lighteval, the custom task had to be modiefied. Follow these instructions to get Fineweb evaluations working with fast VLLM backend:

mkdir evals/tasks -p
# Download the custom task
wget -P evals/tasks/ https://raw.githubusercontent.com/JousiaPiha/Lighteval-on-LUMI/refs/heads/main/evals/tasks/lighteval_tasks.py
# Download the task list
wget -P evals/tasks/ https://raw.githubusercontent.com/JousiaPiha/Lighteval-on-LUMI/refs/heads/main/evals/tasks/fineweb.txt

Then you should be able to run the evaluations with:

lighteval accelerate \
    --model_args="vllm,pretrained=HuggingFaceFW/ablation-model-fineweb-edu,dtype=bfloat16" \
    --custom_tasks "./evals/tasks/lighteval_tasks.py" --max_samples 1000 \
    --tasks "./evals/tasks/fineweb.txt" \
    --output_dir "./evals/"

Currently, parallelism is not working well with VLLM and ROCM. We are investigating it.

Run with Transformers backend

Example: The following assumes there's 4 GPUs available. Both pipeline and data parallelism are used. If the model fits in one GPU, --num_processes could be 4 and model_parallel=False. Note that enough RAM must be requested.

accelerate launch --multi_gpu --num_processes=2 -m \
  lighteval accelerate \
    --model_args="pretrained=pretrained=HuggingFaceFW/ablation-model-fineweb-edu,dtype=bfloat16,trust_remote_code=True,model_parallel=True" \
    --tasks "leaderboard|truthfulqa:mc|0|0" \
    --output_dir "./evals/" \
    --override_batch_size 1

Your go-to toolkit for lightning-fast, flexible LLM evaluation, from Hugging Face's Leaderboard and Evals Team.

Documentation: Lighteval's Wiki

Unlock the Power of LLM Evaluation with Lighteval 🚀

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends—whether it's transformers, tgi, vllm, or nanotron—with ease. Dive deep into your model’s performance by saving and exploring detailed, sample-by-sample results to debug and see how your models stack-up.

Customization at your fingertips: letting you either browse all our existing tasks and metrics or effortlessly create your own, tailored to your needs.

Seamlessly experiment, benchmark, and store your results on the Hugging Face Hub, S3, or locally.

🔑 Key Features

Speed: Use vllm as backend for fast evals.
Completeness: Use the accelerate backend to launch any models hosted on Hugging Face.
Seamless Storage: Save results in S3 or Hugging Face Datasets.
Python API: Simple integration with the Python API.
Custom Tasks: Easily add custom tasks.
Versatility: Tons of metrics and tasks ready to go.

⚡️ Installation

pip install lighteval[accelerate]

Lighteval allows for many extras when installing, see here for a complete list.

If you want to push results to the Hugging Face Hub, add your access token as an environment variable:

huggingface-cli login

🚀 Quickstart

Lighteval offers two main entry points for model evaluation:

lighteval accelerate: evaluate models on CPU or one or more GPUs using 🤗 Accelerate.
lighteval nanotron: evaluate models in distributed settings using ⚡️ Nanotron.

Here’s a quick command to evaluate using the Accelerate backend:

lighteval accelerate \
    --model_args "pretrained=gpt2" \
    --tasks "leaderboard|truthfulqa:mc|0|0" \
    --override_batch_size 1 \
    --output_dir="./evals/"

🙏 Acknowledgements

Lighteval started as an extension of the fantastic Eleuther AI Harness (which powers the Open LLM Leaderboard) and draws inspiration from the amazing HELM framework.

While evolving Lighteval into its own standalone tool, we are grateful to the Harness and HELM teams for their pioneering work on LLM evaluations.

🌟 Contributions Welcome 💙💚💛💜🧡

Got ideas? Found a bug? Want to add a task or metric? Contributions are warmly welcomed!

If you're adding a new feature, please open an issue first.

If you open a PR, don't forget to run the styling!

pip install -e .[dev]
pre-commit install
pre-commit run --all-files

📜 Citation

@misc{lighteval,
  author = {Fourrier, Clémentine and Habib, Nathan and Wolf, Thomas and Tunstall, Lewis},
  title = {LightEval: A lightweight framework for LLM evaluation},
  year = {2023},
  version = {0.5.0},
  url = {https://github.com/huggingface/lighteval}
}

Name		Name	Last commit message	Last commit date
Latest commit History 251 Commits
.github		.github
assets		assets
community_tasks		community_tasks
evals		evals
examples		examples
src/lighteval		src/lighteval
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lighteval on LUMI

Notes:

Get started with Lighteval on LUMI

Test it with an interactive compute node

Run Fineweb evals

Run with Transformers backend

Unlock the Power of LLM Evaluation with Lighteval 🚀

🔑 Key Features

⚡️ Installation

🚀 Quickstart

🙏 Acknowledgements

🌟 Contributions Welcome 💙💚💛💜🧡

📜 Citation

About

Releases

Packages

Languages

License

JousiaPiha/Lighteval-on-LUMI

Folders and files

Latest commit

History

Repository files navigation

Lighteval on LUMI

Notes:

Get started with Lighteval on LUMI

Test it with an interactive compute node

Run Fineweb evals

Run with Transformers backend

Unlock the Power of LLM Evaluation with Lighteval 🚀

🔑 Key Features

⚡️ Installation

🚀 Quickstart

🙏 Acknowledgements

🌟 Contributions Welcome 💙💚💛💜🧡

📜 Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages