GitHub - OpenInterpreter/benchmarks-v0

This repo is used to run various AI benchmarks on Open Interpreter.

There is currently support for GAIA and SWE-bench

Setup

Make sure the following software is installed on your computer.

Run Docker
Copy-paste the following into your terminal

git clone https://github.com/OpenInterpreter/benchmarks.git \
  && cd benchmarks \
  && python -m venv .venv \
  && source .venv/bin/activate \
  && python -m pip install -r requirements.txt \
  && docker build -t worker . \
  && python setup.py

Enter your Huggingface token

Running Benchmarks

This section assumes:

benchmarks (downloaded via git in the preview section) is set as the current working directory.
You've activated the virtualenv with the installed prerequisite packages.
If using an OpenAI model, your OPENAI_API_KEY environment variable is set with a valid OpenAI API key.
If using a Groq model, your GROQ_API_KEY environment variable is set with a valid Groq API key.

Note: For running GAIA, you have to accept the conditions to access its files and content on Huggingface

Example: gpt-3.5-turbo, first 16 GAIA tasks, 8 docker containers

This command will output a file called output.csv containing the results of the benchmark.

python run_benchmarks.py \
  --command gpt35turbo \
  --ntasks 16 \
  --nworkers 8

--command gpt35turbo: Replace gpt35turbo with any existing key in the commands Dict in commands.py. Defaults to gpt35turbo.
--ntasks 16: Grabs the first 16 GAIA tasks to run. Defaults to all 165 GAIA validation tasks.
--nworkers 8: Number of docker containers to run at once. Defaults to whatever max_workers defaults to when constructing a ThreadPoolExecutor.

Troubleshooting

ModuleNotFoundError: No module named '_lzma' when running example.
- If you're using pyenv to manage python versions, this stackoverflow post might help.
ModuleNotFoundError: No module named 'pkg_resources' when running example.
- Refer to this stackoverflow post for now.
- OpenInterpreter should probably include setuptools in its list of dependencies, or should switch to another module that's in python's standard library.

Name	Name	Last commit message	Last commit date
Latest commit imapersonman Minor maintenance Nov 4, 2024 9d5c950 · Nov 4, 2024 History 45 Commits
docs	docs	[WIP] update readme and create contributing.md (#9 )	Jul 4, 2024
templates	templates	Output less spammy (#7 )	Jul 4, 2024
worker	worker	Output less spammy (#7 )	Jul 4, 2024
.gitignore	.gitignore	Local saves (#3 )	May 26, 2024
Dockerfile	Dockerfile	A bunch of stuff in preparation for local2 (#5 )	Jul 4, 2024
LICENSE	LICENSE	Create LICENSE	May 26, 2024
README.md	README.md	[WIP] update readme and create contributing.md (#9 )	Jul 4, 2024
__init__.py	__init__.py	Output less spammy (#7 )	Jul 4, 2024
benchmark.py	benchmark.py	Minor maintenance	Nov 4, 2024
commands.py	commands.py	A bunch of stuff in preparation for local2 (#5 )	Jul 4, 2024
constants.py	constants.py	Local saves (#3 )	May 26, 2024
custom.py	custom.py	Custom tasks (#6 )	Jul 4, 2024
gaia.py	gaia.py	Output less spammy (#7 )	Jul 4, 2024
requirements.txt	requirements.txt	Minor maintenance	Nov 4, 2024
run_benchmarks.py	run_benchmarks.py	More command line (#8 )	Jul 4, 2024
setup.py	setup.py	Fs (#4 )	May 28, 2024
utils.py	utils.py	Fs (#4 )	May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Setup

Running Benchmarks

Example: gpt-3.5-turbo, first 16 GAIA tasks, 8 docker containers

Troubleshooting

About

Releases

Packages

Contributors 3

Languages

License

OpenInterpreter/benchmarks-v0

Folders and files

Latest commit

History

Repository files navigation

Setup

Running Benchmarks

Example: gpt-3.5-turbo, first 16 GAIA tasks, 8 docker containers

Troubleshooting

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages