Auto-GPT Benchmark

A repo built for the purpose of benchmarking the performance of agents far and wide, regardless of how they are set up and how they work

Scores:

Radio chart for each agent coming soon !

⚠️ These results are constantly evolving at the moment. We will publish an official benchmark result very soon.

Interface

Task	Auto-GPT	gpt-engineer	mini-agi	smol-developer
Write File	❌	✅	tbd	✅
Read File	❌	❌	tbd	❌
Search File	❌	❌	tbd	❌

Code

Task	Auto-GPT	gpt-engineer	mini-agi	smol-developer
Debug Simple Typo With Guidance	❌	❌	tbd	❌
Debug Simple Typo Without Guidance	❌	❌	tbd	❌
Basic Code Generation	❌	✅	tbd	✅
Create Simple Web Server	❌	❌	tbd	❌

Memory

Task	Auto-GPT
Basic Memory	❌
Remember Multiple Ids	❌
Remember Multiple Ids With Noise	❌
Remember Multiple Phrases With Noise	❌

Name		Name	Last commit message	Last commit date
Latest commit History 229 Commits
.github		.github
.vscode		.vscode
agbenchmark		agbenchmark
agent		agent
benchmark_runs		benchmark_runs
reports		reports
.env.example		.env.example
.flake8		.flake8
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
json_to_base_64.py		json_to_base_64.py
mypy.ini		mypy.ini
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
send_to_googledrive.py		send_to_googledrive.py