feat: Add benchmarks API-Bank, APIBench, Nexus #1136

HHHHHejia · 2024-10-30T10:00:55Z

I've added the APIbank APIbench and Nexus benchmark, main method see benchmark test and utils folder (benchmark_base.py)

There're some problem to be solved for the APIBank, APIbench(gorilla) and Nexus benchmark. listed as below.

For Nexus:
run python nexus_test.py. You'll get error
1.OpenAI limits the size of the function passed into the function call api (function name, function description length, number of functions, etc.). You need to add judgment logic in Camel. If OpenAI does not allow function call, use structure output instead.

2.Critical: while true bug in camel.chatagent.step. When the incoming api is not executed correctly, while true will not terminate.The while true logic should be eliminated. You cannot assume that the function passed by the user will always be executed correctly.

For APIbench
There're three datasets 'torchhub', 'tensorhub', 'huggingface’ . "torchhub"works well. BUT
3.'tensorhub', 'huggingface’ could not be correctly evaluted by the ast matching program. This is a problem within the original repo. I have already proposed an issue. [(https://github.com/ShishirPatil/gorilla/issues/729)]

It could be version problem of tree_sitter, but if you don't use tree_sitter==0.20.4, you'll get an another bug.

For APIbank
There're three datasets 'level1', 'level2', 'level3’ . BUT
4.NO ONE knows how to eveluate 'level3'. See the issue in original repo:
[https://github.com/AlibabaResearch/DAMO-ConvAI/issues/167]
[https://github.com/AlibabaResearch/DAMO-ConvAI/issues/102]
[https://github.com/AlibabaResearch/DAMO-ConvAI/issues/114]

5.APIbank involves multiple "User-Assistant-System" messages as History Records. Camel ChatAgent does not support adding multiple rounds of system messages yet. Temporary solution: Use record_message and make_assistant_message instead of system messages.

6.The version conflict between openai in camel, Https, and Google translate in original repo, see
[https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/api-bank#demo]. Camel, Https and Google translate lib doesn't work together.
For now two way works:
-use original repo without camel, Google translate and Https works well.
-use camel, remove Google translate, it works but without Google translate tool.
See:
[https://github.com/microsoft/TaskWeaver/issues/172]

7.Some datasets need to be hosted on GitHub/HuggingFace. The original author did not do this, but we do not want to include these data in Camel's GitHub.

…A branch

harryeqs · 2024-12-09T02:36:59Z

@Wendong-Fan Hi Wendong, the three functional calling benchmarks have been integrated following the pattern of the GAIA benchmark integration. Sorry that the retriever has not yet been integrated into the APIBench benchmark due to time constraints and that can be done by Wednesday at the earliest, but the other parts are ready for review.

I also have a few questions regarding the integration:

The code of all three benchmarks is under Apache 2.0 license, are there any patterns to follow when referencing the code I copied/adapted?
What should be included in the unit test for these three benchmarks? I feel like the GAIA benchmark test is hard to reference for these benchmarks.
The level 3 dataset of APIBank still can't run since there haven't been any updates from the authors on how to use the evaluator.
How do you run the pre-commit test when committing to this fork?

Thanks!

Wendong-Fan

Thanks @HHHHHejia @harryeqs ! Left some comments for APIBank

camel/benchmarks/apibank.py

camel/benchmarks/apibench.py

Wendong-Fan · 2024-12-14T16:39:40Z

camel/benchmarks/apibench.py

+            ast_database.append(ast_tree)
+        self._data['ast'] = ast_database
+
+    def run(  # type: ignore[override]


I think the current run method in BaseBenchmark should be refactored. cc @liuxukun2000

Hi Wendong should this be done in this PR, or shall we set up a new issue and PR for the refactoring?

yeah we can do it in another pr, issue created here:#1338

camel/benchmarks/apibench.py

camel/benchmarks/apibank.py

Wendong-Fan · 2024-12-14T17:56:24Z

some errors with running pre-commit run --all-files need to be fixed
camel/benchmarks/utils/ast_eval.py:28: error: Cannot find implementation or library stub for module named "tree_sitter_python"  [import-not-found]
camel/benchmarks/utils/ast_eval.py:28: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
camel/benchmarks/utils/ast_eval.py:29: error: Library stubs not installed for "tree_sitter"  [import-untyped]
Hi Wendong sorry I tried but could not reproduce this error. Could you please check if the tree-sitter-python package has been successfully installed in your venv after running poetry install --with dev,docs -E all. This package was added to the poetry lock and its stub check should have been disabled so there shouldn't be any errors?

Sorry @harryeqs , my bad, I didn't switch my environment properly

harryeqs · 2024-12-15T17:33:22Z

Thanks @HHHHHejia @harryeqs ! Left some comments for APIBank

Thanks @Wendong-Fan for the comprehensive review! I have made some changes and please have a look when possible.
Thanks!

Wendong-Fan

thanks @harryeqs and sorry for the late review, left some comments below

camel/benchmarks/apibench.py

camel/benchmarks/utils/ast_eval.py

camel/benchmarks/apibench.py

camel/benchmarks/nexus.py

test/benchmarks/test_apibank_benchmark.py

test/benchmarks/test_apibench_benchmark.py

test/benchmarks/test_nexus_benchmark.py

Wendong-Fan · 2024-12-18T20:28:16Z

pyproject.toml

+tree-sitter = "*"
+tree-sitter-python = "*"
+googletrans-py = "4.0.0"


where did these dependencies used?

Hi Wendong, the googletrans-py is used in the APIs defined for the API-Bank, while the tree-sitter and tree-sitter-python are used for evaluation of APIBench.

Wendong-Fan · 2024-12-23T05:39:59Z

hey @harryeqs , I noticed some review comments were marked as resolved, but the updates don’t appear to have been pushed. Did you forget to push the code? Please ensure all comments are fully addressed before marking them as resolved.

harryeqs · 2024-12-23T05:48:35Z

hey @harryeqs , I noticed some review comments were marked as resolved, but the updates don’t appear to have been pushed. Did you forget to push the code? Please ensure all comments are fully addressed before marking them as resolved.

Hi Wendong I am very sorry that the update was delayed due to the hackathon. I've addressed some comments locally but have not pushed since I am still adding the tests. I will finish the tests asap this afternoon and push the code. Sorry for the wait!

harryeqs · 2024-12-23T14:39:53Z

thanks @harryeqs and sorry for the late review, left some comments below

Thank you @Wendong-Fan very much for the review! Sorry for the delay in updating the benchmarks. The tests have been added but they are based on a number of mocks as downloading the actual datasets is time-consuming and requires extra storage.

camel/benchmarks/apibench.py

Wendong-Fan · 2024-12-30T13:19:17Z

from Guohao: in the example we should add tools to the ChatAgent

HHHHHejia and others added 5 commits October 30, 2024 02:56

add benchmark gorilla, nexus

713a2d2

add apibank

9398ffb

Merge branch 'master' into benchmark_hejia

f8ba0d7

Merge branch 'master' into benchmark_hejia

5352cea

Merge branch 'master' into benchmark_hejia

f6d3436

harryeqs self-assigned this Nov 25, 2024

HHHHHejia changed the title ~~add benchmark gorilla, nexus~~ add benchmark apibank, gorilla, nexus Nov 26, 2024

Convert gorilla and nexusraven to regular directories

638dc57

Wendong-Fan marked this pull request as draft November 30, 2024 20:34

harryeqs and others added 12 commits December 2, 2024 21:21

Merge branch 'master' into benchmark_hejia

679225e

refactor: Constructed NexusBenchmark following BaseBenchmark from GAI…

1c113d1

…A branch

Merge branch 'master' into benchmark_hejia

ca1e490

Merge branch 'master' into benchmark_hejia

b5a7d0b

refactor: Refactored the integration of APIBench (Gorilla) benchmark

9083048

Merge branch 'master' into benchmark_hejia

1b151e0

refactor: Modified code structure

14ee80a

refactor: Integrated APIBank

42b7407

refactor: Change directory name for smoother merge

f908681

Merge branch 'master' into benchmark_hejia

9a77c78

docs: Update docs and put into benchmarks directory

0d71313

docs: Included examples

7745c37

harryeqs marked this pull request as ready for review December 9, 2024 02:20

update poetry lock

8ad922a

harryeqs requested a review from Wendong-Fan December 9, 2024 02:41

fix: Fix tree_sitter_import issue

acd676c

Wendong-Fan requested review from willshang76 and liuxukun2000 December 9, 2024 15:16

harryeqs changed the title ~~add benchmark apibank, gorilla, nexus~~ feat: Add benchmarks API-Bank, APIBench, Nexus Dec 9, 2024

Merge branch 'master' into benchmark_hejia

1511711

Wendong-Fan reviewed Dec 14, 2024

View reviewed changes

harryeqs and others added 2 commits December 16, 2024 01:21

clean code and update docstrings

f534efe

Merge branch 'master' into benchmark_hejia

1eab6e3

harryeqs and others added 4 commits December 16, 2024 16:22

Merge branch 'master' into benchmark_hejia

34e33cf

Merge branch 'master' into benchmark_hejia

1d280a9

Merge branch 'master' into benchmark_hejia

41fe3ed

update pyproject.toml and poetry.lock

9a0dba3

Wendong-Fan added the New Feature label Dec 18, 2024

Wendong-Fan added this to the Sprint 19 milestone Dec 18, 2024

Wendong-Fan linked an issue Dec 18, 2024 that may be closed by this pull request

[Feature Request] Integrate RAGBench to evaluate RAG performance #1203

Closed

2 tasks

Wendong-Fan removed a link to an issue Dec 18, 2024

[Feature Request] Integrate RAGBench to evaluate RAG performance #1203

Closed

2 tasks

Wendong-Fan reviewed Dec 18, 2024

View reviewed changes

Merge branch 'master' into benchmark_hejia

a367501

harryeqs and others added 4 commits December 23, 2024 22:10

improve structure and add unit tests

a5127f9

Merge branch 'master' into benchmark_hejia

511fbe0

resolve conflicts

f3da00c

update poetry.lock

8907b94

Merge branch 'master' into benchmark_hejia

c38f619

Wendong-Fan approved these changes Dec 29, 2024

View reviewed changes

camel/benchmarks/apibench.py Outdated Show resolved Hide resolved

Wendong-Fan enabled auto-merge (squash) December 29, 2024 08:56

Wendong-Fan disabled auto-merge December 29, 2024 08:56

Merge branch 'master' into benchmark_hejia

d2b7e9d

Wendong-Fan merged commit 926596e into camel-ai:master Dec 29, 2024
0 of 6 checks passed

Wendong-Fan mentioned this pull request Dec 29, 2024

[Feature Request] Add function calling benchmark and tools #1013

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add benchmarks API-Bank, APIBench, Nexus #1136

feat: Add benchmarks API-Bank, APIBench, Nexus #1136

HHHHHejia commented Oct 30, 2024 •

edited

Loading

harryeqs commented Dec 9, 2024

Wendong-Fan left a comment

Wendong-Fan Dec 14, 2024

harryeqs Dec 15, 2024

Wendong-Fan Dec 18, 2024

Wendong-Fan commented Dec 14, 2024

harryeqs commented Dec 15, 2024

Wendong-Fan left a comment

Wendong-Fan Dec 18, 2024

harryeqs Dec 23, 2024

Wendong-Fan commented Dec 23, 2024

harryeqs commented Dec 23, 2024

harryeqs commented Dec 23, 2024

Wendong-Fan commented Dec 30, 2024

feat: Add benchmarks API-Bank, APIBench, Nexus #1136

feat: Add benchmarks API-Bank, APIBench, Nexus #1136

Conversation

HHHHHejia commented Oct 30, 2024 • edited Loading

harryeqs commented Dec 9, 2024

Wendong-Fan left a comment

Choose a reason for hiding this comment

Wendong-Fan Dec 14, 2024

Choose a reason for hiding this comment

harryeqs Dec 15, 2024

Choose a reason for hiding this comment

Wendong-Fan Dec 18, 2024

Choose a reason for hiding this comment

Wendong-Fan commented Dec 14, 2024

harryeqs commented Dec 15, 2024

Wendong-Fan left a comment

Choose a reason for hiding this comment

Wendong-Fan Dec 18, 2024

Choose a reason for hiding this comment

harryeqs Dec 23, 2024

Choose a reason for hiding this comment

Wendong-Fan commented Dec 23, 2024

harryeqs commented Dec 23, 2024

harryeqs commented Dec 23, 2024

Wendong-Fan commented Dec 30, 2024

HHHHHejia commented Oct 30, 2024 •

edited

Loading