chore(tests): accuracy tests for MongoDB tools exposed by MCP server #341

himanshusinghs · 2025-07-07T14:02:55Z

Note:

Still a WIP, will be finalizing the accuracy snapshots logic and then will mark this PR ready to review with some description and motivation behind the changes.

Until then feel free to take a look at the tests and the testing harness.

Proposed changes

Checklist

I have signed the MongoDB CLA

coveralls · 2025-07-07T14:12:56Z

Pull Request Test Coverage Report for Build 16141478615

Details

28 of 28 (100.0%) changed or added relevant lines in 7 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.2%) to 75.833%

Totals
Change from base Build 16135914646:	0.2%
Covered Lines:	853
Relevant Lines:	1041

💛 - Coveralls

LangChain's ToolCalling agent was not providing a structured tool call response and different model providers were providing entirely different tool calls for the same tool definition which was too turbulent for us to have any accuracy baseline at all. Vercel's AI SDK pushes us forward on that problem and the tool call responses so far have always been well structured. This commit replaces LangChain based implementation with Vercel's AI SDK based implementation.

When writing test cases, I realized that it is too much duplicated effort to write and maintain mocks. So instead of having only a mocked mcp client, this commit introduces a real mcp client that talks to our mcp server and is still mockable. We are now setting up real MCP client with test data in mongodb database spun up for test suites. Mocking is still an option but we likely never feel the need for that.

introduces the following necessary env variables: - MDB_ACCURACY_RUN_ID: The accuracy run id - MDB_ACCURACY_MDB_URL: The connection string to mongodb instance where the snapshots will be stored - MDB_ACCURACY_MDB_DB: The database for snapshots - MDB_ACCURACY_MDB_COLLECTION: The collection for snapshots

…onfig

The new field `accuracyRunStatus` is supposed to help guard against cases where jest might fail in between, maybe due to LLM rate limit errors or something else, and we then have a partially saved state of an accuracy run. With the new field `accuracyRunStatus` we should be able to safely look for last runs where `accuracyRunStatus` is done and have complete state of accuracy snapshot.

himanshusinghs force-pushed the chore/issue-307-proposal-2 branch 2 times, most recently from 0fd7c20 to aca0abe Compare July 8, 2025 09:11

himanshusinghs added 27 commits July 8, 2025 23:38

chore: LangChain based accuracy tests

6727e94

chore: integrate capturing accuracy snapshots

656e630

chore: correct env names

6f9e956

chore: more consolidated prompt tests

6349234

chore: add a few more tests and some more models

cae734a

chore: add AzureOpenAI model in the model list

93e967e

chore: use ListDatabasesTool response creator for tests

0cd594c

chore: use ListCollectionsTool response creators in tests

29b3be9

chore: tests for collection-indexes tool

3d2d54d

modify prompt for list-collections prompt and log tools provided

99d8ad2

chore: have mock generators return Promise of ToolResult as well

6b5fcbb

chore: tests for collection-schema tool

332cb32

chore: do not fail tests on dropped accuracy

bb6c05b

chore: added tests for find tool

324fed3

chore: tests for insert-many tool

0c89851

chore: tests for delete-many tool

7de7a02

chore: add oepnai provider

ef7fd59

chore: fixes accuracy scorer for position independent matching

b9d6dd2

chore: moved all existing tests to vercel mcp client

e8bc19d

chore: adds tests for the rest of the tools

2a22f51

chore: adds missed out tests for tools

0e89932

chore: remove file based snapshot

38fdc84

wip: snapshot summary generator

1562a90

chore: single entry point for running accuracy tests with different c…

1ddff0d

…onfig

himanshusinghs added 6 commits July 8, 2025 23:39

chore: reformat

7e607b7

chore: lint fixes

312b2a5

chore: simplified toolCallingAccuracy calculation

7cd61aa

chore: account for types moved around

7f931dc

chore: add disk based accuracy storage for local runs

da15e6d

himanshusinghs force-pushed the chore/issue-307-proposal-2 branch from 7d494ad to da15e6d Compare July 8, 2025 21:39

chore: revert changes done to any of the src files

58bc8a5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(tests): accuracy tests for MongoDB tools exposed by MCP server #341

chore(tests): accuracy tests for MongoDB tools exposed by MCP server #341

himanshusinghs commented Jul 7, 2025

Uh oh!

coveralls commented Jul 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

chore(tests): accuracy tests for MongoDB tools exposed by MCP server #341

Are you sure you want to change the base?

chore(tests): accuracy tests for MongoDB tools exposed by MCP server #341

Conversation

himanshusinghs commented Jul 7, 2025

Proposed changes

Checklist

Uh oh!

coveralls commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 16141478615

Details

💛 - Coveralls

Uh oh!

Uh oh!

coveralls commented Jul 7, 2025 •

edited

Loading