Skip to content

chore(tests): accuracy tests for MongoDB tools exposed by MCP server #341

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 34 commits into
base: main
Choose a base branch
from

Conversation

himanshusinghs
Copy link
Collaborator

Note:

Still a WIP, will be finalizing the accuracy snapshots logic and then will mark this PR ready to review with some description and motivation behind the changes.

Until then feel free to take a look at the tests and the testing harness.

Proposed changes

Checklist

@coveralls
Copy link
Collaborator

coveralls commented Jul 7, 2025

Pull Request Test Coverage Report for Build 16141478615

Details

  • 28 of 28 (100.0%) changed or added relevant lines in 7 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.2%) to 75.833%

Totals Coverage Status
Change from base Build 16135914646: 0.2%
Covered Lines: 853
Relevant Lines: 1041

💛 - Coveralls

@himanshusinghs himanshusinghs force-pushed the chore/issue-307-proposal-2 branch 2 times, most recently from 0fd7c20 to aca0abe Compare July 8, 2025 09:11
LangChain's ToolCalling agent was not providing a structured tool call
response and different model providers were providing entirely different
tool calls for the same tool definition which was too turbulent for us
to have any accuracy baseline at all.

Vercel's AI SDK pushes us forward on that problem and the tool call
responses so far have always been well structured.

This commit replaces LangChain based implementation with Vercel's AI SDK
based implementation.
When writing test cases, I realized that it is too much duplicated effort to write and maintain mocks. So instead of having only a mocked mcp client, this commit introduces a real mcp client that talks to our mcp server and is still mockable.

We are now setting up real MCP client with test data in mongodb database spun up for test suites. Mocking is still an option but we likely never feel the need for that.
introduces the following necessary env variables:
- MDB_ACCURACY_RUN_ID: The accuracy run id
- MDB_ACCURACY_MDB_URL: The connection string to mongodb instance where the snapshots will be stored
- MDB_ACCURACY_MDB_DB: The database for snapshots
- MDB_ACCURACY_MDB_COLLECTION: The collection for snapshots
The new field `accuracyRunStatus` is supposed to help guard against
cases where jest might fail in between, maybe due to LLM rate limit
errors or something else, and we then have a partially saved state of an
accuracy run. With the new field `accuracyRunStatus` we should be able
to safely look for last runs where `accuracyRunStatus` is done and have
complete state of accuracy snapshot.
@himanshusinghs himanshusinghs force-pushed the chore/issue-307-proposal-2 branch from 7d494ad to da15e6d Compare July 8, 2025 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants