Agentic Documents Assistant

The Agentic Documents Assistant is an LLM assistant that allows users to answer complex questions from their business documents through natural conversations. It supports answering factual questions by retrieving information directly from documents using semantic search with the popular RAG design pattern. Additionally, it answers analytical questions such as which contracts will expire in the next 3 months? by translating user questions into SQL queries and running them against a database of entities extracted from the documents using a batch process. It is also able to answer complex multi-step questions by combining retrieval, analytical, and other tools and data sources using an LLM agent design pattern.

To learn more about the design and architecture of this solution, check the accompanying AWS ML blog post: Boosting RAG-based intelligent document assistants using entity extraction, SQL querying, and agents with Amazon Bedrock.

Key Features

Semantic search to augment response generation with relevant documents
Structured metadata & entities extraction and SQL queries for analytical reasoning
An agent built with the Reason and Act (ReAct) instruction format that determines whether to use search or SQL to answer a given question.

Architecture Overview

The following architecture diagrams depicts the design of the solution.

Content

Below an outline of the main folders included in this asset.

Folder	Description
`backend`	Includes a Typescript CDK project implementing IaaC to setup the backend infrastructure.
`frontend`	A Typescript CDK project to setup infrastructure for deploying and hosting the frontend app with AWS Amplify.
`frontend/chat-app`	A Next.js app with AWS Cognito Authentication and secured backend connectivity.
`data-pipelines`	Notebooks implementing SageMaker Jobs and Pipeline to process the data in batch.
`experiments`	Notebooks and code showcasing different modules of the solution as standalone experiments for research and development.

Getting Started

Follow the insturctions below to setup the solution on your account.

Prerequisites

An AWS account.
Configure model access to Anthroptic Claude and Amazon Titan models in one of the supported regions of Amazon Bedrock.
setup AWS Cloud Development Kit (CDK):
- We recommend using a Cloud9 environment or CloudShell to install the cdk app.
- Alternatively, you can setup CDK in your local environment by following the documentation instructions.

Installation

To install the solution in your AWS account:

Clone this repository.
Install the backend CDK app, as follows:
1. Go inside the backend folder.
2. Run npm install to install the dependencies.
3. If you have never used CDK in the current account and region, run bootstrapping with npx cdk bootstrap.
4. Run npx cdk deploy to deploy the stack.
5. Take note of the SageMaker IAM Policy ARN found in the CDK stack output.
Deploy the Next.js frontend on AWS Amplify:
1. Go inside the fontend folder.
2. Run npm install to install the dependencies.
3. Run npx cdk deploy to deploy a stack that builds an Amplify CI/CD
4. Once the CI/CD is ready go to the Amplify console and trigger a build.
5. Once the app is built, click the hosting link to view. You can now create a new account and interact with agentic assistant.
To update the underlying data, run the SageMaker Pipeline notebooks under the data-pipelines folder. This processes the input pdf documents, prepares the SQL table, and creates the semantic search index used by the LLM assistant.

Clean up

To remove the resources of the solution:

Remove the stack inside the backend folder by running npx cdk destroy.
Remove the stack inside the frontend folder by running npx cdk destroy.

Authors

The authors of this asset are:

Mohamed Ali Jamaoui: Solution designer/Core maintainer.
Giuseppe Hannen: Extensive contribution to the data extraction modules.
Laurens ten Cate: Contributed to extending the agent with SQL tool and early streamlit UI deployments.

Security

See CONTRIBUTING for more information.

References

Future improvements

Improve the overall-inference speed.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
assets		assets
backend		backend
data-pipelines		data-pipelines
experiments		experiments
frontend		frontend
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agentic Documents Assistant

Key Features

Architecture Overview

Content

Getting Started

Prerequisites

Installation

Clean up

Authors

Security

References

Future improvements

License

About

Releases

Packages

Languages

License

CC-Tech-Digital/aws-agentic-document-assistant

Folders and files

Latest commit

History

Repository files navigation

Agentic Documents Assistant

Key Features

Architecture Overview

Content

Getting Started

Prerequisites

Installation

Clean up

Authors

Security

References

Future improvements

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages