Skip to content

Simple document analysis, summarization, and Q&A using Retrieval-Augmented Generation with LLMs.

Notifications You must be signed in to change notification settings

Hongda-OSU/DocoGPT

Repository files navigation

DocoGPT

What is DocoGPT?

  • DocoGPT is a platform for document analysis, summarization, and Q&A using Retrieval-Augmented Generation with LLMs.
  • We also build a simple frontend for DocoGPT, below is an example of what DocoGPT achieved.

Design for DocoGPT

  • DocoGPT is aimed to achieve higher accuracy in finding answers related to user’s question in the vector database by employing a hierarchical information architecture.
  • Instead of embedding the document content as a whole, DocoGPT use GPT-4 to split documents into sections and create summaries, building a document tree where document sections forms the leaf nodes and parent nodes are summaries of their children.
  • This design helps LLM agent to make easier decision as the option is binary, and the summary layers also solve the problem that traditional RAGs have in document or section summarization (since RAG is better in retrieval)
  • The Design Diagram is here: Link

Known Issues

  • Time-consuming tree construction if the document is big. We recommended document of size 1 - 8 pages.
  • High token consumption, related to the previous issue, also because we use binary tree instead of ternary.
  • PDF parsing sometime don't work.

About

Simple document analysis, summarization, and Q&A using Retrieval-Augmented Generation with LLMs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published