DocoGPT

DocoGPT is a platform for document analysis, summarization, and Q&A using Retrieval-Augmented Generation with LLMs.
We also build a simple frontend for DocoGPT, below is an example of what DocoGPT achieved.

DocoGPT is aimed to achieve higher accuracy in finding answers related to user’s question in the vector database by employing a hierarchical information architecture.
Instead of embedding the document content as a whole, DocoGPT use GPT-4 to split documents into sections and create summaries, building a document tree where document sections forms the leaf nodes and parent nodes are summaries of their children.
This design helps LLM agent to make easier decision as the option is binary, and the summary layers also solve the problem that traditional RAGs have in document or section summarization (since RAG is better in retrieval)
The Design Diagram is here: Link

Time-consuming tree construction if the document is big. We recommended document of size 1 - 8 pages.
High token consumption, related to the previous issue, also because we use binary tree instead of ternary.
PDF parsing sometime don't work.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
agent		agent
docogpt		docogpt
.gitignore		.gitignore
DocoGPT Final Presentation.pptx		DocoGPT Final Presentation.pptx
Instruction to run DocoGPT.pdf		Instruction to run DocoGPT.pdf
README.md		README.md

Provide feedback