- DocoGPT is a platform for document analysis, summarization, and Q&A using Retrieval-Augmented Generation with LLMs.
- We also build a simple frontend for DocoGPT, below is an example of what DocoGPT achieved.
- DocoGPT is aimed to achieve higher accuracy in finding answers related to user’s question in the vector database by employing a hierarchical information architecture.
- Instead of embedding the document content as a whole, DocoGPT use GPT-4 to split documents into sections and create summaries, building a document tree where document sections forms the leaf nodes and parent nodes are summaries of their children.
- This design helps LLM agent to make easier decision as the option is binary, and the summary layers also solve the problem that traditional RAGs have in document or section summarization (since RAG is better in retrieval)
- The Design Diagram is here: Link
- Time-consuming tree construction if the document is big. We recommended document of size 1 - 8 pages.
- High token consumption, related to the previous issue, also because we use binary tree instead of ternary.
- PDF parsing sometime don't work.