Skip to content

The Document Summarizer tool helps users quickly understand the main points of lengthy documents. - I am trying to build a system using Open source LLMs

Notifications You must be signed in to change notification settings

Natarajan-R/Document-Summarizer

Repository files navigation

Document-Summarizer

The Document Summarizer tool helps users quickly understand the main points of lengthy documents. - I am trying to build a system using Open source LLMs We can choose any one of PDF Document. Using PyMuPDF , text contents are extracted first. This input PDF documents path and its summary file path can be configured . Once the text is extracted from this pdf file, we can filter or remove unnecessary text content which we dont want to include in the summary . We use LLM to summarize the whole PDF Text content. As text size may be high, first we chunk the pdf content into small portions and summarize each chunk. Then finally we consolidate all summaries together. We used facebook/bart-large-cnn LLM for our intial testing , but we can change it any desired LLM . it is working fine now . but the output is raw text. To make it good looking , we have to include the formatting in the summary generated. We are working on it

About

The Document Summarizer tool helps users quickly understand the main points of lengthy documents. - I am trying to build a system using Open source LLMs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published