The Document Summarizer tool helps users quickly understand the main points of lengthy documents. - I am trying to build a system using Open source LLMs We can choose any one of PDF Document. Using PyMuPDF , text contents are extracted first. This input PDF documents path and its summary file path can be configured . Once the text is extracted from this pdf file, we can filter or remove unnecessary text content which we dont want to include in the summary . We use LLM to summarize the whole PDF Text content. As text size may be high, first we chunk the pdf content into small portions and summarize each chunk. Then finally we consolidate all summaries together. We used facebook/bart-large-cnn LLM for our intial testing , but we can change it any desired LLM . it is working fine now . but the output is raw text. To make it good looking , we have to include the formatting in the summary generated. We are working on it
-
Notifications
You must be signed in to change notification settings - Fork 0
Natarajan-R/Document-Summarizer
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
The Document Summarizer tool helps users quickly understand the main points of lengthy documents. - I am trying to build a system using Open source LLMs
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published