Document-Summarizer

The Document Summarizer tool helps users quickly understand the main points of lengthy documents. - I am trying to build a system using Open source LLMs We can choose any one of PDF Document. Using PyMuPDF , text contents are extracted first. This input PDF documents path and its summary file path can be configured . Once the text is extracted from this pdf file, we can filter or remove unnecessary text content which we dont want to include in the summary . We use LLM to summarize the whole PDF Text content. As text size may be high, first we chunk the pdf content into small portions and summarize each chunk. Then finally we consolidate all summaries together. We used facebook/bart-large-cnn LLM for our intial testing , but we can change it any desired LLM . it is working fine now . but the output is raw text. To make it good looking , we have to include the formatting in the summary generated. We are working on it

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Custom_ViT_and_Deploy_in_HuggingFace_HUB.ipynb		Custom_ViT_and_Deploy_in_HuggingFace_HUB.ipynb
PDF_Summarizer.ipynb		PDF_Summarizer.ipynb
README.md		README.md
pdf_summarizer.py		pdf_summarizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document-Summarizer

About

Releases

Packages

Languages

Natarajan-R/Document-Summarizer

Folders and files

Latest commit

History

Repository files navigation

Document-Summarizer

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages