Docling PDF Processor w/ Streamlit

A simple UI wrapper around Docling for document processing. I built this to make document analysis more accessible and thought others might find it useful.

Inspired by Docling and its integration with LlamaIndex.

What This Does

Processes PDFs using Docling's document analysis
Extracts text, tables, and performs OCR
Presents results in a clean Streamlit interface
Handles multi-page documents and complex tables
Makes document processing accessible to non-technical users

Setup

git clone https://github.com/lesteroliver911/docling-pdf-processor.git
cd docling-pdf-processor
pip install -r requirements.txt
streamlit run main.py

How It Works

The app combines three powerful frameworks:

Docling: Advanced document processing and analysis
LlamaIndex: Robust framework for structuring and indexing document data
Streamlit: Simple web interface

Key functions:

# Setting up the document processor
def initialize_converter():
    pipeline_options = PdfPipelineOptions()
    pipeline_options.do_ocr = True
    pipeline_options.do_table_structure = True
    return DocumentConverter(...)

# Processing PDFs
def process_pdf(uploaded_file, doc_converter):
    # Handles conversion and extraction
    # Returns markdown and multimodal content

Configuration

You can adjust a few settings in the code:

OMP_NUM_THREADS: CPU threads (default: 4)
IMAGE_RESOLUTION_SCALE: Image quality (default: 2.0)

Requirements

docling
llama-index
streamlit
pandas
python-dotenv

Using the App

Upload a PDF
Check out the three tabs:
- AI Preview: Quick look at the content
- Extracted Content: Full text and structure
- Document Analysis: Page-by-page breakdown

Notes

Works best with clearly formatted PDFs
Table extraction might need tweaking for complex layouts
OCR can be slow on large documents
Docling provides robust document processing - check their documentation for more features
LlamaIndex integration adds powerful document structuring capabilities - see their Docling reader docs

Feel free to use this code, modify it, or suggest improvements. You can find me on LinkedIn if you want to discuss Python, AI, or document processing.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
LICENSE		LICENSE
README.md		README.md
demo.gif		demo.gif
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Docling PDF Processor w/ Streamlit

What This Does

Setup

How It Works

Configuration

Requirements

Using the App

Notes

About

Releases

Packages

Languages

License

lesteroliver911/docling-pdf-processor

Folders and files

Latest commit

History

Repository files navigation

Docling PDF Processor w/ Streamlit

What This Does

Setup

How It Works

Configuration

Requirements

Using the App

Notes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages