A powerful OCR (Optical Character Recognition) package that uses state-of-the-art vision language models through Ollama to extract text from images. Available both as a Python package and a Streamlit web application.
-
Multiple Vision Models Support
- LLaVA 7B: Efficient vision-language model for real-time processing (LLaVa model can generate wrong output sometimes)
- Llama 3.2 Vision: Advanced model with high accuracy for complex documents
-
Multiple Output Formats
- Markdown: Preserves text formatting with headers and lists
- Plain Text: Clean, simple text extraction
- JSON: Structured data format
- Structured: Tables and organized data
- Key-Value Pairs: Extracts labeled information
-
Batch Processing
- Process multiple images in parallel
- Progress tracking for each image
- Image preprocessing (resize, normalize, etc.)
pip install ollama-ocr
- Install Ollama
- Pull the required model:
ollama pull llama3.2-vision:11b
from ollama_ocr import OCRProcessor
# Initialize OCR processor
ocr = OCRProcessor(model_name='llama3.2-vision:11b') # You can use any vision model available on Ollama
# Process an image
result = ocr.process_image(
image_path="path/to/your/image.png",
format_type="markdown" # Options: markdown, text, json, structured, key_value
)
print(result)
from ollama_ocr import OCRProcessor
# Initialize OCR processor
ocr = OCRProcessor(model_name='llama3.2-vision:11b', max_workers=4) # max workers for parallel processing
# Process multiple images
# Process multiple images with progress tracking
batch_results = ocr.process_batch(
input_path="path/to/images/folder", # Directory or list of image paths
format_type="markdown",
recursive=True, # Search subdirectories
preprocess=True # Enable image preprocessing
)
# Access results
for file_path, text in batch_results['results'].items():
print(f"\nFile: {file_path}")
print(f"Extracted Text: {text}")
# View statistics
print("\nProcessing Statistics:")
print(f"Total images: {batch_results['statistics']['total']}")
print(f"Successfully processed: {batch_results['statistics']['successful']}")
print(f"Failed: {batch_results['statistics']['failed']}")
- Markdown Format: The output is a markdown string containing the extracted text from the image.
- Text Format: The output is a plain text string containing the extracted text from the image.
- JSON Format: The output is a JSON object containing the extracted text from the image.
- Structured Format: The output is a structured object containing the extracted text from the image.
- Key-Value Format: The output is a dictionary containing the extracted text from the image.
- User-Friendly Interface
- Drag-and-drop image upload
- Real-time processing
- Download extracted text
- Image preview with details
- Responsive design
- Clone the repository:
git clone https://github.com/imanoop7/Ollama-OCR.git
cd Ollama-OCR
- Install dependencies:
pip install -r requirements.txt
- Go to the directory where app.py is located:
cd src/ollama_ocr
- Run the Streamlit app:
streamlit run app.py
This project is licensed under the MIT License - see the LICENSE file for details.
Built with Ollama Powered by LLaMA Vision Models