A Python application that processes images of documents, automatically detects their boundaries, applies perspective correction, and converts them to enhanced PDF files.
- Automatic document boundary detection
- Perspective correction
- Text skew correction
- Image enhancement and binarization
- Batch processing of multiple images
- Automatic conversion to PDF
opencv-python
numpy
Pillow
- Clone this repository or download the source code
- Install the required dependencies:
pip install opencv-python numpy Pillow
- Create an
input
folder in the project directory - Place your document images (PNG, JPG, JPEG) in the
input
folder - Run the script:
python app.py
- Processed PDFs will be saved in the
output
folder
The application performs the following steps:
- Loads and preprocesses the image (grayscale conversion and blur)
- Detects document boundaries using contour detection
- Applies perspective transformation to get a top-down view
- Corrects text skew using the Hough transform
- Enhances the image using adaptive thresholding
- Saves the result as a PDF
The script includes error handling for:
- Invalid or unreadable images
- Cases where document boundaries cannot be detected
- File system operations
- The input images should have the document clearly visible with good contrast from the background
- Supported input formats: PNG, JPG, JPEG
- Output files are saved as PDF with the same name as the input file