Reduce printing costs by automatically detecting and removing blank spaces in PDF documents. Home Page
pip install PyMuPDF Pillow python-docx tkinterdnd2
- Run
gui.pyw
for graphical interface operations (CLI version currently unavailable) - After processing, use the provided Office macro(
图片溢出缩小.bas
) to resize overflowing images- The macro automatically detects first column width in Word and scales oversized images
Parameter | Description |
---|---|
threshold | Brightness threshold for blank line detection (0-255) |
- Converts RGB pixels to grayscale (average value) | |
- Rows with average grayscale ≥ threshold are considered blank | |
dpi | Image resolution when converting PDF to images |
min_height | Content validity filter (in pixels) |
- Only preserves content blocks with height ≥ specified value | |
blank_height | Paragraph separation baseline |
- Content is considered separate paragraphs when blank lines ≥ this value |
- Small images/geometric shapes may be accidentally removed or cropped
- Narrow images/color blocks might be misidentified (adjust config parameters)
- Scanned documents must have horizontal text alignment (pre-process tilted pages)
- Manual margin trimming required to remove headers/footers
MIT License
This third-generation version features:
- GUI implementation for user-friendly operation
- Partial utilization of AI-assisted development tools
- Continuous optimization through multiple iterations