This project is a FastAPI application that provides OCR (Optical Character Recognition) capabilities for both images and PDFs. It uses Tesseract OCR to extract text and can handle tables in images and PDFs.
This application features a user interface that allows you to upload images or PDF files and receive the extracted text in JSON format.
This file contains the FastAPI application and defines the routes for handling image and PDF uploads.
-
/ocr-image/
- Method: POST
- Description: Accepts an image file and returns the extracted text and tables.
- Request Body: Image file (form-data).
- Response: JSON containing the extracted text and tables.
-
/ocr-pdf/
- Method: POST
- Description: Accepts a PDF file and returns the extracted text and tables.
- Request Body: PDF file (form-data).
- Response: JSON containing the extracted text and tables.
This module provides functions for OCR and table extraction from images.
-
ocr_image(image_data: bytes) -> str
- Takes image data as input and returns the extracted text as a string.
-
extract_table_from_img(image_data: bytes) -> list
- Takes image data as input and returns a list of extracted table data.
This module provides functions for OCR and table extraction from PDF files.
-
ocr_pdf(pdf_data: bytes) -> str
- Takes PDF data as input and returns the extracted text as a string.
-
extract_table_from_pdf(pdf_data: bytes) -> list
- Takes PDF data as input and returns a list of extracted table data.
- OCR for Images: Extract text from standard images and images containing tables.
- OCR for PDFs: Extract text and tables from PDF files.
- UI Design: Separate files for handling image and PDF processing.
OCR/
├── main.py
├── image_processing.py
├── pdf_processing.py
└── index.html
└── asset/
├── img.png
├── lorem.pdf
├── table_img.jpg
├── table_pdf.pdf
└── static/
└── style.css
└── README. md
Follow these steps to set up the FastAPI OCR application:
Ensure you have the following installed on your system:
- Python 3.7 or higher
- Tesseract OCR (Make sure to note the installation path)
-
Clone the Repository: Open your terminal and run the following command to clone the repository
git clone https://github.com/dagimgetaw/OCR.git cd OCR
-
Set Up a Virtual Environment (Optional but Recommended): Create a virtual environment to manage dependencies
python -m venv venv
Activate the virtual environment:
- For Windows:
venv\Scripts\activate
- For macOS/Linux:
source venv/bin/activate
-
Install Requirements: Install the necessary dependencies using pip
pip install -r requirements.txt
-
Tesseract Installation: Make sure Tesseract OCR is installed on your system. You can download it from: Tesseract OCR
After installation, set the tesseract_cmd
path in your code:
python
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
pytesseract.pytesseract.tesseract_cmd = '/usr/local/bin/tesseract'
pytesseract.pytesseract.tesseract_cmd = '/usr/bin/tesseract'
-
Run the Application: Start the FastAPI application using the following command Inside the directory:
uvicorn main:app --host 0.0.0.0 --port 8000
-
Access the Application: Access the application at http://localhost:8000.
Make sure you have the following installed:
- Docker
- Tesseract OCR (included in the Dockerfile)
- FastAPI
- OpenCV
- pdfplumber
- pdf2image
This project is licensed under the MIT License. See the LICENSE file for details.