This is a Ruby on Rails application that demonstrates how to implement file upload using Active Storage and perform Optical Character Recognition (OCR) using the rtesseract
gem.
- Upload files (e.g., images, PDFs) using Active Storage
- Perform OCR on uploaded files to extract text
- Display the extracted text on the document show page
- Ruby 3.0.0 or later
- Rails 6.0 or later
- Tesseract OCR
- Tesseract language pack (optional, for additional languages)
Ensure you have the following installed:
- Ruby
- Rails
- Tesseract OCR
macOS:
brew install tesseract
brew install tesseract-lang
Ubuntu:
sudo apt-get install tesseract-ocr
sudo apt-get install tesseract-ocr-<lang-code>
Windows:
Download the installer from the Tesseract at UB Mannheim page and follow the installation instructions. For additional languages, download the appropriate language pack files.
- Clone the repository:
git clone https://github.com/your-username/rails-file-upload-ocr.git
cd rails-file-upload-ocr
- Install dependencies:
bundle install
- Set up the database:
rails db:create
rails db:migrate
- Start the Rails server:
./bin/dev
-
Navigate to
http://localhost:3000
in your web browser. -
Upload a new document by visiting
http://localhost:3000/documents/new
. -
After uploading, the OCR text will be displayed on the document show page.
app/models/document.rb
: Model representing the Document with file attachment and OCR method.app/controllers/documents_controller.rb
: Controller handling document upload and OCR.app/views/documents/new.html.erb
: Form for uploading a new document.app/views/documents/show.html.erb
: View displaying the uploaded document and OCR text.
- Fork the repository
- Create your feature branch (
git checkout -b feature/new-feature
) - Commit your changes (
git commit -m 'Add some feature'
) - Push to the branch (
git push origin feature/new-feature
) - Open a pull request
This project is licensed under the MIT License.