Rails File Upload with OCR

This is a Ruby on Rails application that demonstrates how to implement file upload using Active Storage and perform Optical Character Recognition (OCR) using the rtesseract gem.

Features

Upload files (e.g., images, PDFs) using Active Storage
Perform OCR on uploaded files to extract text
Display the extracted text on the document show page

Requirements

Ruby 3.0.0 or later
Rails 6.0 or later
Tesseract OCR
Tesseract language pack (optional, for additional languages)

Getting Started

Prerequisites

Ensure you have the following installed:

Ruby
Rails
Tesseract OCR

Installing Tesseract OCR

macOS:

brew install tesseract
brew install tesseract-lang

Ubuntu:

sudo apt-get install tesseract-ocr
sudo apt-get install tesseract-ocr-<lang-code>

Windows:

Download the installer from the Tesseract at UB Mannheim page and follow the installation instructions. For additional languages, download the appropriate language pack files.

Installation

Clone the repository:

git clone https://github.com/your-username/rails-file-upload-ocr.git
cd rails-file-upload-ocr

Install dependencies:

bundle install

Set up the database:

rails db:create
rails db:migrate

Usage

Start the Rails server:

./bin/dev

Navigate to http://localhost:3000 in your web browser.
Upload a new document by visiting http://localhost:3000/documents/new.
After uploading, the OCR text will be displayed on the document show page.

Project Structure

app/models/document.rb: Model representing the Document with file attachment and OCR method.
app/controllers/documents_controller.rb: Controller handling document upload and OCR.
app/views/documents/new.html.erb: Form for uploading a new document.
app/views/documents/show.html.erb: View displaying the uploaded document and OCR text.

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/new-feature)
Commit your changes (git commit -m 'Add some feature')
Push to the branch (git push origin feature/new-feature)
Open a pull request

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
app		app
bin		bin
config		config
db		db
lib		lib
log		log
public		public
storage		storage
test		test
tmp		tmp
vendor		vendor
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.node-version		.node-version
.ruby-version		.ruby-version
Dockerfile		Dockerfile
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
Procfile.dev		Procfile.dev
README.md		README.md
Rakefile		Rakefile
config.ru		config.ru

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rails File Upload with OCR

Features

Requirements

Getting Started

Prerequisites

Installing Tesseract OCR

Installation

Usage

Project Structure

Contributing

License

Acknowledgments

About

Releases

Packages

Languages

luizcg/ocr

Folders and files

Latest commit

History

Repository files navigation

Rails File Upload with OCR

Features

Requirements

Getting Started

Prerequisites

Installing Tesseract OCR

Installation

Usage

Project Structure

Contributing

License

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages