This project aimed to develop and implement an Optical Character Recognition (OCR) system by leveraging the capabilities of the NVIDIA Jetson Orin Nano Development Platform. The original objective was to create a functional OCR application capable of accurately recognizing and identifying single characters from a live video feed. The project's scope was then expanded to handle multiple characters and eventually full words and sentences. The vision is to harness the power of advanced neural networks and embedded GPU technology to deliver a robust, real-time OCR solution suitable for various real-world applications.
The educational goal of this project was to learn and understand the low-level implementation and functionality of neural networks. Therefore, all major functionalities for performing character inference were written from scratch, avoiding the use of common Python and C/C++ libraries.
To view the full project proposal and see a greater breakdown of the project timeline, please refer to documentation here: Project Proposal
To view program demonstration see the assets
folder or click here: Program Demo
This project was done in collaboration with A2e Technologies under their Summer Internship Program.
The following software and libraries should be installed before attempting to run the program or install any drivers:
- C++ Compiler: Version 9.4 or later (e.g., GCC, Clang)
- CUDA: Version 11.4 or later
- OpenCV: Version 4.8 or later
- Python: Version 3.8 or later
- NumPy: Version 1.17 or later
- Git: Version 2.25 or later
To install the Python dependencies required for this project, follow these steps:
-
Ensure Python is Installed: Verify that Python 3.8 or later is installed by running
python --version
orpython3 --version
in your terminal. -
Create and Activate a Virtual Environment (Recommended):
- Linux:
python3 -m venv venv source venv/bin/activate
- Linux:
-
Install the Required Python Packages:
- Ensure you are in the project directory where the
requirements.txt
file is located. - Run the following command to install all required packages:
pip install -r requirements.txt
- Ensure you are in the project directory where the
-
Verify Installation:
- You can check if the packages were installed correctly by running:
pip list
- This will list all installed packages and their versions.
- You can check if the packages were installed correctly by running:
For detailed instructions on how to use the requirements.txt
file, see Python's documentation on virtual environments and pip's user guide.
This system is intended to be run on the NVIDIA Jetson Development Kits connected with a CSI Camera. The following hardware was used in the development of this system:
- NVIDIA Jetson Orin Nano Development Kit
- Running JetPack 5.1.1/L4T 35.3.1 or JetPack 5.1.2/L4T 35.4.1
- ArduCam 12MP IMX477 CSI Camera
- microSD card (64GB UHS-1 or larger)
- USB keyboard and mouse
- Computer display
- DisplayPort or HDMI cable
- Internet connection (wired or wireless)
- USB LED Ring Light
If flashing via SDK Manager:
- Native Linux PC
- USB 3 to USB C or USB C to USB C cable
- Depending on the available ports on your Linux PC
If it is your first time setting up your NVIDIA Jetson, it is recommended to flash the Jetson using SDK Manager.
- Refer to this YouTube tutorial from JetsonHacks for detailed instructions: NVIDIA SDK Manager Tutorial: Installing Jetson Software Explained
- NVIDIA SDK Manager Documentation
If SDK manager is not an option or you prefer a simpler method, you may also flash your Jetson via microSD card.
- Refer to this tutorial for detailed instructions for Windows, Mac, and Linux: Jetson Orin Nano Developer Kit Getting Started Guide
Flashing the Jetson via SDK Manager will require a native Linux PC running Ubuntu 18.04 or 20.04. Using a VM to run Ubuntu and attempting to use SDK Manager will not work.
Use of the ArduCam 12MP IMX477 CSI Camera will require the installation of the appropriate drivers for your JetPack Version. Refer to the following guide for how to connect your ArduCam to your Jetson via the CSI connection ports: ArduCam Quick-Start-Guide
- Download the bash scripts:
cd ~
wget https://github.com/ArduCAM/MIPI_Camera/releases/download/v0.0.3/install_full.sh
- Install the driver and reboot system:
chmod +x install_full.sh
./install_full.sh -m imx477
- Verify your driver installation:
dmesg | grep -E "imx477|imx219|arducam"
The result shown means the driver was installed successfully:
[ 0.000000] Linux version 5.10.120-tegra (arducam_006@arducam-server-006) (aarch64-linux-gcc.br_real (Buildroot 2020.08) 9.3.0, GNU ld (GNU Binutils) 2.33.1) #1 SMP PREEMPT Wed Aug 9 15:29:32 CST 2023
[ 0.002899] DTS File Name: /home/arducam_006/jenkins/workspace/n_nano_kernel_l4t-35.4.1-arducam/kernel/kernel-5.10/arch/arm64/boot/dts/../../../../../../hardware/nvidia/platform/t23x/p3768/kernel-dts/tegra234-p3767-0003-p3768-0000-a0.dts
[ 17.415135] nv_imx477: no symbol version for module_layout
[ 17.423536] nv_imx477: loading out-of-tree module taints kernel.
[ 17.423704] nv_imx477: loading out-of-tree module taints kernel.
[ 17.442589] imx477 9-001a: tegracam sensor driver:imx477_v2.0.6
[ 17.746012] imx477 9-001a: imx477_board_setup: error during i2c read probe (-121)
[ 17.758240] imx477 9-001a: board setup failed
[ 17.770853] imx477: probe of 9-001a failed with error -121
[ 17.778801] imx477 10-001a: tegracam sensor driver:imx477_v2.0.6
[ 18.080380] tegra-camrtc-capture-vi tegra-capture-vi: subdev imx477 10-001a bound
- Check cable connection:
ls /dev/video*
The result shown means the cable connection is correct:
/dev/video0 or /dev/video1
- Test the video feed:
gst-launch-1.0 nvarguscamerasrc ! 'video/x-raw(memory:NVMM), width=(int)1920, height=(int)1080, format=(string)NV12, framerate=(fraction)60/1' ! nvvidconv ! xvimagesink -e
If you can see the live video feed from your camera, you are ready to start using the program.
JetPack 5.1.1/L4T 35.3.1 or JetPack 5.1.2/L4T 35.4.1 is required on the Jetson device in order for the driver to download successfully.
Locate and download the 3D modeled camera stand's .stl
file from the camera_stand
directory. Use a slicing software of your choice and generate G-code for the print. Prepare the filament of your choice and start print, PLA was used for this project.
- Connect the ArduCam 12MP IMX477 CSI Camera to the Jetson
- Locate the camera connector (CSI). It’s on the side of the carrier board, opposite to the GPIO pins
- Pull up on the plastic edges of the camera port. Do it gently to avoid pulling it off
- Push in the camera ribbon. Make sure the contacts are facing the heatsinks. Do not bend the flex cable, and make sure it’s firmly inserted into the bottom of the connector
- First connect the mouse, keyboard, display, and ethernet (optional), to the Jetson then, connect the Jetson's DC Barrel Jack power supply
- Insert the camera into the 3D printed camera stand in the orientation that matches the stand's keyed pattern
- The camera should sit flush on the top of the stand with only the backside of the PCB exposed
- Plug in the USB LED Ring Light and place on top of the camera stand
The assembled system should look similar to the image below:
At the highest level, the folders are split between the source code code
and the training data data
. Typically, only source code would be checked into the repo, but for the scope of this project, we decided to stray from the standard convention and included the train data images for those who may want to train the model on their own and recreate our results. A pre-trained model is also included in this repo.
-
- Contains additional content such as images and videos
-
- All source code for the project
-
- Old code that was used in the development of the final product but is no longer is use
-
- The main application can be found under the inference folder. This includes the CUDA inference file, OpenCV Python script, default config file, and Makefile
-
- Contains files relating to the model
-
- TensorFlow file used for training the model
-
- Pre-trained weight and bias CSV files for the neural network
-
- Additional utility Python scripts
-
- Images used to train the model (40px x 60px). Training data contains alphanumeric English characters (62 classes). Uses 3475 font styles available in Google Fonts
- The images are compressed in a zip folder and must be extracted before use. See README - Data
- Dataset link: OCR-Dataset
Use your tool of choice to clone the repository. For example, using Git Bash:
git clone https://jira.a2etechnologies.com:8444/scm/aeocr/jetson-ocr.git
The main control flow of this system is managed by a Python program that uses OpenCV to handle the live video stream, detect character objects, produce overlays, and interface with the CUDA Inference API. This program uses the CTypes library to facilitate communication and efficient data transfer between the Python controller and the GPU-accelerated neural network.
Next, navigate into the inference directory:
cd jetson-ocr/code/inference
Finally, build the project:
make
If built successfully, a file called libcuda_inference.so
should be created, at which point, you are ready to run the program.
A test page with 11pt Arial font should be used. Individual characters and/or separate words should have at least a tab's worth of space in between to avoid being represented as a whole word. Individual characters of a word should have exactly 1 space in-between each other to be considered part of the same word so that the program can properly detect individual characters. Characters/words should be at least double spaced apart to be detected on separate lines.
An example test page can be found in the assets
folder. Or click Here
Navigate to the inference folder.
cd jetson-ocr/code/inference
The system is designed to be run from the command line with no additional arguments required. Although, command line arguments are available for further customization and testing purposes. For the full list of arguments use the -h, --help argument:
Character recognition using CUDA
Usage: perform_inference.py [-h] [-n NUMBER] [-v] [-c CONFIG]
Press the "Esc" key to gracefully exit the program.
Options:
-h | --help Print this message
-n | --number Set maximum number of characters to process. Number must be greater than 0 and no more than 30. Default number is 30
-v | --verbose Display FPS, inference time, number of images detected, and other info
-c | --config Path to the config JSON file
Example:
python3 perform_inference -n 25 -v Inference on a maximum of 25 characters and include info like FPS, inference time, and number of images detected.
python3 perform_inference -c new_config.json Inference on a maximum of 20 characters (default) using the user-created "new_config.json", without displaying addtional infernce information.
Note: Press the "Esc" key to gracefully exit the program.
The GUI should appear with the following elements:
- The viewing window showing the live video stream
- Detected objects encapsulated by a green bounding box
- Detected character and percent certainty overlaid above the detected object
If the "verbose" option was chosen, in the top left of the window:
- Frames Per Second
- Max # of objects
- Detected # of objects
- Inference time
An output log program.log
should be created once the program concludes that includes a report of key system operations with respective timestamps for debugging purposes or if unexpected errors are encountered.
Lastly, I would like to recognize the resources that helped in the development of this system:
- A2e Technologies
- Jetson Orin Nano Developer Kit Getting Started Guide
- NVIDIA SDK Manager Tutorial: Installing Jetson Software Explained
- NVIDIA SDK Manager Documentation
- ArduCam Quick-Start-Guide
- CUDA C++ Programming Guide
- PNG Encoder and Decoder in C and C++
- Contrast Stretching Background Info
- YouTube NVIDIA Jetson Tutorials
- CUDA Basics Course
- Neural Network Basics Course