OCR + transliteration on arabic scanned images
Used specifically for Lebanese Government ID card scanned images
- clone this repository
- install jq, curl, direnv
- copy
.envrc.dist
to.envrc
and set Google vision API key into env varGOOGLE_VISION_API_KEY
- get it from google cloud console
direnv allow .
Download example scanned ID
wget https://www.tradearabia.com/source/2014/08/06/id.jpg -O images/id.jpg
Run OCR and transliteration
./ocr-arabic.sh images/id.jpg
Example input
Example output
Transliterated | OCR
---------------------------------------------------------------------------------
United Arab Emirates o | o setarimE barA detinU
. ldentity card | drac ytitnedl .
dwlp Al<mArAt AlErbyp AlmtHdp |ةدحتملا ةيبرعلا تارامإلا ةلود
bTAqp hwyp | ةيوه ةقاطب
Number | rebmuN
rqm Alhwyp / ID | DI / ةيوهلا مقر
784-1977-1234566-1 | 1-6654321-7791-487
mn Al<sm: AHmd mHmd Ebd Allh |هللا دبع دمحم دمحا :مسإلا نم
Name: Ahmed Mohamed Abdulla | alludbA demahoM demhA :emaN
Aljnsyp: Al<mArAt AlErbyp AlmtHdp |ةدحتملا ةيبرعلا تارامإلا :ةيسنجلا
Nationality: United Arab Emirates | setarimE barA detinU :ytilanoitaN