Link to Web App: https://bit.ly/Arabic-image-to-text
A user can input a whole PDF with Arabic text images and it will output 2 files with digitized Arabic text that can be copied/searched/analyzed/etc. One file is a PDF of the output and the other is a Word file of the output.
This was done using the Tesseract Python library. Its accuracy is far from a 100% but example image is shown below.
- Install Python and Streamlit
- Install all required library and packages
- Run "streamlit run SL_ArabicOCR.py"
Hosted by Streamlit Cloud