Medicine OCR Mobile is a React Native app built with Expo that leverages Optical Character Recognition (OCR) to detect and extract text from images. The app integrates with a Flask backend to process the images and return the OCR results. Additionally, the app features Text-to-Speech (TTS), enabling users to listen to the detected text.
- Capture Image: Use your device's camera to capture images for OCR processing.
- OCR Detection: Automatically detects and extracts text from captured images.
- Text-to-Speech (TTS): Allows users to listen to the detected text aloud.
- IP Configuration: Configure the backend IP address for OCR processing.
- Camera Toggle: Switch between front and back cameras to capture images.
- Image Gallery Access: Access and select images from the device's media library for OCR.
Before you begin, ensure you have the following installed:
- Node.js (v14 or later)
- Expo CLI: Install it globally using
npm install -g expo-cli
- React Native Development Environment: Required for building the app on iOS/Android
- Flask Backend: The app relies on a Flask API for OCR processing. You can find the backend repository here.
You will also need the following dependencies:
- Expo Camera (
expo-camera
) - Expo Speech (
expo-speech
) - Expo Media Library (
expo-media-library
) - Ionicons (
@expo/vector-icons
)
git clone https://github.com/lancedalanon/medicine-ocr-mobile.git
- Navigate to the project directory:
cd medicine-ocr-mobile
- Install the dependencies:
npm install
- Install necessary Expo packages:
expo install expo-camera expo-media-library expo-speech @expo/vector-icons
- Start the app:
expo start
This will open a QR code in the terminal that you can scan using the Expo Go app on your phone.
Grant Permissions: When first launched, the app will request permissions for the camera and media library. Make sure to grant these permissions to use the camera and save images. Capture an Image: Press the "Capture" button to open the camera. You can toggle between the front and back cameras using the "Flip" button. Process Image: After capturing the image, the text detected from the image will appear in a text input box. Speak Text: Press the "Speak" button to listen to the detected text via Text-to-Speech. Settings: Tap on the settings icon to input the IP address of the backend server. Backend Integration The app sends captured images to a Flask backend for OCR processing. The backend should be running and accessible via the provided IP address.
Flask Backend Repository: Medicine OCR Flask API
Backend Endpoint: /process-image Make sure that the Flask API is up, .env file has been setup with the API_KEY and running before using the app.
The app sends a POST request to the backend with the captured image data:
POST http://<server-ip>:5000/process-image
Headers:
X-API-KEY: Your API key for authentication.
Body:
image: The captured image, passed as a FormData object.
json
Copy code
{
"image": "<captured_image>"
}
The app allows you to configure the IP address of the Flask server that processes the OCR images. You can set the backend server's IP address from the app’s settings modal. This IP address is saved and used for all subsequent API requests.
To contribute to this project:
Fork the repository. Create a new branch for your changes. Make your changes. Commit your changes. Push to your forked repository. Submit a pull request. Sample Code Changes Here’s a small code snippet that demonstrates how to integrate the image capture and OCR processing features:
Copy code
import * as Camera from 'expo-camera';
import * as MediaLibrary from 'expo-media-library';
import * as Speech from 'expo-speech';
import { useState } from 'react';
import { Button, View, TextInput, Alert } from 'react-native';
const CaptureAndProcess = () => {
const [image, setImage] = useState(null);
const [ocrText, setOcrText] = useState('');
// Function to capture an image and process it using the backend
const captureImage = async () => {
let permission = await Camera.requestCameraPermissionsAsync();
if (permission.granted) {
// Capture image and set it
let result = await Camera.takePictureAsync();
setImage(result.uri);
// Send image to backend for OCR
const response = await fetch('http://<server-ip>:5000/process-image', {
method: 'POST',
body: JSON.stringify({ image: result.uri }),
headers: {
'Content-Type': 'application/json',
},
});
const data = await response.json();
setOcrText(data.text); // Set the OCR text result
} else {
Alert.alert('Permission Denied', 'Camera access is required.');
}
};
// Function to speak the OCR text
const speakText = () => {
Speech.speak(ocrText);
};
return (
<View>
<Button title="Capture Image" onPress={captureImage} />
{image && <TextInput value={ocrText} />}
<Button title="Speak" onPress={speakText} />
</View>
);
};
Camera Not Working: Ensure the app has the necessary permissions to access the camera. If the camera is still not working, try restarting the app or resetting the device's camera permissions.
OCR Result Not Displayed: Check if the backend is running and accessible. Ensure that the backend server’s IP address is correctly configured in the app settings.
TTS Not Working: Ensure that the device’s audio is on, and the app has permission to access the speaker. If TTS doesn’t work, try restarting the app.
This project is licensed under the MIT License. See the LICENSE file for more details.