REST API service in Rust that takes in any file and returns its parsed content.
Multithreading was used to improve the performance of the service. The service is able to handle multiple requests concurrently.
Demonstration URL: https://parser.excoffierleonard.com
Demonstration Endpoint: https://parser.excoffierleonard.com/parse
The API supports the following file formats:
- PDF (
.pdf
) - Word Documents (
.docx
) - Excel Spreadsheets (
.xlsx
) - PowerPoint Presentations (
.pptx
) - All text-based files including but not limited to:
- Plain text (
.txt
) - Source code files (
.rs
,.py
,.js
,etc.
) - Configuration files (
.json
,.yaml
,.toml
,etc.
) - Markup files (
.html
,.md
,.xml
) - Data files (
.csv
,.tsv
) - Log files (
.log
)
- Plain text (
- All image-based files (OCR) including but not limited to:
- Raster images (
.png
,.jpg
,.jpeg
,.gif
,.bmp
,.webp
,etc.
) - Icon files (
.ico
) - Animated images (
.gif
)
- Raster images (
The OCR functionality supports English and French languages.
For local build:
- Rust
- Libraries (For Tessaract OCR):
- Tesseract development libraries
- Leptonica development libraries
- Clang development libraries
- English Language Data
- French Language Data
For deployment:
The service can be configured using the following environment variables.
PARSER_APP_PORT
: INT, The port on which the program listens on. (default: 8080)ENABLE_FILE_SERVING
: BOOL, Enable serving files for the frontend. (default: false, just the API is enabled)
curl -o compose.yaml https://raw.githubusercontent.com/excoffierleonard/parser/refs/heads/main/compose.yaml && \
docker compose up -d
API documentation and examples are available in docs/api.md.
Useful commands for development:
- Full build:
chmod +x ./scripts/build.sh && \
./scripts/build.sh
- Deployment tests:
chmod +x ./scripts/deploy-tests.sh && \
./scripts/deploy-tests.sh
This project is licensed under the MIT License - see the LICENSE file for details.