Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github		.github
docs		docs
omniparse		omniparse
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
server.py		server.py

Repository files navigation

OmniParse

Important

OmniParse is a comprehensive parsing tool designed to convert any unstrcutred document, media, or website into strcutured markdown. Whether you're dealing with documents, tables, images, videos, audio files, or web pages, OmniParse ensures your data is parsed and cleaned to a high standard before it is passed to any downstream LLM use case, such as advanced RAG.

Features

✅ Supports 15+ file types
✅ Convert Documents, Multimedia, Web pages to high-quality structured markdown
✅ Table Extraction, Image Extraction/Captioning, Audio/Video Transcription, Web page Crawling
✅ Easily Deployable using Docker and Skypilot
✅ CPU/GPU compatible
✅ Batch processing for handling multiple files at once
✅ Comprehensive logging and error handling for robust performance \

Supported Types

Type	Supported Extensions
Plaintext	.eml, .html, .md, .msg, .rst, .rtf, .txt, .xml
Documents	.doc, .docx, .epub, .odt, .pdf, .ppt, .pptx
Table	.csv, .xlsx
Images	.png, .jpg, .jpeg, .tiff, .bmp, .heic
Video	.mp4, .mkv, .avi, .mov
Audio	.mp3, .wav, .aac
Web	dynamic webpages, http://.com
Crawl	dynamic webpages, http://.com

Installation

To install OmniParse, you can use pip:

git clone https://github.com/adithya-s-k/omniparse
cd omniparse

Create a Virtual Environment:

conda create omniparse-venv python=3.10
conda activate omniparse-venv

Install Dependencies:

poetry install
# or
pip install -e .

Usage

Run the Server:

python server.py

Install the client:

pip install omniparse_client

Example usage:

from omniparse_client import OmniParse

# Initialize the parser
parser = OmniParse(
    base_url="http://localhost:8000" 
    api_key="op-...", # get the api key from dev.omniparse.com
    verbose=True,
    language="en" )

# Parse a document
document = parser.load_data('path/to/document.pdf')

# Convert to markdown
parser.save_to_markdown(document)

License

OmniParse is licensed under the Apache License. See LICENSE for more information.

Acknowledgement

Surya-OCR,Texify - Big thanks to VikParuchuri for creating awesome open-source OCR models which have been extensively used in this project

Contact

For any inquiries, please contact us at [email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OmniParse

Features

Supported Types

Installation

Usage

License

Acknowledgement

Contact

About

Releases

Packages

Languages

License

yijunx/omniparse

Folders and files

Latest commit

History

Repository files navigation

OmniParse

Features

Supported Types

Installation

Usage

License

Acknowledgement

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages