✨Currenr version: v1.1.2
🦄E2M is an API tool converting everything to markdown or json(both LLM-friendly Format).
🔥You'd better set USE_LLM=True
and use a LLM API to get the best result.
Why do I create this API? Because I do believe data is the most important thing in this AI era, but many resources are not in the right format. They are only information, not data. So I want to create a tool to convert everything to markdown or json, which is the most common format in the AI field. I hope E2M can be used in any AI application that needs format conversion, such as AI knowledge base, AI dataset, etc., so that developers can focus on the core functions of AI applications, not data format conversion.
Supported | Document | Image | Data | Audio | Video |
---|---|---|---|---|---|
Done | doc, docx, ppt, pptx, pdf, html, htm | ||||
Todo | jpg, jpeg, png, gif, svg | csv, xlsx, xls | mp3, wav, flac | mp4, avi, mkv |
- ParseMode:
auto
,ocr-low(tesseract)
,ocr-high(Surya)
,fast
- Update API structure
- Support long text parsing
- Add a new table to store raw data
- Add stream mode in API and frontend
- Add Async feature in API
- Develop a SDK for E2M API
- Add more LLM API
- Open an online demo
Please check your platform before you start:
$ arch
- if
x86_64
, you can use:docker-compose.amd64.yml
docker-compose.gpu.amd64.yml
- if
arm64
, you can use:docker-compose.arm64.yml
docker-compose.gpu.arm64.yml
You should have
docker
anddocker-compose
installed on your machine in advance.
git clone https://github.com/Jing-yilin/E2M
cd E2M/docker
# edit the docker-compose.yml file, set `USE_LLM` to `True`, and add your API key
# deploy the app with correst docker-compose file
docker-compose -f docker-compose.amd64.yml up --build -d
# check the logs with
docker-compose -f docker-compose.amd64.yml logs -f
# remove the container with
docker-compose -f docker-compose.amd64.yml down
- 🚀Web: http://127.0.0.1:3000
- 🚀API: http://127.0.0.1:8765/api/v1/
- 🚀API doc: http://127.0.0.1:8765/swagger/
You should have
docker
anddocker-compose
installed on your machine in advance.
git clone https://github.com/Jing-yilin/E2M
cd E2M
# edit the docker-compose.yml file, set `USE_LLM` to `True`, and add your API key
# deploy the app with docker, detach mode
docker-compose -f docker-compose.yml up --build -d
# check the logs with
docker-compose -f docker-compose.yml logs -f
# remove the container with
docker-compose -f docker-compose.yml down
- 🚀Web: http://127.0.0.1:3000
- 🚀API: http://127.0.0.1:8765/api/v1/
- 🚀API doc: http://127.0.0.1:8765/swagger/
To utilize the local GPU, follow these steps:
-
Install NVIDIA Driver: Ensure the NVIDIA driver is installed on your host machine.
-
Install NVIDIA Container Toolkit:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker
You may have to update your docker version if you encounter any issues.
- Run Docker Container with GPU Support:
docker-compose -f docker-compose.gpu.yml up --build -d
# edit the docker-compose.yml file, set `USE_LLM` to `True`, and add your API key
# check the logs with
docker-compose -f docker-compose.gpu.yml logs -f
# remove the container with
docker-compose -f docker-compose.gpu.yml down
- 🚀Web: http://127.0.0.1:3000
- 🚀API: http://127.0.0.1:8765/api/v1/
- 🚀API doc: http://127.0.0.1:8765/swagger/
If you are using Windows, you can use Docker Desktop with GPU support.
You can refer to: https://docs.docker.com/desktop/gpu/
Then you can run docker-compose as usual:
git clone https://github.com/Jing-yilin/E2M
cd E2M
docker-compose -f docker-compose.gpu.yml up --build -d
# check the logs with
docker-compose -f docker-compose.gpu.yml logs -f
# remove the container with
docker-compose -f docker-compose.gpu.yml down
Install:
git clone https://github.com/Jing-yilin/E2M
cd E2M/app
conda create -n e2m python=3.10 -y
conda activate e2m
python -m pip install -r requirements-dev.txt
First, you should install [email protected]
and libreoffice
:
-
Install PostgreSQL 15 and LibreOffice:
Reference: How to Install PostgreSQL On Ubuntu
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list' wget -qO- https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo tee /etc/apt/trusted.gpg.d/pgdg.asc &>/dev/null sudo apt update sudo apt install postgresql-15 postgresql-client-15 -y sudo apt install libreoffice -y
-
Start PostgreSQL:
sudo systemctl status postgresql
- Install PostgreSQL 15 and LibreOffice:
brew install postgresql@15 -y brew install --cask libreoffice -y
- Start PostgreSQL:
brew services start postgresql@15
-
Install PostgreSQL 15 and LibreOffice:
choco install postgresql15 --version=15.0.1 -y choco install libreoffice -y
You may have to run the cmd as an administrator
Also, you can download the libreoffice from here
-
Start PostgreSQL:
pg_ctl -D "C:\Program Files\PostgreSQL\15\data" start
Then, you need to migrate the database:
You have to change the
DB_ADMIN
andDB_PASSWORD
in thesetup_db.sh
file.
# make sure you are in E2M/app
# Please change DB_ADMIN and DB_PASSWORD to your own settings
chmod +x ./setup_db.sh
./setup_db.sh
Then you can start the API with the following command:
flask run --host 0.0.0.0 --port=8765 # --debug
If you want a web page, you can start the web with the following command:
cd web
npm install
npm run start
export FLASK_ENV=development
export FLASK_DEBUG=1
export FLASK_ENV=production
export FLASK_DEBUG=0
bash script:
curl -X POST "http://127.0.0.1:8765/api/v1/convert" \
-H "accept: application/json" \
-H "Content-Type: multipart/form-data; charset=utf-8" \
-H "Accept-Charset: utf-8" \
-F "file=@/path/to/file.docx" \
-F "parse_mode=auto"
return:
{
"message": "This is your markdown content"
}
Currently, only English and Chinese are supported.
{
"af": "Afrikaans",
"am": "Amharic",
"ar": "Arabic",
"as": "Assamese",
"az": "Azerbaijani",
"be": "Belarusian",
"bg": "Bulgarian",
"bn": "Bengali",
"br": "Breton",
"bs": "Bosnian",
"ca": "Catalan",
"cs": "Czech",
"cy": "Welsh",
"da": "Danish",
"de": "German",
"el": "Greek",
"en": "English",
"eo": "Esperanto",
"es": "Spanish",
"et": "Estonian",
"eu": "Basque",
"fa": "Persian",
"fi": "Finnish",
"fr": "French",
"fy": "Western Frisian",
"ga": "Irish",
"gd": "Scottish Gaelic",
"gl": "Galician",
"gu": "Gujarati",
"ha": "Hausa",
"he": "Hebrew",
"hi": "Hindi",
"hr": "Croatian",
"hu": "Hungarian",
"hy": "Armenian",
"id": "Indonesian",
"is": "Icelandic",
"it": "Italian",
"ja": "Japanese",
"jv": "Javanese",
"ka": "Georgian",
"kk": "Kazakh",
"km": "Khmer",
"kn": "Kannada",
"ko": "Korean",
"ku": "Kurdish",
"ky": "Kyrgyz",
"la": "Latin",
"lo": "Lao",
"lt": "Lithuanian",
"lv": "Latvian",
"mg": "Malagasy",
"mk": "Macedonian",
"ml": "Malayalam",
"mn": "Mongolian",
"mr": "Marathi",
"ms": "Malay",
"my": "Burmese",
"ne": "Nepali",
"nl": "Dutch",
"no": "Norwegian",
"om": "Oromo",
"or": "Oriya",
"pa": "Punjabi",
"pl": "Polish",
"ps": "Pashto",
"pt": "Portuguese",
"ro": "Romanian",
"ru": "Russian",
"sa": "Sanskrit",
"sd": "Sindhi",
"si": "Sinhala",
"sk": "Slovak",
"sl": "Slovenian",
"so": "Somali",
"sq": "Albanian",
"sr": "Serbian",
"su": "Sundanese",
"sv": "Swedish",
"sw": "Swahili",
"ta": "Tamil",
"te": "Telugu",
"th": "Thai",
"tl": "Tagalog",
"tr": "Turkish",
"ug": "Uyghur",
"uk": "Ukrainian",
"ur": "Urdu",
"uz": "Uzbek",
"vi": "Vietnamese",
"xh": "Xhosa",
"yi": "Yiddish",
"zh": "Chinese"
}
Before you commit your code, please create a new branch:
feature/xxx
for new featuresbugfix/xxx
for bug fixes
You can create a new branch with the following command:
# fetch the latest cod
git checkout main
git pull
# create a new branch
git checkout -b feature/xxx
Then, run the following commands to format the style of your code:
# all contributions should follow PEP8 style
flake8 . # to check the style
black . # to format the code
pymarkdownlnt fix . # to format the markdown
cd app
poetry export -f requirements.txt --without-hashes > requirements.txt
poetry export -f requirements.txt --without-hashes --with dev -o requirements-dev.txt
# add the changes
git add .
# commit the changes
git commit -m "your commit message"
# push the changes
git push origin feature/xxx # or simply `git push`
A new version:
cd app
docker build -t jingyilin/e2m-api:<version> .
docker push jingyilin/e2m-api:<version>
cd ../web
docker build -t jingyilin/e2m-web:<version> .
docker push jingyilin/e2m-web:<version>
For example, the version is v1.0.0
:
cd app
docker build -t jingyilin/e2m-api:v1.0.0 .
docker push jingyilin/e2m-api:v1.0.0
cd ../web
docker build -t jingyilin/e2m-web:v1.0.0 .
docker push jingyilin/e2m-web:v1.0.0
# create a pull request to develop branch on GitHub