Docling parses documents and exports them to the desired format with ease and speed.
- 🗂️ Reads popular document formats (PDF, DOCX, PPTX, XLSX, Images, HTML, AsciiDoc & Markdown) and exports to Markdown and JSON
- 📑 Advanced PDF document understanding incl. page layout, reading order & table structures
- 🧩 Unified, expressive DoclingDocument representation format
- 🤖 Easy integration with 🦙 LlamaIndex & 🦜🔗 LangChain for powerful RAG / QA applications
- 🔍 OCR support for scanned PDFs
- 💻 Simple and convenient CLI
- ♾️ Equation & code extraction
- 📝 Metadata extraction, including title, authors, references & language
- 🦜🔗 Native LangChain extension
Docling has been brought to you by IBM.