Releases · DS4SD/docling

14 Feb 08:53

deep-search-ops

v2.22.0

ffbde1d

v2.22.0 Latest

Latest

Feature

Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) (00d9405)
Introduce the enable_remote_services option to allow remote connections while processing (#941) (2716c7d)
Allow artifacts_path to be defined as ENV (#940) (5101e25)

Fix

Update Pillow constraints (#958) (af19c03)
Fix the initialization of the TesseractOcrModel (#935) (c47ae70)

Documentation

Update example Dockerfile with download CLI (#929) (7493d5b)
Examples for picture descriptions (#951) (2d66e99)

Assets 2

10 Feb 11:43

deep-search-ops

v2.21.0

de46209

v2.21.0

Feature

Add content_layer property to items to address body, furniture and other roles (#735) (cf78d5b)

Assets 2

07 Feb 17:46

deep-search-ops

v2.20.0

3e26597

v2.20.0

Feature

Describe pictures using vision models (#259) (4cc6e3e)

Fix

Remove unused httpx (#919) (c18f47c)

Assets 2

07 Feb 13:36

deep-search-ops

v2.19.0

fba3cf9

v2.19.0

Feature

New artifacts path and CLI utility (#876) (ed74fe2)

Fix

markdown: Handle nested lists (#910) (90b766e)
Test cases for RTL programmatic PDFs and fixes for the formula model (#903) (9114ada)
msword_backend: Handle conversion error in label parsing (#896) (722a6eb)
Enrichment models batch size and expose picture classifier (#878) (5ad6de0)

Documentation

Introduce example with custom models for RapidOCR (#874) (6d3fea0)

Assets 2

03 Feb 14:58

deep-search-ops

v2.18.0

b5da408

v2.18.0

Feature

Expose equation exports (#869) (6a76b49)
Add option to define page range (#852) (70d68b6)
docx: Support of SDTs in docx backend (#853) (d727b04)
Python 3.13 support (#841) (4df085a)

Fix

markdown: Fix parsing if doc ending with table (#873) (5ac2887)
markdown: Add support for HTML content (#855) (94751a7)
docx: Merged table cells not properly converted (#857) (0cd81a8)
Processing of placeholder shapes in pptx that have text but no bbox (#868) (eff16b6)
KeyError in tableformer prediction (#854) (b1cf796)
Fixed docx import with headers that are also lists (#842) (2c037ae)
Use new add_code in html backend and add more typing hints (#850) (2a1f8af)
markdown: Fix empty block handling (#843) (bccb022)
Fix for the crash when encountering WMF images in pptx and docx (#837) (fea0a99)

Documentation

Updated the readme with upcoming features (#831) (d7c0828)
Add example for inspection of picture content (#624) (f9144f2)

Assets 2

28 Jan 18:37

deep-search-ops

v2.17.0

4d11d87

v2.17.0

Feature

CLI: Expose code and formula models in the CLI (#820) (6882e6c)
Add platform info to CLI version printout (#816) (95b293a)
ocr: Expose rec_keys_path in RapidOcrOptions to support custom dictionaries (#786) (5332755)
Introduce automatic language detection in TesseractOcrCliModel (#800) (3be2fb5)

Fix

Fix single newline handling in MD backend (#824) (5aed9f8)
Use file extension if filetype fails with PDF (#827) (adf6353)
Parse html with omitted body tag (#818) (a112d7a)

Documentation

Document Docling JSON parsing (#819) (6875913)
Add SSL verification error mitigation (#821) (5139b48)
backend XML: Do not delete temp file in notebook (#817) (4d41db3)
Typo (#814) (8a4ec77)
Added markdown headings to enable TOC in github pages (#808) (b885b2f)
Description of supported formats and backends (#788) (c2ae1cc)

Assets 2

24 Jan 18:21

deep-search-ops

v2.16.0

9e4ca90

v2.16.0

Feature

New document picture classifier (#805) (16a218d)
Add Docling JSON ingestion (#783) (88a0e66)
Code and equation model for PDF and code blocks in markdown (#752) (3213b24)
Add "auto" language for TesseractOcr (#759) (8543c22)

Fix

Added extraction of byte-images in excel (#804) (a458e29)
Update docling-parse-v2 backend version with new parsing fixes (#769) (670a08b)

Documentation

Fix minor typos (#801) (c58f75d)
Add Azure RAG example (#675) (9020a93)
Fix links between docs pages (#697) (c49b352)
Fix correct Accelerator pipeline options in docs/examples/custom_convert.py (#733) (7686083)
Example to translate documents (#739) (f7e1cbf)

Assets 2

10 Jan 10:29

deep-search-ops

v2.15.1

1976584

v2.15.1

Fix

Improve OCR results, stricten criteria before dropping bitmap areas (#719) (5a060f2)
Allow earlier requests versions (#716) (e64b5a2)

Documentation

Add pointers to LangChain-side docs (#718) (9a6b5c8)
Add LangChain docs (#717) (4fa8028)

Assets 2

08 Jan 12:06

deep-search-ops

v2.15.0

9a94b54

v2.15.0

Feature

Added http header support for document converter and cli (#642) (0ee849e)

Fix

Correct scaling of debug visualizations, tune OCR (#700) (5cb4cf6)
Let BeautifulSoup detect the HTML encoding (#695) (42856fd)
mspowerpoint: Handle invalid images in PowerPoint slides (#650) (d49650c)

Documentation

Specify docstring types (#702) (ead396a)
Add link to rag with granite (#698) (6701f34)
Add integrations, revamp docs (#693) (2d24fae)
Add OpenContracts as an integration (#679) (569038d)
Add Weaviate RAG recipe notebook (#451) (2b591f9)
Document Haystack & Vectara support (#628) (fc645ea)

Assets 2

18 Dec 07:04

deep-search-ops

v2.14.0

1418fa1

v2.14.0

Feature

Create a backend to transform PubMed XML files to DoclingDocument (#557) (fd03480)

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature

Fix

Documentation

Feature

Feature

Fix

Feature

Fix

Documentation

Feature

Fix

Documentation

Feature

Fix

Documentation

Feature

Fix

Documentation

Fix

Documentation

Feature

Fix

Documentation

Feature

Releases: DS4SD/docling

v2.22.0

Feature

Fix

Documentation

v2.21.0

Feature

v2.20.0

Feature

Fix

v2.19.0

Feature

Fix

Documentation

v2.18.0

Feature

Fix

Documentation

v2.17.0

Feature

Fix

Documentation

v2.16.0

Feature

Fix

Documentation

v2.15.1

Fix

Documentation

v2.15.0

Feature

Fix

Documentation

v2.14.0

Feature