Skip to content

Releases: DS4SD/docling

v2.22.0

14 Feb 08:53
Compare
Choose a tag to compare

Feature

  • Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) (00d9405)
  • Introduce the enable_remote_services option to allow remote connections while processing (#941) (2716c7d)
  • Allow artifacts_path to be defined as ENV (#940) (5101e25)

Fix

Documentation

  • Update example Dockerfile with download CLI (#929) (7493d5b)
  • Examples for picture descriptions (#951) (2d66e99)

v2.21.0

10 Feb 11:43
Compare
Choose a tag to compare

Feature

  • Add content_layer property to items to address body, furniture and other roles (#735) (cf78d5b)

v2.20.0

07 Feb 17:46
Compare
Choose a tag to compare

Feature

  • Describe pictures using vision models (#259) (4cc6e3e)

Fix

v2.19.0

07 Feb 13:36
Compare
Choose a tag to compare

Feature

Fix

  • markdown: Handle nested lists (#910) (90b766e)
  • Test cases for RTL programmatic PDFs and fixes for the formula model (#903) (9114ada)
  • msword_backend: Handle conversion error in label parsing (#896) (722a6eb)
  • Enrichment models batch size and expose picture classifier (#878) (5ad6de0)

Documentation

  • Introduce example with custom models for RapidOCR (#874) (6d3fea0)

v2.18.0

03 Feb 14:58
Compare
Choose a tag to compare

Feature

Fix

  • markdown: Fix parsing if doc ending with table (#873) (5ac2887)
  • markdown: Add support for HTML content (#855) (94751a7)
  • docx: Merged table cells not properly converted (#857) (0cd81a8)
  • Processing of placeholder shapes in pptx that have text but no bbox (#868) (eff16b6)
  • KeyError in tableformer prediction (#854) (b1cf796)
  • Fixed docx import with headers that are also lists (#842) (2c037ae)
  • Use new add_code in html backend and add more typing hints (#850) (2a1f8af)
  • markdown: Fix empty block handling (#843) (bccb022)
  • Fix for the crash when encountering WMF images in pptx and docx (#837) (fea0a99)

Documentation

  • Updated the readme with upcoming features (#831) (d7c0828)
  • Add example for inspection of picture content (#624) (f9144f2)

v2.17.0

28 Jan 18:37
Compare
Choose a tag to compare

Feature

  • CLI: Expose code and formula models in the CLI (#820) (6882e6c)
  • Add platform info to CLI version printout (#816) (95b293a)
  • ocr: Expose rec_keys_path in RapidOcrOptions to support custom dictionaries (#786) (5332755)
  • Introduce automatic language detection in TesseractOcrCliModel (#800) (3be2fb5)

Fix

  • Fix single newline handling in MD backend (#824) (5aed9f8)
  • Use file extension if filetype fails with PDF (#827) (adf6353)
  • Parse html with omitted body tag (#818) (a112d7a)

Documentation

v2.16.0

24 Jan 18:21
Compare
Choose a tag to compare

Feature

  • New document picture classifier (#805) (16a218d)
  • Add Docling JSON ingestion (#783) (88a0e66)
  • Code and equation model for PDF and code blocks in markdown (#752) (3213b24)
  • Add "auto" language for TesseractOcr (#759) (8543c22)

Fix

  • Added extraction of byte-images in excel (#804) (a458e29)
  • Update docling-parse-v2 backend version with new parsing fixes (#769) (670a08b)

Documentation

v2.15.1

10 Jan 10:29
Compare
Choose a tag to compare

Fix

  • Improve OCR results, stricten criteria before dropping bitmap areas (#719) (5a060f2)
  • Allow earlier requests versions (#716) (e64b5a2)

Documentation

v2.15.0

08 Jan 12:06
Compare
Choose a tag to compare

Feature

  • Added http header support for document converter and cli (#642) (0ee849e)

Fix

  • Correct scaling of debug visualizations, tune OCR (#700) (5cb4cf6)
  • Let BeautifulSoup detect the HTML encoding (#695) (42856fd)
  • mspowerpoint: Handle invalid images in PowerPoint slides (#650) (d49650c)

Documentation

v2.14.0

18 Dec 07:04
Compare
Choose a tag to compare

Feature

  • Create a backend to transform PubMed XML files to DoclingDocument (#557) (fd03480)