-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Insights: DS4SD/docling
Overview
Could not load contribution data
Please try again later
3 Releases published by 1 person
17 Pull requests merged by 6 people
-
fix: remove unused httpx
#919 merged
Feb 7, 2025 -
feat: Describe pictures using vision models
#259 merged
Feb 7, 2025 -
refactor: use org--name in artifacts-path
#912 merged
Feb 7, 2025 -
fix(markdown): handle nested lists
#910 merged
Feb 7, 2025 -
fix: Test cases for RTL programmatic PDFs and fixes for the formula model
#903 merged
Feb 7, 2025 -
feat: new artifacts path for old and new models and CLI utility
#876 merged
Feb 6, 2025 -
Artifacts dl improv
#901 merged
Feb 6, 2025 -
fix(msword_backend): handle conversion error in label parsing
#896 merged
Feb 6, 2025 -
fix: enrichment models batch size and expose picture classifier
#878 merged
Feb 5, 2025 -
chore: fix docs search
#880 merged
Feb 4, 2025 -
docs: Introduce example with custom models for RapidOCR
#874 merged
Feb 4, 2025 -
fix(markdown): fix parsing if doc ending with table
#873 merged
Feb 3, 2025 -
chore: cleanup top-level file
#872 merged
Feb 3, 2025 -
fix(markdown): add support for HTML content
#855 merged
Feb 3, 2025 -
feat: Expose equation exports
#869 merged
Feb 3, 2025 -
fix(docx): merged table cells not properly converted
#857 merged
Feb 3, 2025 -
fix: Processing of placeholder shapes in pptx that have text but no bbox
#868 merged
Feb 3, 2025
4 Pull requests opened by 3 people
-
feat(actor): Docling Actor on Apify infrastructure
#875 opened
Feb 3, 2025 -
feat: add update and tests for right-to-left documents [probably needs to be closed, not merged]
#883 opened
Feb 4, 2025 -
feat: Add DoclingParseV3 backend using high-level docling-parse API
#905 opened
Feb 6, 2025 -
feat: [WIP] Implement new reading-order model
#916 opened
Feb 7, 2025
19 Issues closed by 8 people
-
Error: No module named 'httpx'
#918 closed
Feb 7, 2025 -
Integrate image understanding pipeline
#192 closed
Feb 7, 2025 -
convert is keeping a subprocess or thread running after returning results
#915 closed
Feb 7, 2025 -
SSL Error
#909 closed
Feb 7, 2025 -
Convert Markdown document incorrect
#623 closed
Feb 7, 2025 -
Behind a Firewall: how to download models?
#904 closed
Feb 6, 2025 -
GOT-OCR 2.0 support
#898 closed
Feb 6, 2025 -
Set artifacts_path for picture classifier
#870 closed
Feb 6, 2025 -
[Errno 30] Read-only file system: '/home/sbx_user1051'
#900 closed
Feb 6, 2025 -
PDF to MD Conversion with Docling v2.18 is Incomprehensible
#888 closed
Feb 6, 2025 -
how to export each page to markdown for docx/pdf ?
#892 closed
Feb 5, 2025 -
converter.convert extremely slow
#879 closed
Feb 5, 2025 -
Restore docs search
#836 closed
Feb 4, 2025 -
Support for Mixed Document Types
#734 closed
Feb 3, 2025 -
Incorrect Table Formatting in Converting Word Documents To HTML
#791 closed
Feb 3, 2025 -
Crash with DoclingDocument.add_code() got an unexpected keyword argument 'label'
#863 closed
Feb 3, 2025 -
RuntimeError: Invalid code point
#860 closed
Feb 3, 2025 -
Placeholder elements in Powerpoint files have no size
#584 closed
Feb 3, 2025 -
how to refine OCR result and choose custom model?
#806 closed
Feb 2, 2025
19 Issues opened by 18 people
-
Some page numbers are excluded from document index
#920 opened
Feb 7, 2025 -
Numbered list is identified as a table
#917 opened
Feb 7, 2025 -
Incorrect table columns
#914 opened
Feb 7, 2025 -
Markdown parser only considers first child of ListItem
#913 opened
Feb 7, 2025 -
Update/retrain layout model to identify correctly single column reference pages
#908 opened
Feb 7, 2025 -
Feature request: Parsing citations and references in research papers
#906 opened
Feb 6, 2025 -
Release beautifulsoup version constrain
#902 opened
Feb 6, 2025 -
Docling crashes on the attached docx
#895 opened
Feb 5, 2025 -
Support conversion of JATS format into DoclingDocument
#893 opened
Feb 5, 2025 -
Docling on n8n nodes
#890 opened
Feb 4, 2025 -
Not All Headers Are Identified during PDF to MD conversion
#887 opened
Feb 4, 2025 -
No text extracted from DOCX image
#886 opened
Feb 4, 2025 -
Deploy a docling app on Azure App Service
#884 opened
Feb 4, 2025 -
Newer versions fail to include pdf table cells that are successfully handled in older versions
#882 opened
Feb 4, 2025 -
GPU RAM Requirements for Formula Detection (do_formula_enrichment)
#871 opened
Feb 3, 2025 -
OSError: could not find or load spatialindex_c-64.dll
#867 opened
Feb 3, 2025 -
refine quality of OCR for tables
#866 opened
Feb 2, 2025
18 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
feat: Add content_layer property to items to address body, furniture and other roles
#735 commented on
Feb 7, 2025 • 1 new comment -
feat: use `w:lastRenderedPageBreak` to get approximate pagination from docx
#832 commented on
Feb 3, 2025 • 0 new comments -
feat: translate equations to latex when running MSWord backend
#825 commented on
Feb 7, 2025 • 0 new comments -
feat: Adding cuda:n device allocation
#694 commented on
Feb 3, 2025 • 0 new comments -
feat: Enable markdown text formatting for docx
#630 commented on
Feb 7, 2025 • 0 new comments -
Error building extension 'MultiScaleDeformableAttention' when running sample from web site.
#603 commented on
Feb 8, 2025 • 0 new comments -
Landscape pages are not read
#683 commented on
Feb 7, 2025 • 0 new comments -
Support pagination in MSWord documents
#833 commented on
Feb 7, 2025 • 0 new comments -
Identify table of contents for better chunking Hierarchy Identification
#287 commented on
Feb 6, 2025 • 0 new comments -
Hyperlinks not identified in PDFs
#828 commented on
Feb 6, 2025 • 0 new comments -
supporting footnotes
#433 commented on
Feb 5, 2025 • 0 new comments -
Downloading detection and recognition models takes a lot of time and space on my pod
#746 commented on
Feb 5, 2025 • 0 new comments -
Control HTML document Unicode decoding
#682 commented on
Feb 5, 2025 • 0 new comments -
Overlapping layout clusters
#747 commented on
Feb 5, 2025 • 0 new comments -
Markdown header levels are all the same
#652 commented on
Feb 4, 2025 • 0 new comments -
Issue with Heading Extraction
#529 commented on
Feb 4, 2025 • 0 new comments -
Improve backend resolution logic
#802 commented on
Feb 3, 2025 • 0 new comments -
No translation of option buttons when converting docx to MarkDown
#858 commented on
Feb 3, 2025 • 0 new comments