Skip to content

Commit

Permalink
build(deps): bump unstructured.paddleocr 2.8.0.1 (Unstructured-IO#3388)
Browse files Browse the repository at this point in the history
### Summary
- Bump unstructured.paddleocr to `2.8.0.1` which removed `lmdb`
dependency due to license issue.

---------

Co-authored-by: Matt Robinson <[email protected]>
  • Loading branch information
christinestraub and MthwRobinson authored Jul 14, 2024
1 parent 69cddf5 commit 3e1a30d
Show file tree
Hide file tree
Showing 4 changed files with 5 additions and 7 deletions.
4 changes: 2 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
## 0.15.0-dev8
## 0.15.0-dev9

### Enhancements

* **Bump unstructured.paddleocr to 2.8.0.**
* **Bump unstructured.paddleocr to 2.8.0.1.**
* **Refine HTML parser to accommodate block element nested in phrasing.** HTML parser no longer raises on a block element (e.g. `<p>`, `<div>`) nested inside a phrasing element (e.g. `<strong>` or `<cite>`). Instead it breaks the phrasing run (and therefore element) at the block-item start and begins a new phrasing run after the block-item. This is consistent with how the browser determines element boundaries in this situation.
* **Install rewritten HTML parser to fix 12 existing bugs and provide headroom for refinement and growth.** A rewritten HTML parser resolves a collection of outstanding bugs with HTML partitioning and provides a firm foundation for further elaborating that important partitioner.
* **CI check for dependency licenses** Adds a CI check to ensure dependencies are appropriately licensed.
Expand Down
2 changes: 1 addition & 1 deletion requirements/extra-paddleocr.in
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
-c ./deps/constraints.txt
-c base.txt

unstructured.paddleocr==2.8.0
unstructured.paddleocr==2.8.0.1
4 changes: 1 addition & 3 deletions requirements/extra-paddleocr.txt
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,6 @@ lanms-neo==1.0.2
# via unstructured-paddleocr
lazy-loader==0.4
# via scikit-image
lmdb==1.5.1
# via unstructured-paddleocr
lxml==5.2.2
# via
# -c ./base.txt
Expand Down Expand Up @@ -154,7 +152,7 @@ tqdm==4.66.4
# via
# -c ./base.txt
# unstructured-paddleocr
unstructured-paddleocr==2.8.0
unstructured-paddleocr==2.8.0.1
# via -r ./extra-paddleocr.in
urllib3==1.26.19
# via
Expand Down
2 changes: 1 addition & 1 deletion unstructured/__version__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.15.0-dev8" # pragma: no cover
__version__ = "0.15.0-dev9" # pragma: no cover

0 comments on commit 3e1a30d

Please sign in to comment.