-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specific HTML can cause from_read
to hang indefinitely
#54
Comments
Thanks for the report and test case! |
I've found the problem: is to do with how it tries to allocate columns widths when laying out deeply nested and wide tables. The column width estimate is ridiculously large and the strategy for shrinking to fit into the allowed width doesn't work well. I think the full fix will tie in with changes needed for #48 and #53, but this is important enough to fix ASAP. |
Here is an alternate sample which shows same behaviour. I assume the root cause is the same, but thought it might be of help. |
Thanks! I'll check the fix with this test case as well. |
I believe I've just about fixed this branch bugfix_issue_54. Small tables look a bit tidier, but more importantly I've improved handling of larger tables and very nested tables-in-tables. One intended side effect is that for very large (wide) tables, cells will be on top of each other rather than next to each other. |
I got a panic on an index out of bounds with the attached. I have a pretty huge test set but this triggered pretty early. I think this is the relevant part of the stacktrace:
|
Thanks for another test case! I've fixed the out of bounds panic. |
I've opened #55. The only thing I think would be nice to do here is to do something to show the difference between a vertical column of table cells and a horizontal row of cells which has been rendered vertically because of space issues (e.g. a border of |
I admire your commitment to the tables :). I know the README says "rendering reasonable HTML in a terminal" among your goals, but I'd like to also humbly submit a suggestion for my use case (indexing textual content): it would be very cool to have a mode that ignores the whole issue of layout and produces mere text. |
I believe this is now fixed, and included in the 0.3.0 release. Thanks again! |
While testing against a large batch of HTML samples I found that one appeared to cause an infinite loop when calling
from_read
.Sample attached. I have tested this both with library call and with command line.
infinite.zip
I think it would be nice to be able to have some kind of limiting factor because HTML in the wild can be very weird and malformed.
The text was updated successfully, but these errors were encountered: