-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trying to extract text, not all strings are present - looks like those with non-latin characters are gone #102
Comments
@Niedzwiedzw which approach are you using? |
I've switched to master branch to be able to use named enum-style Ops, but now it doesn't load the document at all
|
oh wow. an inline image. Will look into that as well. |
I'm not creating the documents, and I can imagine the standard compliance for pdf is a MESS. for some context, I'm trying to salvage what I can from some government generated documents :D |
@s3bk https://github.com/sbeckeriv/lopdf/blob/master/src/nom_parser.rs would this be useful to you at all? |
I don't think we are going to switch to nom. It is great, but PDF is a mess and we already have a handwritten parser. |
The PDF Reference lists |
@Niedzwiedzw you are in luck. The |
so cool thank you so much @s3bk |
hmm |
I'm unable to provide an example pdf cause it contains sensitive data though :(
The text was updated successfully, but these errors were encountered: