Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error determining page count - fixable with change to grep command? #6

Open
abali96 opened this issue Nov 7, 2023 · 1 comment
Open
Assignees

Comments

@abali96
Copy link

abali96 commented Nov 7, 2023

On rare occassions, I've encountered PDFToImage::PDFError: Error determining page count..

Working off of what's found on this line, https://github.com/robflynn/pdftoimage/blob/master/lib/pdftoimage.rb#L95, you can see for this particular document pdfinfo does in fact return a page count:

pdfinfo /tmp/document.pdf
Title:           Document
Author:          https://imagemagick.org
Creator:         https://imagemagick.org
Producer:        https://imagemagick.org
CreationDate:    Fri Sep 15 11:17:09 2023 PDT
ModDate:         Fri Sep 15 11:17:09 2023 PDT
Custom Metadata: no
Metadata Stream: no
Tagged:          no
UserProperties:  no
Suspects:        no
Form:            none
JavaScript:      no
Pages:           1
Encrypted:       no
Page size:       611.242 x 791.252 pts (letter)
Page rot:        0
File size:       117406 bytes
Optimized:       no
PDF version:     1.4

However, when we add | grep Pages, it returns:

pdfinfo /tmp/document.pdf | grep Pages
Binary file (standard input) matches

It seems like changing grep Pages to grep --text Pages or grep -a Pages fixes matters, but I'm not sure if there's a reason why that shouldn't be used by default.

For reference:

pdfinfo version 23.10.0
Copyright 2005-2023 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2011, 2022 Glyph & Cog, LLC

I'm running on MacOS 14.0.

Thanks!

@robflynn robflynn self-assigned this Feb 13, 2024
@abramovks
Copy link
Contributor

abramovks commented Sep 20, 2024

My solution. Replace function and add -a key to Grep

pdf2image = PDFToImage

            def pdf2image.page_size(filename, page)
                cmd = "pdfinfo -f #{page} -l #{page} #{Shellwords.escape(filename)} | grep -a Page"
                output = exec(cmd)
    
                matches = /^Page.*?size:.*?(\d+).*?(\d+)/.match(output)
                if matches.nil?
                    raise "Unable to determine page size."
                end
    
                scale = 2.08333333333333333
                dimension = {
                    width: (matches[1].to_i * scale).to_i,
                    height: (matches[2].to_i * scale).to_i
                }
    
                dimension
            end
    
            def pdf2image.page_count(filename)
                cmd = "pdfinfo #{Shellwords.escape(filename)} | grep -a Pages"
                output = exec(cmd)
                matches = /^Pages:.*?(\d+)$/.match(output)
                if matches.nil?
                    raise "Error determining page count."
                end
    
                return matches[1].to_i
            end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants