pyPDF Unable to resolve IndirectObject getting pdf with empty pages #17
Description
I trying to write PDF file to do that i using following code:
from PyPDF3 import PdfFileWriter, PdfFileReader
import boto3
s3 = boto3.resource("s3")
bucket = s3.Bucket(my_s3Bucket_on_AWS)
object = bucket.Object(my_s3_file_on_AWS)
tmp = tempfile2.NamedTemporaryFile()
inputpdf = PdfFileReader(open(tmp.name, "rb"), strict=False)
num_pages = inputpdf.getNumPages()
output = PdfFileWriter()
for i in range(num_pages):
logger.info(f"Adding page --> {i}")
output.addPage(inputpdf.getPage(i))
logger.info(f"Here getting UserWarning")
with open(tmp2.name, "wb") as output_stream:
output.write(output_stream)
output_stream.close()
Works perfect for at least 10K of PDFs, until 1 PDF that is getting following error:
UserWarning: Unable to resolve [IndirectObject: IndirectObject(7, 0)],
returning NullObject instead [pdf.py:644]UserWarning: Unable to resolve [IndirectObject: IndirectObject(9, 0)],
returning NullObject instead [pdf.py:644]UserWarning: Unable to resolve [IndirectObject: IndirectObject(10,
0)], returning NullObject instead [pdf.py:644]UserWarning: Unable to resolve [IndirectObject: IndirectObject(13,
0)], returning NullObject instead [pdf.py:644]UserWarning: Unable to resolve [IndirectObject: IndirectObject(16,
0)], returning NullObject instead [pdf.py:644]UserWarning: Unable to resolve [IndirectObject: IndirectObject(20,
0)], returning NullObject instead [pdf.py:644]UserWarning: Unable to resolve [IndirectObject: IndirectObject(24,
0)], returning NullObject instead [pdf.py:644]UserWarning: Unable to resolve [IndirectObject: IndirectObject(29,
0)], returning NullObject instead [pdf.py:644]
Any suggestion about how to fix this?
Note: The PDF i trying to read is not empty, it have data.