Skip to content
This repository has been archived by the owner on Jul 7, 2020. It is now read-only.

panic on some PDFs + suspect memory leak #9

Open
mark-summerfield opened this issue Mar 21, 2017 · 7 comments
Open

panic on some PDFs + suspect memory leak #9

mark-summerfield opened this issue Mar 21, 2017 · 7 comments

Comments

@mark-summerfield
Copy link

I have the following Go program that uses this library:

package main

import (
	"fmt"
	"os"
	"strconv"
	"rsc.io/pdf"
)

func main() {
	if len(os.Args) < 2 || os.Args[1] == "-h" || os.Args[1] == "--help" {
		fmt.Println("usage: pdfpage file.pdf [pnum]")
		os.Exit(1)
	}
	reader, err := pdf.Open(os.Args[1])
	if err != nil {
		fmt.Println(err)
		os.Exit(2)
	}
	if len(os.Args) == 3 {
		var pnum int
		var err error
		if pnum, err = strconv.Atoi(os.Args[2]); err != nil {
			pnum = 1
		}
		fmt.Printf("PAGE %d\n", pnum)
		printPage(reader, pnum)
	} else {
		for pnum := 1; pnum <= reader.NumPage(); pnum++ {
			fmt.Printf("PAGE %d\n", pnum)
			printPage(reader, pnum)
			fmt.Println("")
		}
	}
}

func printPage(reader *pdf.Reader, pnum int) {
	page := reader.Page(pnum)
	if page.V.IsNull() {
		fmt.Printf("failed to read page %d\n", pnum)
		os.Exit(3)
	}
	for _, chunk := range page.Content().Text {
		fmt.Printf("x=%06.2f y=%06.2f w=%06.2f %q %s %.1fpt\n",
			chunk.X, chunk.Y, chunk.W, chunk.S, chunk.Font,
			chunk.FontSize)
	}
}

This builds and runs fine and for many PDFs gives the expected output (although it is rather slow).
However I have a few PDFs which produce a panic:

PAGE 1
panic: malformed PDF: reading at offset 0: stream not present

goroutine 1 [running]:
rsc.io/pdf.(*buffer).errorf(0xc4200d3948, 0x507f70, 0x27, 0xc4200d36d0, 0x2, 0x2)
	/home/mark/app/go/src/rsc.io/pdf/lex.go:82 +0x74
rsc.io/pdf.(*buffer).reload(0xc4200d3948, 0x8)
	/home/mark/app/go/src/rsc.io/pdf/lex.go:95 +0x193
rsc.io/pdf.(*buffer).readByte(0xc4200d3948, 0x599da0)
	/home/mark/app/go/src/rsc.io/pdf/lex.go:71 +0x69
rsc.io/pdf.(*buffer).readToken(0xc4200d3948, 0xc42000aca0, 0x1000)
	/home/mark/app/go/src/rsc.io/pdf/lex.go:135 +0x4a
rsc.io/pdf.Interpret(0xc42006e060, 0x37, 0x4d78a0, 0xc42000ab60, 0xc4200d3b08)
	/home/mark/app/go/src/rsc.io/pdf/ps.go:64 +0x1c6
rsc.io/pdf.Page.Content(0xc42006e060, 0x37, 0x4db2e0, 0xc420014810, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
	/home/mark/app/go/src/rsc.io/pdf/page.go:613 +0x326
main.printPage(0xc42006e060, 0x1)
	/home/mark/app/go/src/pdfpage2/main.go:47 +0xa8
main.main()
	/home/mark/app/go/src/pdfpage2/main.go:35 +0x25d

I also have a 647 page PDF for which the program outputs the first 22 pages, then outputs PAGE 23 and then just sits there eating memory and using ~25% CPU. That particular page has some Japanese characters but I don't know if they are Unicode text or paths.

@anacrolix
Copy link

I get the same error on a PDF that's been "optimized", and on page 1.

@anacrolix
Copy link

This is the file that causes it: Species Present Report_Apr May Jun 2016.pdf

@wayi1
Copy link

wayi1 commented Oct 13, 2017

I got the same problem.

@frontmill
Copy link

I am getting this with every single pdf-file.

@asticode
Copy link

Hey guys,

I was having the same problem but after trying out another library I realized that my pdf file had a protection that prevented this library from extracting data.

After converting it with PDFCreator it removed the protection and I could read pages.

Hope it helps someone.

Cheers

@bigzhu
Copy link

bigzhu commented Jul 22, 2019

I also have a lots of pdf throw error stream not present, but open this pdf by Mac "Preview"
then "Export as PDF..." , the new exported pdf file can read and open fine.

maybe just need some pdf software open and resave the pdf, will fix this error?

use Automator batch transfer pdf is perfect.

@florin0x01
Copy link

Is the library still maintained? I've come to the conclusion that it does not support PDF versions greater than 1.2 ( so things like LWZ compression, linearization and so on). Please reply. If not maintained, what is a great alternative?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants