Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I get a page and save to another file? #99

Open
zemelLeong opened this issue Aug 5, 2021 · 10 comments
Open

How can I get a page and save to another file? #99

zemelLeong opened this issue Aug 5, 2021 · 10 comments

Comments

@zemelLeong
Copy link

Like this.

from PyPDF2 import PdfFileReader, PdfFileWriter

pdf_input = PdfFileReader(open("test.pdf", 'rb'))

pdf_output = PdfFileWriter()
page = pdf_input.getPage(2)
pdf_output.addPage(page)

pdf_output.write(open("./splitted.pdf", 'wb'))
@s3bk
Copy link
Contributor

s3bk commented Aug 5, 2021

Constructing PDFs is very much under construction.
Take a look at https://github.com/pdf-rs/pdf/blob/master/examples/content/src/main.rs and use page.content instead.

Note that so far no cleanup is done. It just writes another trailer to the existing data.

@zemelLeong
Copy link
Author

Hope it can be read a page as a stream.

import asyncio
from PyPDF2 import PdfFileReader, PdfFileWriter


async def sender():
    _, writer = await asyncio.open_connection('127.0.0.1', 8888)

    old_write = writer.write
    writer.length = 0

    def write(data):
        writer.length += len(data)
        old_write(data)

    def tell():
        return writer.length

    writer.tell = tell
    writer.write = write

    pdf_input = PdfFileReader(open("original.pdf", 'rb'))

    pdf_output = PdfFileWriter()
    page = pdf_input.getPage(5)

    pdf_output.addPage(page)

    pdf_output.write(writer)


asyncio.run(sender())

@s3bk
Copy link
Contributor

s3bk commented Aug 5, 2021

No. PDFs need to be there entirely. Technically there exists an extension that allows processing partial PDFs, but that would require a much more complex architecture.

@zemelLeong
Copy link
Author

我可能表达得不准确,我是希望被读取的一页能够转换为字节数组以便于在网络中传输。我在pdf-rspypdf2中有找到相似的代码。
My expression may not be accurate. I want the page read to be converted into a byte array for transmission over the network. I found similar code in pdf-rs and pypdf2.

image

@s3bk
Copy link
Contributor

s3bk commented Aug 5, 2021

for now you can add a save_to_vec here:
https://github.com/pdf-rs/pdf/blob/master/pdf/src/file.rs#L261

    pub fn save_to_vec(&mut self, path: impl AsRef<Path>) -> Result<Vec<u8>> {
        self.storage.save(&mut self.trailer)?)
    }

Note that the output still contains all original data, so it will not be smaller.

@zemelLeong
Copy link
Author

I use this file to test this example. The generated file display is blank. The other files have the same issue.

#[cfg(test)]
mod pdf_test {
    use pdf::content::{Op, Point};
    use pdf::{build::PageBuilder, content::Content, file::File};
    use pdf::build::CatalogBuilder;

    macro_rules! file_path {
        ( $sub_dir:expr ) => { concat!("./src/test/common/", $sub_dir) }
    }

    macro_rules! run {
        ($e:expr) => (
            match $e {
                Ok(v) => v,
                Err(e) => {
                    e.trace();
                    panic!("{}", e);
                }
            }
        )
    }

    #[test]
    pub fn write_pages() {
        let mut file = run!(File::<Vec<u8>>::open(file_path!("xelatex.pdf")));
        let mut pages = Vec::new();
        for page in file.pages().take(1) {
            let page = page.unwrap();
            if let Some(ref c) = page.contents {
                println!("{:?}", c);
            }
    
            let content = Content::from_ops(vec![
                Op::MoveTo { p: Point { x: 100., y: 100. } },
                Op::LineTo { p: Point { x: 100., y: 200. } },
                Op::LineTo { p: Point { x: 200., y: 100. } },
                Op::LineTo { p: Point { x: 200., y: 200. } },
                Op::Close,
                Op::Stroke,
            ]);
            pages.push(PageBuilder::from_content(content));
        }
        let catalog = CatalogBuilder::from_pages(pages)
            .build(&mut file).unwrap();
        
        file.update_catalog(catalog).unwrap();
    
        file.save_to(file_path!("modify.pdf")).unwrap();
    }
}

image

@zemelLeong
Copy link
Author

Open modify.pdf got an error.

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Try { file: "pdf\\src\\file.rs", line: 277, column: 23, source: FromPrimitive { typ: "RcRef < Catalog >", field: "
root", source: Try { file: "pdf\\src\\file.rs", line: 94, column: 19, source: FromPrimitive { typ: "PagesRc", field: "pages", source: Try { file: "pdf\\src\\object\\types.rs", line: 90,
column: 20, source: UnexpectedPrimitive { expected: "Reference", found: "Dictionary" } } } } } }', examples\content\src\main.rs:12:49

@s3bk
Copy link
Contributor

s3bk commented Aug 8, 2021

Yea, I ran into the same problem. This should be fixed now. Try running cargo update (or git pull if you have a local repo).

@zemelLeong
Copy link
Author

Rewritten content it seems that missing some info.

#[cfg(test)]
mod pdf_test {
    use pdf::content::{Op, Point};
    use pdf::{build::PageBuilder, content::Content, file::File};
    use pdf::build::CatalogBuilder;

    macro_rules! file_path {
        ( $sub_dir:expr ) => { concat!("./src/test/common/", $sub_dir) }
    }

    macro_rules! run {
        ($e:expr) => (
            match $e {
                Ok(v) => v,
                Err(e) => {
                    e.trace();
                    panic!("{}", e);
                }
            }
        )
    }

    #[test]
    pub fn write_pages() {
        let mut file = run!(File::<Vec<u8>>::open(file_path!("xelatex.pdf")));

        let mut pages = Vec::new();
        // for page in file.pages().take(1) {
        //     let page = page.unwrap();
        //     if let Some(ref c) = page.contents {
        //         println!("{:?}", c);
        //     }
    
        //     let content = Content::from_ops(vec![
        //         Op::MoveTo { p: Point { x: 100., y: 100. } },
        //         Op::LineTo { p: Point { x: 100., y: 200. } },
        //         Op::LineTo { p: Point { x: 200., y: 100. } },
        //         Op::LineTo { p: Point { x: 200., y: 200. } },
        //         Op::Close,
        //         Op::Stroke,
        //     ]);
        //     pages.push(PageBuilder::from_content(content));
        // }

        // for page in file.pages() {
        //     if let Some(ref contents) = page.unwrap().contents {
        //         let content = Content::from_ops(contents.operations.to_vec());
        //         pages.push(PageBuilder::from_content(content));
        //     }
        // }

        for page in file.pages().take(2) {
            let content = page.unwrap().contents.clone().unwrap();
            pages.push(PageBuilder::from_content(content));
        }

        let catalog = CatalogBuilder::from_pages(pages)
            .build(&mut file).unwrap();
        
        file.update_catalog(catalog).unwrap();
    
        file.save_to(file_path!("modify.pdf")).unwrap();
    }
}

image

@zemelLeong
Copy link
Author

This method worked.

#[cfg(test)]
mod pdf_test {
    use pdf::content::{Op, Point};
    use pdf::{build::PageBuilder, content::Content, file::File};
    use pdf::build::CatalogBuilder;

    macro_rules! file_path {
        ( $sub_dir:expr ) => { concat!("./src/test/common/", $sub_dir) }
    }

    macro_rules! run {
        ($e:expr) => (
            match $e {
                Ok(v) => v,
                Err(e) => {
                    e.trace();
                    panic!("{}", e);
                }
            }
        )
    }

    #[test]
    pub fn write_pages() {
        let mut file = run!(File::<Vec<u8>>::open(file_path!("xelatex.pdf")));

        let mut pages = Vec::new();

        for page in file.pages().take(2) {
            if let Ok(ref page) = page {
                pages.push(PageBuilder::from_page(page).unwrap());
            }
        }

        let catalog = CatalogBuilder::from_pages(pages)
            .build(&mut file).unwrap();
        
        file.update_catalog(catalog).unwrap();
    
        file.save_to(file_path!("modify.pdf")).unwrap();
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants