Skip to content

Commit c8d94f4

Browse files
committed
Final version of script
1 parent 91bd772 commit c8d94f4

File tree

2 files changed

+34
-4
lines changed

2 files changed

+34
-4
lines changed

AUTOMATION/converted_pdf.txt

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
Adobe Acrobat PDF Files
2+
3+
Adobe® Portable Document Format (PDF) is a universal file format that preserves all
4+
of the fonts, formatting, colours and graphics of any source document, regardless of
5+
the application and platform used to create it.
6+
7+
Adobe PDF is an ideal format for electronic document distribution as it overcomes the
8+
problems commonly encountered with electronic file sharing.
9+
10+
• Anyone, anywhere can open a PDF file. All you need is the free Adobe Acrobat
11+
Reader. Recipients of other file formats sometimes can't open files because they
12+
don't have the applications used to create the documents.
13+
14+
• PDF files always print correctly on any printing device.
15+
16+
• PDF files always display exactly as created, regardless of fonts, software, and
17+
operating systems. Fonts, and graphics are not lost due to platform, software, and
18+
version incompatibilities.
19+
20+
• The free Acrobat Reader is easy to download and can be freely distributed by
21+
22+
anyone.
23+
24+
• Compact PDF files are smaller than their source files and download a
25+
26+
page at a time for fast display on the Web.
27+
28+

AUTOMATION/pdfToText.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,15 @@
33

44
# Extract text with Pdfminer.six Module
55
def With_PdfMiner(pdf):
6-
with open(pdf,'rb') as file_handle:
7-
doc = pdfminer.high_level.extract_text(file_handle)
8-
print(doc)
6+
with open(pdf,'rb') as file_handle_1:
7+
doc = pdfminer.high_level.extract_text(file_handle_1)
8+
9+
with open('converted_pdf.txt','w') as file_handle_2 :
10+
file_handle_2.write(doc)
11+
912

1013
if __name__ == '__main__':
1114
parser = argparse.ArgumentParser()
1215
parser.add_argument("file", help = "PDF file from which we extract text")
1316
args = parser.parse_args()
14-
# print()
1517
With_PdfMiner(args.file)

0 commit comments

Comments
 (0)