Skip to content

Commit 67c4c1d

Browse files
authored
Merge pull request larymak#125 from Sdccoding/main
Created a PDF_Downloader using Python Web Scraping
2 parents 405bc8c + b50178b commit 67c4c1d

File tree

4 files changed

+33
-0
lines changed

4 files changed

+33
-0
lines changed

PDF_Downloader/Readme.MD

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
This is the readme file of this project.
2+
It's a basic PDF downloader from a certain link.
3+
4+
5+
Install required dependancies
6+
7+
python -m pip install ./requirements.txt
8+
9+
How to run :
10+
11+
python pdf.py

PDF_Downloader/pdf.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
import os
2+
import requests
3+
from urllib.parse import urljoin
4+
from bs4 import BeautifulSoup
5+
6+
#Put the link from which you need to download all the pdf
7+
url = ""
8+
9+
#If there is no such folder, the script will create one automatically
10+
folder_location = r'./NewFolder'
11+
if not os.path.exists(folder_location):os.mkdir(folder_location)
12+
13+
response = requests.get(url)
14+
soup= BeautifulSoup(response.text, "html.parser")
15+
for link in soup.select("a[href$='.pdf']"):
16+
#Name the pdf files using the last portion of each link which are unique in this case
17+
filename = os.path.join(folder_location,link['href'].split('/')[-1])
18+
with open(filename, 'wb') as f:
19+
f.write(requests.get(urljoin(url,link['href'])).content)

PDF_Downloader/requirements.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
beautifulsoup4==4.10.0
2+
requests==2.18.4

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,3 +92,4 @@ The contribution guidelines are as per the guide [HERE](https://github.com/larym
9292
| 49 | [Pomodoro App](https://github.com/HarshitRV/Python-project-Scripts/tree/main/Pomodoro-App) | [HarshitRV](https://github.com/HarshitRV)
9393
| 49 | [BullsAndCows](https://github.com/HarshitRV/Python-project-Scripts/tree/main/BullsAndCows) | [JerryChen](https://github.com/jerrychen1990)
9494
| 50 | [Minesweeper AI](https://github.com/nrp114/Minsweeper_AI) | [Nisarg Patel](https://github.com/nrp114)
95+
| 51 | [PDF Downloader](https://github.com/Sdccoding/Python-project-Scripts/tree/main/PDF_Downloader) | [Souhardya Das Chowdhury](https://github.com/Sdccoding)

0 commit comments

Comments
 (0)