Scraperdesu

Description

This project is a simple web scraping application using playwright-python and jolly'ol python. This tool allows you to extract data from multiple pages on the web by providing their website URLs and saving the extracted content in plain text files.

Features

Extract data from multiple webpages
Loop over all links in a batched manner, scrapping a batch in parallel
Save the extracted content in plain text files with custom file names

Built with

Setup Guide

To get started, make sure you have python 3.12.x visrtual environment installed on your system.
Then, follow these steps:

Clone this repository to your local machine using the following command:
git clone https://github.com/schartz/scraperdesu.git
Navigate to the project directory: cd scraperdesu
Install required dependencies by running: pip install -r requirements.txt
Adjust your ENV info. Copy the .env file from env.sample file in the root of the project directory.
Run the script by executing: python main.py command from the root of the project directory.
View the output to see the scraped data and saved files' paths.

Dog digging giph from Giphy

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
config		config
models		models
services		services
utils		utils
.env.sample		.env.sample
.gitignore		.gitignore
README.md		README.md
env.sample		env.sample
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scraperdesu

Description

Features

Built with

Setup Guide

About

Releases

Packages

Contributors 2

Languages

schartz/scraperdesu

Folders and files

Latest commit

History

Repository files navigation

Scraperdesu

Description

Features

Built with

Setup Guide

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages