Skip to content

schartz/scraperdesu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scraperdesu

Dog digging giph from Giphy

Description

This project is a simple web scraping application using playwright-python and jolly'ol python. This tool allows you to extract data from multiple pages on the web by providing their website URLs and saving the extracted content in plain text files.

Features

  • Extract data from multiple webpages
  • Loop over all links in a batched manner, scrapping a batch in parallel
  • Save the extracted content in plain text files with custom file names

Built with

Python
Neovim

Setup Guide

To get started, make sure you have python 3.12.x visrtual environment installed on your system.
Then, follow these steps:

  1. Clone this repository to your local machine using the following command:
    git clone https://github.com/schartz/scraperdesu.git
  2. Navigate to the project directory: cd scraperdesu
  3. Install required dependencies by running: pip install -r requirements.txt
  4. Adjust your ENV info. Copy the .env file from env.sample file in the root of the project directory.
  5. Run the script by executing: python main.py command from the root of the project directory.
  6. View the output to see the scraped data and saved files' paths.

Dog digging giph from Giphy

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages