Skip to content

0ddbird/OC_Python_P2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BookWorm

A webscraper for books.toscrape.com

Installation

  1. Clone the repository git clone https://github.com/0ddbird/OC_Python_P2.git
  2. Navigate to the local directory cd OC_Python_P2
  3. Create a virtual environment python -m venv <venv_name>
  4. Activate the virtual environment source <venv_name>/bin/activate
  5. Install requirements pip install -r requirements.txt
  6. Execute the script python -m src/main

Requirements

Python version and cchardet

To run the most performant version of this script, Python 3.9.x is required.
This version sets cchardet over charset-normalizer as an aiohttp dependency.

If you want to use a later version of Python >= 3.10, don't install cchardet package, aiohttp will automatically switch to charset-normalizer.

Parser options

Two parsers are available in this project.
By default, the script will run using Selectolax.

If you prefer to use Beautiful Soup 4 instead, you can run python -m src/main -bs4

This project uses the lxml parser over 'html.parser' for Beautiful Soup 4.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages