Skip to content

Locussta/scraping_books_from_internet

 
 

Repository files navigation

Scraping Books from Internet

This project contains a collection of scripts designed to scrape books, primarily from the Internet Archive. The results are stored in CSV files for easy access and processing.

Features

  • Book Scraping: Extract metadata and details of books from the Internet Archive.
  • CSV Output: The scraped data is saved in well-structured CSV files.

Requirements

Ensure you have Python installed. Then, install the necessary dependencies:

pip install -r requirements.txt

Usage

  1. Clone the repository:
    git clone https://github.com/your_username/scraping_books_from_internet.git
  2. Navigate to the project directory:
    cd scraping_books_from_internet
  3. Run the scraping script:
    python api7.py
    Or:
    python scraper.py
  4. The resulting CSV files will be located in the output/ directory.

Current Status

The project is functional but contains some known bugs. If you encounter issues, feel free to report them or contribute a fix.

Contributing

Contributions are welcome! If you'd like to enhance the project or fix bugs:

  • Fork the repository
  • Make your changes
  • Submit a pull request

Let’s improve this project together!

Notes

  • The scraping scripts are optimized for the Internet Archive, but they might be adaptable to other sources with some modifications or other data like videos , images ...
  • Ensure compliance with the terms of service of the websites you scrape.

Happy scraping! 🕵️‍♂️📚

About

Some codes to scrape books from the internet archive

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%