This project contains a collection of scripts designed to scrape books, primarily from the Internet Archive. The results are stored in CSV files for easy access and processing.
- Book Scraping: Extract metadata and details of books from the Internet Archive.
- CSV Output: The scraped data is saved in well-structured CSV files.
Ensure you have Python installed. Then, install the necessary dependencies:
pip install -r requirements.txt
- Clone the repository:
git clone https://github.com/your_username/scraping_books_from_internet.git
- Navigate to the project directory:
cd scraping_books_from_internet
- Run the scraping script:
Or:
python api7.py
python scraper.py
- The resulting CSV files will be located in the
output/
directory.
The project is functional but contains some known bugs. If you encounter issues, feel free to report them or contribute a fix.
Contributions are welcome! If you'd like to enhance the project or fix bugs:
- Fork the repository
- Make your changes
- Submit a pull request
Let’s improve this project together!
- The scraping scripts are optimized for the Internet Archive, but they might be adaptable to other sources with some modifications or other data like videos , images ...
- Ensure compliance with the terms of service of the websites you scrape.
Happy scraping! 🕵️♂️📚