A Python application that automates the process of converting PDF books into high-quality audiobooks. It leverages the Anthropic API for text optimization and the ElevenLabs API for text-to-speech conversion.
- 📚 Extracts text from PDF books with intelligent chapter detection
- 🌐 Translates content from English to Czech using multiple translation services
- ✨ Optimizes text for natural-sounding speech synthesis using Anthropic's API
- 🎧 Generates high-quality audio with word-level timing using ElevenLabs API
- 💾 Stores all processing artifacts with MongoDB integration
- 📊 Provides detailed progress tracking and error recovery
- 🔄 Supports resuming from the last successful stage
- Python 3.8 or higher
- MongoDB 4.4 or higher
- API keys for:
- Anthropic (Claude)
- ElevenLabs
- DeepL (optional, for translation)
pip install eleven-audiobooks
- Clone the repository:
git clone https://github.com/sparesparrow/eleven-audiobooks.git
cd eleven-audiobooks
- Create a virtual environment and install dependencies:
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -e ".[dev,test]"
Set up the following environment variables:
export ANTHROPIC_API_KEY=your_anthropic_api_key
export ELEVENLABS_API_KEY=your_elevenlabs_api_key
export DEEPL_API_KEY=your_deepl_api_key # Optional, for translation
export MONGO_URI=mongodb://localhost:27017/ # Optional, defaults to localhost
export VOICE_ID=your_preferred_voice_id # Optional, defaults to a standard voice
Or create a .env
file in the project root:
ANTHROPIC_API_KEY=your_anthropic_api_key
ELEVENLABS_API_KEY=your_elevenlabs_api_key
DEEPL_API_KEY=your_deepl_api_key
MONGO_URI=mongodb://localhost:27017/
VOICE_ID=your_preferred_voice_id
Process a PDF book:
python -m eleven_audiobooks data/book.pdf
Process and translate a book:
python -m eleven_audiobooks data/book.pdf --translate
import asyncio
from pathlib import Path
from eleven_audiobooks import PipelineManager
async def process_book(pdf_path: str) -> str:
pipeline = PipelineManager(
pdf_path=Path(pdf_path),
output_dir=Path('./output'),
mongo_db=your_mongo_client,
config={
"ANTHROPIC_API_KEY": "your_api_key",
"ELEVENLABS_API_KEY": "your_api_key",
"DEEPL_API_KEY": "your_api_key", # Optional
}
)
# Process book and get audiobook URL
audiobook_url = await pipeline.process(translate=False)
return audiobook_url
# Run the pipeline
url = asyncio.run(process_book('path/to/book.pdf'))
print(f"Audiobook available at: {url}")
The project is currently under active development, with several key components functional and others in progress:
- ✅ Core Pipeline - Complete
- ✅ PDF Processing - Basic functionality working, enhancements in progress
- ✅ Translation - Basic functionality working, service abstraction in progress
- ✅ Text Optimization - Working with improvements for large inputs
- ✅ Audio Generation - Working with chunking and retries
- ✅ Storage Engine - Complete with data validation and indexing
The application follows a modular pipeline architecture with enhanced error handling and recovery mechanisms. Refer to TODO.txt
for detailed implementation plans and progress tracking.
- BatchTextOptimizer Integration - Fixed implementation of
optimize_chapter
method to properly handle file operations - Storage Engine URL Generation - Updated to handle both file IDs and paths correctly
- PDF Processor Text Cleaning - Improved OCR correction to preserve numeric values with context-aware processing
- Translation Chunk Recombination - Enhanced chapter boundary preservation during translation
-
Text Optimization
- Added rate limiting and concurrency control for API calls
- Improved text splitting to preserve paragraph and sentence structure
- Implemented retry mechanism with exponential backoff
-
Audio Generation
- Added text chunking for handling large inputs
- Implemented audio file concatenation
- Added retry logic with configurable parameters
-
Storage Engine
- Added data validation before storage operations
- Implemented versioning for all stored artifacts
- Added proper MongoDB indexing for efficient queries
- Enhanced cleanup mechanism with project-specific options
Please refer to the TODO.txt
file for current issues and planned improvements. The main limitations currently are:
- OCR quality can be inconsistent with certain PDF formats
- Translation may not preserve all nuances of the original text
- Large files may require significant processing time and resources
- API rate limits may affect processing speed
# Run all tests
pytest
# Run with coverage report
pytest --cov=eleven_audiobooks
# Run specific test file
pytest tests/test_pdf_processor.py
# Format code
black .
# Sort imports
isort .
# Lint code
ruff check .
# Type check
mypy .
When encountering issues, check the log file at audiobook.log
for detailed information. You can increase verbosity with:
export LOG_LEVEL=DEBUG
Contributions are welcome! Please read our Contributing Guide for details on our code of conduct and the process for submitting pull requests.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Anthropic for their Claude API
- ElevenLabs for their text-to-speech API
- DeepL for their translation API
- All contributors who have helped improve this project