A command-line interface for crawling websites and converting them to Markdown format with token counting capabilities.
- Crawl any website and convert content to Markdown
- Automatically copy output to clipboard
- Count tokens using OpenAI's cl100k tokenizer (used by text-embedding-ada-002)
- Progress indicator with colorful output
- Flexible output options (file or clipboard-only)
- Rust and Cargo (install from rust-lang.org)
- Ubuntu/Debian dependencies:
sudo apt-get install libxcb-shape0-dev libxcb-xfixes0-dev
-
Clone the repository:
git clone https://github.com/TheLiberal/CrawlCLI.git cd CrawlCLI
-
Build and install globally:
cargo install --path .
The CLI will be installed to ~/.cargo/bin/crawl
. Make sure ~/.cargo/bin
is in your PATH.
# Basic usage
crawl --url https://example.com
# Specify custom output file
crawl -u https://example.com -o custom_output.md
# Copy to clipboard only (no file output)
crawl -u https://example.com --clipboard-only
# Set custom page limit (default: 50)
crawl -u https://example.com -l 100
# Using API key from environment variable
export FIRECRAWL_API_KEY=your_api_key
crawl -u https://example.com
# Or provide API key directly
crawl -u https://example.com -a your_api_key
-u, --url <URL>
: The URL to crawl (required)-o, --output <FILE>
: Output file path (optional, defaults to domain name)-l, --limit <NUMBER>
: Maximum number of pages to crawl (default: 50)-a, --api-key <KEY>
: Your Firecrawl API key (can also use FIRECRAWL_API_KEY env var)-c, --clipboard-only
: Only copy to clipboard, don't create file-h, --help
: Show help information-V, --version
: Show version information
The program provides colorful output with:
- Progress spinner while crawling
- Success/error messages in color
- Token count for the generated content
- Clipboard confirmation
MIT License