An example of using Netlify Primitives (Edge Functions, Serverless Functions, and Durable Caching) to implement serverless browser prerendering for web crawlers and bots.
This project showcases how to build a production-ready prerendering solution using modern Netlify features:
- Edge Functions for crawler detection and routing
- Serverless Functions for dynamic HTML generation with Puppeteer
- Durable Caching for optimized performance and reduced compute costs
- Security measures to prevent abuse and open proxy attacks
Browser/Crawler Request
↓
Edge Function (crawler-detector.ts)
↓ (if crawler) ↓ (if regular user)
Serverless Function → Static Site
(prerender.mts)
↓
Puppeteer + Chrome
↓
Cached HTML Response
Purpose: Comprehensive bot and crawler detection at the edge with production-grade filtering.
Features:
- 70+ User agents including AI bots (GPTBot, ClaudeBot, PerplexityBot)
- Smart HTML request detection based on file extensions and patterns
- Accept header validation ensures requests want HTML content
- GET method filtering - only processes GET requests
- Anti-abuse measures - user agent length limits and "Prerender" exclusion
- Legacy crawler support via
_escaped_fragment_
parameter - Font asset filtering - prevents unnecessary
.woff2
prerendering
Enhanced Detection Logic:
const isCrawlerRequest = (req: Request): boolean => {
if (req.method !== 'GET') return false;
const url = new URL(req.url);
if (url.searchParams.has('_escaped_fragment_')) return true;
const userAgent = req.headers.get('user-agent') || '';
if (!userAgent || userAgent === 'Prerender' || userAgent.length > 4096) {
return false;
}
const ua = userAgent.toLowerCase();
return CRAWLER_USER_AGENTS.some(bot => ua.includes(bot));
};
Purpose: Generate SEO-optimized HTML using headless Chrome.
Core Technologies:
- Puppeteer for browser automation
- @sparticuz/chromium for AWS Lambda/Netlify compatibility
- TypeScript for type safety
Security Features:
- Same-host validation prevents open proxy abuse
- Protocol restrictions (HTTP/HTTPS only)
- Private network blocking in production
- Request monitoring with IP logging
Performance Optimizations:
- Request interception blocks tracking scripts and ads
- DOM cleanup removes cookie banners and modals
- Network request monitoring waits for content to fully load
- Intelligent timing balances completeness with performance
SPA Readiness Detection:
window.prerenderReady
- Apps can signal when content is ready- Request tracking fallback - Monitors network activity when prerenderReady not used
- Smart timeouts - 9-second maximum wait prevents indefinite hanging
- Dual strategy - Handles both modern SPAs and static content
Prerender.io Compatibility:
- Status code handling via
<meta name="prerender-status-code" content="404">
- Redirect support via
<meta name="prerender-header" content="Location: /new-url">
- Legacy crawler support via
_escaped_fragment_
parameter - Content-based caching strategies
Cache Invalidation:
- Automatic via
stale-while-revalidate
- Manual via Netlify API or redeploys
The system uses a two-tier approach:
- Edge Function (Fast): Basic user-agent detection at CDN edge
- Prerender Function (Thorough): Full HTML generation with Chrome
Local Development:
// Uses bundled Chromium from Puppeteer
browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox', '--disable-setuid-sandbox']
});
Production (Netlify):
// Uses optimized Chrome binary
browser = await puppeteer.launch({
executablePath: await chromium.executablePath(),
args: [...chromium.args, '--hide-scrollbars'],
headless: chromium.headless
});
Same-Host Validation:
const requestHost = new URL(req.url).host;
const targetHost = new URL(targetUrl).host;
if (targetHost !== requestHost) {
return new Response('Invalid target URL: must be same host', { status: 403 });
}
Private Network Protection:
const isPrivateNetwork =
hostname === 'localhost' ||
hostname.startsWith('192.168.') ||
hostname.startsWith('10.') ||
// ... other private ranges
-
Deploy to Netlify:
npm run build netlify deploy --prod
-
Test crawler detection:
# Regular user - gets React app curl https://your-site.netlify.app/ # Crawler - gets prerendered HTML curl -H "User-Agent: Googlebot" https://your-site.netlify.app/
?prerender=true
- Force prerendering for testing?_escaped_fragment_=
- Legacy crawler parameter (also forces prerendering)
# Force prerender
curl "https://your-site.netlify.app/?prerender=true"
# Legacy crawler format
curl "https://your-site.netlify.app/?_escaped_fragment_="
# Test specific page
curl -H "User-Agent: Googlebot" "https://your-site.netlify.app/about"
PRERENDER SUCCESS: https://site.com/page | 1234ms total (567ms nav, 890ms wait) | 200 status | 45678B HTML | prerenderReady=true | requests=15/18 (0 pending) | domCleanup=3 | IP=1.2.3.4
PRERENDER ERROR: https://site.com/page | 1234ms | Navigation timeout | IP=1.2.3.4
PRERENDER ERROR STACK: [full stack trace]
- Response times (target: <3s)
- Success rates (target: >95%)
- Cache hit rates (target: >80%)
- Chrome memory usage
- Request blocking effectiveness
NODE_ENV=production
- Enables production optimizationsNETLIFY=true
- Auto-set by Netlify, enables Lambda ChromeAWS_LAMBDA_FUNCTION_NAME
- Auto-set, helps detect serverless environment
[build]
command = "npm run build"
publish = "dist"
[[redirects]]
from = "/*"
to = "/index.html"
status = 200
- Browser instance reuse across requests
- Minimal dependencies in function bundle
- Fast environment detection
- Automatic page cleanup after rendering
- Browser connection pooling
- Efficient DOM manipulation
- Intelligent caching reduces function invocations
- Request blocking reduces network usage
- Smart timeouts prevent runaway functions
This demo includes a comprehensive testing suite to demonstrate different prerendering scenarios:
/test-crawler-detection.html
- Tests enhanced crawler detection with 70+ user agents and edge cases/test-prerender-ready-fast.html
- Testswindow.prerenderReady
with quick (1s) completion/test-prerender-ready-timeout.html
- Tests timeout behavior whenprerenderReady
never triggers/test-request-tracking.html
- Tests fallback request monitoring withoutprerenderReady
/test-status-code-404.html
- Tests custom HTTP status codes via meta tags/test-redirect.html
- Tests HTTP redirects via meta tags
Visit /test-index.html
for an interactive test dashboard with:
- Visual test cards for each scenario
- Browser testing links - Click "🤖 Test Prerender" to trigger prerendering in your browser
- Copy-paste curl commands for command-line testing
- Expected timing and behavior for each test
Browser Testing:
- Normal view: See the regular browser experience
- Prerender view: Add
?_escaped_fragment_=
to trigger prerendering
Command Line Testing:
# Test enhanced crawler detection
curl -H "User-Agent: GPTBot/1.0" "https://your-site.netlify.app/test-crawler-detection.html"
curl -H "User-Agent: ClaudeBot/1.0" "https://your-site.netlify.app/test-crawler-detection.html"
# Test as Googlebot (triggers prerendering)
curl -H "User-Agent: Googlebot" "https://your-site.netlify.app/test-prerender-ready-fast.html"
# Test timing scenarios
time curl -H "User-Agent: Googlebot" "https://your-site.netlify.app/test-prerender-ready-timeout.html"
# Test status codes
curl -I -H "User-Agent: Googlebot" "https://your-site.netlify.app/test-status-code-404.html"
# Test redirects
curl -I -H "User-Agent: Googlebot" "https://your-site.netlify.app/test-redirect.html"
# Test edge cases (should NOT prerender)
curl -X POST -H "User-Agent: Googlebot" "https://your-site.netlify.app/" # POST method
curl -H "User-Agent: Prerender" "https://your-site.netlify.app/" # Excluded UA
curl -H "User-Agent: Googlebot" "https://your-site.netlify.app/font.woff2" # Font file
How it works:
// In your SPA, signal when content is ready
window.prerenderReady = false; // Initial state
// After your app finishes loading/rendering
setTimeout(() => {
window.prerenderReady = true; // Signals prerender completion
}, 1000);
Behavior:
- When used: Prerender function waits for
prerenderReady = true
(up to 9s timeout) - When not used: Falls back to monitoring network requests (waits 500ms after last request)
- Best practice: Set to
true
when your SPA content is fully rendered and ready for crawlers
See /TESTING.md
for comprehensive testing instructions, including:
- Detailed test scenarios and expected results
- Performance benchmarks and timing expectations
- Troubleshooting guides for common issues
- Advanced testing techniques and automation scripts
Chrome not found:
- Ensure
@sparticuz/chromium
is installed - Check environment detection logs
- Verify production environment variables
Timeout errors:
- Check if
window.prerenderReady
is being set properly - Verify network requests are completing
- Monitor function logs for timing details
Abuse prevention:
- Host validation will block external URLs
- Monitor IP addresses in logs
- Add rate limiting if needed
This is an example project showcasing Netlify's prerendering capabilities. Feel free to:
- Fork and adapt for your use case
- Submit issues for bugs or improvements
- Share your own prerendering strategies
MIT License - feel free to use this code in your own projects.