GitHub - justwordsai/llm-scraper-worker

llm-scraper-worker

A port of llm-scraper to Cloudflare Workers, using the browser rendering api and ai sdk.

Usage

Setup your wrangler.toml

# ...

browser = { binding = "MYBROWSER" }

import { z } from "zod";
import LLMScraper from "llm-scraper-worker";
import puppeteer from "@cloudflare/puppeteer";
import { createOpenAI } from "@ai-sdk/openai";

// ...later, in your worker...

// Launch a browser instance
const browser = await puppeteer.launch(env.MYBROWSER);

// Initialize LLM provider
const openai = createOpenAI({
  apiKey: env.OPENAI_API_KEY, // set this up in .dev.vars / secrets
});
const llm = openai.chat("gpt-4o");

// Create a new LLMScraper
const scraper = new LLMScraper(llm);

// Open new page
const page = await browser.newPage();
await page.goto("https://news.ycombinator.com");

// Define schema to extract contents into
const schema = z.object({
  top: z
    .array(
      z.object({
        title: z.string(),
        points: z.number(),
        by: z.string(),
        commentsURL: z.string(),
      })
    )
    .length(5)
    .describe("Top 5 stories on Hacker News"),
});

// Run the scraper
const { data } = await scraper.run(page, schema, {
  format: "html",
});

await page.close();
await browser.close();

// Show the result from LLM
console.log(data);

This will output:

{
  "top": [
    {
      "title": "A 2-ply minimax chess engine in 84,688 regular expressions",
      "points": 245,
      "by": "ilya_m",
      "commentsURL": "https://news.ycombinator.com/item?id=42619652"
    },
    {
      "title": "Stimulation Clicker",
      "points": 2365,
      "by": "meetpateltech",
      "commentsURL": "https://news.ycombinator.com/item?id=42611536"
    },
    {
      "title": "AI and Startup Moats",
      "points": 37,
      "by": "vismit2000",
      "commentsURL": "https://news.ycombinator.com/item?id=42620994"
    },
    {
      "title": "How I program with LLMs",
      "points": 370,
      "by": "stpn",
      "commentsURL": "https://news.ycombinator.com/item?id=42617645"
    },
    {
      "title": "First time a Blender-made production has won the Golden Globe",
      "points": 155,
      "by": "jgilias",
      "commentsURL": "https://news.ycombinator.com/item?id=42620656"
    }
  ]
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.tshy		.tshy
examples		examples
src		src
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
wrangler.toml		wrangler.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Usage

About

Releases

Packages

Languages

justwordsai/llm-scraper-worker

Folders and files

Latest commit

History

Repository files navigation

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages