Scrapway allows users to create powerful data scraping workflows and execute them either manually or via cron jobs. With an intuitive UI and support for various tasks such as user interactions, data extraction, storage, and result delivery – you can easily automate scraping operations with minimal effort.
- Workflow Builder: Use React Flow to create and design scraping tasks visually.
- User Interactions: Automate tasks like navigating to URLs, filling inputs, clicking elements, and scrolling to specific elements.
- Data Extraction: Extract raw HTML, specific text, or use AI-based extraction techniques.
- Data Storage: Read and write properties from JSON files, or store your scraping results conveniently.
- Timing Controls: Add flexibility by waiting for elements before proceeding with actions.
- Result Delivery: Deliver scraping results via Webhook.
- Scheduling: Set up cron jobs to automate running workflows at specific intervals.
Here is an overview of the tasks available to be added and run within the workflow:
NAVIGATE_URL
: Navigate to a specific URL.FILL_INPUT
: Fill an input field.CLICK_ELEMENT
: Simulate a click event on an element.SCROLL_TO_ELEMENT
: Scroll to a particular element on the page.
PAGE_TO_HTML
: Extract the entire page as HTML.EXTRACT_TEXT_FROM_ELEMENT
: Extract text from a specific element.EXTRACT_WITH_AI
: Use AI to extract and analyze content from the page.
READ_PROPERTY_FROM_JSON
: Read data from a JSON file.ADD_PROPERTY_TO_JSON
: Add or update properties in a JSON file.
WAIT_FOR_ELEMENT
: Wait for an element to appear before proceeding to the next task.
DELIVER_VIA_WEBHOOK
: Send the results to a Webhook URL.
Make sure you have the following installed before running the project:
- Node.js
- npm or yarn
Clone the repository and install dependencies:
git clone https://github.com/baxsm/scrapway.git
cd scrapway
npm install --legacy-peer-deps
To start the project locally:
npm run dev
Visit http://localhost:3000
to start building workflows!
To create a production build:
npm run build
- Open the Workflow Editor: Use the visual interface based on React Flow to drag and drop tasks into your workflow.
- Define Tasks: Choose tasks from categories like User Interactions, Data Extraction, etc.
- Run or Schedule: Once the workflow is complete, either run it manually or schedule the execution using a cron job.