A Puppeteer bridge for PHP, supporting the full API. Based on Rialto, a package to manage Node resources from PHP.
Here are some examples borrowed from Puppeteer's documentation and adapted to PHP's syntax:
Example - navigating to https://example.com and saving a screenshot as example.png:
use Nesk\Puphpeteer\Puppeteer;
$puppeteer = new Puppeteer;
$browser = $puppeteer->launch();
$page = $browser->newPage();
$page->goto('https://example.com');
$page->screenshot(['path' => 'example.png']);
$browser->close();
Example - evaluate a script in the context of the page:
use Nesk\Puphpeteer\Puppeteer;
use Nesk\Rialto\Data\JsFunction;
$puppeteer = new Puppeteer;
$browser = $puppeteer->launch();
$page = $browser->newPage();
$page->goto('https://example.com');
// Get the "viewport" of the page, as reported by the page.
$dimensions = $page->evaluate(JsFunction::create("
return {
width: document.documentElement.clientWidth,
height: document.documentElement.clientHeight,
deviceScaleFactor: window.devicePixelRatio
};
"));
printf('Dimensions: %s', print_r($dimensions, true));
$browser->close();
This package requires PHP >= 7.1 and Node >= 8.
Install it with these two command lines:
composer require nesk/puphpeteer
npm install @nesk/puphpeteer
Instead of requiring Puppeteer:
const puppeteer = require('puppeteer');
You have to instanciate the Puppeteer
class:
$puppeteer = new Puppeteer;
This will create a new Node process controlled by PHP.
You can also pass some options to the constructor, see Rialto's documentation.
Note: If you use some timeouts higher than 30 seconds in Puppeteer's API, you will have to set a higher value for the read_timeout
option (default: 35
):
$puppeteer = new Puppeteer([
'read_timeout' => 65, // In seconds
]);
$puppeteer->launch()->newPage()->goto($url, [
'timeout' => 60000, // In milliseconds
]);
With PuPHPeteer, every method call or property getting/setting is synchronous.
The following methods have been aliased because PHP doesn't support the $
character in method names:
$
=>querySelector
$$
=>querySelectorAll
$x
=>querySelectorXPath
$eval
=>querySelectorEval
$$eval
=>querySelectorAllEval
Use these aliases just like you would have used the original methods:
$divs = $page->querySelectorAll('div');
Functions evaluated in the context of the page must be written with the JsFunction
class, the body of these functions must be written in JavaScript instead of PHP.
use Nesk\Rialto\Data\JsFunction;
$pageFunction = JsFunction::create(['element'], "
return element.textContent;
");
If an error occurs in Node, a Node\FatalException
will be thrown and the process closed, you will have to create a new instance of Puppeteer
.
To avoid that, you can ask Node to catch these errors by prepending your instruction with ->tryCatch
:
use Nesk\Rialto\Exceptions\Node;
try {
$page->tryCatch->goto('invalid_url');
} catch (Node\Exception $exception) {
// Handle the exception...
}
Instead, a Node\Exception
will be thrown, the Node process will stay alive and usable.
The MIT License (MIT). Please see License File for more information.