Skip to content

Commit

Permalink
Merge pull request yujiosaka#67 from yujiosaka/add_max_depth_to_const…
Browse files Browse the repository at this point in the history
…ructor_options

maxDepth is a constructor option
  • Loading branch information
yujiosaka authored Jan 12, 2018
2 parents 86b3a02 + ff11d53 commit 9897273
Show file tree
Hide file tree
Showing 3 changed files with 5 additions and 2 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,7 @@ HCCrawler.launch({
* `options` <[Object]>
* `maxConcurrency` <[number]> Maximum number of pages to open concurrently, defaults to `10`.
* `maxRequest` <[number]> Maximum number of requests, defaults to `0`. Pass `0` to disable the limit.
* `maxDepth` <[number]> Maximum depth for the crawler to follow links automatically, default to 1. Leave default to disable following links.
* `exporter` <[Exporter]> An exporter object which extends [BaseExporter](#class-baseexporter)'s interfaces to export result, default to `null`.
* `cache` <[Cache]> A cache object which extends [BaseCache](#class-basecache)'s interfaces to remember and skip duplicate requests, defaults to a [SessionCache](#class-sessioncache) object.
* `persistCache` <[boolean]> Whether to clear cache on closing or disconnecting from the browser, defaults to `false`.
Expand Down Expand Up @@ -185,6 +186,7 @@ url, allowedDomains, timeout, priority, delay, retryCount, retryDelay, jQuery, d
* `options` <[Object]>
* `maxConcurrency` <[number]> Maximum number of pages to open concurrently, defaults to `10`.
* `maxRequest` <[number]> Maximum number of requests, defaults to `0`. Pass `0` to disable the limit.
* `maxDepth` <[number]> Maximum depth for the crawler to follow links automatically, default to 1. Leave default to disable following links.
* `exporter` <[Exporter]> An exporter object which extends [BaseExporter](#class-baseexporter)'s interfaces to export result, default to `null`.
* `cache` <[Cache]> A cache object which extends [BaseCache](#class-basecache)'s interfaces to remember and skip duplicate requests, defaults to a [SessionCache](#class-sessioncache) object.
* `persistCache` <[boolean]> Whether to clear cache on closing or disconnecting from the browser, defaults to `false`.
Expand Down Expand Up @@ -233,7 +235,6 @@ See [puppeteer.executablePath()](https://github.com/GoogleChrome/puppeteer/blob/

* `options` <[Object]>
* `url` <[string]> Url to navigate to. The url should include scheme, e.g. `https://`.
* `maxDepth` <[number]> Maximum depth for the crawler to follow links automatically, default to 1. Leave default to disable following links.
* `priority` <[number]> Basic priority of queues, defaults to `1`. Priority with larger number is preferred.
* `skipDuplicates` <[boolean]> Whether to skip duplicate requests, default to `null`. The request is considered to be the same if `url`, `userAgent`, `device` and `extraHeaders` are strictly the same.
* `obeyRobotsTxt` <[boolean]> Whether to obey [robots.txt](https://developers.google.com/search/reference/robots_txt), default to `true`.
Expand Down
1 change: 1 addition & 0 deletions examples/priority-queue.js
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
const HCCrawler = require('headless-chrome-crawler');

HCCrawler.launch({
maxDepth: 3,
maxConcurrency: 1,
onSuccess: (result => {
console.log(`Requested ${result.options.url}.`);
Expand Down
3 changes: 2 additions & 1 deletion lib/hccrawler.js
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ const LAUNCH_OPTIONS = [
const CONSTRUCTOR_OPTIONS = CONNECT_OPTIONS.concat(LAUNCH_OPTIONS).concat([
'maxConcurrency',
'maxRequest',
'maxDepth',
'cache',
'exporter',
'persistCache',
Expand Down Expand Up @@ -458,7 +459,7 @@ class HCCrawler extends EventEmitter {
* @private
*/
_followLinks(links, options, depth) {
if (depth >= options.maxDepth) {
if (depth >= this._options.maxDepth) {
this.emit(HCCrawler.Events.MaxDepthReached);
return;
}
Expand Down

0 comments on commit 9897273

Please sign in to comment.