Skip to content
This repository was archived by the owner on Mar 22, 2022. It is now read-only.
/ cairn Public archive
forked from wabarc/cairn

NPM package and CLI tool for saving webpages

License

Notifications You must be signed in to change notification settings

delphai/cairn

 
 

Repository files navigation

Cairn


    //   ) )                              
   //         ___     ( )  __       __    
  //        //   ) ) / / //  ) ) //   ) ) 
 //        //   / / / / //      //   / /  
((____/ / ((___( ( / / //      //   / /   

Cairn is an npm package and CLI tool for saving the web page as a single HTML file, it is TypeScript implementation of Obelisk.

Features

Usage

As CLI tool

npm install -g @wabarc/cairn
$ cairn -h

Usage: cairn [options] url1 [url2]...[urlN]

CLI tool for saving web page as single HTML file

Options:
  -v, --version              output the current version
  -o, --output <string>      path to save archival result
  -u, --user-agent <string>  set custom user agent
  -t, --timeout <number>     maximum time (in second) request timeout
  --no-js                    disable JavaScript
  --no-css                   disable CSS styling
  --no-embeds                remove embedded elements (e.g iframe)
  --no-medias                remove media elements (e.g img, audio)
  -h, --help                 display help for command

As npm package

npm install @wabarc/cairn
import { Cairn } from '@wabarc/cairn';
// const cairn = require('@wabarc/cairn');

const cairn = new Cairn();

cairn
  .request({ url: url })
  .options({ userAgent: 'Cairn/2.0.0' })
  .archive()
  .then((archived) => {
    console.log(archived.url, archived.webpage.html());
  })
  .catch((err) => console.warn(`${url} => ${JSON.stringify(err)}`));

Instance methods

cairn#request({ url: string }): this
cairn#options({}): this
  • userAgent?: string;
  • disableJS?: boolean;
  • disableCSS?: boolean;
  • disableEmbeds?: boolean;
  • disableMedias?: boolean;
  • timeout?: number;
cairn#archive(): Promise
cairn#Archived
  • url: string;
  • webpage: cheerio.Root;
  • status: 200 | 400 | 401 | 403 | 404 | 500 | 502 | 503 | 504;
  • contentType: 'text/html' | 'text/plain' | 'text/*';

Request Params

request
{
  // `url` is archival target.
  url: 'https://www.github.com'
}
options
{
  userAgent: 'Cairn/2.0.0',

  disableJS: true,
  disableCSS: false,
  disableEmbeds: false,
  disableMedias: true,

  timeout: 30
}

Response Schema

for v1.x:

The archive method will return webpage body as string.

for v2.x:

{
  url: 'https://github.com/',
  webpage: cheerio.Root,
  status: 200,
  contentType: 'text/html'
}

License

This software is released under the terms of the GNU General Public License v3.0. See the LICENSE file for details.

About

NPM package and CLI tool for saving webpages

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • TypeScript 92.3%
  • JavaScript 7.7%