Skip to content
/ cairn Public
forked from wabarc/cairn

NPM package and CLI tool for saving web page as single HTML file

License

Notifications You must be signed in to change notification settings

lehvitus/cairn

 
 

Repository files navigation

Cairn


    //   ) )                              
   //         ___     ( )  __       __    
  //        //   ) ) / / //  ) ) //   ) ) 
 //        //   / / / / //      //   / /  
((____/ / ((___( ( / / //      //   / /   

Cairn is an npm package and CLI tool for saving the web page as a single HTML file, it is TypeScript implementation of Obelisk.

Features

Usage

As CLI tool

npm install -g @wabarc/cairn
$ cairn -h

Usage: cairn [options] url1 [url2]...[urlN]

CLI tool for saving web page as single HTML file

Options:
  -v, --version                         output the current version
  -o, --output <string>                 path to save archival result
  -u, --user-agent <string>             set custom user agent
  -p, --proxy [protocol://]host[:port]  use this proxy
  -t, --timeout <number>                maximum time (in second) request timeout
  --no-js                               disable JavaScript
  --no-css                              disable CSS styling
  --no-embeds                           remove embedded elements (e.g iframe)
  --no-medias                           remove media elements (e.g img, audio)
  -h, --help                            display help for command

As npm package

npm install @wabarc/cairn
import { Cairn } from '@wabarc/cairn';
// const cairn = require('@wabarc/cairn');

const cairn = new Cairn();

cairn
  .request({ url: url })
  .options({ userAgent: 'Cairn/2.0.0', proxy: 'socks5://127.0.0.1:1080' })
  .archive()
  .then((archived) => {
    console.log(archived.url, archived.webpage.html());
  })
  .catch((err) => console.warn(`${url} => ${JSON.stringify(err)}`));

Instance methods

cairn#request({ url: string }): this
cairn#options({}): this
  • proxy?: string;
  • userAgent?: string;
  • disableJS?: boolean;
  • disableCSS?: boolean;
  • disableEmbeds?: boolean;
  • disableMedias?: boolean;
  • timeout?: number;
cairn#archive(): Promise
cairn#Archived
  • url: string;
  • webpage: cheerio.Root;
  • status: 200 | 400 | 401 | 403 | 404 | 500 | 502 | 503 | 504;
  • contentType: 'text/html' | 'text/plain' | 'text/*';

Request Params

request
{
  // `url` is archival target.
  url: 'https://www.github.com'
}
options
{
  proxy: 'socks5://127.0.0.1:1080',
  userAgent: 'Cairn/2.0.0',

  disableJS: true,
  disableCSS: false,
  disableEmbeds: false,
  disableMedias: true,

  timeout: 30
}

Response Schema

for v1.x:

The archive method will return webpage body as string.

for v2.x:

{
  url: 'https://github.com/',
  webpage: cheerio.Root,
  status: 200,
  contentType: 'text/html'
}

License

Cairn has been re-licensed under MIT since version 3.0.0. If you are using versions 2 and 1, you should note that it is licensed under GPL 3.0.

This software is released under the terms of the MIT. See the LICENSE file for details.

About

NPM package and CLI tool for saving web page as single HTML file

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • TypeScript 91.1%
  • JavaScript 7.7%
  • Dockerfile 1.2%