github

danburzo / percollate

  • суббота, 13 октября 2018 г. в 00:20:27
https://github.com/danburzo/percollate

JavaScript
🌐 → 📖 A command-line tool to grab web pages as beautifully formatted PDFs



percollate

Percollate is a command-line tool to turn web pages as beautifully formatted PDFs.

Installation

💡 percollate needs Node.js version 8 or later, as it uses new(ish) JavaScript syntax.

You can install percollate globally:

# using npm
npm install -g percollate

# using yarn
yarn global add percollate

To keep the package up-to-date, you can run:

# using npm, upgrading is the same command as installing
npm install -g percollate

# yarn has a separate command
yarn global upgrade --latest percollate

Usage

💡 Run percollate --help for a list of available commands. For a particular command, percollate <command> --help lists all available options.

Available commands

Command What it does
percollate pdf Bundles one or more web pages into a PDF
percollate epub Not implemented yet
percollate html Not implemented yet

Available options

The pdf, epub, and html commands have these options:

Option What it does
-o, --output (Required) The path of the resulting bundle
--template Path to a custom HTML template
--style Path to a custom CSS

Examples

Generating a PDF

To transform a single web page to PDF:

percollate pdf --output some.pdf https://example.com

To bundle several web pages into a single PDF, specify them as separate arguments to the command:

percollate pdf --output some.pdf https://example.com/page1 https://example.com/page2

You can use common Unix commands and keep the list of URLs in a newline-delimited text file:

cat urls.txt | xargs percollate pdf --output some.pdf

Using a custom HTML template

⚠️ TODO add example here

Using a custom CSS stylesheet

⚠️ TODO add example here

Customizing the page header / footer

⚠️ TODO add example here

How it works

  1. Fetch the page(s) using got
  2. Enhance the DOM using jsdom
  3. Pass the DOM through mozilla/readability to strip unnecessary elements
  4. Apply the HTML template and the print stylesheet to the resulting HTML
  5. Use puppeteer to generate a PDF from the page

Troubleshooting

On some Linux machines you'll need to install a few more Chrome dependencies before percollate works correctly. (Thanks to @ptica for sorting it out)

The percollate pdf command supports the --no-sandbox Puppeteer flag, but make sure you're aware of the implications before disabling the sandbox.

See also

Here are some other projects to check out if you're interested in building books using the browser: