Skip to content

andrewdotn/2md

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

2md: convert formatted text to markdown

There are lots of packages for turning markdown into html, but this one goes the other way, turning formatted html into markdown.

Say you’re reading an Ars Technica article and want to copy something into some markdown notes. Just select some content in your browser, copy it, then run 2md. The heading, list formatting, bold text, and hyperlinks are all preserved.

Installation

The easiest way to try 2md is with npx a tool to automatically download, cache, and run programs, that’s been included with Node.js since 2017:

npx 2md [--no-quote] [FILE]

You can also install to install the 2md command with yarn:

yarn [global] add 2md

Usage

Run

npx 2md [--no-quote] [FILE]

to get markdown.

By default, 2md reads from the clipboard, using osascript, xclip, or powershell. Otherwise, pass it the name of html file as a command-line argument.

For easy inserting of stuff into other documents, --quote is on by default and wraps the markdown in a blockquote:

> # Foo
>
> bar ...

API

Only a single function is exposed: toMd.

const { toMd } = require('2md');

console.log(toMd('foo <b>bar</b>'));

prints

foo **bar**

Only exported files with public in the path are supported. Everything else is subject to change without notice. But if there’s some interesting code here you’d like to reuse, it should be possible to publish it.

Contributing

Contributions are welcome! There are fairly comprehensive end-to-end and round-trip tests, and TypeScript’s type-checking makes refactoring safer, so don’t be afraid to move code around.

Architecture

A quick sketch of how this works:

  • Some AppleScript reads html off the clipboard

  • jsdom parses input html into dom elements

  • A parser iterates over the dom, emitting a tree of custom nodes that correspond to markdown elements; for example, <b> and <strong> tags get mapped to Bold nodes

  • Some simplifying transformations are applied to the tree of markdown elements, such as removing <a> tags with no text, because links like [][1] in the markdown output aren’t useful

  • The markdown nodes get render() called on them to generate a series of OutputBlock objects which are, roughly, paragraphs in the markdown output

  • The OutputBlocks are wrapped to 80 columns, and separated with blank lines where appropriate

You can see the results of individual steps with the --output-format option to the cli. These are subject to change without notice, and not exposed through the public api.

License

All the original code here is licensed under the Apache License, version 2.0, included in LICENSE.code; except for the contents of the “how it works” article how-it-works/post.mdx, which is not redistributable.

Releasing

The current release process, to be automated later, is:

  1. Remove the -pre tag from the core/package.json version field

  2. Update CHANGELOG.md

  3. Commit to git, and git tag vA.B.C

  4. git push --tags $REMOTE master.

    Optional: figure out automation to put CHANGELOG.md excerpt into auto-created GitHub releases.

  5. yarn run package and inspect tarball

  6. npm publish 2md-vA.B.C.tgz.

    If publishing a pre-release, add npm publish --tag next to set the correct npm tag.

  7. Bump version and add -pre version suffix in core/package.json; update the 2md dependency version in website/package.json as well.

    Otherwise yarn won’t use the local version.

    The yarn workspaces documentation says,

    if workspace-b depends on a different version than the one referenced in workspace-a’s package.json, the dependency will be installed from npm rather than linked from your local filesystem. This is because some packages actually need to use the previous versions in order to build the new ones (Babel is one of them).