There are lots of packages for turning markdown into html, but this one goes the other way, turning formatted html into markdown.
Say you’re reading an Ars Technica article and want to copy something
into some markdown notes. Just select some content in your browser, copy
it, then run 2md
. The heading, list formatting, bold text, and hyperlinks
are all preserved.
The easiest way to try 2md
is with npx
a tool to automatically
download, cache, and run programs, that’s been included with Node.js
since 2017:
npx 2md [--no-quote] [FILE]
You can also install to install the 2md
command with yarn
:
yarn [global] add 2md
Run
npx 2md [--no-quote] [FILE]
to get markdown.
By default, 2md reads from the clipboard, using osascript
, xclip
,
or powershell
. Otherwise, pass it the name of html file as a command-line
argument.
For easy inserting of stuff into other documents, --quote
is on by
default and wraps the markdown in a blockquote:
> # Foo
>
> bar ...
Only a single function is exposed: toMd
.
const { toMd } = require('2md');
console.log(toMd('foo <b>bar</b>'));
prints
foo **bar**
Only exported files with public
in the path are supported. Everything
else is subject to change without notice. But if there’s some interesting
code here you’d like to reuse, it should be possible to publish it.
Contributions are welcome! There are fairly comprehensive end-to-end and round-trip tests, and TypeScript’s type-checking makes refactoring safer, so don’t be afraid to move code around.
A quick sketch of how this works:
-
Some AppleScript reads html off the clipboard
-
jsdom
parses input html into dom elements -
A parser iterates over the dom, emitting a tree of custom nodes that correspond to markdown elements; for example,
<b>
and<strong>
tags get mapped toBold
nodes -
Some simplifying transformations are applied to the tree of markdown elements, such as removing
<a>
tags with no text, because links like[][1]
in the markdown output aren’t useful -
The markdown nodes get
render()
called on them to generate a series ofOutputBlock
objects which are, roughly, paragraphs in the markdown output -
The
OutputBlock
s are wrapped to 80 columns, and separated with blank lines where appropriate
You can see the results of individual steps with the --output-format
option to the cli. These are subject to change without notice, and not
exposed through the public api.
All the original code here is licensed under the Apache License, version
2.0, included in LICENSE.code
; except for the contents of the “how it
works” article how-it-works/post.mdx
, which is not redistributable.
The current release process, to be automated later, is:
-
Remove the
-pre
tag from thecore/package.json
version field -
Update
CHANGELOG.md
-
Commit to git, and
git tag vA.B.C
-
git push --tags $REMOTE master
.Optional: figure out automation to put
CHANGELOG.md
excerpt into auto-created GitHub releases. -
yarn run package
and inspect tarball -
npm publish 2md-vA.B.C.tgz
.If publishing a pre-release, add
npm publish --tag next
to set the correct npm tag. -
Bump version and add
-pre
version suffix incore/package.json
; update the2md
dependency version inwebsite/package.json
as well.Otherwise yarn won’t use the local version.
The yarn workspaces documentation says,
if workspace-b depends on a different version than the one referenced in workspace-a’s package.json, the dependency will be installed from npm rather than linked from your local filesystem. This is because some packages actually need to use the previous versions in order to build the new ones (Babel is one of them).