A stateless API for converting Markdown files, HTML files and Office documents to PDF
At TheCodingMachine, we build a lot of web applications (intranets, extranets and so on) which require to generate PDF from various sources. Each time, we ended up using some well known libraries like wkhtmltopdf or unoconv and kind of lost time by reimplementing a solution from a project to another project. Meh.
Let's say you're starting the API using this simple command:
$ docker run --rm -p 3000:3000 thecodingmachine/gotenberg:2.0.0
The API is now available on your host under http://127.0.0.1:3000
.
It accepts POST
requests with a multipart/form-data
Content-Type. Your form data should provide one or more files to convert.
The default image accepts the following:
- Markdown files
- HTML files
- Office documents (.docx, .doc, .odt, .pptx, .ppt, .odp and so on)
- PDF files
Heads up: the API relies on the file extension to determine which library to use for conversion.
There are two use cases:
- If you send one file, it will convert it and return the resulting PDF
- If many files, it will convert them to PDF, merge the resulting PDFs into a single PDF and return it
- One file
$ curl --request POST \
--url http://127.0.0.1:3000 \
--header 'Content-Type: multipart/form-data' \
--form [email protected] \
> result.pdf
- Many files
$ curl --request POST \
--url http://127.0.0.1:3000 \
--header 'Content-Type: multipart/form-data' \
--form [email protected] \
--form [email protected] \
--form [email protected] \
--form [email protected] \
> result.pdf
The API does not provide any authentication mechanisms. Make sure to not put it on a public facing port and your client(s) should always controls what is sent to the API.
Some libraries like unoconv cannot perform concurrent conversions. That's why the API does only one conversion at a time. If your API is under heavy load, a request will take time to be processed.
Fortunately, you may pass through this limitation by scaling the API.
In the following example, I'll demonstrate how to do some vertical scaling (= on the same machine) with Docker Compose, but of course horizontal scaling works too!
version: '3'
services:
# your others services
gotenberg:
image: thecodingmachine/gotenberg:2.0.0
You may now launch your services using:
docker-compose up --scale gotenberg=your_number_of_instances
When requesting the Gotenberg service with your client(s), Docker will automatically redirect a request to a Gotenberg container according to the round-robin strategy.
The API relies on a simple YAML configuration file called gotenberg.yml
. It allows you to tweak some values and even provides you
a way to change the commands called for each kind of conversion.
Below the default configuration file:
# The port the application will listen to.
port: 3000
logs:
# Accepted values, in order of severity: DEBUG, INFO, WARN, ERROR, FATAL, PANIC.
# Messages at and above the selected level will be logged.
level: "INFO"
# Accepted values: text, json.
# When a TTY is not attached, the output will be in the defined format.
formatter: "text"
# You don't like a library which is used for a conversion? You want to handle a new file type?
# You may provide here your own implementation!
commands:
# Some libraries like unoconv cannot perform concurrent conversions. That's why the API does only one conversion at a time.
# If your current implementation uses libraries which are able to perform concurrent conversions, you may
# change this value to false.
lock: true
# Unlike others commands' templates, you have access to FilesPaths instead of FilePath: it gathers all PDF files which should be merged.
merge:
template: "pdftk {{ range $filePath := .FilesPaths }} {{ $filePath }} {{ end }} cat output {{ .ResultFilePath }}"
interpreter: "/bin/sh -c"
timeout: 30
# You may add more commands (or less, or even none).
conversions:
# The command template: you have access to FilePath and ResultFilePath variables.
- template: "markdown-pdf {{ .FilePath }} -o {{ .ResultFilePath }}"
# The binary which will call the command.
interpreter: "/bin/sh -c"
# Duration in seconds after which the command will be killed if it has not finished.
timeout: 30
# Files with the following extensions will be converted by the current command.
extensions:
- ".md"
- template: "xvfb-run -e /dev/stdout wkhtmltopdf {{ .FilePath }} {{ .ResultFilePath }}"
interpreter: "/bin/sh -c"
timeout: 30
extensions:
- ".html"
- ".htm"
- template: "unoconv --format pdf --output \"{{ .ResultFilePath }}\" \"{{ .FilePath }}\""
interpreter: "/bin/sh -c"
timeout: 30
extensions:
- ".doc"
- ".docx"
- ".odt"
- ".xls"
- ".xlsx"
- ".ods"
- ".ppt"
- ".pptx"
- ".odp"
We provide binaries for a wide range of OS and architecture in the releases page, so feel free to create your own Docker image for your implementation of the Gotenberg API 🤘
- https://github.com/thecodingmachine/gotenberg-php-client (PHP client)
- Add your own client by submitting a pull request!
Would you like to update this documentation ? Feel free to open an issue.