Skip to main content

pandadoc: lightweight pandoc wrapper

Project description

pandadoc

pandadoc: lightweight pandoc wrapper

Project Version on PyPI Supported Python Versions Unit Tests Unit Test Coverage Code Style: Black MIT License

An extremely lightweight pandoc wrapper for Python 3.8+.

Its features:

  • Supports conversion between all formats that pandoc supports - markdown, HTML, LaTeX, Word, epub, pdf (output), and more.

  • Output to raw bytes (binary formats - e.g. PDF), to str objects (text formats - e.g. markdown), or to file (any format).

  • pandoc errors are raised as (informative) exceptions.

  • Full flexibility of the pandoc command-line tool, and the same syntax. (See the pandoc manual for more information.)

Getting Started Guide

Installation

First, ensure pandoc is on your PATH. (In other words, install pandoc and add it to your PATH.)

Then install pandadoc from PyPI:

$ python -m pip install pandadoc

That’s it.

Usage

Convert a webpage to markdown, and store it as a python str:

>>> import pandadoc
>>> input_url = "https://example.com/"
>>> example_md = pandadoc.call_pandoc(
...    options=["-t", "markdown"], files=[input_url]
... )
>>> print(example_md)
<div>

# Example Domain

This domain is for use in illustrative examples in documents.
...

Now convert the markdown to RTF, and write it to a file:

>>> rtf_output_file = "example.rtf"
>>> pandadoc.call_pandoc(
...     options=["-f", "markdown", "-t", "rtf", "-o", rtf_output_file],
...     input_text=example_md,
... )
''

Notice that call_pandoc returns an empty string '' when a file output is used. Looking at the output file:

{\pard \ql \f0 \sa180 \li0 \fi0 \outlinelevel0 \b \fs36 Example Domain\par}
{\pard \ql \f0 \sa180 \li0 \fi0 This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.\par}
{\pard \ql \f0 \sa180 \li0 \fi0 {\field{\*\fldinst{HYPERLINK "https://www.iana.org/domains/example"}}{\fldrslt{\ul
More information...
}}}
\par}

Convert this RTF document to PDF, using xelatex with a custom character set, and store the result as raw bytes:

>>> raw_pdf = pandadoc.call_pandoc(
...     options=["-f", "markdown", "-t", "pdf", "--pdf-engine", "xelatex", "--variable-mainfont",  "Palatino"],
...     files=[rtf_output_file],
...     decode=False,
... )

Note that PDF conversion requires a “PDF engine” (e.g. pdflatex, latexmk etc.) to be installed.

Now you can send those raw bytes over a network, or write them to a file:

>>> with open("example.pdf", "wb") as f:
...     f.write(raw_pdf)
...
>>> # Finished

You can find more pandoc examples here.

Exceptions

If pandoc exits with an error, an appropriate exception is raised (based on the exit code):

>>> pandadoc.call_pandoc(
...     options=["-f", "markdown", "-t", "zzz"], # non-existent format
...     input_text=example_md,
... )
Traceback (most recent call last):
...
pandadoc.exceptions.PandocUnknownWriterError: Unknown output format zzz
>>> isinstance(pandadoc.exceptions.PandocUnknownWriterError(), pandadoc.PandocError)
True

You can find a full list of exceptions in the pandadoc.exceptions module.

Explanation

The pandoc command-line tool works like this:

pandoc [OPTIONS] [FILES]

In addition to the OPTIONS (documented here), you can provide either some FILES, or some input text (via stdin).

The call_pandoc function of pandadoc works in a similar way:

  • The options argument contains a list of pandoc options. E.g. ["-f", "markdown", "-t", "html"].

  • The files argument is a list of file paths (or absolute URIs). E.g. ["path/to/file.md", "https://www.fsf.org"]

  • The input_text argument is used as text input to pandoc. E.g. # Simple Doc\n\nA simple markdown document\n.

The timeout and decode arguments are used to control whether the pandoc process times out, and whether the result should be decoded to a str (True by default).

Bugs/Requests

Please use the GitHub issue tracker to submit bugs or request features.

Feedback is always appreciated.

License

Distributed under the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandadoc-0.1.0.tar.gz (17.8 kB view hashes)

Uploaded Source

Built Distribution

pandadoc-0.1.0-py3-none-any.whl (7.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page