Skip to main content

Utilities for writing pandoc filters in python

Project description

A python module for writing pandoc filters

What are pandoc filters?

Pandoc filters are pipes that read a JSON serialization of the Pandoc AST from stdin, transform it in some way, and write it to stdout. They can be used with pandoc (>= 1.12) either using pipes

pandoc -t json -s | ./caps.py | pandoc -f json

or using the --filter (or -F) command-line option.

pandoc --filter ./caps.py -s

For more on pandoc filters, see the pandoc documentation under --filter and the tutorial on writing filters.

For an alternative library for writing pandoc filters, with a more “Pythonic” design, see panflute.

Compatibility

Pandoc 1.16 introduced link and image attributes to the existing caption and target arguments, requiring a change in pandocfilters that breaks backwards compatibility. Consequently, you should use:

  • pandocfilters version <= 1.2.4 for pandoc versions 1.12–1.15, and

  • pandocfilters version >= 1.3.0 for pandoc versions >= 1.16.

Pandoc 1.17.3 (pandoc-types 1.17.*) introduced a new JSON format. pandocfilters 1.4.0 should work with both the old and the new format.

Installing

Run this inside the present directory:

python setup.py install

Or install from PyPI:

pip install pandocfilters

Available functions

The main functions pandocfilters exports are

  • walk(x, action, format, meta)

    Walk a tree, applying an action to every object. Returns a modified tree. An action is a function of the form action(key, value, format, meta), where:

    • key is the type of the pandoc object (e.g. ‘Str’, ‘Para’)

    • value is the contents of the object (e.g. a string for ‘Str’, a list of inline elements for ‘Para’)

    • format is the target output format (as supplied by the format argument of walk)

    • meta is the document’s metadata

    The return of an action is either:

    • None: this means that the object should remain unchanged

    • a pandoc object: this will replace the original object

    • a list of pandoc objects: these will replace the original object; the list is merged with the neighbors of the original objects (spliced into the list the original object belongs to); returning an empty list deletes the object

  • toJSONFilter(action)

    Like toJSONFilters, but takes a single action as argument.

  • toJSONFilters(actions)

    Generate a JSON-to-JSON filter from stdin to stdout

    The filter:

    • reads a JSON-formatted pandoc document from stdin

    • transforms it by walking the tree and performing the actions

    • returns a new JSON-formatted pandoc document to stdout

    The argument actions is a list of functions of the form action(key, value, format, meta), as described in more detail under walk.

    This function calls applyJSONFilters, with the format argument provided by the first command-line argument, if present. (Pandoc sets this by default when calling filters.)

  • applyJSONFilters(actions, source, format="")

    Walk through JSON structure and apply filters

    This:

    • reads a JSON-formatted pandoc document from a source string

    • transforms it by walking the tree and performing the actions

    • returns a new JSON-formatted pandoc document as a string

    The actions argument is a list of functions (see walk for a full description).

    The argument source is a string encoded JSON object.

    The argument format is a string describing the output format.

    Returns a new JSON-formatted pandoc document.

  • stringify(x)

    Walks the tree x and returns concatenated string content, leaving out all formatting.

  • attributes(attrs)

    Returns an attribute list, constructed from the dictionary attrs.

How to use

Most users will only need toJSONFilter. Here is a simple example of its use:

#!/usr/bin/env python

"""
Pandoc filter to convert all regular text to uppercase.
Code, link URLs, etc. are not affected.
"""

from pandocfilters import toJSONFilter, Str

def caps(key, value, format, meta):
  if key == 'Str':
    return Str(value.upper())

if __name__ == "__main__":
  toJSONFilter(caps)

Examples

The examples subdirectory in the source repository contains the following filters. These filters should provide a useful starting point for developing your own pandocfilters.

abc.py

Pandoc filter to process code blocks with class abc containing ABC notation into images. Assumes that abcm2ps and ImageMagick’s convert are in the path. Images are put in the abc-images directory.

caps.py

Pandoc filter to convert all regular text to uppercase. Code, link URLs, etc. are not affected.

blockdiag.py

Pandoc filter to process code blocks with class “blockdiag” into generated images. Needs utils from http://blockdiag.com.

comments.py

Pandoc filter that causes everything between <!-- BEGIN COMMENT --> and <!-- END COMMENT --> to be ignored. The comment lines must appear on lines by themselves, with blank lines surrounding

deemph.py

Pandoc filter that causes emphasized text to be displayed in ALL CAPS.

deflists.py

Pandoc filter to convert definition lists to bullet lists with the defined terms in strong emphasis (for compatibility with standard markdown).

gabc.py

Pandoc filter to convert code blocks with class “gabc” to LaTeX \gabcsnippet commands in LaTeX output, and to images in HTML output.

graphviz.py

Pandoc filter to process code blocks with class graphviz into graphviz-generated images.

lilypond.py

Pandoc filter to process code blocks with class “ly” containing Lilypond notation.

metavars.py

Pandoc filter to allow interpolation of metadata fields into a document. %{fields} will be replaced by the field’s value, assuming it is of the type MetaInlines or MetaString.

myemph.py

Pandoc filter that causes emphasis to be rendered using the custom macro \myemph{...} rather than \emph{...} in latex. Other output formats are unaffected.

plantuml.py

Pandoc filter to process code blocks with class plantuml to images. Needs plantuml.jar from http://plantuml.com/.

ditaa.py

Pandoc filter to process code blocks with class ditaa to images. Needs ditaa.jar from http://ditaa.sourceforge.net/.

theorem.py

Pandoc filter to convert divs with class="theorem" to LaTeX theorem environments in LaTeX output, and to numbered theorems in HTML output.

tikz.py

Pandoc filter to process raw latex tikz environments into images. Assumes that pdflatex is in the path, and that the standalone package is available. Also assumes that ImageMagick’s convert is in the path. Images are put in the tikz-images directory.

API documentation

By default most filters use get_filename4code to create a directory ...-images to save temporary files. This directory doesn’t get removed as it can be used as a cache so that later pandoc runs don’t have to recreate files if they already exist. The directory is generated in the current directory.

If you prefer to have a clean directory after running pandoc filters, you can set an environment variable PANDOCFILTER_CLEANUP to any non-empty value such as 1 which forces the code to create a temporary directory that will be removed by the end of execution.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandocfilters-1.5.0.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pandocfilters-1.5.0-py2.py3-none-any.whl (8.7 kB view details)

Uploaded Python 2Python 3

File details

Details for the file pandocfilters-1.5.0.tar.gz.

File metadata

  • Download URL: pandocfilters-1.5.0.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/0.0.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for pandocfilters-1.5.0.tar.gz
Algorithm Hash digest
SHA256 0b679503337d233b4339a817bfc8c50064e2eff681314376a47cb582305a7a38
MD5 d625fec43c27f091e465ff28df763a66
BLAKE2b-256 6242c32476b110a2d25277be875b82b5669f2cdda7897c165bd22b78f366b3cb

See more details on using hashes here.

File details

Details for the file pandocfilters-1.5.0-py2.py3-none-any.whl.

File metadata

  • Download URL: pandocfilters-1.5.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 8.7 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/0.0.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.7

File hashes

Hashes for pandocfilters-1.5.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 33aae3f25fd1a026079f5d27bdd52496f0e0803b3469282162bafdcbdf6ef14f
MD5 f183cea1efb5321287b8b46d9fe6c712
BLAKE2b-256 5ea8878258cffd53202a6cc1903c226cf09e58ae3df6b09f8ddfa98033286637

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page