Skip to main content

Toolset to perform various operations on PAGE XML datasets

Project description

PAGETools - WIP

Small collection of PAGE XML related Python scripts.

Installing

Installation using pip

The suggested method is to install pagetools into a virtual environment using pip:

python -m venv VENV_NAME
source VENV_NAME/bin/activate
pip install pagetools

To install the package from its source, clone this repository and run

pip install pagetools

Install from source

python setup.py install

Usage

Line extraction

Usage: pagetools-extract-lines [OPTIONS] [XMLS]...

Options:
  -ie, --image-extension TEXT     Extension of image files (must be in the
                                  same directory as XML files to be
                                  considered).

  -o, --output TEXT               Path where generated files will get stored.
  -e, --enumerate-output          Enumerates output file names instead of
                                  using original names.

  -z, --zip-output                Add output to zip archive.
  -bg, --background-color INTEGER...
                                  RGB color code used to fill up background.
                                  Used when padding and / or deskewing.

  --background-mode [median|mean|dominant]
                                  Color calc mode to fill up background
                                  (overwrites -bg / --background-color).

  -p, --padding INTEGER...        Padding in pixels around the line image
                                  cutout (top, bottom, left, right).

  -ad, --auto-deskew              Autodeskew extracted line images
                                  (Experimental!).

  -d, --deskew FLOAT              Angle for manuel clockwise rotation of the
                                  line images.

  -gt, --gt-index INTEGER         Index of the TextEquiv elements containing
                                  ground truth.

  -pred, --pred-index INTEGER     Index of the TextEquiv elements containing
                                  predicted text.

  --help                          Show this message and exit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PAGETools-0.1.tar.gz (8.7 kB view hashes)

Uploaded Source

Built Distribution

PAGETools-0.1-py3-none-any.whl (12.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page