Skip to main content

No project description provided

Project description

https://img.shields.io/pypi/v/diff-pdf-visually.svg https://img.shields.io/pypi/l/diff-pdf-visually.svg Commitizen friendly

This script checks whether two PDFs are visually the same. So:

  • White text on a white background will be ignored.

  • Subtle changes in position, size, or color of text will be detected.

  • This program will ignore changes caused by a different version of the PDF generator, or by invisible changes in the source document.

This is in contrast to most other tools, which tend to extract the text stream out of a PDF, and then diff those texts. Such tools include:

There seem to be some tools similar to the one you’re looking at now, although I have experience with none of these:

The strength of this script is that it’s simple to use on the command line, and it’s easy to reuse in scripts:

from diff_pdf_visually import pdfdiff

# Returns True or False
pdfdiff("a.pdf", "b.pdf")

Or use it from the command line:

$ pip3 install --user diff-pdf-visually
$ diff-pdf-visually a.pdf b.pdf

How to install this

You can install this tool with pip3, but we need the ImageMagick and Poppler programs.

On Ubuntu Linux

  1. sudo apt update

  2. sudo apt install python3-pip imagemagick poppler-utils

  3. pip3 install --user diff-pdf-visually

  4. If this is the first time that you pip3 install --user something, then log out totally from Linux and log in again. (This is to refresh the PATH.)

  5. Run with diff-pdf-visually.

On Mac with Homebrew (untested)

  1. Run brew install poppler imagemagick.

  2. pip3 install --user diff-pdf-visually

  3. If this is the first time that you pip3 install --user something, then close your terminal and open a new one. (This is to refresh the PATH.)

  4. Run with diff-pdf-visually.

On Windows Subsystem for Linux

I’ve never tried but I think this will work. Give it a go and let me know (at bram at bram dot xyz) if it worked! Unfortunately it takes quite a while to get everything installed.

  1. Install Windows Subsystem for Linux (WSL) and Ubuntu 18.04, for instance with this tutorial

  2. Initialize Ubuntu 18.04 (tutorial)

  3. Now proceed with the Ubuntu Linux instructions.

Let me know (at bram at bram dot xyz) if this worked!

On Windows native

Lars Olafsson suggested that the following might work:

How it works

We use pdftocairo to convert both PDFs to a series of PNG images in a temporary directory. The number of pages and the dimensions of the page must be exactly the same. Then we call compare from ImageMagick to check how similar they are; if one of the pages compares different above a certain threshold, then the PDFs are reported as different, otherwise they are reported the same.

You must have ImageMagick and poppler already installed.

Call diff-pdf-visually without parameters (or run python3 -m diff_pdf_visually) to see its command line arguments. Import it as diff_pdf_visually to use its functions from Python.

There are some options that you can use either from the command line or from Python:

$ diff-pdf-visually  -h
usage: diff-pdf-visually [-h] [--silent] [--verbose] [--threshold THRESHOLD]
                         [--dpi DPI] [--time TIME]
                         a.pdf b.pdf

Compare two PDFs visually. The exit code is 0 if they are the same, and 2 if
there are significant differences.

positional arguments:
  a.pdf
  b.pdf

optional arguments:
  -h, --help            show this help message and exit
  --silent, -q          silence output (can be used only once)
  --verbose, -v         show more information (can be used 2 times)
  --threshold THRESHOLD
                        PSNR threshold to consider a change significant,
                        higher is more sensitive (default: 100)
  --dpi DPI             resolution for the rasterised files (default: 50)
  --time TIME           number of seconds to wait before discarding temporary
                        files, or 0 to immediately discard

These “temporary files” include a PNG image of where any differences are, per page, as well as the log output of ImageMagick. If you want to get a feeling for thresholds, there are some example PDFs in the tests/ directory.

There is also an environment variable:

  • COMPARE: override the path of ImageMagick compare. By default, we try first compare and then magick compare (for Windows).

So what do you use this for?

Personally, I’ve used this a couple of times to refactor my LaTeX documents: I just simplify or remove some macro definitions, and if nothing changes, apparently it’s safe to make that change.

Status

At the moment, this program/module works best for finding whether two PDFs are visually different.

This project will not work on Python 2.

The code is dual-licenced under both

at your option.

Supported Python versions

The versions that are regularly tested can be found here, that’s probably Python 3.8 and Python 3.9.

For your convenience we declare more Python versions acceptable in pyproject.toml, but the non-tested versions could potentially break from time to time. My goal is to support basically Python 3.x; please let me know if something doesn’t work on an older version.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diff-pdf-visually-1.6.2.tar.gz (16.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

diff_pdf_visually-1.6.2-py3-none-any.whl (15.1 kB view details)

Uploaded Python 3

File details

Details for the file diff-pdf-visually-1.6.2.tar.gz.

File metadata

  • Download URL: diff-pdf-visually-1.6.2.tar.gz
  • Upload date:
  • Size: 16.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.23.0 setuptools/49.3.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.6

File hashes

Hashes for diff-pdf-visually-1.6.2.tar.gz
Algorithm Hash digest
SHA256 0532b76177045160bae21832c6e91fe7de5cd4a297e8df5b78921bd80eca0493
MD5 247b71d404247079e7b9a09e4ef78793
BLAKE2b-256 f1b3d545af18125cc683030f3447adedf91dfedb1f9d625cdd67312768751099

See more details on using hashes here.

File details

Details for the file diff_pdf_visually-1.6.2-py3-none-any.whl.

File metadata

  • Download URL: diff_pdf_visually-1.6.2-py3-none-any.whl
  • Upload date:
  • Size: 15.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.23.0 setuptools/49.3.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.6

File hashes

Hashes for diff_pdf_visually-1.6.2-py3-none-any.whl
Algorithm Hash digest
SHA256 9d6f1fba8cfc2e4dce9e7c933cad4fcc76cbe3cf4c08f087f12239609843ede5
MD5 1fe71022c164985d283ce77b698be882
BLAKE2b-256 c44417367eec4b2147e9fd07659e2bb235516c84779fee9e2ca1b828a10ae950

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page