Skip to main content

Pythonic API for parsing PDF files

Project description

Info:

See the tutorials & documentation for more information.

Author & Maintainer:

Maksym Polshcha <maxp@sterch.net>

See GitHub for the latest source.

About

pdfreader is a Pythonic API for:
  • extracting texts, images and other data from PDF documents (plain or protected)

  • accessing different objects within PDF documents

pdfreader is NOT a tool (maybe one day it become!):
  • to create or update PDF files

  • to split PDF files into pages or other pieces

  • convert PDFs to any other format

Nevertheless it can be used as a part of such tools.

See Tutorials & Documentation.

Features

  • Extracts texts (plain text and formatted text objects)

  • Extract PDF forms data (pure strings and formatted text objects)

  • Supports all PDF encodings, CMap, predefined cmaps.

  • Extracts images and image masks as Pillow/PIL Images

  • Supports encrypted and password-protected PDF documents

  • Allows browse any document objects, resources and extract any data you need (fonts, annotations, metadata, multimedia, etc.)

  • Follows PDF-1.7 specification

  • Lazy objects access allows to process huge PDF documents quite fast

Installation

pdfreader can be installed with pip:

$ python -m pip install pdfreader

Or easy_install from setuptools:

$ python -m easy_install pdfreader

You can also download the project source and do:

$ python setup.py install

Tutorial and Documentation

Tutorial, real-life examples and documentation

Support, Bugs & Feature Requests

pdfreader uses GitHub issues to keep track of bugs, feature requests, etc.

References

Donation

If this project is helpful, you can treat me to coffee :-)

https://www.paypalobjects.com/en_US/i/btn/btn_donateCC_LG.gif

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdfreader-0.1.8.tar.gz (2.9 MB view details)

Uploaded Source

File details

Details for the file pdfreader-0.1.8.tar.gz.

File metadata

  • Download URL: pdfreader-0.1.8.tar.gz
  • Upload date:
  • Size: 2.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.1

File hashes

Hashes for pdfreader-0.1.8.tar.gz
Algorithm Hash digest
SHA256 b00eee00b519a058877fc75bd82fa03dfa1fbd0e822d9fb9c8ecffc627005b38
MD5 9baf17654445a4ac903517f744412c1f
BLAKE2b-256 3415423f8d3d6435e86a7e1abc813ea1ca3fc3e948567469ac5004f5099742ac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page