A package to add metadata tags to objects saved in s3

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 5 - Production/Stable
Intended Audience
- Developers
License
- OSI Approved :: BSD License
- OSI Approved :: MIT License
Operating System
- OS Independent
- POSIX :: Linux
Programming Language
- Python :: 3
Topic
- Multimedia

Project description

AWS S3 Metadata Tagger

The S3 Metadata tagger adds information in the form of metadata to files saved in S3.

To do this, the central handler takes a file location and a metadata extracting function. It first checks whether the file already contains the requested information via a HEAD request. If it does not, it downloads the file, invokes extracting function, and adds the metadata to the s3 object with a inplace COPY, MetadataDirective="REPLACE" operation.

This package comes with two optional variants for metadata extraction:

pdf: for determining the number of pages in a pdf
picture: for determining the dimension of an image

Usage

The entrypoint into the tagger is the object_tagger.tag_file function.

It expects an object_tagger.S3ObjectPath(key, bucket) and a object_tagger.MetadataHandler(already_tagged, extraction_function, versioning_tag) object as its parameters. The parameters of the MetadataHandler are as follows:

already_tagged: a function which receives the metadata tags already present on the object, and returns a boolean indicating whether the object should be tagged.
extraction_function: a function receiving the path to the downloaded object, and returning a string -> string dictionary embodying the metadata to add to the object
versioning_tags: a string -> string dictionary which contains further tags to add to the s3 object, which can for example be used for tag versioning

The function tries to extract the metadata and add it to the object for up to three times. On success, the added metadata is returned, upon failure an exception is thrown.

For an example, see the service utilizing this library for automatically tagging pdfs uploaded to s3 via aws lambda in the examples directory.

Structure

`object_tagger`

contains the higher-level orchestration:

object_tagger.py contains all the logic for checking whether the file has already been tagged, downloading it, invoking the metadata script, creating the tag object, and adding it to the s3 resource.

The metadata scripts are stored in their respective folders

`pdf_tagger`

The pdf tagger uses PyPDF2 to determine the amount of pages in a pdf. Install with the [pdf] extra option.

`picture_tagger`

Using Pillow, the script gets the width and height of the passed image. Install with the [picture] extra option.

Testing

Both pdf_tagger and picture_tagger come with unittests. There is also an integration test in tests/test_object_tagger.py, which expects a localstack instance to run in the background. Furthermore, the following environment variables need to be set:

LOCALSTACK_S3_ENDPOINT_URL=http://localhost:4566
AWS_ACCESS_KEY_ID=test
AWS_SECRET_ACCESS_KEY=test

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 5 - Production/Stable
Intended Audience
- Developers
License
- OSI Approved :: BSD License
- OSI Approved :: MIT License
Operating System
- OS Independent
- POSIX :: Linux
Programming Language
- Python :: 3
Topic
- Multimedia

Release history Release notifications | RSS feed

This version

1.0.1

Oct 14, 2022

1.0.0

Sep 16, 2022

0.2.1

Sep 5, 2022

0.2.0

Sep 5, 2022

0.0.3

Aug 29, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s3-metadata-tagger-1.0.1.tar.gz (7.8 kB view hashes)

Uploaded Oct 14, 2022 Source

Built Distribution

s3_metadata_tagger-1.0.1-py3-none-any.whl (7.6 kB view hashes)

Uploaded Oct 14, 2022 Python 3

Hashes for s3-metadata-tagger-1.0.1.tar.gz

Hashes for s3-metadata-tagger-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`83e7a386e8496fcf7e9e6aea5877a5ebff9273ecf0fea2ac79e264124cf94638`
MD5	`c13d2bff19024e80c6c3643be1eadd06`
BLAKE2b-256	`39ec012fae0aa194f973e1ad6a10ae0122dbb9e1e50737930da2df718787baf0`

Hashes for s3_metadata_tagger-1.0.1-py3-none-any.whl

Hashes for s3_metadata_tagger-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`79040c2f3d422616a4c4c2a4191cb875fa8bc257b2b5b68a9423b637bb31d0e4`
MD5	`84e243963f0b7f8018623b43795bac52`
BLAKE2b-256	`f070850713daba633cf4ef8970c47ec8c73a7172318bff4fb90dca3cb0bbf678`