Creates manifests for syncd works.
Project description
bdrc-volume-manifest-builder
New in Release 1.1
- Ability to use either file system or S3 for image repository
Changelog
Version | Commit | Comment |
---|---|---|
1.2.6 | 6d600c53 | Expose standalone creation of one volume's manifest |
Intent
This project originated as a script to extract image dimensions from a work, and:
- write the dimensions to a json file
- report on images which broke certain rules.
Implementation
Archival Operations determined that this would be most useful to BUDA to implement as a service which could be injected into the current sync process. To do this, the system needed to:
- be more modular
- be distributable onto an instance which could be cloned in AWS.
This branch expands the original tool by:
- Adding the ability to use the eXist db as a source for the image dimensions.
- Use a pre-built BOM Bill of Materials) to derive the files which should be included in the dimesnsions file
- Read input from either S3 or local file system repositories
- Create and save log files.
- Manage input files.
- Run as a service on a Linux platform
Standalone tool
Internal tool to create json manifests of image format data for volumes present in S3 to support the BUDA IIIF presentation server.
Language
Python 3.7 or newer. It is highly recommended to use pip
to install, to manage dependencies. If you must do it
yourself, you can refer to setup.py
for the dependency list.
Environment
- Write access to
/var/log/VolumeManifestBuilder
which must exist. systemctl
service management, if you want to use the existing materials to install as a service.
Usage
Command line usage
The command line mode allows running one batch or one work at a time. Arguments specify the parameters, options.
You also must choose a repository mode which determines if the images
are on a local file system (the fs
mode), or on an AWS S3 system (the s3
)
mode.
Common parameters
This section describes the parameters which are independent of the repository mode.
$ manifestforwork -h
usage: manifestforwork [common options] { fs [fs options] | s3 [s3 options]}
Prepares an inventory of image dimensions
optional arguments:
-h, --help show this help message and exit
-d {info,warning,error,debug,critical}, --debugLevel {info,warning,error,debug,critical}
choice values are from python logging module
-l LOG_PARENT, --logDir LOG_PARENT
Path to log file directory
-f WORK_LIST_FILE, --workListFile WORK_LIST_FILE
File containing one RID per line.
-w WORK_RID, --work-Rid WORK_RID
name or partially qualified path to one work
-p POLL_INTERVAL, --poll-interval POLL_INTERVAL
Seconds between alerts for file.
Repository Parser:
Handles repository alternatives
{s3,fs}
Common usage Notes:
-f/--workListFile
is a file which contains a list of RIDS, or a list of paths
to work RIDs, in the fs
mode (see below.)
-w/--workRID
is a single work.
-
The
--workListFile
and--workRid
arguments are mutually exclusive -
-p
is disregarded in this mode. It is an argument to themanifestFromS3
-
The system logs its activity into a file named yyyy-MM-DD_HH_MM_PID.local_v_m_b.log
in the folder given in the
-l/--logDirargument (default
/var/log`) mode.
fs Mode Usage
❯ manifestforwork fs -h
usage: manifestforwork [common options] { fs [fs options] | s3 [s3 options]} fs
[-h] [-c CONTAINER] [-i IMAGE_FOLDER_NAME]
optional arguments:
-h, --help show this help message and exit
-c CONTAINER, --container CONTAINER
container for all work_Rid archives. Prefixes entries
in --source_rid or --workList
-i IMAGE_FOLDER_NAME, --image-folder-name IMAGE_FOLDER_NAME
name of parent folder of image files
Notes:
- the
-c/--container
defines a path to the RIDS (or the RID subpaths) given. It is optional. It prepends its value to the WorkRID paths or individual workRIDs in the input file (-f
) or to the individual work (-w
)
In the -w
or -f
options above. The system supports user expansion
(~[uid]/path...
in Linux) and environment variable expansion in both the -c
and the -f
options. That is, the file given in the -f
option can contain
- Environment variables
- User alias pathnames (
~[user]/...
) - Fully qualified pathnames
e.g.
> pwd
/data
>ls
Works
>ls ~/tmp
/home/me/tmp/Works
> export THISWORK="Works/FromThom"
> cat workList
$WORKS/W12345
~/tmp/$WORKS/W12345
/home/me/tmp/Works/W89012
using this list in
> manifestforwork -f worklist fs
will process files from
- /data/Works/FromThom
- /home/me/tmp/Works/FromThom
- /home/me/tmp/Works/W89012
if the
--container
argument is not given. (-c
defaults to the current working directory)
s3 mode usage
❯ manifestforwork s3 --help
usage: manifestforwork [common options] { fs [fs options] | s3 [s3 options]} s3
[-h] [-b BUCKET]
optional arguments:
-h, --help show this help message and exit
-b BUCKET, --bucket BUCKET
Bucket - source and destination
The S3 mode uses a bucket named with the optional -b/--bucket
argument. The default bucket
is closely held. note that the --container
argument is not applicable in this mode, and
that if a worklist is given, it must contain only RIDs, not paths.
manifestFromS3 input
manifestFromS3
is a mode which waits for a list of RIDs or paths to appear in a well known location
and then processes what it finds there as if it were given in the --workFile
argument.
All the other parameters are the same - manifestFromS3
can work on local file system (fs
)
or on s3
targets.
- Upload an input list (file name does not matter) to s3://manifest.bdrc.org/processing/todo/
- run
manifestFromS3 -p n [ -l {info,debug,error} {fs [ fs arguments ] | s3 [ -b alternative.bucket]}
from the command line.
manifestFromS3
does the following:
- Moves the input list from
s3://manifest.bdrc.org/processing/input
to.../processing/inprocess
and changes the name from to - Runs the processing, uploading a dimensions.json file for each volume in each RID in the input list.
- When complete, it moves the file from
.../processing/inprocess
to../processing/done
Installation
PIP
PyPI contains bdrc-volume-manifest-builder
Global installation
Install is simply
sudo python3 -m pip install --upgrade bdrc-volume-manifest-builder
to install system-wide (which is needed to run as a service)
Local installation
To install and run locally, python3 -m pip install --upgrade bdrc-volume-manifest-builder
will do. Best to do this in
a virtual python environment, see venv
When you install volume-manifest-builder
three entry points are defined in /usr/local/bin
(or your local environment):
manifestforlist
the command mode, which operates on a list of RIDsmanifestforwork
alternate command line mode, which works on one pathmanifestFromS3
the mode which runs continuously, polling an S3 resource for a file, and processing all the files it finds. This is the mode which runs on a service.
Service
See Service Readme for details on installing manifestFromS3 as a service on systemctl
supporting platforms.
Development
volume-manifest-builder
is hosted on BUDA Github volume-manifest-builder
- Credentials: you must have the input credentials for a specific AWS user installed to deposit into the archives on s3.
Usage
volume-manifest-builder
has two use cases:
- command line, which allows using a list of workRIDS on a local system
- service, which continually polls a well-known location,
s3://manifest.bdrc.org/processing/todo/
for a file.
Building a distribution
Be sure to check PyPI for current release, and update accordingly. Use PEP440 for naming releases.
Prerequisites
pip3 install wheel
pip3 install twine
python3 setup.py bdist_wheel
twine upload dist/<thing you built
Project changelog
v 1.2.0:
- Sort all output by filename
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for bdrc_volume_manifest_builder-1.2.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 945d0a63f3fd238941ea6204647952a126b1094386b4f9c62c1741682b740124 |
|
MD5 | fd4e59f4a9e3cf84f003390824be4cf9 |
|
BLAKE2b-256 | 63e0e7235ecfbc4762eb6831d022d41c143a4c79a4594aa184e9f66b98071a08 |