Skip to main content

A package for performing Data Catalog operations on object storage solutions

Project description

datacatalog-object-storage-processor

A package for performing Data Catalog operations on object storage solutions.

CircleCI PyPi License Issues

1. Environment setup

1.1. Get the code

git clone https://github.com/mesmacosta/datacatalog-object-storage-processor
cd datacatalog-object-storage-processor

1.2. Auth credentials

1.2.1. Create a service account and grant it below roles
  • Data Catalog Admin
  • Storage Admin or Custom Role with storage.buckets.list acl
1.2.2. Download a JSON key and save it as
  • ./credentials/datacatalog-object-storage-processor-sa.json

1.3. Virtualenv

Using virtualenv is optional, but strongly recommended unless you use Docker.

1.3.1. Install Python 3.6+
1.3.2. Create and activate an isolated Python environment
pip install --upgrade virtualenv
python3 -m virtualenv --python python3 env
source ./env/bin/activate
1.3.3. Install the dependencies
pip install --upgrade --editable .
1.3.4. Set environment variables
export GOOGLE_APPLICATION_CREDENTIALS=./credentials/datacatalog-object-storage-processor-sa.json

1.4. Docker

Docker may be used as an alternative to run all the scripts. In this case, please disregard the Virtualenv install instructions.

2. Create DataCatalog entries based on object storage files

2.1. python main.py

  • python
datacatalog-object-storage-processor \
  object-storage create-entries --type cloud-storage \
  --project-id my_project \
  --entry-group-name my_entry_group_name \
  --bucket-prefix my_bucket
  • docker
docker build --rm --tag datacatalog-object-storage-processor .
docker run --rm --tty -v your_credentials_folder:/data datacatalog-object-storage-processor \
  --type cloud-storage \
  --project-id my_project \
  --entry-group-name my_entry_group_name \
  --bucket-prefix my_bucket

3 Delete up object storage entries on entry group

Delete entries for given entry group

datacatalog-object-storage-processor \
  object-storage delete-entries --type cloud-storage \
  --project-id my_project \
  --entry-group-name my_entry_group_name

Disclaimers

This is not an officially supported Google product.

History

0.1.0 (2020-05-01)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datacatalog-object-storage-processor-0.1.0.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datacatalog_object_storage_processor-0.1.0-py2.py3-none-any.whl (13.7 kB view details)

Uploaded Python 2Python 3

File details

Details for the file datacatalog-object-storage-processor-0.1.0.tar.gz.

File metadata

  • Download URL: datacatalog-object-storage-processor-0.1.0.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.14.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.6

File hashes

Hashes for datacatalog-object-storage-processor-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5c0fac9fd1e9e8f7b3789bf1a6e5d6366b20b1b5e5f7935e4b86e4ba00e84015
MD5 8c48ee91145adbe29457cc0652ff2c41
BLAKE2b-256 68db07e35397f96c90a796d6db350d8b948bdafe7d5a38038bd30762172e8558

See more details on using hashes here.

File details

Details for the file datacatalog_object_storage_processor-0.1.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for datacatalog_object_storage_processor-0.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 7802134aa0a1e8cc017e2b6f92f7724889c9c97e866ece29561ee363aecb641c
MD5 fef2f1f4f9a6058d2699ab4ad46c4391
BLAKE2b-256 5ff872885da7a4d05291f6162f23814584c684c1250812e703fd9562ca7f2ced

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page