Skip to main content

Python wrapper and metaschema for datadictionary.

Project description

dictionaryutils

python wrapper and metaschema for datadictionary. It can be used to:

  • load a local dictionary to a python object.
  • dump schemas to a file that can be uploaded to s3 as an artifact.
  • load schema file from an url to a python object that can be used by services

Test for dictionary validity with Docker

Say you have a dictionary you are building locally and you want to see if it will pass the tests.

You can add a simple alias to your .bash_profile to enable a quick test command:

testdict() { docker run --rm -v $(pwd):/dictionary quay.io/cdis/dictionaryutils:master; }

Then from the directory containing the gdcdictionary directory run testdict.

Generate simulated data with Docker

If you wish to generate fake simulated data you can also do that with dictionaryutils and the data-simulator.

simdata() { docker run --rm -v $(pwd):/dictionary -v $(pwd)/simdata:/simdata quay.io/cdis/dictionaryutils:master; /bin/bash -c "cd /dictionary/dictionaryutils; bash dockerrun.bash; cd /dictionary/dictionaryutils; poetry run python bin/simulate_data.py --path /dictionary/simdata $*; export SUCCESS=$?; cd /dictionary; rm -rf build dictionaryutils dist gdcdictionary.egg-info; chmod -R a+rwX /simdata; exit $SUCCESS "; }

Then from the directory containing the gdcdictionary directory run simdata and a folder will be created called simdata with the results of the simulator run. You can also pass in additional arguments to the data-simulator script such as simdata --max_samples 10.

The --max_samples argument will define a default number of nodes to simulate, but you can override it using the --node_num_instances_file argument. For example, if you create the following instances.json:

{
        "case": 100,
        "demographic": 100
}

Then run the following:

docker run --rm -v $(pwd):/dictionary -v $(pwd)/simdata:/simdata quay.io/cdis/dictionaryutils:master /bin/bash -c "cd /dictionaryutils; bash dockerrun.bash; cd /dictionary/dictionaryutils; poetry run python bin/simulate_data.py --path /simdata/ --program workshop --project project1 --max_samples 10 --node_num_instances_file /dictionary/instances.json; export SUCCESS=$?; rm -rf build dictionaryutils dist gdcdictionary.egg-info; chmod -R a+rwX /simdata; exit $SUCCESS";

Then you'll get 100 each of case and demographic nodes and 10 each of everything else. Note that the above example also defines program and project names.

You can also run the simulator for an arbitrary json url with the --url parameter. The alias can be simplified to skip the set up of the parent directory virtual env (ie, skip the docker_run.bash):

simdataurl() { docker run --rm -v $(pwd):/dictionary -v $(pwd)/simdata:/simdata quay.io/cdis/dictionaryutils:master /bin/bash -c "python /dictionaryutils/bin/simulate_data.py simulate --path /simdata/ $*; chmod -R a+rwX /simdata"; }

Then run simdataurl --url https://datacommons.example.com/schema.json.

Using a local build of the Docker image

It is possible to use a local build of the dictionaryutils Docker image instead of the master branch stored in quay.

From a local copy of the dictionaryutils repo, build and tag a Docker image, for example

docker build -t dictionaryutils-mytag .

Then use this image in any of the aliases and commands mentioned above by replacing quay.io/cdis/dictionaryutils:master with dictionaryutils-mytag.

Use dictionaryutils to load a dictionary

from dictionaryutils import DataDictionary

dict_fetch_from_remote = DataDictionary(url=URL_FOR_THE_JSON)

dict_loaded_locally = DataDictionary(root_dir=PATH_TO_SCHEMA_DIR)

Use dictionaryutils to dump a dictionary

import json
from dictionaryutils import dump_schemas_from_dir

with open('dump.json', 'w') as f:
    json.dump(dump_schemas_from_dir('../datadictionary/gdcdictionary/schemas/'), f)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dictionaryutils-4.0.0.tar.gz (14.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dictionaryutils-4.0.0-py3-none-any.whl (16.8 kB view details)

Uploaded Python 3

File details

Details for the file dictionaryutils-4.0.0.tar.gz.

File metadata

  • Download URL: dictionaryutils-4.0.0.tar.gz
  • Upload date:
  • Size: 14.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.13.11 Linux/6.11.0-1018-azure

File hashes

Hashes for dictionaryutils-4.0.0.tar.gz
Algorithm Hash digest
SHA256 c846c48d534c7ae4a47d65b29cb7bb4324e998366e81ff5cc2175c94b1ea0c77
MD5 e52dcd0485f3f646e90eb2c701efe6b4
BLAKE2b-256 929f721aa12d8730d09eaac9589f2f900f04ab9b83ee6a07fc0a5c831fc26241

See more details on using hashes here.

File details

Details for the file dictionaryutils-4.0.0-py3-none-any.whl.

File metadata

  • Download URL: dictionaryutils-4.0.0-py3-none-any.whl
  • Upload date:
  • Size: 16.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.13.11 Linux/6.11.0-1018-azure

File hashes

Hashes for dictionaryutils-4.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a6d1b2af957ffd2c66df42a1398fee1bd30b0268df82d9630236ed23286500a0
MD5 e91a6bef092eb78408c9da0ca1e0ebee
BLAKE2b-256 33aa25563d55b055708c92b49a495f73cc3fc00529525cee77d7dc8e50e83265

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page