Skip to main content

A package containing common components for the roocs project

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

roocs-utils

Pypi Build Status Documentation

A package containing common components for the roocs project

Features

    1. Intake Catalog

1. Data Catalog

The module roocs_utils.catalog_maker provides tools for writing data catalogs of the known data holdings in a csv format, described by a YAML file.

For each project in roocs_utils/etc/roocs.ini there are options to set the file paths for the inputs and outputs of this catalog maker. A list of datasets to include needs to be provided. The path to this list for each project can be set in roocs_utils/etc/roocs.ini. The datasets in this list must be what you want in the ds_id column of the csv file.

The data catalog is created using a database backend to store the results of the scans, from which the csv and YAML files will be created. For this, a posgresql database is required. Once you have a database, you need to export an environment variable called $ABCUNIT_DB_SETTINGS:

$ export ABCUNIT_DB_SETTINGS="dbname=<name> user=<user> host=<host> password=<pwd>"

The table created will be named after the porject you are creating a catalog for in the format <project_name>_catalog_results e.g. c3s_cmip6_catalog_results

Note when using the catalog maker, the dependency abcunit-backend is required. If using a conda environment this must be pip installed manually:

$ pip install abcunit-backend

Creating batches

Once the list of datasets is collated a number of batches must be created:

$ python roocs_utils/catalog_maker/cli.py create-batches -p c3s-cmip6

The option -p is required to specify the project.

Creating catalog entries

Once the batches are created, the catalog maker can be run - either locally or on lotus. The settings for how many datasets to be included in a batch and the maximum duration of each job on lotus can also be changed in roocs_utils/etc/roocs.ini.

Each batch can be run idependently, e.g. running batch 1 locally:

$ python roocs_utils/catalog_maker/cli.py run -p c3s-cmip6 -b 1 -r local

or running all batches on lotus:

$ python roocs_utils/catalog_maker/cli.py run -p c3s-cmip6 -r lotus

This creates a table in the database containing an ordered dictionary of the entry for each file in each dataset if successful, or the error traceback if there is an Exception raised.

Viewing entries and errors

To view the records:

$ python roocs_utils/catalog_maker/cli.py list -p c3s-cmip6

With many entries, this may take a while.

To just get a count of how many files have been scanned:

$ python roocs_utils/catalog_maker/cli.py list -p c3s-cmip6 -c

To see any errors:

$ python roocs_utils/catalog_maker/cli.py show-errors -p c3s-cmip6

To see just a count of errors:

$ python roocs_utils/catalog_maker/cli.py show-errors -p c3s-cmip6 -c

Each count will show how many files and how many datasets have been successful/failed.

The list count will also show the total numbers of datasets/files in the database - including errors. The error count will show whether there are any datasets that have files which have succeeded and failed i.e. that are partially scanned.

Writing to CSV

The final command is to write the entries to a csv file.

$ python roocs_utils/catalog_maker/cli.py write -p c3s-cmip6

The csv file will be generated in the csv_dir specified in roocs_utils/etc/roocs.ini and will have the name “{project}_{version_stamp}.csv”. e.g. c3s-cmip6_v20210414.csv

A yaml file will be created the catalog_dir specified in roocs_utils/etc/roocs.ini. It will have the name c3s.yml and will contain the below for each project scanned and which is using the same catalog_dir:

sources:
  c3s-cmip6:
    args:
      urlpath:
    cache:
    - argkey: urlpath
      type: file
    description: c3s-cmip6 datasets
    driver: intake.source.csv.CSVSource
    metadata:
      last_updated:

urlpath and last_updated for a project will be updated very time the csv file is written for the project.

Deleting the table of results

In order to delete all entries in the table of results

$ python roocs_utils/catalog_maker/cli.py clean -p c3s-cmip6

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

roocs_utils-0.4.0.tar.gz (52.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

roocs_utils-0.4.0-py2.py3-none-any.whl (46.4 kB view details)

Uploaded Python 2Python 3

File details

Details for the file roocs_utils-0.4.0.tar.gz.

File metadata

  • Download URL: roocs_utils-0.4.0.tar.gz
  • Upload date:
  • Size: 52.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.1

File hashes

Hashes for roocs_utils-0.4.0.tar.gz
Algorithm Hash digest
SHA256 2b02eb19def4d575e6fe4c76e0b409bbe5f1f9102c84eb440973945b2eef46d3
MD5 20fd3963eb040986117486bd2f22b7c9
BLAKE2b-256 30d0015a9f98b01a27d14ebad5b14d6ae950726baabcb26087818de875ea6d1a

See more details on using hashes here.

File details

Details for the file roocs_utils-0.4.0-py2.py3-none-any.whl.

File metadata

  • Download URL: roocs_utils-0.4.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 46.4 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.1

File hashes

Hashes for roocs_utils-0.4.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 078a5a2caf992011e2574128c71de30c43389474e63a73513c32058b06ac31f4
MD5 fe0e58be1fc8d65871eee2b8e6fc7d71
BLAKE2b-256 823e1e4d7867d7092ffd9daae9d301b5d1b0b040baa5b5c1780d8aa44a0ef1dc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page