Skip to main content

This package offers both synchronous and asynchronous implementations of a standardized Python API to communicate with the dtool lookup server.

Project description

PyPI GitHub tag (latest by date) GitHub Workflow Status

Python API for interacting with dtool lookup server.

This package offers a class-based asynchronous lookup API within dtool_lookup_api.core.LookupClient, a simple class-less wrapper around it at dtool_lookup_api.asynchronous, and a synchronous interface on top at dtool_lookup_api.synchronous.

Direct imports of utility functions from dtool_lookup_api in the examples below forward to the synchronous API variant.

Installation

To install the dtool_lookup_api package.

pip install dtool_lookup_api

This package depends on a dtool-lookup-server instance to talk to.

Configuration

The API needs to know the URL of the lookup server

export DTOOL_LOOKUP_SERVER_URL=https://localhost:5000

You may also need specify an access token generated on the server

export DTOOL_LOOKUP_SERVER_TOKEN=$(flask user token testuser)

Instead of specifying the access token directly, it is also possible to provide

export DTOOL_LOOKUP_SERVER_TOKEN_GENERATOR_URL=https://localhost:5001
export DTOOL_LOOKUP_SERVER_USERNAME=my-username
export DTOOL_LOOKUP_SERVER_PASSWORD=my-password

for the API to request a token. This, however, is intended only for testing purposes and strongly discouraged in a production environment, as your password would reside within environment variables or the dtool config file as clear text.

Our recommended setup is a combination of

export DTOOL_LOOKUP_SERVER_URL=https://localhost:5000
export DTOOL_LOOKUP_SERVER_TOKEN_GENERATOR_URL=https://localhost:5001

in the config. If used interactively, the API will then ask for your credentials at the first interaction and cache the provided values for this session, i.e.

In [1]: from dtool_lookup_api import query
   ...: res = query(
   ...:     {
   ...:         'readme.owners.name': {'$regex': '^Testing User$'},
   ...:     }
   ...: )
Authentication URL https://localhost:5001/token username:my-username
Authentication URL https://localhost:5001/token password:

In [2]: res
Out[2]:
[{'base_uri': 'smb://test-share',
  'created_at': 'Sun, 08 Nov 2020 18:38:40 GMT',
  'creator_username': 'jotelha',
  'dtoolcore_version': '3.17.0',
  'frozen_at': 'Wed, 11 Nov 2020 17:20:30 GMT',
  'name': 'simple_test_dataset',
  'tags': [],
  'type': 'dataset',
  'uri': 'smb://test-share/1a1f9fad-8589-413e-9602-5bbd66bfe675',
  'uuid': '1a1f9fad-8589-413e-9602-5bbd66bfe675'}]

In [3]: from dtool_lookup_api import all
   ...: all()
Out[4]:
[{'base_uri': 'smb://test-share',
  'created_at': 1604860720.736269,
  'creator_username': 'jotelha',
  'frozen_at': 1604921621.719575,
  'name': 'simple_test_dataset',
  'uri': 'smb://test-share/1a1f9fad-8589-413e-9602-5bbd66bfe675',
  'uuid': '1a1f9fad-8589-413e-9602-5bbd66bfe675'}]

Credentials caching and interactive prompting are turned off with

In [1]: import dtool_lookup_api.core.config
   ...: dtool_lookup_api.core.config.Config.interactive = False
   ...: dtool_lookup_api.core.config.Config.cache = False

In [2]: from dtool_lookup_api import all
   ...: all()
...
RuntimeError: Authentication failed

For testing purposes, it is possible to disable SSL certificates validation with

export DTOOL_LOOKUP_SERVER_VERIFY_SSL=false

As usual, these settings may be specified within the default dtool configuration file as well, i.e. at ~/.config/dtool/dtool.json

{
    "DTOOL_LOOKUP_SERVER_TOKEN_GENERATOR_URL": "https://localhost:5001/token",
    "DTOOL_LOOKUP_SERVER_URL": "https://localhost:5000"
}

List all datasets

To list all registered datasets

In [1]: from dtool_lookup_api import all
   ...: res = all()

In [2]: res
Out[2]:
[{'base_uri': 'smb://test-share',
'created_at': 1604860720.736269,
'creator_username': 'jotelha',
'frozen_at': 1604921621.719575,
'name': 'simple_test_dataset',
'uri': 'smb://test-share/1a1f9fad-8589-413e-9602-5bbd66bfe675',
'uuid': '1a1f9fad-8589-413e-9602-5bbd66bfe675'}]

Looking up datasets by UUID

To lookup URIs from a dataset UUID within Python

In [1]: from dtool_lookup_api import lookup
   ...: uuid = "1a1f9fad-8589-413e-9602-5bbd66bfe675"
   ...: res = lookup(uuid)

In [2]: res
Out[2]:
[{'base_uri': 'smb://test-share',
  'created_at': 1604860720.736269,
  'creator_username': 'jotelha',
  'frozen_at': 1604921621.719575,
  'name': 'simple_test_dataset',
  'uri': 'smb://test-share/1a1f9fad-8589-413e-9602-5bbd66bfe675',
  'uuid': '1a1f9fad-8589-413e-9602-5bbd66bfe675'}]

Full text searching

Full text search for the word “test”

In [1]: from dtool_lookup_api import search
    ...: res = search("test")

In [2]: res
Out[2]:
[{'base_uri': 'smb://test-share',
  'created_at': 1604860720.736,
  'creator_username': 'jotelha',
  'dtoolcore_version': '3.17.0',
  'frozen_at': 1605027357.308,
  'name': 'simple_test_dataset',
  'tags': [],
  'type': 'dataset',
  'uri': 'smb://test-share/1a1f9fad-8589-413e-9602-5bbd66bfe675',
  'uuid': '1a1f9fad-8589-413e-9602-5bbd66bfe675'}]

Manifest

Request the manifest of a particular dataset by URI

In [1]: from dtool_lookup_api import manifest
   ...: uri = 'smb://test-share/1a1f9fad-8589-413e-9602-5bbd66bfe675'
   ...: res = manifest(uri)

In [2]: res
Out[2]:
{'dtoolcore_version': '3.17.0',
 'hash_function': 'md5sum_hexdigest',
 'items': {'eb58eb70ebcddf630feeea28834f5256c207edfd': {'hash': '2f7d9c3e0cfd47e8fcab0c12447b2bf0',
   'relpath': 'simple_text_file.txt',
   'size_in_bytes': 17,
   'utc_timestamp': 1605027357.284966}}}

Readme

Request the readme cotent of a particular dataset by URI

In [1]: from dtool_lookup_api import readme
    ..: res = readme('smb://test-share/1a1f9fad-8589-413e-9602-5bbd66bfe675')

In [2]: res
Out[2]:
{'creation_date': '2020-11-08',
'description': 'testing description',
'expiration_date': '2022-11-08',
'funders': [{'code': 'testing_code',
 'organization': 'testing_organization',
 'program': 'testing_program'}],
'owners': [{'email': 'testing@test.edu',
 'name': 'Testing User',
 'orcid': 'testing_orcid',
 'username': 'testing_user'}],
'project': 'testing project'}

Direct mongo language queries

To list all datasets at a certain base URI with their name matching some regular expression pattern, send a direct mongo language query to the server with

In [15]: from dtool_lookup_api import query
    ...: res = query(
    ...:     {
    ...:         'base_uri': 'smb://test-share',
    ...:         'name': {'$regex': 'test'},
    ...:     }
    ...: )

In [16]: res
Out[16]:
[{'base_uri': 'smb://test-share',
'created_at': 'Sun, 08 Nov 2020 18:38:40 GMT',
'creator_username': 'jotelha',
'dtoolcore_version': '3.17.0',
'frozen_at': 'Tue, 10 Nov 2020 16:55:57 GMT',
'name': 'simple_test_dataset',
'tags': [],
'type': 'dataset',
'uri': 'smb://test-share/1a1f9fad-8589-413e-9602-5bbd66bfe675',
'uuid': '1a1f9fad-8589-413e-9602-5bbd66bfe675'}]

It is possible to search readme content via

In [21]: from dtool_lookup_api import query
    ...: res = query(
    ...:     {
    ...:         'readme.owners.name': {'$regex': '^Testing User$'},
    ...:     }
    ...: )

In [22]: res
Out[22]:
[{'base_uri': 'smb://test-share',
  'created_at': 'Sun, 08 Nov 2020 18:38:40 GMT',
  'creator_username': 'jotelha',
  'dtoolcore_version': '3.17.0',
  'frozen_at': 'Tue, 10 Nov 2020 16:55:57 GMT',
  'name': 'simple_test_dataset',
  'tags': [],
  'type': 'dataset',
  'uri': 'smb://test-share/1a1f9fad-8589-413e-9602-5bbd66bfe675',
  'uuid': '1a1f9fad-8589-413e-9602-5bbd66bfe675'}]

This requires the server-side dtool-lookup-server-direct-mongo-plugin.

TODO: Response from server-side direct mongo plugin still yields dates as strings. Fix within https://github.com/IMTEK-Simulation/dtool-lookup-server-direct-mongo-plugin.

Usage on Jupyter notebook

The current implementation via asgiref.async_to_sync (https://github.com/django/asgiref) hinders the use of the synchronous interface within Jupyter notebooks. Directly use the asynchronous api instead

import dtool_lookup_api.asynchronous as dl
res = await dl.query({
    'base_uri': 'smb://test-share',
    'name': {'$regex': 'test'},
})

The drawback of the above approach is that the same code doesn’t work in python and in jupyter (await outsite of a function is a syntax error in non-interactive python context). The code below can be executed in both contexts:

import dtool_lookup_api.asynchronous as dl
if asyncio.get_event_loop().is_running():
    # then we are in jupyter notebook
    # this allows nested event loops, i.e. calls to asyncio.run inside the notebook as well
    # This way, the same code works in notebook and python
    import nest_asyncio
    nest_asyncio.apply()

def query(query_dict):
    return asyncio.run(dl.query(query_dict))

query({
    'base_uri': 'smb://test-share',
    'name': {'$regex': 'test'},
})

See https://github.com/jupyter/notebook/issues/3397#issuecomment-419386811, https://ipython.readthedocs.io/en/stable/interactive/autoawait.html

Testing

Tests require the presence of a working dtool lookup server ecosystem. The testing workflow within .github/workflows/test.yml uses the dtool-lookup-server-container-composition to provide a mock ecosystem. It is possible to run the workflow locally with the help of docker and act.

After installing and configuring act, run

act -P ubuntu-latest=catthehacker/ubuntu:full-latest -s GITHUB_TOKEN=$GITHUB_TOKEN -W .github/workflows/test.yml --bind

from within this repository. $GITHUB_TOKEN must hold a valid access token. The user must be member of the docker group. The --bind option avoids quirky permission errors by running the test in the current directory. This will however result in the local creation of two subdirectories dtool-lookup-server-container-composition and workflow during testing, which may be removed with

rm -rf dtool-lookup-server-container-composition
sudo rm -rf workflow

eventually. All tests have been confirmed to work with the catthehacker/ubuntu:full-20.04 runner.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dtool-lookup-api-0.7.0.tar.gz (26.1 kB view hashes)

Uploaded Source

Built Distribution

dtool_lookup_api-0.7.0-py3-none-any.whl (17.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page