Skip to main content

Synchronization interface for the SCRC FAIR Data Pipeline registry

Project description

FAIR Data Pipeline Command Line Interface

PyPI PyPI - Python Version DOI PyPI - License CII Best Practices

FAIR Data Pipeline CLI codecov Quality Gate Status

FAIR-CLI forms the main interface for synchronising changes between local and shared remote FAIR Data Pipeline registries, it is also used to instantiate model runs/data submissions to the pipeline. Full documentation of the FAIR Data Pipeline can be found on the project website.

Installation

The package is installed using Pip:

pip install fair-cli

To enable tab completion you need to modify your shell:

Bash

_FAIR_COMPLETE=bash_source fair > ~/.config/.fair-complete.bash
echo '. ~/.config/.fair-complete.bash' >> ~/.bashrc

zsh

_FAIR_COMPLETE=zsh_source fair > ~/.fair-complete.zsh
echo '. ~/.fair-complete.zsh' >> ~/.bashrc

Fish

_FAIR_COMPLETE=bash_source fair > ~/.config/fish/.fair-complete.fish
echo '. ~/.config/fish/.fair-complete.fish' >> ~/.bashrc

Uninstallation

To uninstall the CLI run:

fair purge --all
pip uninstall fair

The User Configuration File

Job runs are configured via config.yaml files. Upon initialisation of a project, FAIR-CLI automatically generates a starter configuration file with all requirements in place. To execute a process (e.g. perform a model run from a compiled binary/script) an additional key of either script or script_path must be provided. Alternatively the command fair run bash can be used to append the key and run a command directly.

By default the shell used to execute a process is sh or batch for UNIX and Windows systems respectively. This can be overwritten by assigning the optional shell key with one of the following values (where {0} is the script file):

Shell Command
bash bash -eo pipefail {0}
java java {0}
julia julia {0}
powershell powershell -command ". '{0}'"
pwsh pwsh -command ". '{0}'"
python2 python2 {0}
python3 python3 {0}
python python {0}
R R -f {0}
sh sh -e {0}
batch {0}

A full description of config.yaml files can be found here.

Available Commands

init

Initialises a new FAIR repository within the given directory. This should ideally be the same location as the .git folder for the current project, however during setup an option is given to specify an alternative. The command will ask the user a series of questions which will provide metadata for tracking run authors, and also allow for the creation of a starter config.yaml file. Initialisation will also configure the CLI itself.

Custom CLI Configuration

After setup is complete, the current CLI configuration can also be saved using the command:

fair init --export

the created file can then be re-read at a later point during setup. Alternatively, if creating a configuration from scratch the YAML file should contain the following information:

namespaces:
  input: testing
  output: testing
registries:
  local:
    data_store: /path/to/local/data_store/,
    directory: /local/registry/install/directory
    uri: http://127.0.0.1:8000/api/
  origin:
    data_store: /remote/registry/data/store/path/
    token: /path/to/remote/token
    uri: https://data.fairdatapipeline.org/api/'
user:
  email: 'test@noreply',
  family_name: 'Test'
  given_names: 'Interface'
  orcid: None,
  uuid: '2ddb2358-84bf-43ff-b2aa-3ac7dc3b49f1'
git:
  local_repo: /local/repo/path
  remote: origin
description: Testing Project

this file is then read during the initialisation:

fair init --using <cli-config.yaml file>

For integration into a CI workflow, the setup can be skipped by running:

fair init --ci

which will create temporary directories for some of the required location paths.

run

The purpose of run is to execute a model/submission run and submit results to the local registry. Outputs of a run will be stored within the coderun folder in the directory specified under the data_store tag in the config.yaml, by default this is $HOME/.fair/data/coderun.

fair run

If you wish to use an alternative config.yaml then specify it as an additional argument:

fair run /path/to/config.yaml

You can also launch a bash command directly, this will be automatically written into the config.yaml:

fair run --script 'echo "Hello World"'

note the command itself must be quoted as it is a single argument.

By default the CLI will not allow the user to perform a run if the state of the analysis repository is such that it is behind the git remote, or contains uncommitted changes. To override this behaviour use the --dirty flag.

pull

The command pull will update any entries within the config.yaml under the register heading creating external_object and data_product objects on the registry and downloading the data to the local data storage. Any data required for a run is downloaded and stored within the local registry. In addition any data products requested that are available on the remote registry are pulled locally.

fair pull /path/to/config.yaml

status

This command displays objects which are awaiting staging or have been staged behaving in a manner similar to git status:

fair status

staged changes are displayed in green, and unstaged in red.

add

Before changes can be pushed to the remote registry they must be staged. This command allows you to stage objects displayed when running fair status so that they can be sent to the remote registry. Data products are displayed and staged in the form namespace:data_product_name@version:

fair add my_namespace:data_object@v0.1.0

push

The push command will push any staged data products to the remote registry:

fair push

purge

The purge command removes setup of the current project so it can bereinitialised:

fair purge

To remove all configurations entirely (including those global to all projects) run:

fair purge --global

To remove the data directory itself run:

fair purge --data

WARNING: This is not recommended as the registry may still have entries pointing to this location!

Finally to remove everything run:

fair purge --all

this will remove the current repository .fair folder and the global FAIR directory which also contains the local registry.

You can skip any confirmation messages by running:

fair purge --yes

registry

By default the CLI will launch the registry whenever a synchronisation or run is called. The server will only be halted once all ongoing CLI processes (in the case of multiple parallel calls) have been completed.

However the user may also specify a manual launch that will override this behaviour, instead leaving the server running constantly allowing them to view the registry in the browser.

The commands:

fair registry start

and

fair registry stop

will launch and halt the server respectively.

The registry can be installed using the CLI as well by running:

fair registry install

with the additional options to specify the installation location, and the data registry repository tag to install from:

fair registry install --directory ~/.fair/my_registry --version v1.0-rc5

log

Runs are logged locally within the local FAIR repository. A full list of runs is shown by running:

fair log

This will present a list of runs in a summary analogous to a git log call:

run 0db35c20946a1ebeaafdc3b30103cd74a57eb6b6
Author: Joe Bloggs <jbloggs@noreply.uk>
Date:   Wed Jun 30 09:09:30 2021
NOTE
The SHA for a job is not related to a registry code run identifier as multiple code runs can be executed within a single job.

view

To view the stdout of a run given its SHA as shown by running fair log use the command:

fair view <sha>

you do not need to specify the full SHA but rather the first few unique characters.

Template Variables

Within the config.yaml file, template variables can be specified by using the notation ${{ VAR }}, the following variables are currently recognised:

Variable Description
DATE Date in the form %Y%m%d
DATETIME Date and time in the form %Y-%m-%sT%H:%M:S
DATETIME-%Y%H%M Date and time in custom format (where %Y%H%M can be any valid form)
USER The current user as defined in the CLI
USER_ID The unique identifier for the current user
REPO_DIR The FAIR repository root directory
CONFIG_DIR The directory containing the config.yaml after template substitution
LOCAL_TOKEN The token for access to the local registry
SOURCE_CONFIG Path of the user defined config.yaml
GIT_BRANCH Current branch of the git repository
GIT_REMOTE The URI of the git repository specified during setup
GIT_TAG The latest tag on git

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fair_cli-0.9.4.tar.gz (88.8 kB view hashes)

Uploaded Source

Built Distribution

fair_cli-0.9.4-py3-none-any.whl (103.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page