A utility for packaging objects and validating metadata for FAIRSCAPE
Project description
fairscape-cli
A utility for packaging objects and validating metadata for FAIRSCAPE.
Documentation: https://fairscape.github.io/fairscape-cli/
Features
fairscape-cli provides a Command Line Interface (CLI) that allows the client side to create, manage, and publish scientific data packages:
- RO-Crate Management: Create and manipulate RO-Crate packages locally.
- Initialize RO-Crates in new or existing directories.
- Add data, software, and computation metadata.
- Copy files into the crate structure alongside metadata registration.
- Schema Handling: Define, infer, and validate data schemas (Tabular, HDF5).
- Create schema definition files.
- Add properties with constraints.
- Infer schemas directly from data files.
- Validate data files against specified schemas.
- Register schemas within RO-Crates.
- Data Import: Fetch data from external sources and convert them into RO-Crates.
- Import NCBI BioProjects.
- Convert Portable Encapsulated Projects (PEPs) to RO-Crates.
- Build Artifacts: Generate derived outputs from RO-Crates.
- Create detailed HTML datasheets summarizing crate contents.
- Generate provenance evidence graphs (JSON and HTML).
- Release Management: Organize multiple related RO-Crates into a cohesive release package.
- Initialize a release structure.
- Automatically link sub-crates and propagate metadata.
- Build a top-level datasheet for the release.
- Publishing: Publish RO-Crate metadata to external repositories.
- Upload RO-Crate directories or zip files to Fairscape.
- Create datasets on Dataverse instances.
- Mint or update DOIs on DataCite.
Requirements
Python 3.8+
Installation
$ pip install fairscape-cli
Command Overview
The CLI is organized into several top-level commands:
rocrate: Core local RO-Crate manipulation (create, add files/metadata).
schema: Operations on data schemas (create, infer, add properties, add to crate).
validate: Validate data against schemas.
import: Fetch external data into RO-Crate format (e.g., bioproject, pep).
build: Generate outputs from RO-Crates (e.g., datasheet, evidence-graph).
release: Manage multi-part RO-Crate releases (e.g., create, build).
publish: Publish RO-Crates to repositories (e.g., fairscape, dataverse, doi).
Use --help for details on any command or subcommand:
$ fairscape-cli --help
$ fairscape-cli rocrate --help
$ fairscape-cli rocrate add --help
$ fairscape-cli schema create --help
Examples
Creating an RO-Crate
Create an RO-Crate in a specified directory:
$ fairscape-cli rocrate create \
--name "My Analysis Crate" \
--description "RO-Crate containing analysis scripts and results" \
--organization-name "My Org" \
--project-name "My Project" \
--keywords "analysis" \
--keywords "python" \
--author "Jane Doe" \
--version "1.1.0" \
./my_analysis_crate
Initialize an RO-Crate in the current working directory:
# Navigate to an empty directory first if desired
# mkdir my_analysis_crate && cd my_analysis_crate
$ fairscape-cli rocrate init \
--name "My Analysis Crate" \
--description "RO-Crate containing analysis scripts and results" \
--organization-name "My Org" \
--project-name "My Project" \
--keywords "analysis" \
--keywords "python"
Adding Content and Metadata to an RO-Crate
These commands support adding both the file and its metadata (add) or just the metadata (register).
Add a dataset file and its metadata:
$ fairscape-cli rocrate add dataset \
--name "Raw Measurements" \
--author "John Smith" \
--version "1.0" \
--date-published "2023-10-27" \
--description "Raw sensor measurements from Experiment A." \
--keywords "raw-data" \
--keywords "sensors" \
--data-format "csv" \
--source-filepath "./source_data/measurements.csv" \
--destination-filepath "data/measurements.csv" \
./my_analysis_crate
Add a software script file and its metadata:
$ fairscape-cli rocrate add software \
--name "Analysis Script" \
--author "Jane Doe" \
--version "1.1.0" \
--description "Python script for processing raw measurements." \
--keywords "analysis" \
--keywords "python" \
--file-format "py" \
--source-filepath "./scripts/process_data.py" \
--destination-filepath "scripts/process_data.py" \
./my_analysis_crate
Register computation metadata (metadata only):
# Assuming the script and dataset were added previously and have GUIDs:
# Dataset GUID: ark:59852/dataset-raw-measurements-xxxx
# Software GUID: ark:59852/software-analysis-script-yyyy
$ fairscape-cli rocrate register computation \
--name "Data Processing Run" \
--run-by "Jane Doe" \
--date-created "2023-10-27T14:30:00Z" \
--description "Execution of the analysis script on the raw measurements." \
--keywords "processing" \
--used-dataset "ark:59852/dataset-raw-measurements-xxxx" \
--used-software "ark:59852/software-analysis-script-yyyy" \
--generated "ark:59852/dataset-processed-results-zzzz" \
./my_analysis_crate
# Note: You would typically register the generated dataset ('processed-results') separately.
Register dataset metadata (metadata only, file assumed present or external):
$ fairscape-cli rocrate register dataset \
--name "Processed Results" \
--guid "ark:59852/dataset-processed-results-zzzz" \
--author "Jane Doe" \
--version "1.0" \
--description "Processed results from the analysis script." \
--keywords "results" \
--data-format "csv" \
--filepath "results/processed.csv" \
--generated-by "ark:59852/computation-data-processing-run-wwww" \
./my_analysis_crate
Schema Management
Create a tabular schema definition file:
$ fairscape-cli schema create \
--name 'Measurement Schema' \
--description 'Schema for raw sensor measurements' \
--schema-type tabular \
--separator ',' \
--header true \
./measurement_schema.json
Add properties to the tabular schema file:
# Add a string property (column 0)
$ fairscape-cli schema add-property string \
--name 'Timestamp' \
--index 0 \
--description 'Measurement time (ISO8601)' \
./measurement_schema.json
# Add a number property (column 1)
$ fairscape-cli schema add-property number \
--name 'Value' \
--index 1 \
--description 'Sensor reading' \
--minimum 0 \
./measurement_schema.json
Infer a schema from an existing data file:
$ fairscape-cli schema infer \
--name "Inferred Results Schema" \
--description "Schema inferred from processed results" \
./my_analysis_crate/results/processed.csv \
./processed_schema.json
Add an existing schema file to an RO-Crate:
$ fairscape-cli schema add-to-crate \
./measurement_schema.json \
./my_analysis_crate
Validation
Validate a data file against a schema file:
# Successful validation
$ fairscape-cli validate schema \
--schema-path ./measurement_schema.json \
--data-path ./my_analysis_crate/data/measurements.csv
# Example failure
$ fairscape-cli validate schema \
--schema-path ./measurement_schema.json \
--data-path ./source_data/measurements_invalid.csv
Importing Data
Import an NCBI BioProject into a new RO-Crate:
$ fairscape-cli import bioproject \
--accession PRJNA123456 \
--author "Importer Name" \
--output-dir ./bioproject_prjna123456_crate \
--crate-name "Imported BioProject PRJNA123456"
Convert a PEP project to an RO-Crate:
$ fairscape-cli import pep \
./path/to/my_pep_project \
--output-path ./my_pep_rocrate \
--crate-name "My PEP Project Crate"
Building Outputs
Generate an HTML datasheet for an RO-Crate:
$ fairscape-cli build datasheet ./my_analysis_crate
# Output will be ./my_analysis_crate/ro-crate-datasheet.html by default
Generate a provenance graph for a specific item within the crate:
# Assuming 'ark:59852/dataset-processed-results-zzzz' is the item of interest
$ fairscape-cli build evidence-graph \
./my_analysis_crate \
ark:59852/dataset-processed-results-zzzz \
--output-json ./my_analysis_crate/prov/results_prov.json \
--output-html ./my_analysis_crate/prov/results_prov.html
Release Management
Create the structure for a multi-part release:
$ fairscape-cli release create \
--name "My Big Release Q4 2023" \
--description "Combined release of Experiment A and Experiment B crates" \
--organization-name "My Org" \
--project-name "Overall Project" \
--keywords "release" \
--keywords "experiment-a" \
--keywords "experiment-b" \
--version "2.0" \
--author "Release Manager" \
--publisher "My Org Publishing" \
./my_big_release
# Manually copy or move your individual RO-Crate directories (e.g., experiment_a_crate, experiment_b_crate)
# into the ./my_big_release directory now.
Build the release (link sub-crates, update metadata, generate datasheet):
$ fairscape-cli release build ./my_big_release
Publishing
Upload an RO-Crate to Fairscape:
# Ensure FAIRSCAPE_USERNAME and FAIRSCAPE_PASSWORD are set as environment variables or use options
$ fairscape-cli publish fairscape \
--rocrate ./my_analysis_crate \
--username <your_username> \
--password <your_password>
# Works with either directories or zip files
$ fairscape-cli publish fairscape \
--rocrate ./my_analysis_crate.zip \
--username <your_username> \
--password <your_password> \
--api-url https://fairscape.example.edu/api
Publish RO-Crate metadata to Dataverse:
# Ensure DATAVERSE_API_TOKEN is set as an environment variable or use --token
$ fairscape-cli publish dataverse \
--rocrate ./my_analysis_crate/ro-crate-metadata.json \
--url https://my.dataverse.instance.edu \
--collection my_collection_alias \
--token <your_api_token>
Mint a DOI using DataCite:
# Ensure DATACITE_USERNAME and DATACITE_PASSWORD are set or use options
$ fairscape-cli publish doi \
--rocrate ./my_analysis_crate/ro-crate-metadata.json \
--prefix 10.1234 \
--username MYORG.MYREPO \
--password <your_api_password> \
--event publish # or 'register' for draft
Contribution
If you'd like to request a feature or report a bug, please create a GitHub Issue using one of the templates provided.
License
This project is licensed under the terms of the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fairscape_cli-1.1.12.tar.gz.
File metadata
- Download URL: fairscape_cli-1.1.12.tar.gz
- Upload date:
- Size: 152.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
add12010a8a729dfc40f2b8e1647ebd9ea129175215c5a338d90a70fc4307001
|
|
| MD5 |
25fcda5455b5daf730a4abf4f2d0bc00
|
|
| BLAKE2b-256 |
44a93668a510b91a57d2a3e4ffef7b0c829102f4a8aa798dc7c2d5f37b1afad5
|
File details
Details for the file fairscape_cli-1.1.12-py3-none-any.whl.
File metadata
- Download URL: fairscape_cli-1.1.12-py3-none-any.whl
- Upload date:
- Size: 183.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a8d247ce72599e4f652a5d2114792fc58751c919487147da3f77006d96aa2fff
|
|
| MD5 |
84b98d1e601f1fa23353b136025b064a
|
|
| BLAKE2b-256 |
c180f31353ee7553634fc18e0ac8e6194b4cf75327b3ca5f137dffef970f16cc
|