The sapporo-service is a standard implementation conforming to the Global Alliance for Genomics and Health (GA4GH) Workflow Execution Service (WES) API specification.
Project description
sapporo-service
The sapporo-service is a standard implementation conforming to the Global Alliance for Genomics and Health (GA4GH) Workflow Execution Service (WES) API specification.
Also, we have extended the API specification. Please check SwaggerHub - sapporo-wes.
One of sapporo-service's features is the abstraction of workflow engines, making it easy to convert various workflow engines into WES. Currently, the following workflow engines have been confirmed to work.
- cwltool
- nextflow
- Toil (experimental)
- cromwell
- snakemake
- ep3 (experimental)
- StreamFlow (experimental)
Another feature of the sapporo-service is the mode that can only execute workflows registered by the system administrator. This feature is useful when building a WES in a shared HPC environment.
Install and Run
The sapporo-service supports Python 3.7 or newer.
$ pip3 install sapporo
$ sapporo
Docker
You can also launch the sapporo-service with Docker.
To use Docker-in-Docker (DinD), you must mount docker.sock, /tmp, etc.
# Launch
$ docker compose up -d
# Launch confirmation
$ docker compose logs
Usage
The help for the sapporo-service startup command is as follows.
$ sapporo --help
usage: sapporo [-h] [--host] [-p] [--debug] [-r] [--disable-get-runs]
[--disable-workflow-attachment]
[--run-only-registered-workflows] [--service-info]
[--executable-workflows] [--run-sh] [--url-prefix]
Implementation of a GA4GH workflow execution service that can easily support
various workflow runners.
optional arguments:
-h, --help show this help message and exit
--host Host address of Flask. (default: 127.0.0.1)
-p , --port Port of Flask. (default: 1122)
--debug Enable debug mode of Flask.
-r , --run-dir Specify the run dir. (default: ./run)
--disable-get-runs Disable endpoint of `GET /runs`.
--disable-workflow-attachment
Disable `workflow_attachment` on endpoint `Post
/runs`.
--run-only-registered-workflows
Run only registered workflows. Check the registered
workflows using `GET /service-info`, and specify
`workflow_name` in the `POST /run`.
--service-info Specify `service-info.json`. The
supported_wes_versions, system_state_counts and
workflows are overwritten in the application.
--executable-workflows
Specify `executable-workflows.json`.
--run-sh Specify `run.sh`.
--url-prefix Specify the prefix of the url (e.g. --url-prefix /foo
-> /foo/service-info).
Operating Mode
There are two startup modes in the sapporo-service.
- Standard WES mode (Default)
- Execute only registered workflows mode
These are switched with the startup argument --run-only-registered-workflows.
It can also be switched by giving True or False to the environment variable SAPPORO_ONLY_REGISTERED_WORKFLOWS.
Startup arguments take priority over environment variables.
Standard WES mode
As the API specifications, please check SwaggerHub - sapporo-wes - RunWorkflow.
When using the sapporo-service, It is different from the standard WES API specification; you must specify workflow_engine_name in the request parameter of POST /runs.
We think this part is a standard WES API specification mistake, so we request fixing it.
Execute only registered workflows mode
As the API specifications for executing only registered workflows mode, please check SwaggerHub - sapporo-wes.
It conforms to the standard WES API. The changes are as follows.
- Executable workflows are returned by
GET /executable_workflows. - Specify
workflow_nameinstead ofworkflow_urlinPOST /runs.
The executable workflows are managed at executable_workflows.json.
Also, the schema for this definition is executable_workflows.schema.json. The default location of these files is under the application directory of the sapporo-service. You can override them using the startup argument --executable-workflows or the environment variable SAPPORO_EXECUTABLE_WORKFLOWS.
For more information, see SwaggerUI - sapporo-wes - GetExecutableWorkflows.
Run Dir
The sapporo-service manages the submitted workflows, workflow parameters, output files, etc., on the file system.
You can override the location of run dir by using the startup argument --run-dir or the environment variable SAPPORO_RUN_DIR.
The run dir structure is as follows. You can initialize and delete each run by physical deletion with rm.
$ tree run
.
└── 29
└── 29109b85-7935-4e13-8773-9def402c7775
├── cmd.txt
├── end_time.txt
├── exe
│ └── workflow_params.json
├── exit_code.txt
├── outputs
│ ├── ERR034597_1.small.fq.trimmed.1P.fq
│ ├── ERR034597_1.small.fq.trimmed.1U.fq
│ ├── ERR034597_1.small.fq.trimmed.2P.fq
│ ├── ERR034597_1.small.fq.trimmed.2U.fq
│ ├── ERR034597_1.small_fastqc.html
│ └── ERR034597_2.small_fastqc.html
├── outputs.json
├── run.pid
├── run_request.json
├── start_time.txt
├── state.txt
├── stderr.log
├── stdout.log
└── workflow_engine_params.txt
├── 2d
│ └── ...
└── 6b
└── ...
The execution of POST /runs is very complex.
Examples using curl are provided in GitHub - sapporo/tests/curl.
Please use these as references.
run.sh
We use run.sh to abstract the workflow engine.
When POST /runs is called, the sapporo-service fork the execution of run.sh after dumping the necessary files to run dir. Therefore, you can apply various workflow engines to WES by editing run.sh.
The default position of run.sh is under the application directory of the sapporo-service. You can override it using the startup argument --run-sh or the environment variable SAPPORO_RUN_SH.
Other Startup Arguments
You can change the host and port used by the application by using the startup arguments (--host and --port) or the environment variables SAPPORO_HOST and SAPPORO_PORT.
The following three startup arguments and environment variables limit the WES.
--disable-get-runsSAPPORO_GET_RUNS:TrueorFalse.- Disable
GET /runs.- When using WES with an unspecified number of people, by knowing the run_id, you can see the run's contents and cancel the run of other people.
It is difficult to know it in brute force because run_id itself is automatically generated using
uuid4.
- When using WES with an unspecified number of people, by knowing the run_id, you can see the run's contents and cancel the run of other people.
It is difficult to know it in brute force because run_id itself is automatically generated using
--disable-workflow-attachmentSAPPORO_WORKFLOW_ATTACHMENT:TrueorFalse.- Disable
workflow_attachmentinPOST /runs.- The
workflow_attachmentfield is used to attach files for executing workflows. - There is a security concern because anything can be attached.
- The
--url-prefix.SAPPORO_URL_PREFIX.- Set the URL PREFIX.
- If
--url-prefix /foo/baris set,GET /service-infobecomesGET /foo/bar/service-info.
- If
The contents of the response of GET /service-info are managed in service-info.json. The default location of service-info.json is under the application directory of the sapporo-service. You can override by using the startup argument --service-info or the environment variable SAPPORO_SERVICE_INFO.
Generate download link
The sapporo-service provides the file and directory under run_dir as a download link.
For more information, see SwaggerUI - sapporo-wes - GetData.
Parse workflow
The sapporo-service provides the feature to check the workflow document's type, version, and inputs.
For more information, see SwaggerUI - sapporo-wes - GetData.
Development
You can start the development environment as follows.
$ docker compose -f docker-compose.dev.yml up -d --build
$ docker compose -f docker-compose.dev.yml exec app bash
We use flake8, isort, and mypy as a linter.
$ bash ./tests/lint_and_style_check/flake8.sh
$ bash ./tests/lint_and_style_check/isort.sh
$ bash ./tests/lint_and_style_check/mypy.sh
$ bash ./tests/lint_and_style_check/run_all.sh
We use pytest as a tester.
$ pytest .
Add new Workflow Engines to Sapporo Service
Have a look at the run.sh script called from Python.
This shell script will receive a request with Workflow Engine such as cwltool and will invoke the run_cwltool bash function.
That function will execute a Bash Shell command to start a Docker container for the Workflow Engine, and monitor its exit status. For a complete example, please refer to this pull request: https://github.com/sapporo-wes/sapporo-service/pull/29
License
Apache-2.0. See the LICENSE.
Notice
Please note that this repository is participating in a study into sustainability of open source projects. Data will be gathered about this repository for approximately the next 12 months, starting from 2021-06-16.
Data collected will include number of contributors, number of PRs, time taken to close/merge these PRs, and issues closed.
For more information, please visit our informational page or download our participant information sheet.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sapporo-1.3.3.tar.gz.
File metadata
- Download URL: sapporo-1.3.3.tar.gz
- Upload date:
- Size: 31.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ec48a3814539bea02f4549fb183466649351cd465e1132df1edf199507007621
|
|
| MD5 |
1a4ce111929b784ff37345c2dd684aa3
|
|
| BLAKE2b-256 |
d841a659b30428430b2ef1b0ddadb149ca59bb32927e6701e50e4d37ab610d91
|
File details
Details for the file sapporo-1.3.3-py3-none-any.whl.
File metadata
- Download URL: sapporo-1.3.3-py3-none-any.whl
- Upload date:
- Size: 33.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a6f032d8b1e5dc9a0047ac41f6ed7ccd35c7c3ca58eec60bf7f0138b685f55e4
|
|
| MD5 |
318cd397288ceba3a8d227a5a7cf4284
|
|
| BLAKE2b-256 |
fe2f7834d573ec3a48335683b29aab00fed609d30bd42e7351ef37895ed5d065
|