Skip to main content

Implementation of a GA4GH workflow execution service that can easily support various workflow runners.

Project description

SAPPORO-service

pytest flake8 isort mypy Apache License

SAPPORO-service logo

Japanese Document

SAPPORO is a standard implementation conforming to the Global Alliance for Genomics and Health (GA4GH) Workflow Execution Service (WES) API specification.

One of SAPPORO's features is the abstraction of workflow engines, which makes it easy to convert various workflow engines into WES. The following workflow engines have been confirmed to be working at present.

Another feature of SAPPORO is the mode that can only execute workflows registered by the system administrator. This feature is useful when building a WES in a shared HPC environment.

Install and Run

SAPPORO supports Python 3.6 or newer.

$ pip3 install sapporo
$ sapporo

Docker

You can also launch it with Docker. To use Docker-in-Docker (DinD), you have to mount docker.sock, /tmp, etc.

# Launch
$ docker-compose up -d

# Launch confirmation
$ docker-compose logs

Usage

The help for the SAPPORO startup command is as follows.

$ sapporo --help
usage: sapporo [-h] [--host] [-p] [--debug] [-r] [--disable-get-runs]
               [--disable-workflow-attachment]
               [--run-only-registered-workflows] [--service-info]
               [--executable-workflows] [--run-sh] [--url-prefix]

Implementation of a GA4GH workflow execution service that can easily support
various workflow runners.

optional arguments:
  -h, --help            show this help message and exit
  --host                Host address of Flask. (default: 127.0.0.1)
  -p , --port           Port of Flask. (default: 8080)
  --debug               Enable debug mode of Flask.
  -r , --run-dir        Specify the run dir. (default: ./run)
  --disable-get-runs    Disable endpoint of `GET /runs`.
  --disable-workflow-attachment
                        Disable `workflow_attachment` on endpoint `Post
                        /runs`.
  --run-only-registered-workflows
                        Run only registered workflows. Check the registered
                        workflows using `GET /service-info`, and specify
                        `workflow_name` in the `POST /run`.
  --service-info        Specify `service-info.json`. The
                        supported_wes_versions, system_state_counts and
                        workflows are overwritten in the application.
  --executable-workflows
                        Specify `executable-workflows.json`.
  --run-sh              Specify `run.sh`.
  --url-prefix          Specify the prefix of the url (e.g. --url-prefix /foo
                        -> /foo/service-info).

Operating Mode

There are two startup modes in SAPPORO.

  • Standard WES mode (Default)
  • Execute only registered workflows mode

These are switched with the startup argument -run-only-registered-workflows. It can also be switched by giving True or False to the environment variable SAPPORO_ONLY_REGISTERED_WORKFLOWS. Startup arguments take priority over environment variables.

Standard WES mode

As API specifications, please check GitHub - GA4GH WES and SwaggerUI - GA4GH WES.

When using SAPPORO, It is different from the standard WES API specification, you must specify workflow_engine_name in the request parameter of POST /runs. I personally think this part is standard WES API specification's mistake, so I am sending a request to fix it.

Execute only registered workflows mode

As API specifications for the execute only registered workflows mode, please check SwaggerUI - SAPPORO WES.

Basically, it conforms to the standard WES API. The changes are as follows.

  • Executable workflows are returned by GET /service-info as executable_workflows.
  • Specify workflow_name instead of workflow_url in POST /runs.

The following is an example of requesting GET /service-info in the execute only registered workflows mode.

GET /service-info
{
  "auth_instructions_url": "https://github.com/ddbj/SAPPORO-service",
  "contact_info_url": "https://github.com/ddbj/SAPPORO-service",
  "default_workflow_engine_parameters": [],
  "executable_workflows": [
    {
      "workflow_attachment": [],
      "workflow_name": "CWL_trimming_and_qc_remote",
      "workflow_type": "CWL",
      "workflow_type_version": "v1.0",
      "workflow_url": "https://raw.githubusercontent.com/ddbj/SAPPORO-service/master/tests/resources/trimming_and_qc_remote.cwl"
    },
    {
      "workflow_attachment": [
        {
          "file_name": "fastqc.cwl",
          "file_url": "https://raw.githubusercontent.com/ddbj/SAPPORO-service/master/tests/resources/fastqc.cwl"
        },
        {
          "file_name": "trimming_pe.cwl",
          "file_url": "https://raw.githubusercontent.com/ddbj/SAPPORO-service/master/tests/resources/trimming_pe.cwl"
        }
      ],
      "workflow_name": "CWL_trimming_and_qc_local",
      "workflow_type": "CWL",
      "workflow_type_version": "v1.0",
      "workflow_url": "https://raw.githubusercontent.com/ddbj/SAPPORO-service/master/tests/resources/trimming_and_qc.cwl"
    }
  ],
  "supported_filesystem_protocols": [
    "http",
    "https",
    "file"
  ],
  "supported_wes_versions": [
    "sapporo-wes-1.1"
  ],
  "system_state_counts": {},
  "tags": {
    "debug": true,
    "get_runs": true,
    "registered_only_mode": true,
    "run_dir": "/home/ubuntu/git/github.com/ddbj/SAPPORO-service/run",
    "wes_name": "sapporo",
    "workflow_attachment": true
  },
  "workflow_engine_versions": {
    "cromwell": "50",
    "cwltool": "1.0.20191225192155",
    "nextflow": "20.04.1",
    "snakemake": "v5.17.0",
    "toil": "4.1.0"
  },
  "workflow_type_versions": {
    "CWL": {
      "workflow_type_version": [
        "v1.0",
        "v1.1",
        "v1.1.0-dev1"
      ]
    }
  }
}

The executable workflows are managed at executable_workflows.json. Also, the schema for this definition is executable_workflows.schema.json. The default location of these files is under the application directory of SAPPORO. You can override them by using the startup argument --executable-workflows or the environment variable SAPPORO_EXECUTABLE_WORKFLOWS.

Run Dir

SAPPORO manages the submitted workflows, workflow parameters, output files, etc. on the file system. You can override the location of run dir by using the startup argument --run-dir or the environment variable SAPPORO_RUN_DIR.

The run dir structure is as follows. You can initialize and delete each run by physical deletion with rm.

$ tree run
.
└── 29
    └── 29109b85-7935-4e13-8773-9def402c7775
        ├── cmd.txt
        ├── end_time.txt
        ├── exe
           └── workflow_params.json
        ├── exit_code.txt
        ├── outputs
           ├── ERR034597_1.small.fq.trimmed.1P.fq
           ├── ERR034597_1.small.fq.trimmed.1U.fq
           ├── ERR034597_1.small.fq.trimmed.2P.fq
           ├── ERR034597_1.small.fq.trimmed.2U.fq
           ├── ERR034597_1.small_fastqc.html
           └── ERR034597_2.small_fastqc.html
        ├── outputs.json
        ├── run.pid
        ├── run_request.json
        ├── start_time.txt
        ├── state.txt
        ├── stderr.log
        ├── stdout.log
        └── workflow_engine_params.txt
├── 2d
│   └── ...
└── 6b
    └── ...

The execution of POST /runs is very complex. Examples using Python's requests are provided in GitHub - sapporo/tests/post_runs_examples. Please use this as a reference.

run.sh

We use run.sh to abstract the workflow engine. When POST /runs is called, SAPPORO fork the execution of run.sh after dumping the necessary files to run dir. Therefore, you can apply various workflow engines to WES by editing run.sh.

The default position of run.sh is under the application directory of SAPPORO. You can override it by using the startup argument --run-sh or the environment variable SAPPORO_RUN_SH.

Other Startup Arguments

You can change the host and port used by the application by using the startup arguments (--host and --port) or the environment variables SAPPORO_HOST and SAPPORO_PORT.

The following two startup arguments and environment variables are provided to limit the WES.

  • --disable-get-runs
    • SAPPORO_GET_RUNS: True or False.
    • Disable GET /runs.
      • When using WES with an unspecified number of people, by knowing the run_id, you can see the run's contents and cancel the run of other people.
      • Because run_id itself is automatically generated using uuid4, it is difficult to know it in brute force.
  • --disable-workflow-attachment
    • SAPPORO_WORKFLOW_ATTACHMENT: True or False.
    • Disable workflow_attachment in POST /runs.
      • The workflow_attachment field is used to attach files for executing workflows.
      • There is a security concern because anything can be attached.
  • --url-prefix.
    • SAPPORO_URL_PREFIX.
    • Set the URL PREFIX.
      • If --url-prefix /foo/bar is set, GET /service-info becomes GET /foo/bar/service-info.

The contents of the response of GET /service-info are managed in service-info.json. The default location of service-info.json is under the application directory of SAPPORO. You can override by using the startup argument --service-info or the environment variable SAPPORO_SERVICE_INFO.

Development

You can start the development environment as follows.

$ docker-compose -f docker-compose.dev.yml up -d --build
$ docker-compose -f docker-compose.dev.yml exec app bash

We use flake8, isort, and mypy as the Linter.

$ bash ./tests/lint_and_style_check/flake8.sh
$ bash ./tests/lint_and_style_check/isort.sh
$ bash ./tests/lint_and_style_check/mypy.sh

We use pytest as a Test Tool.

$ pytest .

License

Apache-2.0. See the LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sapporo-1.0.4.tar.gz (21.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sapporo-1.0.4-py3-none-any.whl (26.0 kB view details)

Uploaded Python 3

File details

Details for the file sapporo-1.0.4.tar.gz.

File metadata

  • Download URL: sapporo-1.0.4.tar.gz
  • Upload date:
  • Size: 21.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.6.9

File hashes

Hashes for sapporo-1.0.4.tar.gz
Algorithm Hash digest
SHA256 5aa182d6b52fd335d15a019f1e819c797999a419ddc68ba81ccc31de5815d8f7
MD5 569813425affd707f514a3bb4e8f0a62
BLAKE2b-256 63ff8f1986a8abad114bc5508a307681d9af374593fc569d22c310e1190f169b

See more details on using hashes here.

File details

Details for the file sapporo-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: sapporo-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 26.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.6.9

File hashes

Hashes for sapporo-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 cd61cb37b8df88429f7a54a2ce8efae4d50400cd631ecd663ea9ef90019b1dfb
MD5 7793c6cc380f2915571c82c5aa3c78ce
BLAKE2b-256 6b447229719b6fb9a1054631f087efeac16646824c8ae75c6da2c45dcae03f59

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page