Python library for building ETL pipelines involving Synapse and data processing workflows
Project description
Sage Prefect Tasks
⚠️ Warning: This repository is a work in progress. ⚠️
Python package of useful Prefect tasks for common use cases at Sage Bionetworks.
Some thoughts are included below the Demo Flow and Usage.
Inspired by Pocket/data-flows.
Demo Flow
Demo Usage
Getting access
To run this demo, you'll need the following access:
- You need to ask Bruno for edit-access on the INCLUDE Sandbox Synapse project.
- You need to ask Bruno for edit-access on the include-sandbox Cavatica project.
Getting set up
# Create a virtual environment with the Python dependencies
pipenv install
# Copy the example `.env` file and update the auth tokens
cp .env.example .env
Run the flow at the command line
You'll need to get set up first.
# Run the demo (pipenv will automatically load the `.env` file)
pipenv run python demo.py
Inspect the flow using the Prefect Server UI
You'll need to get set up first.
# Deploy Prefect Server (Orion)
prefect orion start
# Explore the flow runs in Prefect Server
# Usually hosted at http://127.0.0.1:4200/
# Stop the running server with Ctrl-C
Thoughts
-
The
CavaticaBaseTask
demonstrates a use case for classes (i.e. extendingTask
) as opposed to functions (i.e. decorated by@task
). On the other hand,SynapseBaseTask
doesn't really benefit from the class structure. -
The SevenBridges Python client embeds the client instance into every resource object, which prevents
cloudpickle
to serialize these objects due toTypeError: cannot pickle '_thread.lock' object
.import os import cloudpickle import sevenbridges as sbg api = sbg.Api( url="https://cavatica-api.sbgenomics.com/v2", token=os.environ["SB_AUTH_TOKEN"] ) proj = api.projects.query(name="include-sandbox")[0] proj._API = None proj._api = None proj._data.api = None pickle = cloudpickle.dumps(proj)
Note
This project has been set up using PyScaffold 4.3. For details and usage information on PyScaffold see https://pyscaffold.org/.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for sagetasks-0.4.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6bf0f9fa7aa158160c43727c4c1f177331d5ce5cfd9081c97bf2b979e8cf83c7 |
|
MD5 | 4d36f5b8bba6a643fd813101c4bdf019 |
|
BLAKE2b-256 | cc3fcdbd66267433dc8be5b6dbc4c032742df9358eac4fd0941e34cc774630f8 |