Airflow provider for Versatile Data Kit.
Project description
Versatile Data Kit Airflow provider
A set of Airflow operators, sensors and a connection hook intended to help schedule Versatile Data Kit jobs using Apache Airflow.
Usage
To install it simply run:
pip install airflow-provider-vdk
Then you can create a workflow of data jobs (deployed by VDK Control Service) like this:
from datetime import datetime
from airflow import DAG
from vdk_provider.operators.vdk import VDKOperator
with DAG(
"airflow_example_vdk",
schedule_interval=None,
start_date=datetime(2022, 1, 1),
catchup=False,
tags=["example", "vdk"],
) as dag:
trino_job1 = VDKOperator(
conn_id="vdk-default",
job_name="airflow-trino-job1",
team_name="taurus",
task_id="trino-job1",
)
trino_job2 = VDKOperator(
conn_id="vdk-default",
job_name="airflow-trino-job2",
team_name="taurus",
task_id="trino-job2",
)
transform_job = VDKOperator(
conn_id="vdk-default",
job_name="airflow-transform-job",
team_name="taurus",
task_id="transform-job",
)
[trino_job1, trino_job2] >> transform_job
Example
Demo
You can see demo during one of the community meetings here: https://www.youtube.com/watch?v=c3j1aOALjVU&t=690s
Architecture
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Close
Hashes for airflow-provider-vdk-0.0.872432630.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | f10ffb572154e7760fa48600534e33f92922ed1b3a2ad308f58ee2853f1fb6a8 |
|
MD5 | a4e83c9ee943f7f6d4ff54190e238258 |
|
BLAKE2b-256 | 4f0ac3b161f77ac76ade8baa9808b68d59ab4080dd13ce5e3ccb28231a1359ef |