Skip to main content

A sample Apache Airflow provider package built by Astronomer.

Project description

Airflow Airflow

Apache Airflow Provider for SkyPilot

A provider you can utilize multiple clouds on Apache Airflow through SkyPilot.


Installation

The SkyPilot provider for Apache Airflow was developed and tested on an environment with the following dependencies installed:

The installation of the SkyPilot provider may start from the Airflow environment configured with Docker instructed in "Running Airflow in Docker". Base on the docker configuration, add a pip install command in the Dockerfile and build your own Docker image.

RUN pip install --user airflow-provider-skypilot

Then, make sure that SkyPilot is properly installed and initialized on the same environment. The initialization includes cloud account setup and access verification. Please refer to SkyPilot Installation for more information.

Configuration

A SkyPilot provider process runs on an Airflow worker, but it stores its metadata into the Airflow master node. This scheme allows a set of consecutive sky tasks runs across multiple workers by sharing the metadata.

Following settings in the docker-compose.yaml defines the data sharing, including cloud credentials, metadata and workspace.

x-airflow-common:
  environment:
    volumes:
      - ${AIRFLOW_PROJ_DIR:-.}/dags:/opt/airflow/dags
      - ${AIRFLOW_PROJ_DIR:-.}/logs:/opt/airflow/logs
      - ${AIRFLOW_PROJ_DIR:-.}/config:/opt/airflow/config
      - ${AIRFLOW_PROJ_DIR:-.}/plugins:/opt/airflow/plugins
      # mount cloud credentials
      - ${HOME}/.aws:/opt/airflow/sky_home_dir/.aws
      - ${HOME}/.azure:/opt/airflow/sky_home_dir/.azure
      - ${HOME}/.config/gcloud:/opt/airflow/sky_home_dir/.config/gcloud
      - ${HOME}/.scp:/opt/airflow/sky_home_dir/.scp
      # mount sky metadata 
      - ${HOME}/.sky:/opt/airflow/sky_home_dir/.sky
      - ${HOME}/.ssh:/opt/airflow/sky_home_dir/.ssh
      # mount sky working dir
      - ${HOME}/sky_workdir:/opt/airflow/sky_home_dir/sky_workdir

This example mounts the cloud credentials for AWS, Azure, GCP, and SCP, which have been made by SkyPilot could account setup. For SkyPilot metadata, check .sky/ and .ssh/ are placed in your ${HOME} directory and mount them. Additionally, you can mount your own directory like sky_workdir/ for user resources including user codes and yaml task definition files for Skypilot execution.

Note that all Sky directories are mounted under sky_home_dir/. They will be symbolic-linked to ${HOME}/ in workers where a SkyPilot provider process actually runs.

Usage

The SkyPilot provider includes the following operators:

  • SkyLaunchOperator
  • SkyExecOperator
  • SkyDownOperator
  • SkySSHOperator
  • SkyRsyncUpOperator
  • SkyRsyncDownOperator

SkyLaunchOperator creates an cloud cluster and executes a Sky task, as shown below:

sky_launch_task = SkyLaunchOperator(
    task_id="sky_launch_task",
    sky_task_yaml="~/sky_workdir/my_task.yaml",
    cloud="cheapest", # aws|azure|gcp|scp|ibm ...
    gpus="A100:1",
    minimum_cpus=16,
    minimum_memory=32,
    auto_down=False,
    sky_home_dir='/opt/airflow/sky_home_dir', #set by default
    dag=dag
)

Once SkyLaunchOperator creates a Sky cluster with auto_down=False, the created cluster can be utilized by the other Sky operators. Please refer to an example dag for multiple Sky operators running on a single Sky cluster.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

airflow_provider_skypilot-0.1.3-py3-none-any.whl (23.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page