Skip to main content

Airless is a package that aims to build a serverless and lightweight orchestration platform, creating workflows of multiple tasks being executed on FaaS platform

Project description

Airless

PyPI version

Airless is a package that aims to build a serverless and lightweight orchestration platform, creating workflows of multiple tasks being executed on Google Cloud Functions

Why not just use Apache Airflow?

Airflow is the industry standard when we talk about job orchestration and worflow management. However, in some cases, we believe it may not be the best solution. I would like to highlight 3 main cases we face that Airflow struggles to handle.

  • Serverless

At the beginning of a project we want to avoid dealing with infrastructure since it demands time and it has a fixed cost to reserve an instance to run Airflow. Since we didn't have that many jobs, it didn't make sense to have an instance of Airflow up 24-7.

When the project starts to get bigger and, if we use Airflow's instance to run the tasks, we start facing performance issues on the workflow.

In order to avoid this problems we decided to build a 100% serverless platform.

  • Parallel processing

The main use case we designed Airless for is for data scrappers. The problem with data scrappers is that normally you want them to process a lot of tasks in parallel, for instance, first you want to fetch a website and collect all links in that page and send them forward for another task to be executed and then that task does the same and so on and so forth.

Building this workflow that does not know before hand how many tasks are going to be executed is something hard be built on Airflow.

  • Data sharing between tasks

In order to built this massive parallel processing workflow that we explained on the previous topic, we need to be able to dynamically create and send data to the next task. So use the data from the first task as a trigger and an input data for the next tasks.

How it works

Airless builts its workflows based on Google Cloud Functions, Google Pub/Sub and Google Cloud Scheduler.

  1. Everything starts with the Cloud Scheduler, which is a serverless product from Google Cloud that is able to publish a message to a Pub/Sub with a cron scheduler
  2. When a message is published to a Pub/Sub it can trigger a Cloud Function and get executed with that message as an input
  3. This Cloud Functions is able to publish as many messages as it wants to as many Pub/Sub topics as it wants
  4. Repeat from 2

Preparation

Environment variables

  • ENV
  • GCP_PROJECT
  • PUBSUB_TOPIC_ERROR
  • LOG_LEVEL
  • PUBSUB_TOPIC_EMAIL_SEND
  • PUBSUB_TOPIC_SLACK_SEND
  • BIGQUERY_DATASET_ERROR
  • BIGQUERY_TABLE_ERROR
  • EMAIL_SENDER_ERROR
  • EMAIL_RECIPIENTS_ERROR
  • SLACK_CHANNELS_ERROR

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airless-0.0.60.dev10.tar.gz (21.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

airless-0.0.60.dev10-py3-none-any.whl (27.8 kB view details)

Uploaded Python 3

File details

Details for the file airless-0.0.60.dev10.tar.gz.

File metadata

  • Download URL: airless-0.0.60.dev10.tar.gz
  • Upload date:
  • Size: 21.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.7

File hashes

Hashes for airless-0.0.60.dev10.tar.gz
Algorithm Hash digest
SHA256 39064c9285d37d34c2d6712f8989a9da8f31a6cc90b14fb5ce57f86766f516d0
MD5 7b57faeb7c05520a78e571402131c0a3
BLAKE2b-256 94103abd89c5ec2d7054d70a1e7a7d99f84c7f527da240c7e1f70f25b3bc7267

See more details on using hashes here.

File details

Details for the file airless-0.0.60.dev10-py3-none-any.whl.

File metadata

  • Download URL: airless-0.0.60.dev10-py3-none-any.whl
  • Upload date:
  • Size: 27.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.7

File hashes

Hashes for airless-0.0.60.dev10-py3-none-any.whl
Algorithm Hash digest
SHA256 3819c16f3de0a4dd2fb5e21bc28bf1cec1148f4e06051bef62c2b79f922fec77
MD5 d7ce5633d5a66ae26b8549ea13f2dfda
BLAKE2b-256 862de3e92d752ea0d0cb88093a5175b422917fd748b3e9bf2a09cd0ece4ef302

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page