airbyte-serverless

Airbyte made easy (no UI, no database, no cluster)

These details have not been verified by PyPI

Project links

Download

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

logo

Airbyte made simple

Why `airbyte_serverless` ?

At Unytics, we ❤️ Airbyte which provides a catalog of open-source connectors to move your data from any source to your data-warehouse.

Airbyte Open-Source Platform is "batteries included" 🔋.
You'll get a server, workers, database, UI, orchestrator, connectors, secret manager, logs manager, etc. All of this is very well packaged and deployable on Kubernetes. While we believe this is great for most people we strive for lightweight and simple assets to deploy and maintain. What's more, we ❤️ serverless.

👉 We wanted a simple tool to manage Airbyte connectors, run them locally or deploy them in serverless mode.

`Airbyte Open-Source Platform` vs `airbyte_serverless`

💡 Airbyte Serverless is less than Airbyte Open-Source Platform

Airbyte Open-Source Platform	Airbyte Serverless
Deployed on a VM or Kubernetes Cluster	Deployed with Serverless - Each Airbyte source docker image is upgraded with a destination connector from `airbyte_serverless` - Each upgraded docker image can then be deployed as an isolated `Cloud Run Job` (or `Cloud Run Service`) - Cloud Run is natively monitored with metrics, dashboards, logs, error reporting, alerting, etc - They can be scheduled or triggred upon cloud events
Has database	Has NO database - The destination stores the `state` (the track of where sync stops) - The destination stores the `logs` which can then be visualized with your preferred BI tool - Connectors Configurations can be stored in config files and versionned in git
Has a UI to edit configuration	Has NO UI Configurations are generated as documented-yaml-files that one can edit and version
Is scalable if deployed on autoscaled Kubernetes Cluster	Is scalable Each connector is deployed independently of each other. You can have as many as you want.
Has a transform layer Airbyte loads your data in a raw format but then enables you to perform basic transform such as replace, upsert, schema normalization	Has NO transform layer Data is appended in your destination in raw format. We believe less is more. `airbyte_serverless` is dedicated to do one thing and do it well: `Extract-Load`. It's easier to maintain and to evolve.

Features

⚡ A lightweight python wrapper around any Airbyte Source executable.

⚡ Destination Connectors (only BigQuery for now - contibutions are welcome 🤗) which store logs and states in addition to data. Thus, there is no need for a database any more!

⚡ Examples to deploy to serverless compute (only Google Cloud Run for now - contibutions are welcome 🤗)

Getting Started

0. Install

pip install airbyte-serverless

1. Create an Airbyte Source from an Airbyte Source Executable

If you have docker installed on your laptop, the easiest is to write the following code in a file getting_started.py (change surveymonkey with the source you want). Then, it should directly work when you run python getting_started.py. If it does not, please raise an issue.

from airbyte_serverless.sources import AirbyteSource

airbyte_source_executable = 'docker run --rm -i airbyte/source-surveymonkey:latest'
source = AirbyteSource(airbyte_source_executable)

If you don't have docker (or don't want to use it)

It is also possible to clone airbyte repo and install a python source connector:

Clone the repo

Go to the directory of the connector: cd airbyte-integrations/connectors/source-surveymonkey

Install the python connector pip install -r requirements.txt

Create here the file getting_started.py and set airbyte_source_executable = 'python main.py'

You can now run python getting_started.py it then should also work. If it does not, please raise an issue.

2. Update `config` for your Airbyte Source

Your Airbyte Source needs some config to be able to connect. Show a pre-filled config for your connector with:

print(source.config)

Copy the content, edit it and update the variable:

source.config = '''
YOUR UPDATED CONFIG
'''

3. Check your `config`

print(source.connection_status)

4. Update `configured_catalog` for your Airbyte Source

The source catalog lists the available streams (think entities) that the source is able to retrieve. The configured_catalog specifies which streams to extract and how. Show the default configured_catalog with:

print(source.configured_catalog)

If needed, copy the content, edit it and update the variable:

source.configured_catalog = {
   ...YOUR UPDATED CONFIG
}

5. Test the retrieval of one data record

print(source.first_record)

6. Create a destination and run Extract-Load

from airbyte_serverless.destinations import BigQueryDestination

destination = BigQueryDestination(dataset='YOUR-PROJECT.YOUR_DATASET')
data = source.extract()
destination.load(data)

7. Run Extract-Load from where you stopped

The state keeps track from where the latest extract-load ended (for incremental extract-load). To start from this state run:

state = destination.get_state()
data = source.extract(state=state)
destination.load(data)

End to End Example

from airbyte_serverless.sources import AirbyteSource
from airbyte_serverless.destinations import BigQueryDestination

airbyte_source_executable = 'docker run --rm -i airbyte/source-surveymonkey:latest'
config = 'YOUR CONFIG'
configured_catalog = {YOUR CONFIGURED CATALOG}
source = AirbyteSource(airbyte_source_executable, config=config, configured_catalog=configured_catalog)

destination = BigQueryDestination(dataset='YOUR-PROJECT.YOUR_DATASET')

state = destination.get_state()
data = source.extract(state=state)
destination.load(data)

Deploy

To deploy to Cloud Run job, edit Dockerfile to pick the Airbyte source you like then run:

Limitations

BigQuery Destination connector only works in append mode
Data at destination is in raw format. No data parsing is done.

We believe, like Airbyte, that it is a good thing to decouple data moving and data transformation. To shape your data you may want to use a tool such as dbt. Thus, we follow the EL-T philosophy.

Credits

The generation of the sample connector configuration in yaml is heavily inspired from the code of octavia CLI developed by airbyte.

Contribute

Any contribution is more than welcome 🤗!

Add a ⭐ on the repo to show your support
Raise an issue to raise a bug or suggest improvements
Open a PR! Below are some suggestions of work to be done:
- improve secrets management
- implement a CLI
- manage configurations as yaml files
- implement the get_logs method of BigQueryDestination
- add a new destination connector (Cloud Storage?)
- add more serverless deployment examples.
- implement optional post-processing (replace, upsert data at destination instead of append?)

Project details

These details have not been verified by PyPI

Project links

Download

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.24

May 28, 2024

0.23

Dec 18, 2023

0.22

Oct 10, 2023

0.21

Oct 9, 2023

0.20

Oct 6, 2023

0.19

Oct 6, 2023

0.18

Sep 22, 2023

0.16

Sep 22, 2023

0.15

Sep 22, 2023

0.14

Sep 13, 2023

0.13

Sep 7, 2023

0.12

Sep 7, 2023

0.11

Sep 7, 2023

0.10

Sep 7, 2023

0.9

Sep 7, 2023

0.8

Sep 7, 2023

0.7

Sep 7, 2023

0.6

Sep 7, 2023

0.5

Sep 7, 2023

0.4

Sep 7, 2023

This version

0.3

Sep 7, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

airbyte_serverless-0.3.tar.gz (80.3 kB view hashes)

Uploaded Sep 7, 2023 Source

Hashes for airbyte_serverless-0.3.tar.gz

Hashes for airbyte_serverless-0.3.tar.gz
Algorithm	Hash digest
SHA256	`5850beec11dda3381d51a89dbd71a68840d9348b5ec73d99228f42ffee8c19ca`
MD5	`cd9990150c86bb0c28c294eb2f77bf5b`
BLAKE2b-256	`c77e5f72d4e00067cb274734e1521bb9b7043c9aa60acb74ac90683bc1035fe9`

airbyte-serverless 0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Why `airbyte_serverless` ?

`Airbyte Open-Source Platform` vs `airbyte_serverless`

Features

Getting Started

0. Install

1. Create an Airbyte Source from an Airbyte Source Executable

2. Update `config` for your Airbyte Source

3. Check your `config`

4. Update `configured_catalog` for your Airbyte Source

5. Test the retrieval of one data record

6. Create a destination and run Extract-Load

7. Run Extract-Load from where you stopped

End to End Example

Deploy

Limitations

Credits

Contribute

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

airbyte-serverless 0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Why airbyte_serverless ?

Airbyte Open-Source Platform vs airbyte_serverless

Features

Getting Started

0. Install

1. Create an Airbyte Source from an Airbyte Source Executable

2. Update config for your Airbyte Source

3. Check your config

4. Update configured_catalog for your Airbyte Source

5. Test the retrieval of one data record

6. Create a destination and run Extract-Load

7. Run Extract-Load from where you stopped

End to End Example

Deploy

Limitations

Credits

Contribute

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Why `airbyte_serverless` ?

`Airbyte Open-Source Platform` vs `airbyte_serverless`

2. Update `config` for your Airbyte Source

3. Check your `config`

4. Update `configured_catalog` for your Airbyte Source