Versatile Data Kit SDK plugin provides support for trino database and trino transformation templates.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

This plugin allows vdk-core to interface with and execute queries against a Trino database. Additionally, it can collect lineage data, assuming a lineage logger has been provided through the vdk-core configuration.

Usage

Run

pip install vdk-trino

After this data jobs will have access to Trino database connection managed by Versatile Data Kit SDK.

If it is the only database plugin installed , vdk would automatically use it. Otherwise, users need to set VDK_DB_DEFAULT_TYPE=TRINO as an environment variable or set 'db_default_type' option in the data job config file (config.ini).

For example

    def run(job_input: IJobInput):
        job_input.execute_query("select 'Hi Trino!'")

Templates

vdk-trino comes with pre-defined templates for SQL transformations

SCD1 - Slowly Changing Dimension type 1:
- See usage documentation here
SCD2 - Slowly Changing Dimension Type 2:
- See usage documentation here

Lineage

The package gathers lineage data for all Trino SQL queries executed in a data job

Other plugins can read that lineage data log it. They need to provide ILineageLogger implementation and hook this way:

    @hookimpl
    def vdk_initialize(context: CoreContext) -> None:
        context.state.set(StoreKey[ILineageLogger]("trino-lineage-logger"), MyLogger())

Ingestion

This plugin allows users to ingest data to a Trino database, which can be preferable to inserting data manually as it automatically handles serializing, packaging and sending of the data asynchronously with configurable batching and throughput. To do so, you must set the expected variables to connect to Trino, plus the following environment variable:

export VDK_INGEST_METHOD_DEFAULT=TRINO

Then, from inside the run function in a Python step, you can use the send_object_for_ingestion or send_tabular_data_for_ingestion methods to ingest your data.

Multiple Trino Database Connections

Configuring Multiple Trino Databases

To effectively manage multiple Trino database connections within a data job, configure the default database in the [vdk] section of the config.ini file. This section should contain the primary connection details that the application will use by default.

For each additional Trino database, add a new section following the pattern vdk_<name>, where <name> is a unique identifier for each database connection. These additional sections must also include all necessary Trino connection details.

Example `config.ini` with Multiple Trino Databases

[vdk]
trino_user=user
trino_password=password
trino_host=localhost
trino_port=28080
trino_schema=default
trino_catalog=memory
trino_use_ssl=True

[vdk_trino_reports]
trino_user=reports_user
trino_password=reports_password
trino_host=reports_host
trino_port=28081
trino_schema=reports
trino_catalog=memory
trino_use_ssl=False

You can specify which database to use in your data job by referencing the specific section name.

def run(job_input):

    # Querying the default Trino database
    default_query = "SELECT * FROM default_table"
    job_input.execute_query(sql=default_query, database="trino") # database option can be omitted

    # Querying the reports Trino database
    reports_query = "SELECT * FROM reports_table"
    job_input.execute_query(sql=reports_query, database="trino_reports") # database is mandatory; if omitted query will be executed against default db

Ingestion into Multiple Trino Databases

For data ingestion, you can also specify the target database to ensure the data is sent to the correct Trino instance.

def run(job_input):

    # Ingest data into the default database
    payload_default = {"col1": "value1", "col2": "value2"}
    job_input.send_object_for_ingestion(
        payload=payload_default,
        destination_table="default_table",
        method="trino"
    )

    # Ingest data into the reports database
    payload_reports = {"col1": "value3", "col2": "value4"}
    job_input.send_object_for_ingestion(
        payload=payload_reports,
        destination_table="reports_table",
        method="trino_reports"
    )

Configuration

Run vdk config-help - search for those prefixed with "TRINO_" to see what configuration options are available.

Testing

Testing this plugin locally requires installing the dependencies listed in vdk-plugins/vdk-trino/requirements.txt

Run

pip install -r requirements.txt

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.4.1273335489

Apr 30, 2024

0.4.1271786026

Apr 29, 2024

0.4.1245476944

Apr 9, 2024

0.4.1190994517

Feb 26, 2024

0.4.1184833162

Feb 21, 2024

0.4.1156222304

Jan 29, 2024

0.4.1073094274

Nov 15, 2023

0.4.948436673

Jul 28, 2023

0.4.944393829

Jul 25, 2023

0.4.824443273

Mar 31, 2023

0.4.802490643

Mar 10, 2023

0.4.792866594

Mar 1, 2023

0.4.703555598

Nov 23, 2022

0.4.664990419

Oct 12, 2022

0.4.605101952

Aug 4, 2022

0.4.604201902

Aug 4, 2022

0.4.582131318

Jul 7, 2022

0.3.582107133

Jul 7, 2022

0.3.520417292

Apr 20, 2022

0.2.510593172

Apr 6, 2022

0.2.492790625

Mar 15, 2022

0.2.487135623

Mar 8, 2022

0.2.477708478

Feb 23, 2022

0.2.476585195

Feb 22, 2022

0.1.461192871

Feb 1, 2022

0.1.460149153

Jan 31, 2022

0.1.433653387

Dec 20, 2021

0.1.415648530

Nov 24, 2021

0.1.415625538

Nov 24, 2021

0.1.414800992

Nov 23, 2021

0.1.414725588

Nov 23, 2021

0.1.385075289

Oct 8, 2021

0.1.384822581

Oct 8, 2021

0.1.379170167

Sep 29, 2021

0.1.377908503

Sep 27, 2021

0.1.369062590

Sep 11, 2021

0.1.367818405

Sep 9, 2021

0.1.364174863

Sep 2, 2021

0.1.363986988

Sep 2, 2021

0.1.359047592

Aug 25, 2021

0.1.358698086

Aug 24, 2021

0.1.357627944

Aug 23, 2021

0.1.355305639

Aug 18, 2021

0.1.354434383

Aug 17, 2021

0.1.353684692

Aug 16, 2021

0.1.352665786

Aug 13, 2021

0.1.352155979

Aug 12, 2021

0.1.351079614

Aug 10, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vdk_trino-0.4.1273335489.tar.gz (27.9 kB view hashes)

Uploaded Apr 30, 2024 Source

Hashes for vdk_trino-0.4.1273335489.tar.gz

Hashes for vdk_trino-0.4.1273335489.tar.gz
Algorithm	Hash digest
SHA256	`651cc870f14c831f491f2891b3c67a504b1020b9859a1702fba7c534aa48499b`
MD5	`b646e8440499ebcd61f5b42b4e3e4ad3`
BLAKE2b-256	`28222847a9e82888d4d796899a8c3364891d053527e61fe6a136712866bc5e85`

vdk-trino 0.4.1273335489

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Classifiers

Project description

Usage

Templates

Lineage

Ingestion

Multiple Trino Database Connections

Configuring Multiple Trino Databases

Example `config.ini` with Multiple Trino Databases

Ingestion into Multiple Trino Databases

Configuration

Testing

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

vdk-trino 0.4.1273335489

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Classifiers

Project description

Usage

Templates

Lineage

Ingestion

Multiple Trino Database Connections

Configuring Multiple Trino Databases

Example config.ini with Multiple Trino Databases

Ingestion into Multiple Trino Databases

Configuration

Testing

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Example `config.ini` with Multiple Trino Databases