Skip to main content

A package to test our databases

Project description

database-testing-tools

Database Testing Tools comprises a suite of generalised tests to run on your database, and produce reports for a notebook. It is designed for pre-production checks for curated databases, but can also be useful for monitoring changes over time. It can be particularly useful when combined with papermill, which can be used to parameterize, execute and store notebooks as part of your pipeline.

Installation

pip install database_testing_tools

Usage

The package implements a Tester class, from which you can invoke a number of built-in tests and optional plotting features.

from database_testing_tools import Tester
test = Tester()
database = "my_database_name"
out = test.check_all_tables_return_data(db_name=database)

Tests return two outputs: the result of the test; and some nice graphical representation of the test - either a markdown table, or a plot using the Altair plotting library.

The class also collects the results of the tests for an overall report, which you can access with the Tester.get_results_table() function.

You can find a notebook which demonstrates the functions available in the package.

Curated database assumptions - SCD2 vs Snapshotting

The tests in this package assume the database being tested is versioned in one of two ways:

  1. Slowly Changing Dimensions type 2 (SCD2): where the database includes records of inserts and updates to records. In curated databases a start time and end time will be added to each record. The start time is when the record is either created or last updated; the end time is either when the record was next updated, or a date arbitrarily far in the future to signify that it is the current version of the record. By filtering on these dates, it is possible to view the state of the database at a given point in time.
  2. Snapshots: when SCD2 isn't possible or is impractical, versions of the database are recorded with an associated snapshot date based on date of extraction.

These methods require different methods to compare versions of the database over time, so there are some separate functions to handle them in their own way.

Query engines

By default, the package uses pydbtools to run the queries in the package on a given Amazon Athena database. However, it is possible to connect to your own database by making an extension of the BaseQueryEngine class defined in utils.py.

An engine for DuckDB is included, and is used for the demo and unit tests in this package. A basic use of the DuckDB engine looks like this:

import duckdb
from database_testing_tools.utils import DuckDbEngine
from database_testing_tools import Tester

conn = duckdb.connect()
query_engine = DuckDbEngine(conn)
tester = Tester(query_engine=query_engine)
database = "some_database"

tester.check_all_tables_return_data(db_name=database)

Alternative database engines should follow a similar pattern to the DuckDB engine - an initialiser that includes the database connection object, and an implementation of the run_query function that runs a given SQL query and returns the result as a pandas dataframe. As the built-in tests use the Trino SQL dialect, it is recommended to transpile the SQL with sqlglot to ensure the tests are executed correctly.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

database_testing_tools-1.0.5.tar.gz (14.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

database_testing_tools-1.0.5-py3-none-any.whl (13.6 kB view details)

Uploaded Python 3

File details

Details for the file database_testing_tools-1.0.5.tar.gz.

File metadata

  • Download URL: database_testing_tools-1.0.5.tar.gz
  • Upload date:
  • Size: 14.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for database_testing_tools-1.0.5.tar.gz
Algorithm Hash digest
SHA256 14abd039803757e4ea00e152681aeb6be0d0e9cabb612e0915536fc850c6dc76
MD5 58428738d7611768747f174caf8f42ed
BLAKE2b-256 e20a1af59eb7e15fe811d3842aae06b45714635c0a81da8b7a50ace28fa59331

See more details on using hashes here.

Provenance

The following attestation bundles were made for database_testing_tools-1.0.5.tar.gz:

Publisher: poetry-pypi-release.yml on moj-analytical-services/database-testing-tools

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file database_testing_tools-1.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for database_testing_tools-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 0519d1f02ffa137ceae789d392fc072339ec5f114aaf15f929ffd6ae85f3941d
MD5 d6fbd9f34f66b45ff6c245a0b24fda35
BLAKE2b-256 b73f1435d145a7c8cb69edc56fc1244addb3b51b775fccb1a9a8cd780314f67d

See more details on using hashes here.

Provenance

The following attestation bundles were made for database_testing_tools-1.0.5-py3-none-any.whl:

Publisher: poetry-pypi-release.yml on moj-analytical-services/database-testing-tools

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page