Skip to main content

A scalable, fast, ACID-compliant Data Catalog powered by Ray.

Project description

DeltaCAT

DeltaCAT is a Pythonic Data Catalog powered by Ray.

Its data storage model allows you to define and manage fast, scalable, ACID-compliant data catalogs through git-like stage/commit APIs, and has been used to successfully host exabyte-scale enterprise data lakes.

DeltaCAT uses the Ray distributed compute framework together with Apache Arrow for common table management tasks, including petabyte-scale change-data-capture, data consistency checks, and table repair.

Getting Started

Install

pip install deltacat

Running Tests

pip3 install virtualenv
virtualenv test_env
source test_env/bin/activate
pip3 install -r requirements.txt

pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deltacat-1.1.4.tar.gz (189.0 kB view hashes)

Uploaded Source

Built Distribution

deltacat-1.1.4-py3-none-any.whl (279.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page