Skip to main content

Create sequential synthetic data of mixed types using a GAN.

Project description


This repository is part of The Synthetic Data Vault Project, a project from DataCebo.

Development Status PyPi Shield Tests Downloads Coverage Status Binder Slack

Overview

DeepEcho is a Synthetic Data Generation Python library for mixed-type, multivariate time series. It provides:

  1. Multiple models based both on classical statistical modeling of time series and the latest in Deep Learning techniques.
  2. A robust benchmarking framework for evaluating these methods on multiple datasets and with multiple metrics.
  3. Ability for Machine Learning researchers to submit new methods following our model and sample API and get evaluated.
Important Links
:computer: Website Check out the SDV Website for more information about the project.
:orange_book: SDV Blog Regular publshing of useful content about Synthetic Data Generation.
:book: Documentation Quickstarts, User and Development Guides, and API Reference.
:octocat: Repository The link to the Github Repository of this library.
:keyboard: Development Status This software is in its Pre-Alpha stage.
Community Join our Slack Workspace for announcements and discussions.
Tutorials Run the SDV Tutorials in a Binder environment.

Install

DeepEcho is part of the SDV project and is automatically installed alongside it. For details about this process please visit the SDV Installation Guide

Optionally, DeepEcho can also be installed as a standalone library using the following commands:

Using pip:

pip install deepecho

Using conda:

conda install -c pytorch -c conda-forge deepecho

For more installation options please visit the DeepEcho installation Guide

Quickstart

DeepEcho is included as part of SDV to model and sample synthetic time series. In most cases, usage through SDV is recommeded, since it provides additional functionalities which are not available here. For more details about how to use DeepEcho whithin SDV, please visit the corresponding User Guide:

Standalone usage

DeepEcho can also be used as a standalone library.

In this short quickstart, we show how to learn a mixed-type multivariate time series dataset and then generate synthetic data that resembles it.

We will start by loading the data and preparing the instance of our model.

from deepecho import PARModel
from deepecho.demo import load_demo

# Load demo data
data = load_demo()

# Define data types for all the columns
data_types = {
    'region': 'categorical',
    'day_of_week': 'categorical',
    'total_sales': 'continuous',
    'nb_customers': 'count',
}

model = PARModel(cuda=False)

If we want to use different settings for our model, like increasing the number of epochs or enabling CUDA, we can pass the arguments when creating the model:

model = PARModel(epochs=1024, cuda=True)

Notice that for smaller datasets like the one used on this demo, CUDA usage introduces more overhead than the gains it obtains from parallelization, so the process in this case is more efficient without CUDA, even if it is available.

Once we have created our instance, we are ready to learn the data and generate new synthetic data that resembles it:

# Learn a model from the data
model.fit(
    data=data,
    entity_columns=['store_id'],
    context_columns=['region'],
    data_types=data_types,
    sequence_index='date'
)

# Sample new data
model.sample(num_entities=5)

The output will be a table with synthetic time series data with the same properties to the demo data that we used as input.

What's next?

For more details about DeepEcho and all its possibilities and features, please check and run the tutorials.

If you want to see how we evaluate the performance and quality of our models, please have a look at the SDGym Benchmarking framework.

Also, please feel welcome to visit our contributing guide in order to help us developing new features or cool ideas!




The Synthetic Data Vault Project was first created at MIT's Data to AI Lab in 2016. After 4 years of research and traction with enterprise, we created DataCebo in 2020 with the goal of growing the project. Today, DataCebo is the proud developer of SDV, the largest ecosystem for synthetic data generation & evaluation. It is home to multiple libraries that support synthetic data, including:

  • 🔄 Data discovery & transformation. Reverse the transforms to reproduce realistic data.
  • 🧠 Multiple machine learning models -- ranging from Copulas to Deep Learning -- to create tabular, multi table and time series data.
  • 📊 Measuring quality and privacy of synthetic data, and comparing different synthetic data generation models.

Get started using the SDV package -- a fully integrated solution and your one-stop shop for synthetic data. Or, use the standalone libraries for specific needs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepecho-0.8.1.tar.gz (30.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deepecho-0.8.1-py3-none-any.whl (28.1 kB view details)

Uploaded Python 3

File details

Details for the file deepecho-0.8.1.tar.gz.

File metadata

  • Download URL: deepecho-0.8.1.tar.gz
  • Upload date:
  • Size: 30.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for deepecho-0.8.1.tar.gz
Algorithm Hash digest
SHA256 7589d9b1be1a482a879caca7f674acf1195441de0c8ae020dd1c17a726472f86
MD5 bf7e57c7b04b800f696cd9c69e5af833
BLAKE2b-256 f4d768d071d98a2a921121f4e2f2a78ece38ce83dcdeb5dadad42d207b153e07

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepecho-0.8.1.tar.gz:

Publisher: release.yml on sdv-dev/DeepEcho

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepecho-0.8.1-py3-none-any.whl.

File metadata

  • Download URL: deepecho-0.8.1-py3-none-any.whl
  • Upload date:
  • Size: 28.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for deepecho-0.8.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1706f85e479b8be5cedfbb14d9823eee5fddff9f3d13e73691241af7bd874e84
MD5 ae0c92c57e9f05d40ca75f55ed23366b
BLAKE2b-256 d1dd43e447dbac86b38e7ac4afc38f24efc396b0bd380d172edf3aa2635e1364

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepecho-0.8.1-py3-none-any.whl:

Publisher: release.yml on sdv-dev/DeepEcho

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page