Skip to main content

Data generation

Project description

superstore

High-performance synthetic data generation library for testing and development.

Build Status codecov License PyPI

Overview

superstore is a Rust-powered Python library for generating realistic synthetic datasets. It provides:

Data Generators

Generator Description Use Cases
Retail Sales transactions, employees BI dashboards, forecasting
Time Series Financial-style series with regimes, jumps Quant research, backtesting
Weather Sensor data with seasonal/diurnal patterns IoT analytics, anomaly detection
Logs Web server & application logs Observability, alerting
Finance Stock prices, OHLCV, options chains Trading systems, risk analysis
Telemetry Machine metrics, anomalies, failures DevOps dashboards, ML training

Statistical Tools

Tool Description Use Cases
Distributions Sample from statistical distributions Simulation, Monte Carlo
Copulas Correlated multivariate data Risk modeling, portfolio analysis
Temporal Models AR, Markov chains, random walks Time series simulation

Key Features

  • Rust-powered: High-performance generation, 10-100x faster than pure Python
  • Flexible output: pandas DataFrame, polars DataFrame, or Python dicts
  • Configurable: Pydantic config classes for validated, structured configuration
  • Reproducible: Seed support for deterministic generation
  • Scalable: Streaming and parallel generation for large datasets

Installation

pip install superstore

For development with polars support:

pip install superstore[develop]

Quick Start

from superstore import superstore, employees, timeseries, weather

# Generate 1000 retail records as a pandas DataFrame
df = superstore(count=1000)

# Generate as polars DataFrame
df_polars = superstore(count=1000, output="polars")

# Generate as list of dicts
records = superstore(count=1000, output="dict")

Reproducibility with Seeds

All data generators support an optional seed parameter for reproducible random data generation:

from superstore import superstore, employees, getTimeSeries, machines

# Same seed produces identical data
df1 = superstore(count=100, seed=42)
df2 = superstore(count=100, seed=42)
assert df1.equals(df2)  # True

# Works with all generators
employees_df = employees(count=50, seed=123)
timeseries_df = timeseries(nper=30, seed=456)
weather_df = weather(count=100, seed=789)
machine_list = machines(count=10, seed=321)

# No seed means random data each time
df3 = superstore(count=100)  # Different each call

Development

Setup

# Clone the repository
git clone https://github.com/1kbgz/superstore.git
cd superstore

# Install development dependencies
make develop

Building

# Build Python wheel
make build

Testing

# Run all tests
make test

Linting

# Run linters
make lint

# Fix formatting
make fix

Architecture

superstore uses a hybrid Rust/Python architecture:

  • rust/: Core Rust library with all data generation logic
  • src/: PyO3 bindings exposing Rust functions to Python
  • superstore/: Python package with native module

The core data generation is implemented in Rust for performance, with PyO3 providing seamless Python integration. Output format conversion (pandas/polars/dict) happens in the Rust bindings layer.

License

This library is released under the Apache 2.0 license

[!NOTE] This library was generated using copier from the Base Python Project Template repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

superstore-0.3.2.tar.gz (142.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

superstore-0.3.2-cp310-abi3-win_AMD64.whl (2.1 MB view details)

Uploaded CPython 3.10+Windows x86-64

superstore-0.3.2-cp310-abi3-manylinux_2_28_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ x86-64

superstore-0.3.2-cp310-abi3-macosx_11_0_arm64.whl (2.1 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

File details

Details for the file superstore-0.3.2.tar.gz.

File metadata

  • Download URL: superstore-0.3.2.tar.gz
  • Upload date:
  • Size: 142.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for superstore-0.3.2.tar.gz
Algorithm Hash digest
SHA256 8292b7e5f915768ebad26b732438422a76b11866c58302076072a2dd4a2b287b
MD5 c05307594be6c6f6b0357f8e9bec42c3
BLAKE2b-256 7aeaf390415f052ffff288184e4e1d8dec5be3c70d159f245f37bd157e84b23b

See more details on using hashes here.

File details

Details for the file superstore-0.3.2-cp310-abi3-win_AMD64.whl.

File metadata

  • Download URL: superstore-0.3.2-cp310-abi3-win_AMD64.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for superstore-0.3.2-cp310-abi3-win_AMD64.whl
Algorithm Hash digest
SHA256 c374abd3ea6b67582950514a67ed4ec1e47cfd66983872802d6e45faa8467b10
MD5 e570b5847fd92fd505dd4992cc27ee3b
BLAKE2b-256 0461ede8c9e4ee92237301113e2f3b4177db050ae3c115b0d4ba5e857b48d740

See more details on using hashes here.

File details

Details for the file superstore-0.3.2-cp310-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for superstore-0.3.2-cp310-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f7a149ab09de8e7dc077bfc02c12275ce6a1254f7e0a46f59d6d0ee20305493c
MD5 0394810b4a3d9573dc6503eafdded247
BLAKE2b-256 621ec6cea1192eaf0847f6ce3afbc0eef2b2ca8ff47da6f97141b7f23611bc50

See more details on using hashes here.

File details

Details for the file superstore-0.3.2-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for superstore-0.3.2-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e243aad86d55fe9a94e3ce81ec968c5bc1564cb9b235090894795c5e6f86d657
MD5 5884cc0da74b36e3661333206482e7b4
BLAKE2b-256 f6b18f44af30223172de0f989fe51e3e8c40aa8275227f01753e8a7c2f086589

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page