Data generation
Project description
superstore
High-performance synthetic data generation library for testing and development.
Overview
superstore is a Rust-powered Python library for generating realistic synthetic datasets. It provides:
Data Generators
| Generator | Description | Use Cases |
|---|---|---|
| Retail | Sales transactions, employees | BI dashboards, forecasting |
| Time Series | Financial-style series with regimes, jumps | Quant research, backtesting |
| Weather | Sensor data with seasonal/diurnal patterns | IoT analytics, anomaly detection |
| Logs | Web server & application logs | Observability, alerting |
| Finance | Stock prices, OHLCV, options chains | Trading systems, risk analysis |
| Telemetry | Machine metrics, anomalies, failures | DevOps dashboards, ML training |
Statistical Tools
| Tool | Description | Use Cases |
|---|---|---|
| Distributions | Sample from statistical distributions | Simulation, Monte Carlo |
| Copulas | Correlated multivariate data | Risk modeling, portfolio analysis |
| Temporal Models | AR, Markov chains, random walks | Time series simulation |
Key Features
- Rust-powered: High-performance generation, 10-100x faster than pure Python
- Flexible output: pandas DataFrame, polars DataFrame, or Python dicts
- Configurable: Pydantic config classes for validated, structured configuration
- Reproducible: Seed support for deterministic generation
- Scalable: Streaming and parallel generation for large datasets
Installation
pip install superstore
For development with polars support:
pip install superstore[develop]
Quick Start
from superstore import superstore, employees, timeseries, weather
# Generate 1000 retail records as a pandas DataFrame
df = superstore(count=1000)
# Generate as polars DataFrame
df_polars = superstore(count=1000, output="polars")
# Generate as list of dicts
records = superstore(count=1000, output="dict")
Reproducibility with Seeds
All data generators support an optional seed parameter for reproducible random data generation:
from superstore import superstore, employees, getTimeSeries, machines
# Same seed produces identical data
df1 = superstore(count=100, seed=42)
df2 = superstore(count=100, seed=42)
assert df1.equals(df2) # True
# Works with all generators
employees_df = employees(count=50, seed=123)
timeseries_df = timeseries(nper=30, seed=456)
weather_df = weather(count=100, seed=789)
machine_list = machines(count=10, seed=321)
# No seed means random data each time
df3 = superstore(count=100) # Different each call
Development
Setup
# Clone the repository
git clone https://github.com/1kbgz/superstore.git
cd superstore
# Install development dependencies
make develop
Building
# Build Python wheel
make build
Testing
# Run all tests
make test
Linting
# Run linters
make lint
# Fix formatting
make fix
Architecture
superstore uses a hybrid Rust/Python architecture:
- rust/: Core Rust library with all data generation logic
- src/: PyO3 bindings exposing Rust functions to Python
- superstore/: Python package with native module
The core data generation is implemented in Rust for performance, with PyO3 providing seamless Python integration. Output format conversion (pandas/polars/dict) happens in the Rust bindings layer.
License
This library is released under the Apache 2.0 license
[!NOTE] This library was generated using copier from the Base Python Project Template repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file superstore-0.3.2.tar.gz.
File metadata
- Download URL: superstore-0.3.2.tar.gz
- Upload date:
- Size: 142.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8292b7e5f915768ebad26b732438422a76b11866c58302076072a2dd4a2b287b
|
|
| MD5 |
c05307594be6c6f6b0357f8e9bec42c3
|
|
| BLAKE2b-256 |
7aeaf390415f052ffff288184e4e1d8dec5be3c70d159f245f37bd157e84b23b
|
File details
Details for the file superstore-0.3.2-cp310-abi3-win_AMD64.whl.
File metadata
- Download URL: superstore-0.3.2-cp310-abi3-win_AMD64.whl
- Upload date:
- Size: 2.1 MB
- Tags: CPython 3.10+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c374abd3ea6b67582950514a67ed4ec1e47cfd66983872802d6e45faa8467b10
|
|
| MD5 |
e570b5847fd92fd505dd4992cc27ee3b
|
|
| BLAKE2b-256 |
0461ede8c9e4ee92237301113e2f3b4177db050ae3c115b0d4ba5e857b48d740
|
File details
Details for the file superstore-0.3.2-cp310-abi3-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: superstore-0.3.2-cp310-abi3-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 2.6 MB
- Tags: CPython 3.10+, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7a149ab09de8e7dc077bfc02c12275ce6a1254f7e0a46f59d6d0ee20305493c
|
|
| MD5 |
0394810b4a3d9573dc6503eafdded247
|
|
| BLAKE2b-256 |
621ec6cea1192eaf0847f6ce3afbc0eef2b2ca8ff47da6f97141b7f23611bc50
|
File details
Details for the file superstore-0.3.2-cp310-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: superstore-0.3.2-cp310-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 2.1 MB
- Tags: CPython 3.10+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e243aad86d55fe9a94e3ce81ec968c5bc1564cb9b235090894795c5e6f86d657
|
|
| MD5 |
5884cc0da74b36e3661333206482e7b4
|
|
| BLAKE2b-256 |
f6b18f44af30223172de0f989fe51e3e8c40aa8275227f01753e8a7c2f086589
|