Skip to main content

Generic handler for multiple heterogenous numpy arrays and subclasses

Project description

Conda

datastock

Provides a generic class for storing multiple heterogeneous numpy arrays with non-uniform shapes and built-in interactive visualization routines. Also stores the relationships between arrays (e.g.: matching dimensions...) Also provides an elegant way of storing objects of various categories depending on the storeed arrays

The full power of datastock is unveiled when using the DataStock class and sub-classing it for your own use.

But a simpler and more straightforward use is possible if you are just looking for a ready-to-use interactive visualization tool of 1d, 2d and 3d numpy arrays by using a shortcut

Installation:

datastock is available on Pypi and anaconda.org

pip install datastock

conda install -c conda-forge datastock

Examples:

Straightforward array visualization:

`` import datastock as ds

any 1d, 2d or 3d array

aa = np.random((100, 100, 100))

plot interactive figure using shortcut to method

dax = ds.plot_as_array(aa) ``

Now do shift + left clic on any axes, the rest of the interactive commands are automatically printed in your python console

Direct 3d array visualization

The DataStock class:

You will want to instanciate the DataStock class (which is the cor of datastock) if:

  • You have many numpy arrays, not just one, especially if they do not have the same shape
  • You want to define a variety of objects from these data arrays (DataStock can be seen as a class storing many sub-classes)

DataStock has 3 main dict attributes:

  • dref: to store the size of each dimension, each under a unique key
  • ddata: to store all numpy arrays, each under a unique key
  • dobj: to store any number of arbitrary sub-dict, each containing a category of object

Thanks to dref, the class knows the relationaships between all numpy arrays. In particular it knows which arrays share the same references / dimensions

import numpy as np
import datastock as ds

# -----------
# Define data
# Here: time-varying profiles representing velocity measurement across the radius of a tube
# we assume 5 measurement campaigns were conducted, each yielding a different number of measurements, all sampled on 80 radial points

nc = 5
nx = 80
lnt = [100, 90, 80, 120, 110]

x = np.linspace(1, 2, nx)
lt = [np.linspace(0, 10, nt) for nt in lnt]
lprof = [(1 + np.cos(t)[:, None]) * x[None, :] for t in lt]

# ------------------
# Populate DataStock

# instanciate 
coll = ds.DataStock()

# add references (i.e.: store size of each dimension under a unique key)
coll.add_ref(key='nc', size=nc)
coll.add_ref(key='nx', size=nx)
for ii, nt in enumerate(lnt):
    coll.add_ref(key=f'nt{ii}', size=nt)

# add data dependening on these references
# you can, optionally, specify units, physical dimensionality (ex: distance, time...), quantity (ex: radius, height, ...) and name (to your liking)

coll.add_data(key='x', data=x, dimension='distance', quant='radius', units='m', ref='nx')
for ii, nt in enumerate(lnt):
    coll.add_data(key=f't{ii}', data=lt[ii], dimension='time', units='s', ref=f'nt{ii}')
    coll.add_data(key=f'prof{ii}', data=lprof[ii], dimension='velocity', units='m/s', ref=(f'nt{ii}', 'x'))

# print in the console the content of st
coll

Direct 3d array visualization

You can see that DataStock stores the relationships between each array and each reference Specifying explicitly the references is only necessary if there is an ambiguity (i.e.: several references have the same size, like nx and nt2 in our case)

``

plot any array interactively

dax = coll.plot_as_array('x') dax = coll.plot_as_array('t0') dax = coll.plot_as_array('prof0') dax = coll.plot_as_array('prof0', keyX='t0', keyY='x', aspect='auto') ``

You can then decide to store any object category Let's create a 'campaign' category to store the characteristics of each measurements campaign and let's add a 'campaign' parameter to each profile data

``

add arbitrary object category as sub-dict of self.dobj

for ii in range(nc): coll.add_obj( which='campaign', key=f'c{ii}', start_date=f'{ii}.04.2022', end_date=f'{ii+5}.05.2022', operator='Barnaby' if ii > 2 else 'Jack Sparrow', comment='leak on tube' if ii == 1 else 'none', index=ii, )

create new 'campaign' parameter for data arrays

coll.add_param('campaign', which='data')

tag each data with its campaign

for ii in range(nc): coll.set_param(which='data', key=f't{ii}', param='campaign', value=f'c{ii}') coll.set_param(which='data', key=f'prof{ii}', param='campaign', value=f'c{ii}')

print in the console the content of st

coll ``

Direct 3d array visualization

DataStock also provides built-in object selection method to allow return all objects matching a criterion, as lits of int indices, bool indices or keys.

`` In [9]: coll.select(which='campaign', index=2, returnas=int) Out[9]: array([2])

list of 2 => return all matches inside the interval

In [10]: coll.select(which='campaign', index=[2, 4], returnas=int) Out[10]: array([2, 3, 4])

tuple of 2 => return all matches outside the interval

In [11]: coll.select(which='campaign', index=(2, 4), returnas=int) Out[11]: array([0, 1])

return as keys

In [12]: coll.select(which='campaign', index=(2, 4), returnas=str) Out[12]: array(['c0', 'c1'], dtype='<U2')

return as bool indices

In [13]: coll.select(which='campaign', index=(2, 4), returnas=bool) Out[13]: array([ True, True, False, False, False])

You can combine as many constraints as needed

In [17]: coll.select(which='campaign', index=[2, 4], operator='Barnaby', returnas=str) Out[17]: array(['c3', 'c4'], dtype='<U2')

``

You can also decide to sub-class DataStock to implement methods and visualizations specific to your needs

Other useful built-in methods:

DataStock provides built-in methods like:

  • get_nbytes(): return a tuple (size, dsize) where:
    • size is the total size of all data stored in the instance in bytes
    • dsize is a dict with the detail (size for each item in each sub-dict of the instance)
  • save(): will save the instance
  • coll.load(): will load a saved instance

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datastock-0.0.56.tar.gz (438.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datastock-0.0.56-py3-none-any.whl (140.7 kB view details)

Uploaded Python 3

File details

Details for the file datastock-0.0.56.tar.gz.

File metadata

  • Download URL: datastock-0.0.56.tar.gz
  • Upload date:
  • Size: 438.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.30 {"installer":{"name":"uv","version":"0.9.30","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for datastock-0.0.56.tar.gz
Algorithm Hash digest
SHA256 eead7dd6950cc6de063f8e4de147b3e17350882680371a556e847e36aea296f6
MD5 38ed4e3ff2da448d8c52c060ef2ddac9
BLAKE2b-256 4db991dd6f7d4cc892c3f844072b043b5a844a6517be3d2ab6f597c7d4d343e3

See more details on using hashes here.

File details

Details for the file datastock-0.0.56-py3-none-any.whl.

File metadata

  • Download URL: datastock-0.0.56-py3-none-any.whl
  • Upload date:
  • Size: 140.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.30 {"installer":{"name":"uv","version":"0.9.30","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for datastock-0.0.56-py3-none-any.whl
Algorithm Hash digest
SHA256 cef61ba104c063818272389b8fea10e4d4128265d440991923d84de377ec6da3
MD5 e1e016943faa4bef9ce9cf3e6db80882
BLAKE2b-256 e8cf28968218b9f254f992ee7500ada02f8372d80e1e8f57b6093b571e800fbb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page