Skip to main content

Type and shape validation and serialization for numpy arrays in pydantic models

Project description

numpydantic

PyPI - Version Documentation Status Coverage Status Code style: black

Type and shape validation and serialization for numpy arrays in pydantic models

This package was picked out of nwb-linkml, a translation of the NWB schema language and data format to linkML and pydantic models.

It does two primary things:

  • Provide types - Annotations (based on npytyping) for specifying numpy arrays in pydantic models, and
  • Generate models from LinkML - extend the LinkML pydantic generator to create models that that use the linkml-arrays syntax

Parameterized Arrays

Arrays use the npytying syntax:

from typing import Union
from pydantic import BaseModel
from numpydantic import NDArray, Shape, UInt8, Float, Int

class Image(BaseModel):
    """
    Data values. Data can be in 1-D, 2-D, 3-D, or 4-D. The first dimension should always represent time. This can also be used to store binary data (e.g., image frames). This can also be a link to data stored in an external file.
    """
    array: Union[
        NDArray[Shape["* x, * y"], UInt8],
        NDArray[Shape["* x, * y, 3 rgb"], UInt8],
        NDArray[Shape["* x, * y, 4 rgba"], UInt8],
        NDArray[Shape["* t, * x, * y, 3 rgb"], UInt8],
        NDArray[Shape["* t, * x, * y, 4 rgba"], Float]
    ]

Validation:

import numpy as np
# works
frame_gray = Image(array=np.ones((1280, 720), dtype=np.uint8))
frame_rgb  = Image(array=np.ones((1280, 720, 3), dtype=np.uint8))
frame_rgba = Image(array=np.ones((1280, 720, 4), dtype=np.uint8))
video_rgb  = Image(array=np.ones((100, 1280, 720, 3), dtype=np.uint8))

# fails
wrong_n_dimensions = Image(array=np.ones((1280,), dtype=np.uint8))
wrong_shape = Image(array=np.ones((1280,720,10), dtype=np.uint8))
wrong_type = Image(array=np.ones((1280,720,3), dtype=np.float64))

# shapes and types are checked together
float_video = Image(array=np.ones((100, 1280, 720, 4),dtype=float))
wrong_shape_float_video = Image(array=np.ones((100, 1280, 720, 3),dtype=float))

JSON schema generation:

class MyArray(BaseModel):
  array: NDArray[Shape["2 x, * y, 4 z"], Float]
>>> print(json.dumps(MyArray.model_json_schema(), indent=2))
{
  "properties": {
    "array": {
      "items": {
        "items": {
          "items": {
            "type": "number"
          },
          "maxItems": 4,
          "minItems": 4,
          "type": "array"
        },
        "type": "array"
      },
      "maxItems": 2,
      "minItems": 2,
      "title": "Array",
      "type": "array"
    }
  },
  "required": [
    "array"
  ],
  "title": "MyArray",
  "type": "object"
}

Serialization

class SmolArray(BaseModel):
    array: NDArray[Shape["2 x, 2 y"], Int]

class BigArray(BaseModel):
    array: NDArray[Shape["1000 x, 1000 y"], Int]

Serialize small arrays as lists of lists, and big arrays as a b64-encoded blosc compressed string

>>> smol = SmolArray(array=np.array([[1,2],[3,4]], dtype=int))
>>> big = BigArray(array=np.random.randint(0,255,(1000,1000),int))

>>> print(smol.model_dump_json())
{"array":[[1,2],[3,4]]}
>>> print(big.model_dump_json())
{
  "array": "( long b64 encoded string )",
  "shape": [1000, 1000],
  "dtype": "int64",
  "unpack_fns": ["base64.b64decode", "blosc2.unpack_array2"],
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

numpydantic-1.0.0rc1.tar.gz (25.2 kB view hashes)

Uploaded Source

Built Distribution

numpydantic-1.0.0rc1-py3-none-any.whl (21.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page