Skip to main content

Distributed Xarray with Apache Beam

Project description

Xarray-Beam

Xarray-Beam is a Python library for building Apache Beam pipelines with Xarray datasets.

The project aims to facilitate data transformations and analysis on large-scale multi-dimensional labeled arrays, such as:

  • Ad-hoc computation on Xarray data, by dividing a xarray.Dataset into many smaller pieces ("chunks").
  • Adjusting array chunks, using the Rechunker algorithm.
  • Ingesting large, multi-dimensional array datasets into an analysis-ready, cloud-optimized format, namely Zarr (see also Pangeo Forge).
  • Calculating statistics (e.g., "climatology") across distributed datasets with arbitrary groups.

For more about our approach and how to get started, read the documentation!

Warning: Xarray-Beam is a sharp tool 🔪

Xarray-Beam is relatively new, and focused on expert users:

  • We use it extensively at Google for processing large-scale weather datasets, but there is not yet a vibrant external community.
  • It provides low-level abstractions that facilitate writing very large scale data pipelines (e.g., 100+ TB), but by design it requires explicitly thinking about how every operation is parallelized.

Installation

Xarray-Beam requires recent versions of immutabledict, Xarray, Dask, Rechunker, Zarr, and Apache Beam. For best performance when writing Zarr files, use Xarray 0.19.0 or later.

Disclaimer

Xarray-Beam is an experiment that we are sharing with the outside world in the hope that it will be useful. It is not a supported Google product. We welcome feedback, bug reports and code contributions, but cannot guarantee they will be addressed.

See the "Contribution guidelines" for more.

Credits

Contributors:

  • Stephan Hoyer
  • Jason Hickey
  • Cenk Gazen
  • Alex Merose

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xarray_beam-0.11.5.tar.gz (75.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xarray_beam-0.11.5-py3-none-any.whl (88.4 kB view details)

Uploaded Python 3

File details

Details for the file xarray_beam-0.11.5.tar.gz.

File metadata

  • Download URL: xarray_beam-0.11.5.tar.gz
  • Upload date:
  • Size: 75.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for xarray_beam-0.11.5.tar.gz
Algorithm Hash digest
SHA256 1a1cdafbbb4f539f7062a4a0350a3ad28f4c5fd2488b41cddd8e079bd77d9566
MD5 f24edbdf6e5a446ce3dba975597d1863
BLAKE2b-256 807d2e021815215a37830883a8bb29b4afe803e1baecbfa592b08b889f60ae3c

See more details on using hashes here.

File details

Details for the file xarray_beam-0.11.5-py3-none-any.whl.

File metadata

  • Download URL: xarray_beam-0.11.5-py3-none-any.whl
  • Upload date:
  • Size: 88.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for xarray_beam-0.11.5-py3-none-any.whl
Algorithm Hash digest
SHA256 af280edc084ebd81b56d10827565fb8d0d5cb021e6f8050eea5e2e5798272af5
MD5 07abc353bb293971abab1ae149aa773d
BLAKE2b-256 37bcbf4de0cfe8ccd90554c3fe4b2f493af1d6f5fdd53f9e652a7116aa134ce2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page