Distributed Xarray with Apache Beam
Project description
Xarray-Beam
Xarray-Beam is a Python library for building Apache Beam pipelines with Xarray datasets.
The project aims to facilitate data transformations and analysis on large-scale multi-dimensional labeled arrays, such as:
- Ad-hoc computation on Xarray data, by dividing a
xarray.Datasetinto many smaller pieces ("chunks"). - Adjusting array chunks, using the Rechunker algorithm.
- Ingesting large, multi-dimensional array datasets into an analysis-ready, cloud-optimized format, namely Zarr (see also Pangeo Forge).
- Calculating statistics (e.g., "climatology") across distributed datasets with arbitrary groups.
For more about our approach and how to get started, read the documentation!
Warning: Xarray-Beam is a sharp tool 🔪
Xarray-Beam is relatively new, and focused on expert users:
- We use it extensively at Google for processing large-scale weather datasets, but there is not yet a vibrant external community.
- It provides low-level abstractions that facilitate writing very large scale data pipelines (e.g., 100+ TB), but by design it requires explicitly thinking about how every operation is parallelized.
Installation
Xarray-Beam requires recent versions of immutabledict, Xarray, Dask, Rechunker, Zarr, and Apache Beam. For best performance when writing Zarr files, use Xarray 0.19.0 or later.
Disclaimer
Xarray-Beam is an experiment that we are sharing with the outside world in the hope that it will be useful. It is not a supported Google product. We welcome feedback, bug reports and code contributions, but cannot guarantee they will be addressed.
See the "Contribution guidelines" for more.
Credits
Contributors:
- Stephan Hoyer
- Jason Hickey
- Cenk Gazen
- Alex Merose
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file xarray_beam-0.11.5.tar.gz.
File metadata
- Download URL: xarray_beam-0.11.5.tar.gz
- Upload date:
- Size: 75.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1a1cdafbbb4f539f7062a4a0350a3ad28f4c5fd2488b41cddd8e079bd77d9566
|
|
| MD5 |
f24edbdf6e5a446ce3dba975597d1863
|
|
| BLAKE2b-256 |
807d2e021815215a37830883a8bb29b4afe803e1baecbfa592b08b889f60ae3c
|
File details
Details for the file xarray_beam-0.11.5-py3-none-any.whl.
File metadata
- Download URL: xarray_beam-0.11.5-py3-none-any.whl
- Upload date:
- Size: 88.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
af280edc084ebd81b56d10827565fb8d0d5cb021e6f8050eea5e2e5798272af5
|
|
| MD5 |
07abc353bb293971abab1ae149aa773d
|
|
| BLAKE2b-256 |
37bcbf4de0cfe8ccd90554c3fe4b2f493af1d6f5fdd53f9e652a7116aa134ce2
|