Zarr

Zarr is a Python package providing an implementation of chunked, compressed, N-dimensional arrays.

Highlights

  • Create N-dimensional arrays with any NumPy dtype.
  • Chunk arrays along any dimension.
  • Compress chunks using the fast Blosc meta-compressor or alternatively using zlib, BZ2 or LZMA.
  • Store arrays in memory, on disk, inside a Zip file, on S3, ...
  • Read an array concurrently from multiple threads or processes.
  • Write to an array concurrently from multiple threads or processes.

Status

Zarr is still in an early, experimental phase of development. Feedback and bug reports are very welcome, please get in touch via the GitHub issue tracker.

Installation

Install Zarr from PyPI:

$ pip install zarr

Please note that Zarr includes a C extension providing integration with the Blosc library. Pre-compiled binaries are available for Linux and Windows platforms and will be installed automatically via pip if available. However, if you have a newer CPU that supports the AVX2 instruction set (e.g., Intel Haswell, Broadwell or Skylake) then compiling from source is preferable, as the Blosc library includes some optimisations for those architectures:

$ pip install --no-binary=:all: zarr%

To work with Zarr source code in development, install from GitHub:

$ git clone --recursive https://github.com/alimanfoo/zarr.git
$ cd zarr
$ python setup.py install

Acknowledgments

Zarr bundles the c-blosc library and uses it as the default compressor.

Zarr is inspired by HDF5, h5py and bcolz.

Development of this package is supported by the MRC Centre for Genomics and Global Health.

Indices and tables