Zarr¶

Zarr is a Python package providing an implementation of chunked, compressed, N-dimensional arrays.

Highlights¶

Create N-dimensional arrays with any NumPy dtype.
Chunk arrays along any dimension.
Compress chunks using the fast Blosc meta-compressor or alternatively using zlib, BZ2 or LZMA.
Store arrays in memory, on disk, inside a Zip file, on S3, ...
Read an array concurrently from multiple threads or processes.
Write to an array concurrently from multiple threads or processes.

Status¶

Zarr is still in an early phase of development. Feedback and bug reports are very welcome, please get in touch via the GitHub issue tracker.

Installation¶

Zarr depends on NumPy. It is generally best to install NumPy first using whatever method is most appropriate for you operating system and Python distribution.

Install Zarr from PyPI:

$ pip install zarr

Alternatively, install Zarr via conda:

$ conda install -c conda-forge zarr

Zarr includes a C extension providing integration with the Blosc library. Installing on any operating system via conda or installing on Windows via pip will install a pre-compiled binary distribution. However, if you have a newer CPU that supports the AVX2 instruction set (e.g., Intel Haswell, Broadwell or Skylake) then compiling from source is preferable as the Blosc library includes some optimisations for AVX2:

$ pip install --no-binary=:all: zarr

To work with Zarr source code in development, install from GitHub:

$ git clone --recursive https://github.com/alimanfoo/zarr.git
$ cd zarr
$ python setup.py install

To verify that Zarr has been fully installed (including the Blosc extension) run the test suite:

$ pip install nose zict heapdict
$ python -m nose -v zarr

Contents¶

Acknowledgments¶

Zarr bundles the c-blosc library and uses it as the default compressor.

Zarr is inspired by HDF5, h5py and bcolz.

Development of this package is supported by the MRC Centre for Genomics and Global Health.