Zarr

Zarr is a Python package providing an implementation of chunked, compressed, N-dimensional arrays.

Highlights

  • Create N-dimensional arrays with any NumPy dtype.
  • Chunk arrays along any dimension.
  • Compress chunks using the fast Blosc meta-compressor or alternatively using zlib, BZ2 or LZMA.
  • Store arrays in memory, on disk, inside a Zip file, on S3, ...
  • Read an array concurrently from multiple threads or processes.
  • Write to an array concurrently from multiple threads or processes.
  • Organize arrays into hierarchies via groups.
  • Use filters to preprocess data and improve compression.

Status

Zarr is still in an early phase of development. Feedback and bug reports are very welcome, please get in touch via the GitHub issue tracker.

Installation

Zarr depends on NumPy. It is generally best to install NumPy first using whatever method is most appropriate for you operating system and Python distribution.

Install Zarr from PyPI:

$ pip install zarr

Alternatively, install Zarr via conda:

$ conda install -c conda-forge zarr

Zarr includes a C extension providing integration with the Blosc library. Installing via conda will install a pre-compiled binary distribution. However, if you have a newer CPU that supports the AVX2 instruction set (e.g., Intel Haswell, Broadwell or Skylake) then installing via pip is preferable, because this will compile the Blosc library from source with optimisations for AVX2.

To work with Zarr source code in development, install from GitHub:

$ git clone --recursive https://github.com/alimanfoo/zarr.git
$ cd zarr
$ python setup.py install

To verify that Zarr has been fully installed (including the Blosc extension) run the test suite:

$ pip install nose
$ python -m nose -v zarr

Acknowledgments

Zarr bundles the c-blosc library and uses it as the default compressor.

Zarr is inspired by HDF5, h5py and bcolz.

Development of this package is supported by the MRC Centre for Genomics and Global Health.

Indices and tables