Zarr is a Python package providing an implementation of chunked, compressed, N-dimensional arrays.
- Create N-dimensional arrays with any NumPy dtype.
- Chunk arrays along any dimension.
- Compress chunks using the fast Blosc meta-compressor or alternatively using zlib, BZ2 or LZMA.
- Store arrays in memory, on disk, inside a Zip file, on S3, ...
- Read an array concurrently from multiple threads or processes.
- Write to an array concurrently from multiple threads or processes.
- Organize arrays into hierarchies via groups.
- Use filters to preprocess data and improve compression.
Zarr is still in an early phase of development. Feedback and bug reports are very welcome, please get in touch via the GitHub issue tracker.
Zarr depends on NumPy. It is generally best to install NumPy first using whatever method is most appropriate for you operating system and Python distribution.
Install Zarr from PyPI:
$ pip install zarr
Alternatively, install Zarr via conda:
$ conda install -c conda-forge zarr
Zarr includes a C extension providing integration with the Blosc library. Installing on any operating system via conda or installing on Windows via pip will install a pre-compiled binary distribution. However, if you have a newer CPU that supports the AVX2 instruction set (e.g., Intel Haswell, Broadwell or Skylake) then compiling from source is preferable as the Blosc library includes some optimisations for AVX2:
$ pip install --no-binary=:all: zarr
To work with Zarr source code in development, install from GitHub:
$ git clone --recursive https://github.com/alimanfoo/zarr.git $ cd zarr $ python setup.py install
To verify that Zarr has been fully installed (including the Blosc extension) run the test suite:
$ pip install nose zict heapdict $ python -m nose -v zarr
- API reference
- Release notes
Zarr bundles the c-blosc library and uses it as the default compressor.
Zarr is inspired by HDF5, h5py and bcolz.
Development of this package is supported by the MRC Centre for Genomics and Global Health.