Zarr

Zarr is a Python package providing an implementation of chunked, compressed, N-dimensional arrays.

Highlights

  • Create N-dimensional arrays with any NumPy dtype.
  • Chunk arrays along any dimension.
  • Compress chunks using the fast Blosc meta-compressor or alternatively using zlib, BZ2 or LZMA.
  • Store arrays in memory, on disk, inside a Zip file, on S3, ...
  • Read an array concurrently from multiple threads or processes.
  • Write to an array concurrently from multiple threads or processes.
  • Organize arrays into hierarchies via groups.
  • Use filters to preprocess data and improve compression.

Status

Zarr is still in an early phase of development. Feedback and bug reports are very welcome, please get in touch via the GitHub issue tracker.

Installation

Zarr depends on NumPy. It is generally best to install NumPy first using whatever method is most appropriate for you operating system and Python distribution.

Install Zarr from PyPI:

$ pip install zarr

Alternatively, install Zarr via conda:

$ conda install -c conda-forge zarr

Zarr includes a C extension providing integration with the Blosc library. Installing via conda will install a pre-compiled binary distribution. However, if you have a newer CPU that supports the AVX2 instruction set (e.g., Intel Haswell, Broadwell or Skylake) then installing via pip is preferable, because this will compile the Blosc library from source with optimisations for AVX2.

To work with Zarr source code in development, install from GitHub:

$ git clone --recursive https://github.com/alimanfoo/zarr.git
$ cd zarr
$ python setup.py install

To verify that Zarr has been fully installed (including the Blosc extension) run the test suite:

$ pip install nose
$ python -m nose -v zarr

Contents

Tutorial

Zarr provides classes and functions for working with N-dimensional arrays that behave like NumPy arrays but whose data is divided into chunks and compressed. If you are already familiar with HDF5 then Zarr arrays provide similar functionality, but with some additional flexibility.

Creating an array

Zarr has a number of convenience functions for creating arrays. For example:

>>> import zarr
>>> z = zarr.zeros((10000, 10000), chunks=(1000, 1000), dtype='i4')
>>> z
Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
  nbytes: 381.5M; nbytes_stored: 323; ratio: 1238390.1; initialized: 0/100
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict

The code above creates a 2-dimensional array of 32-bit integers with 10000 rows and 10000 columns, divided into chunks where each chunk has 1000 rows and 1000 columns (and so there will be 100 chunks in total).

For a complete list of array creation routines see the zarr.creation module documentation.

Reading and writing data

Zarr arrays support a similar interface to NumPy arrays for reading and writing data. For example, the entire array can be filled with a scalar value:

>>> z[:] = 42
>>> z
Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
  nbytes: 381.5M; nbytes_stored: 1.8M; ratio: 215.1; initialized: 100/100
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict

Notice that the values of nbytes_stored, ratio and initialized have changed. This is because when a Zarr array is first created, none of the chunks are initialized. Writing data into the array will cause the necessary chunks to be initialized.

Regions of the array can also be written to, e.g.:

>>> import numpy as np
>>> z[0, :] = np.arange(10000)
>>> z[:, 0] = np.arange(10000)

The contents of the array can be retrieved by slicing, which will load the requested region into a NumPy array, e.g.:

>>> z[0, 0]
0
>>> z[-1, -1]
42
>>> z[0, :]
array([   0,    1,    2, ..., 9997, 9998, 9999], dtype=int32)
>>> z[:, 0]
array([   0,    1,    2, ..., 9997, 9998, 9999], dtype=int32)
>>> z[:]
array([[   0,    1,    2, ..., 9997, 9998, 9999],
       [   1,   42,   42, ...,   42,   42,   42],
       [   2,   42,   42, ...,   42,   42,   42],
       ...,
       [9997,   42,   42, ...,   42,   42,   42],
       [9998,   42,   42, ...,   42,   42,   42],
       [9999,   42,   42, ...,   42,   42,   42]], dtype=int32)

Persistent arrays

In the examples above, compressed data for each chunk of the array was stored in memory. Zarr arrays can also be stored on a file system, enabling persistence of data between sessions. For example:

>>> z1 = zarr.open_array('example.zarr', mode='w', shape=(10000, 10000),
...                      chunks=(1000, 1000), dtype='i4', fill_value=0)
>>> z1
Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
  nbytes: 381.5M; nbytes_stored: 323; ratio: 1238390.1; initialized: 0/100
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: DirectoryStore

The array above will store its configuration metadata and all compressed chunk data in a directory called ‘example.zarr’ relative to the current working directory. The zarr.creation.open_array() function provides a convenient way to create a new persistent array or continue working with an existing array. Note that there is no need to close an array, and data are automatically flushed to disk whenever an array is modified.

Persistent arrays support the same interface for reading and writing data, e.g.:

>>> z1[:] = 42
>>> z1[0, :] = np.arange(10000)
>>> z1[:, 0] = np.arange(10000)

Check that the data have been written and can be read again:

>>> z2 = zarr.open_array('example.zarr', mode='r')
>>> z2
Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
  nbytes: 381.5M; nbytes_stored: 1.9M; ratio: 204.5; initialized: 100/100
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: DirectoryStore
>>> np.all(z1[:] == z2[:])
True

Resizing and appending

A Zarr array can be resized, which means that any of its dimensions can be increased or decreased in length. For example:

>>> z = zarr.zeros(shape=(10000, 10000), chunks=(1000, 1000))
>>> z[:] = 42
>>> z.resize(20000, 10000)
>>> z
Array((20000, 10000), float64, chunks=(1000, 1000), order=C)
  nbytes: 1.5G; nbytes_stored: 3.6M; ratio: 422.3; initialized: 100/200
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict

Note that when an array is resized, the underlying data are not rearranged in any way. If one or more dimensions are shrunk, any chunks falling outside the new array shape will be deleted from the underlying store.

For convenience, Zarr arrays also provide an append() method, which can be used to append data to any axis. E.g.:

>>> a = np.arange(10000000, dtype='i4').reshape(10000, 1000)
>>> z = zarr.array(a, chunks=(1000, 100))
>>> z
Array((10000, 1000), int32, chunks=(1000, 100), order=C)
  nbytes: 38.1M; nbytes_stored: 1.9M; ratio: 20.3; initialized: 100/100
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict
>>> z.append(a)
(20000, 1000)
>>> z
Array((20000, 1000), int32, chunks=(1000, 100), order=C)
  nbytes: 76.3M; nbytes_stored: 3.8M; ratio: 20.3; initialized: 200/200
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict
>>> z.append(np.vstack([a, a]), axis=1)
(20000, 2000)
>>> z
Array((20000, 2000), int32, chunks=(1000, 100), order=C)
  nbytes: 152.6M; nbytes_stored: 7.5M; ratio: 20.3; initialized: 400/400
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict

Compressors

By default, Zarr uses the Blosc compression library to compress each chunk of an array. Blosc is extremely fast and can be configured in a variety of ways to improve the compression ratio for different types of data. Blosc is in fact a “meta-compressor”, which means that it can used a number of different compression algorithms internally to compress the data. Blosc also provides highly optimized implementations of byte and bit shuffle filters, which can significantly improve compression ratios for some data.

Different compressors can be provided via the compressor keyword argument accepted by all array creation functions. For example:

>>> z = zarr.array(np.arange(100000000, dtype='i4').reshape(10000, 10000),
...                chunks=(1000, 1000),
...                compressor=zarr.Blosc(cname='zstd', clevel=3, shuffle=2))
>>> z
Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
  nbytes: 381.5M; nbytes_stored: 4.4M; ratio: 87.6; initialized: 100/100
  compressor: Blosc(cname='zstd', clevel=3, shuffle=2)
  store: dict

The array above will use Blosc as the primary compressor, using the Zstandard algorithm (compression level 3) internally within Blosc, and with the bitshuffle filter applied.

A list of the internal compression libraries available within Blosc can be obtained via:

>>> from zarr import blosc
>>> blosc.list_compressors()
['blosclz', 'lz4', 'lz4hc', 'snappy', 'zlib', 'zstd']

In addition to Blosc, other compression libraries can also be used. Zarr comes with support for zlib, BZ2 and LZMA compression, via the Python standard library. For example, here is an array using zlib compression, level 1:

>>> z = zarr.array(np.arange(100000000, dtype='i4').reshape(10000, 10000),
...                chunks=(1000, 1000),
...                compressor=zarr.Zlib(level=1))
>>> z
Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
  nbytes: 381.5M; nbytes_stored: 132.2M; ratio: 2.9; initialized: 100/100
  compressor: Zlib(level=1)
  store: dict

Here is an example using LZMA with a custom filter pipeline including LZMA’s built-in delta filter:

>>> import lzma
>>> lzma_filters = [dict(id=lzma.FILTER_DELTA, dist=4),
...                 dict(id=lzma.FILTER_LZMA2, preset=1)]
>>> compressor = zarr.LZMA(filters=lzma_filters)
>>> z = zarr.array(np.arange(100000000, dtype='i4').reshape(10000, 10000),
...                chunks=(1000, 1000), compressor=compressor)
>>> z
Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
  nbytes: 381.5M; nbytes_stored: 248.9K; ratio: 1569.7; initialized: 100/100
  compressor: LZMA(format=1, check=-1, preset=None, filters=[{'dist': 4, 'id': 3}, {'preset': 1, 'id': 33}])
  store: dict

The default compressor can be changed by setting the value of the zarr.storage.default_compressor variable, e.g.:

>>> import zarr.storage
>>> # switch to using Zstandard via Blosc by default
... zarr.storage.default_compressor = zarr.Blosc(cname='zstd', clevel=1, shuffle=1)
>>> z = zarr.zeros(100000000, chunks=1000000)
>>> z
Array((100000000,), float64, chunks=(1000000,), order=C)
  nbytes: 762.9M; nbytes_stored: 302; ratio: 2649006.6; initialized: 0/100
  compressor: Blosc(cname='zstd', clevel=1, shuffle=1)
  store: dict
>>> # switch back to Blosc defaults
... zarr.storage.default_compressor = zarr.Blosc()

To disable compression, set compressor=None when creating an array, e.g.:

>>> z = zarr.zeros(100000000, chunks=1000000, compressor=None)
>>> z
Array((100000000,), float64, chunks=(1000000,), order=C)
  nbytes: 762.9M; nbytes_stored: 209; ratio: 3827751.2; initialized: 0/100
  store: dict

Filters

In some cases, compression can be improved by transforming the data in some way. For example, if nearby values tend to be correlated, then shuffling the bytes within each numerical value or storing the difference between adjacent values may increase compression ratio. Some compressors provide built-in filters that apply transformations to the data prior to compression. For example, the Blosc compressor has highly optimized built-in implementations of byte- and bit-shuffle filters, and the LZMA compressor has a built-in implementation of a delta filter. However, to provide additional flexibility for implementing and using filters in combination with different compressors, Zarr also provides a mechanism for configuring filters outside of the primary compressor.

Here is an example using the Zarr delta filter with the Blosc compressor:

>>> filters = [zarr.Delta(dtype='i4')]
>>> compressor = zarr.Blosc(cname='zstd', clevel=1, shuffle=1)
>>> z = zarr.array(np.arange(100000000, dtype='i4').reshape(10000, 10000),
...                chunks=(1000, 1000), filters=filters, compressor=compressor)
>>> z
Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
  nbytes: 381.5M; nbytes_stored: 633.4K; ratio: 616.7; initialized: 100/100
  filters: Delta(dtype=int32)
  compressor: Blosc(cname='zstd', clevel=1, shuffle=1)
  store: dict

Zarr comes with implementations of delta, scale-offset, quantize, packbits and categorize filters. It is also relatively straightforward to implement custom filters. For more information see the zarr.codecs API docs.

Parallel computing and synchronization

Zarr arrays can be used as either the source or sink for data in parallel computations. Both multi-threaded and multi-process parallelism are supported. The Python global interpreter lock (GIL) is released for both compression and decompression operations, so Zarr will not block other Python threads from running.

A Zarr array can be read concurrently by multiple threads or processes. No synchronization (i.e., locking) is required for concurrent reads.

A Zarr array can also be written to concurrently by multiple threads or processes. Some synchronization may be required, depending on the way the data is being written.

If each worker in a parallel computation is writing to a separate region of the array, and if region boundaries are perfectly aligned with chunk boundaries, then no synchronization is required. However, if region and chunk boundaries are not perfectly aligned, then synchronization is required to avoid two workers attempting to modify the same chunk at the same time.

To give a simple example, consider a 1-dimensional array of length 60, z, divided into three chunks of 20 elements each. If three workers are running and each attempts to write to a 20 element region (i.e., z[0:20], z[20:40] and z[40:60]) then each worker will be writing to a separate chunk and no synchronization is required. However, if two workers are running and each attempts to write to a 30 element region (i.e., z[0:30] and z[30:60]) then it is possible both workers will attempt to modify the middle chunk at the same time, and synchronization is required to prevent data loss.

Zarr provides support for chunk-level synchronization. E.g., create an array with thread synchronization:

>>> z = zarr.zeros((10000, 10000), chunks=(1000, 1000), dtype='i4',
...                 synchronizer=zarr.ThreadSynchronizer())
>>> z
Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
  nbytes: 381.5M; nbytes_stored: 323; ratio: 1238390.1; initialized: 0/100
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict; synchronizer: ThreadSynchronizer

This array is safe to read or write within a multi-threaded program.

Zarr also provides support for process synchronization via file locking, provided that all processes have access to a shared file system. E.g.:

>>> synchronizer = zarr.ProcessSynchronizer('example.sync')
>>> z = zarr.open_array('example', mode='w', shape=(10000, 10000),
...                     chunks=(1000, 1000), dtype='i4',
...                     synchronizer=synchronizer)
>>> z
Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
  nbytes: 381.5M; nbytes_stored: 323; ratio: 1238390.1; initialized: 0/100
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: DirectoryStore; synchronizer: ProcessSynchronizer

This array is safe to read or write from multiple processes.

User attributes

Zarr arrays also support custom key/value attributes, which can be useful for associating an array with application-specific metadata. For example:

>>> z = zarr.zeros((10000, 10000), chunks=(1000, 1000), dtype='i4')
>>> z.attrs['foo'] = 'bar'
>>> z.attrs['baz'] = 42
>>> sorted(z.attrs)
['baz', 'foo']
>>> 'foo' in z.attrs
True
>>> z.attrs['foo']
'bar'
>>> z.attrs['baz']
42

Internally Zarr uses JSON to store array attributes, so attribute values must be JSON serializable.

Groups

Zarr supports hierarchical organization of arrays via groups. As with arrays, groups can be stored in memory, on disk, or via other storage systems that support a similar interface.

To create a group, use the zarr.hierarchy.group() function:

>>> root_group = zarr.group()
>>> root_group
Group(/, 0)
  store: DictStore

Groups have a similar API to the Group class from h5py. For example, groups can contain other groups:

>>> foo_group = root_group.create_group('foo')
>>> bar_group = foo_group.create_group('bar')

Groups can also contain arrays, e.g.:

>>> z1 = bar_group.zeros('baz', shape=(10000, 10000), chunks=(1000, 1000), dtype='i4',
...                      compressor=zarr.Blosc(cname='zstd', clevel=1, shuffle=1))
>>> z1
Array(/foo/bar/baz, (10000, 10000), int32, chunks=(1000, 1000), order=C)
  nbytes: 381.5M; nbytes_stored: 324; ratio: 1234567.9; initialized: 0/100
  compressor: Blosc(cname='zstd', clevel=1, shuffle=1)
  store: DictStore

Arrays are known as “datasets” in HDF5 terminology. For compatibility with h5py, Zarr groups also implement the zarr.hierarchy.Group.create_dataset() and zarr.hierarchy.Group.require_dataset() methods, e.g.:

>>> z = bar_group.create_dataset('quux', shape=(10000, 10000),
...                              chunks=(1000, 1000), dtype='i4',
...                              fill_value=0, compression='gzip',
...                              compression_opts=1)
>>> z
Array(/foo/bar/quux, (10000, 10000), int32, chunks=(1000, 1000), order=C)
  nbytes: 381.5M; nbytes_stored: 275; ratio: 1454545.5; initialized: 0/100
  compressor: Zlib(level=1)
  store: DictStore

Members of a group can be accessed via the suffix notation, e.g.:

>>> root_group['foo']
Group(/foo, 1)
  groups: 1; bar
  store: DictStore

The ‘/’ character can be used to access multiple levels of the hierarchy, e.g.:

>>> root_group['foo/bar']
Group(/foo/bar, 2)
  arrays: 2; baz, quux
  store: DictStore
>>> root_group['foo/bar/baz']
Array(/foo/bar/baz, (10000, 10000), int32, chunks=(1000, 1000), order=C)
  nbytes: 381.5M; nbytes_stored: 324; ratio: 1234567.9; initialized: 0/100
  compressor: Blosc(cname='zstd', clevel=1, shuffle=1)
  store: DictStore

The zarr.hierarchy.open_group() provides a convenient way to create or re-open a group stored in a directory on the file-system, with sub-groups stored in sub-directories, e.g.:

>>> persistent_group = zarr.open_group('example', mode='w')
>>> persistent_group
Group(/, 0)
  store: DirectoryStore
>>> z = persistent_group.create_dataset('foo/bar/baz', shape=(10000, 10000),
...                                     chunks=(1000, 1000), dtype='i4',
...                                     fill_value=0)
>>> z
Array(/foo/bar/baz, (10000, 10000), int32, chunks=(1000, 1000), order=C)
  nbytes: 381.5M; nbytes_stored: 323; ratio: 1238390.1; initialized: 0/100
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: DirectoryStore

For more information on groups see the zarr.hierarchy API docs.

Tips and tricks

Copying large arrays

Data can be copied between large arrays without needing much memory, e.g.:

>>> z1 = zarr.empty((10000, 10000), chunks=(1000, 1000), dtype='i4')
>>> z1[:] = 42
>>> z2 = zarr.empty_like(z1)
>>> z2[:] = z1

Internally the example above works chunk-by-chunk, extracting only the data from z1 required to fill each chunk in z2. The source of the data (z1) could equally be an h5py Dataset.

Changing memory layout

The order of bytes within each chunk of an array can be changed via the order keyword argument, to use either C or Fortran layout. For multi-dimensional arrays, these two layouts may provide different compression ratios, depending on the correlation structure within the data. E.g.:

>>> a = np.arange(100000000, dtype='i4').reshape(10000, 10000).T
>>> zarr.array(a, chunks=(1000, 1000))
Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
  nbytes: 381.5M; nbytes_stored: 26.3M; ratio: 14.5; initialized: 100/100
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict
>>> zarr.array(a, chunks=(1000, 1000), order='F')
Array((10000, 10000), int32, chunks=(1000, 1000), order=F)
  nbytes: 381.5M; nbytes_stored: 9.2M; ratio: 41.6; initialized: 100/100
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict

In the above example, Fortran order gives a better compression ratio. This is an artifical example but illustrates the general point that changing the order of bytes within chunks of an array may improve the compression ratio, depending on the structure of the data, the compression algorithm used, and which compression filters (e.g., byte shuffle) have been applied.

Storage alternatives

Zarr can use any object that implements the MutableMapping interface as the store for a group or an array.

Here is an example storing an array directly into a Zip file:

>>> store = zarr.ZipStore('example.zip', mode='w')
>>> z = zarr.zeros((1000, 1000), chunks=(100, 100), dtype='i4', store=store)
>>> z
Array((1000, 1000), int32, chunks=(100, 100), order=C)
  nbytes: 3.8M; nbytes_stored: 319; ratio: 12539.2; initialized: 0/100
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: ZipStore
>>> z[:] = 42
>>> z
Array((1000, 1000), int32, chunks=(100, 100), order=C)
  nbytes: 3.8M; nbytes_stored: 21.8K; ratio: 179.2; initialized: 100/100
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: ZipStore
>>> store.close()
>>> import os
>>> os.path.getsize('example.zip')
30721

Re-open and check that data have been written:

>>> store = zarr.ZipStore('example.zip', mode='r')
>>> z = zarr.Array(store)
>>> z
Array((1000, 1000), int32, chunks=(100, 100), order=C)
  nbytes: 3.8M; nbytes_stored: 21.8K; ratio: 179.2; initialized: 100/100
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: ZipStore
>>> z[:]
array([[42, 42, 42, ..., 42, 42, 42],
       [42, 42, 42, ..., 42, 42, 42],
       [42, 42, 42, ..., 42, 42, 42],
       ...,
       [42, 42, 42, ..., 42, 42, 42],
       [42, 42, 42, ..., 42, 42, 42],
       [42, 42, 42, ..., 42, 42, 42]], dtype=int32)
>>> store.close()

Note that there are some restrictions on how Zip files can be used, because items within a Zip file cannot be updated in place. This means that data in the array should only be written once and write operations should be aligned with chunk boundaries.

Note also that the close() method must be called after writing any data to the store, otherwise essential records will not be written to the underlying zip file.

The Dask project has implementations of the MutableMapping interface for distributed storage systems, see the S3Map and HDFSMap classes.

Chunk size and shape

In general, chunks of at least 1 megabyte (1M) seem to provide the best performance, at least when using the Blosc compression library.

The optimal chunk shape will depend on how you want to access the data. E.g., for a 2-dimensional array, if you only ever take slices along the first dimension, then chunk across the second dimenson. If you know you want to chunk across an entire dimension you can use None within the chunks argument, e.g.:

>>> z1 = zarr.zeros((10000, 10000), chunks=(100, None), dtype='i4')
>>> z1.chunks
(100, 10000)

Alternatively, if you only ever take slices along the second dimension, then chunk across the first dimension, e.g.:

>>> z2 = zarr.zeros((10000, 10000), chunks=(None, 100), dtype='i4')
>>> z2.chunks
(10000, 100)

If you require reasonable performance for both access patterns then you need to find a compromise, e.g.:

>>> z3 = zarr.zeros((10000, 10000), chunks=(1000, 1000), dtype='i4')
>>> z3.chunks
(1000, 1000)

If you are feeling lazy, you can let Zarr guess a chunk shape for your data, although please note that the algorithm for guessing a chunk shape is based on simple heuristics and may be far from optimal. E.g.:

>>> z4 = zarr.zeros((10000, 10000), dtype='i4')
>>> z4.chunks
(313, 313)
Configuring Blosc

The Blosc compressor is able to use multiple threads internally to accelerate compression and decompression. By default, Zarr allows Blosc to use up to 8 internal threads. The number of Blosc threads can be changed to increase or decrease this number, e.g.:

>>> from zarr import blosc
>>> blosc.set_nthreads(2)
8

When a Zarr array is being used within a multi-threaded program, Zarr automatically switches to using Blosc in a single-threaded “contextual” mode. This is generally better as it allows multiple program threads to use Blosc simultaneously and prevents CPU thrashing from too many active threads. If you want to manually override this behaviour, set the value of the blosc.use_threads variable to True (Blosc always uses multiple internal threads) or False (Blosc always runs in single-threaded contextual mode). To re-enable automatic switching, set blosc.use_threads to None.

API reference

Array creation (zarr.creation)

zarr.creation.create(shape, chunks=None, dtype=None, compressor='default', fill_value=0, order='C', store=None, synchronizer=None, overwrite=False, path=None, chunk_store=None, filters=None, cache_metadata=True, **kwargs)

Create an array.

Parameters:

shape : int or tuple of ints

Array shape.

chunks : int or tuple of ints, optional

Chunk shape. If not provided, will be guessed from shape and dtype.

dtype : string or dtype, optional

NumPy dtype.

compressor : Codec, optional

Primary compressor.

fill_value : object

Default value to use for uninitialized portions of the array.

order : {‘C’, ‘F’}, optional

Memory layout to be used within each chunk.

store : MutableMapping or string

Store or path to directory in file system.

synchronizer : object, optional

Array synchronizer.

overwrite : bool, optional

If True, delete all pre-existing data in store at path before creating the array.

path : string, optional

Path under which array is stored.

chunk_store : MutableMapping, optional

Separate storage for chunks. If not provided, store will be used for storage of both chunks and metadata.

filters : sequence of Codecs, optional

Sequence of filters to use to encode chunk data prior to compression.

cache_metadata : bool, optional

If True, array configuration metadata will be cached for the lifetime of the object. If False, array metadata will be reloaded prior to all data access and modification operations (may incur overhead depending on storage and data access pattern).

Returns:

z : zarr.core.Array

Examples

Create an array with default settings:

>>> import zarr
>>> z = zarr.create((10000, 10000), chunks=(1000, 1000))
>>> z
Array((10000, 10000), float64, chunks=(1000, 1000), order=C)
  nbytes: 762.9M; nbytes_stored: 323; ratio: 2476780.2; initialized: 0/100
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict
zarr.creation.empty(shape, **kwargs)

Create an empty array.

For parameter definitions see zarr.creation.create().

Notes

The contents of an empty Zarr array are not defined. On attempting to retrieve data from an empty Zarr array, any values may be returned, and these are not guaranteed to be stable from one access to the next.

zarr.creation.zeros(shape, **kwargs)

Create an array, with zero being used as the default value for uninitialized portions of the array.

For parameter definitions see zarr.creation.create().

Examples

>>> import zarr
>>> z = zarr.zeros((10000, 10000), chunks=(1000, 1000))
>>> z
Array((10000, 10000), float64, chunks=(1000, 1000), order=C)
  nbytes: 762.9M; nbytes_stored: 323; ratio: 2476780.2; initialized: 0/100
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict
>>> z[:2, :2]
array([[ 0.,  0.],
       [ 0.,  0.]])
zarr.creation.ones(shape, **kwargs)

Create an array, with one being used as the default value for uninitialized portions of the array.

For parameter definitions see zarr.creation.create().

Examples

>>> import zarr
>>> z = zarr.ones((10000, 10000), chunks=(1000, 1000))
>>> z
Array((10000, 10000), float64, chunks=(1000, 1000), order=C)
  nbytes: 762.9M; nbytes_stored: 323; ratio: 2476780.2; initialized: 0/100
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict
>>> z[:2, :2]
array([[ 1.,  1.],
       [ 1.,  1.]])
zarr.creation.full(shape, fill_value, **kwargs)

Create an array, with fill_value being used as the default value for uninitialized portions of the array.

For parameter definitions see zarr.creation.create().

Examples

>>> import zarr
>>> z = zarr.full((10000, 10000), chunks=(1000, 1000), fill_value=42)
>>> z
Array((10000, 10000), float64, chunks=(1000, 1000), order=C)
  nbytes: 762.9M; nbytes_stored: 324; ratio: 2469135.8; initialized: 0/100
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict
>>> z[:2, :2]
array([[ 42.,  42.],
       [ 42.,  42.]])
zarr.creation.array(data, **kwargs)

Create an array filled with data.

The data argument should be a NumPy array or array-like object. For other parameter definitions see zarr.creation.create().

Examples

>>> import numpy as np
>>> import zarr
>>> a = np.arange(100000000).reshape(10000, 10000)
>>> z = zarr.array(a, chunks=(1000, 1000))
>>> z
Array((10000, 10000), int64, chunks=(1000, 1000), order=C)
  nbytes: 762.9M; nbytes_stored: 15.2M; ratio: 50.2; initialized: 100/100
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict
zarr.creation.open_array(store=None, mode='a', shape=None, chunks=None, dtype=None, compressor='default', fill_value=0, order='C', synchronizer=None, filters=None, cache_metadata=True, path=None, **kwargs)

Open array using mode-like semantics.

Parameters:

store : MutableMapping or string

Store or path to directory in file system.

mode : {‘r’, ‘r+’, ‘a’, ‘w’, ‘w-‘}

Persistence mode: ‘r’ means read only (must exist); ‘r+’ means read/write (must exist); ‘a’ means read/write (create if doesn’t exist); ‘w’ means create (overwrite if exists); ‘w-‘ means create (fail if exists).

shape : int or tuple of ints

Array shape.

chunks : int or tuple of ints, optional

Chunk shape. If not provided, will be guessed from shape and dtype.

dtype : string or dtype, optional

NumPy dtype.

compressor : Codec, optional

Primary compressor.

fill_value : object

Default value to use for uninitialized portions of the array.

order : {‘C’, ‘F’}, optional

Memory layout to be used within each chunk.

synchronizer : object, optional

Array synchronizer.

filters : sequence, optional

Sequence of filters to use to encode chunk data prior to compression.

cache_metadata : bool, optional

If True, array configuration metadata will be cached for the lifetime of the object. If False, array metadata will be reloaded prior to all data access and modification operations (may incur overhead depending on storage and data access pattern).

path : string, optional

Array path.

Returns:

z : zarr.core.Array

Notes

There is no need to close an array. Data are automatically flushed to the file system.

Examples

>>> import numpy as np
>>> import zarr
>>> z1 = zarr.open_array('example.zarr', mode='w', shape=(10000, 10000),
...                      chunks=(1000, 1000), fill_value=0)
>>> z1[:] = np.arange(100000000).reshape(10000, 10000)
>>> z1
Array((10000, 10000), float64, chunks=(1000, 1000), order=C)
  nbytes: 762.9M; nbytes_stored: 23.0M; ratio: 33.2; initialized: 100/100
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: DirectoryStore
>>> z2 = zarr.open_array('example.zarr', mode='r')
>>> z2
Array((10000, 10000), float64, chunks=(1000, 1000), order=C)
  nbytes: 762.9M; nbytes_stored: 23.0M; ratio: 33.2; initialized: 100/100
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: DirectoryStore
>>> np.all(z1[:] == z2[:])
True
zarr.creation.empty_like(a, **kwargs)

Create an empty array like a.

zarr.creation.zeros_like(a, **kwargs)

Create an array of zeros like a.

zarr.creation.ones_like(a, **kwargs)

Create an array of ones like a.

zarr.creation.full_like(a, **kwargs)

Create a filled array like a.

zarr.creation.open_like(a, path, **kwargs)

Open a persistent array like a.

The Array class (zarr.core)

class zarr.core.Array(store, path=None, read_only=False, chunk_store=None, synchronizer=None, cache_metadata=True)

Instantiate an array from an initialized store.

Parameters:

store : MutableMapping

Array store, already initialized.

path : string, optional

Storage path.

read_only : bool, optional

True if array should be protected against modification.

chunk_store : MutableMapping, optional

Separate storage for chunks. If not provided, store will be used for storage of both chunks and metadata.

synchronizer : object, optional

Array synchronizer.

cache_metadata : bool, optional

If True, array configuration metadata will be cached for the lifetime of the object. If False, array metadata will be reloaded prior to all data access and modification operations (may incur overhead depending on storage and data access pattern).

Attributes

store A MutableMapping providing the underlying storage for the array.
path Storage path.
name Array name following h5py convention.
read_only A boolean, True if modification operations are not permitted.
chunk_store A MutableMapping providing the underlying storage for array chunks.
shape A tuple of integers describing the length of each dimension of the array.
chunks A tuple of integers describing the length of each dimension of a chunk of the array.
dtype The NumPy data type.
fill_value A value used for uninitialized portions of the array.
order A string indicating the order in which bytes are arranged within chunks of the array.
synchronizer Object used to synchronize write access to the array.
filters One or more codecs used to transform data prior to compression.
attrs A MutableMapping containing user-defined attributes.
size The total number of elements in the array.
itemsize The size in bytes of each item in the array.
nbytes The total number of bytes that would be required to store the array without compression.
nbytes_stored The total number of stored bytes of data for the array.
cdata_shape A tuple of integers describing the number of chunks along each dimension of the array.
nchunks Total number of chunks.
nchunks_initialized The number of chunks that have been initialized with some data.
is_view A boolean, True if this array is a view on another array.
compression  
compression_opts  

Methods

__getitem__(item) Retrieve data for some portion of the array.
__setitem__(item, value) Modify data for some portion of the array.
resize(*args) Change the shape of the array by growing or shrinking one or more dimensions.
append(data[, axis]) Append data to axis.
view([shape, chunks, dtype, fill_value, ...]) Return an array sharing the same data.
__getitem__(item)

Retrieve data for some portion of the array. Most NumPy-style slicing operations are supported.

Returns:

out : ndarray

A NumPy array containing the data for the requested region.

Examples

Setup a 1-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.array(np.arange(100000000), chunks=1000000, dtype='i4')
>>> z
Array((100000000,), int32, chunks=(1000000,), order=C)
  nbytes: 381.5M; nbytes_stored: 6.4M; ratio: 59.9; initialized: 100/100
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict

Take some slices:

>>> z[5]
5
>>> z[:5]
array([0, 1, 2, 3, 4], dtype=int32)
>>> z[-5:]
array([99999995, 99999996, 99999997, 99999998, 99999999], dtype=int32)
>>> z[5:10]
array([5, 6, 7, 8, 9], dtype=int32)
>>> z[:]
array([       0,        1,        2, ..., 99999997, 99999998, 99999999], dtype=int32)

Setup a 2-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.array(np.arange(100000000).reshape(10000, 10000),
...                chunks=(1000, 1000), dtype='i4')
>>> z
Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
  nbytes: 381.5M; nbytes_stored: 9.2M; ratio: 41.6; initialized: 100/100
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict

Take some slices:

>>> z[2, 2]
20002
>>> z[:2, :2]
array([[    0,     1],
       [10000, 10001]], dtype=int32)
>>> z[:2]
array([[    0,     1,     2, ...,  9997,  9998,  9999],
       [10000, 10001, 10002, ..., 19997, 19998, 19999]], dtype=int32)
>>> z[:, :2]
array([[       0,        1],
       [   10000,    10001],
       [   20000,    20001],
       ...,
       [99970000, 99970001],
       [99980000, 99980001],
       [99990000, 99990001]], dtype=int32)
>>> z[:]
array([[       0,        1,        2, ...,     9997,     9998,     9999],
       [   10000,    10001,    10002, ...,    19997,    19998,    19999],
       [   20000,    20001,    20002, ...,    29997,    29998,    29999],
       ...,
       [99970000, 99970001, 99970002, ..., 99979997, 99979998, 99979999],
       [99980000, 99980001, 99980002, ..., 99989997, 99989998, 99989999],
       [99990000, 99990001, 99990002, ..., 99999997, 99999998, 99999999]], dtype=int32)
__setitem__(item, value)

Modify data for some portion of the array.

Examples

Setup a 1-dimensional array:

>>> import zarr
>>> z = zarr.zeros(100000000, chunks=1000000, dtype='i4')
>>> z
Array((100000000,), int32, chunks=(1000000,), order=C)
  nbytes: 381.5M; nbytes_stored: 301; ratio: 1328903.7; initialized: 0/100
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict

Set all array elements to the same scalar value:

>>> z[:] = 42
>>> z[:]
array([42, 42, 42, ..., 42, 42, 42], dtype=int32)

Set a portion of the array:

>>> z[:100] = np.arange(100)
>>> z[-100:] = np.arange(100)[::-1]
>>> z[:]
array([0, 1, 2, ..., 2, 1, 0], dtype=int32)

Setup a 2-dimensional array:

>>> z = zarr.zeros((10000, 10000), chunks=(1000, 1000), dtype='i4')
>>> z
Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
  nbytes: 381.5M; nbytes_stored: 323; ratio: 1238390.1; initialized: 0/100
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict

Set all array elements to the same scalar value:

>>> z[:] = 42
>>> z[:]
array([[42, 42, 42, ..., 42, 42, 42],
       [42, 42, 42, ..., 42, 42, 42],
       [42, 42, 42, ..., 42, 42, 42],
       ...,
       [42, 42, 42, ..., 42, 42, 42],
       [42, 42, 42, ..., 42, 42, 42],
       [42, 42, 42, ..., 42, 42, 42]], dtype=int32)

Set a portion of the array:

>>> z[0, :] = np.arange(z.shape[1])
>>> z[:, 0] = np.arange(z.shape[0])
>>> z[:]
array([[   0,    1,    2, ..., 9997, 9998, 9999],
       [   1,   42,   42, ...,   42,   42,   42],
       [   2,   42,   42, ...,   42,   42,   42],
       ...,
       [9997,   42,   42, ...,   42,   42,   42],
       [9998,   42,   42, ...,   42,   42,   42],
       [9999,   42,   42, ...,   42,   42,   42]], dtype=int32)
resize(*args)

Change the shape of the array by growing or shrinking one or more dimensions.

Notes

When resizing an array, the data are not rearranged in any way.

If one or more dimensions are shrunk, any chunks falling outside the new array shape will be deleted from the underlying store.

Examples

>>> import zarr
>>> z = zarr.zeros(shape=(10000, 10000), chunks=(1000, 1000))
>>> z
Array((10000, 10000), float64, chunks=(1000, 1000), order=C)
  nbytes: 762.9M; nbytes_stored: 323; ratio: 2476780.2; initialized: 0/100
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict
>>> z.resize(20000, 10000)
>>> z
Array((20000, 10000), float64, chunks=(1000, 1000), order=C)
  nbytes: 1.5G; nbytes_stored: 323; ratio: 4953560.4; initialized: 0/200
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict
>>> z.resize(30000, 1000)
>>> z
Array((30000, 1000), float64, chunks=(1000, 1000), order=C)
  nbytes: 228.9M; nbytes_stored: 322; ratio: 745341.6; initialized: 0/30
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict
append(data, axis=0)

Append data to axis.

Parameters:

data : array_like

Data to be appended.

axis : int

Axis along which to append.

Returns:

new_shape : tuple

Notes

The size of all dimensions other than axis must match between this array and data.

Examples

>>> import numpy as np
>>> import zarr
>>> a = np.arange(10000000, dtype='i4').reshape(10000, 1000)
>>> z = zarr.array(a, chunks=(1000, 100))
>>> z
Array((10000, 1000), int32, chunks=(1000, 100), order=C)
  nbytes: 38.1M; nbytes_stored: 1.9M; ratio: 20.3; initialized: 100/100
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict
>>> z.append(a)
(20000, 1000)
>>> z
Array((20000, 1000), int32, chunks=(1000, 100), order=C)
  nbytes: 76.3M; nbytes_stored: 3.8M; ratio: 20.3; initialized: 200/200
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict
>>> z.append(np.vstack([a, a]), axis=1)
(20000, 2000)
>>> z
Array((20000, 2000), int32, chunks=(1000, 100), order=C)
  nbytes: 152.6M; nbytes_stored: 7.5M; ratio: 20.3; initialized: 400/400
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict
view(shape=None, chunks=None, dtype=None, fill_value=None, filters=None, read_only=None, synchronizer=None)

Return an array sharing the same data.

Parameters:

shape : int or tuple of ints

Array shape.

chunks : int or tuple of ints, optional

Chunk shape.

dtype : string or dtype, optional

NumPy dtype.

fill_value : object

Default value to use for uninitialized portions of the array.

filters : sequence, optional

Sequence of filters to use to encode chunk data prior to compression.

read_only : bool, optional

True if array should be protected against modification.

synchronizer : object, optional

Array synchronizer.

Notes

WARNING: This is an experimental feature and should be used with care. There are plenty of ways to generate errors and/or cause data corruption.

Examples

Bypass filters:

>>> import zarr
>>> import numpy as np
>>> np.random.seed(42)
>>> labels = [b'female', b'male']
>>> data = np.random.choice(labels, size=10000)
>>> filters = [zarr.Categorize(labels=labels,
...                                  dtype=data.dtype,
...                                  astype='u1')]
>>> a = zarr.array(data, chunks=1000, filters=filters)
>>> a[:]
array([b'female', b'male', b'female', ..., b'male', b'male', b'female'],
      dtype='|S6')
>>> v = a.view(dtype='u1', filters=[])
>>> v.is_view
True
>>> v[:]
array([1, 2, 1, ..., 2, 2, 1], dtype=uint8)

Views can be used to modify data:

>>> x = v[:]
>>> x.sort()
>>> v[:] = x
>>> v[:]
array([1, 1, 1, ..., 2, 2, 2], dtype=uint8)
>>> a[:]
array([b'female', b'female', b'female', ..., b'male', b'male', b'male'],
      dtype='|S6')

View as a different dtype with the same itemsize:

>>> data = np.random.randint(0, 2, size=10000, dtype='u1')
>>> a = zarr.array(data, chunks=1000)
>>> a[:]
array([0, 0, 1, ..., 1, 0, 0], dtype=uint8)
>>> v = a.view(dtype=bool)
>>> v[:]
array([False, False,  True, ...,  True, False, False], dtype=bool)
>>> np.all(a[:].view(dtype=bool) == v[:])
True

An array can be viewed with a dtype with a different itemsize, however some care is needed to adjust the shape and chunk shape so that chunk data is interpreted correctly:

>>> data = np.arange(10000, dtype='u2')
>>> a = zarr.array(data, chunks=1000)
>>> a[:10]
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint16)
>>> v = a.view(dtype='u1', shape=20000, chunks=2000)
>>> v[:10]
array([0, 0, 1, 0, 2, 0, 3, 0, 4, 0], dtype=uint8)
>>> np.all(a[:].view('u1') == v[:])
True

Change fill value for uninitialized chunks:

>>> a = zarr.full(10000, chunks=1000, fill_value=-1, dtype='i1')
>>> a[:]
array([-1, -1, -1, ..., -1, -1, -1], dtype=int8)
>>> v = a.view(fill_value=42)
>>> v[:]
array([42, 42, 42, ..., 42, 42, 42], dtype=int8)

Note that resizing or appending to views is not permitted:

>>> a = zarr.empty(10000)
>>> v = a.view()
>>> try:
...     v.resize(20000)
... except PermissionError as e:
...     print(e)
not permitted for views

Groups (zarr.hierarchy)

zarr.hierarchy.group(store=None, overwrite=False, chunk_store=None, synchronizer=None, path=None)

Create a group.

Parameters:

store : MutableMapping or string

Store or path to directory in file system.

overwrite : bool, optional

If True, delete any pre-existing data in store at path before creating the group.

chunk_store : MutableMapping, optional

Separate storage for chunks. If not provided, store will be used for storage of both chunks and metadata.

synchronizer : object, optional

Array synchronizer.

path : string, optional

Group path.

Returns:

g : zarr.hierarchy.Group

Examples

Create a group in memory:

>>> import zarr
>>> g = zarr.group()
>>> g
Group(/, 0)
  store: DictStore

Create a group with a different store:

>>> store = zarr.DirectoryStore('example')
>>> g = zarr.group(store=store, overwrite=True)
>>> g
Group(/, 0)
  store: DirectoryStore
zarr.hierarchy.open_group(store=None, mode='a', synchronizer=None, path=None)

Open a group using mode-like semantics.

Parameters:

store : MutableMapping or string

Store or path to directory in file system.

mode : {‘r’, ‘r+’, ‘a’, ‘w’, ‘w-‘}

Persistence mode: ‘r’ means read only (must exist); ‘r+’ means read/write (must exist); ‘a’ means read/write (create if doesn’t exist); ‘w’ means create (overwrite if exists); ‘w-‘ means create (fail if exists).

synchronizer : object, optional

Array synchronizer.

path : string, optional

Group path.

Returns:

g : zarr.hierarchy.Group

Examples

>>> import zarr
>>> root = zarr.open_group('example', mode='w')
>>> foo = root.create_group('foo')
>>> bar = root.create_group('bar')
>>> root
Group(/, 2)
  groups: 2; bar, foo
  store: DirectoryStore
>>> root2 = zarr.open_group('example', mode='a')
>>> root2
Group(/, 2)
  groups: 2; bar, foo
  store: DirectoryStore
>>> root == root2
True
class zarr.hierarchy.Group(store, path=None, read_only=False, chunk_store=None, synchronizer=None)

Instantiate a group from an initialized store.

Parameters:

store : MutableMapping

Group store, already initialized.

path : string, optional

Group path.

read_only : bool, optional

True if group should be protected against modification.

chunk_store : MutableMapping, optional

Separate storage for chunks. If not provided, store will be used for storage of both chunks and metadata.

synchronizer : object, optional

Array synchronizer.

Attributes

store A MutableMapping providing the underlying storage for the group.
path Storage path.
name Group name following h5py convention.
read_only A boolean, True if modification operations are not permitted.
chunk_store A MutableMapping providing the underlying storage for array chunks.
synchronizer Object used to synchronize write access to groups and arrays.
attrs A MutableMapping containing user-defined attributes.

Methods

__len__() Number of members.
__iter__() Return an iterator over group member names.
__contains__(item) Test for group membership.
__getitem__(item) Obtain a group member.
group_keys() Return an iterator over member names for groups only.
groups() Return an iterator over (name, value) pairs for groups only.
array_keys() Return an iterator over member names for arrays only.
arrays() Return an iterator over (name, value) pairs for arrays only.
create_group(name[, overwrite]) Create a sub-group.
require_group(name[, overwrite]) Obtain a sub-group, creating one if it doesn’t exist.
create_groups(*names, **kwargs) Convenience method to create multiple groups in a single call.
require_groups(*names) Convenience method to require multiple groups in a single call.
create_dataset(name, **kwargs) Create an array.
require_dataset(name, shape[, dtype, exact]) Obtain an array, creating if it doesn’t exist.
create(name, **kwargs) Create an array.
empty(name, **kwargs) Create an array.
zeros(name, **kwargs) Create an array.
ones(name, **kwargs) Create an array.
full(name, fill_value, **kwargs) Create an array.
array(name, data, **kwargs) Create an array.
empty_like(name, data, **kwargs) Create an array.
zeros_like(name, data, **kwargs) Create an array.
ones_like(name, data, **kwargs) Create an array.
full_like(name, data, **kwargs) Create an array.
__len__()

Number of members.

__iter__()

Return an iterator over group member names.

Examples

>>> import zarr
>>> g1 = zarr.group()
>>> g2 = g1.create_group('foo')
>>> g3 = g1.create_group('bar')
>>> d1 = g1.create_dataset('baz', shape=100, chunks=10)
>>> d2 = g1.create_dataset('quux', shape=200, chunks=20)
>>> for name in g1:
...     print(name)
bar
baz
foo
quux
__contains__(item)

Test for group membership.

Examples

>>> import zarr
>>> g1 = zarr.group()
>>> g2 = g1.create_group('foo')
>>> d1 = g1.create_dataset('bar', shape=100, chunks=10)
>>> 'foo' in g1
True
>>> 'bar' in g1
True
>>> 'baz' in g1
False
__getitem__(item)

Obtain a group member.

Parameters:

item : string

Member name or path.

Examples

>>> import zarr
>>> g1 = zarr.group()
>>> d1 = g1.create_dataset('foo/bar/baz', shape=100, chunks=10)
>>> g1['foo']
Group(/foo, 1)
  groups: 1; bar
  store: DictStore
>>> g1['foo/bar']
Group(/foo/bar, 1)
  arrays: 1; baz
  store: DictStore
>>> g1['foo/bar/baz']
Array(/foo/bar/baz, (100,), float64, chunks=(10,), order=C)
  nbytes: 800; nbytes_stored: 290; ratio: 2.8; initialized: 0/10
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: DictStore
group_keys()

Return an iterator over member names for groups only.

Examples

>>> import zarr
>>> g1 = zarr.group()
>>> g2 = g1.create_group('foo')
>>> g3 = g1.create_group('bar')
>>> d1 = g1.create_dataset('baz', shape=100, chunks=10)
>>> d2 = g1.create_dataset('quux', shape=200, chunks=20)
>>> sorted(g1.group_keys())
['bar', 'foo']
groups()

Return an iterator over (name, value) pairs for groups only.

Examples

>>> import zarr
>>> g1 = zarr.group()
>>> g2 = g1.create_group('foo')
>>> g3 = g1.create_group('bar')
>>> d1 = g1.create_dataset('baz', shape=100, chunks=10)
>>> d2 = g1.create_dataset('quux', shape=200, chunks=20)
>>> for n, v in g1.groups():
...     print(n, type(v))
bar <class 'zarr.hierarchy.Group'>
foo <class 'zarr.hierarchy.Group'>
array_keys()

Return an iterator over member names for arrays only.

Examples

>>> import zarr
>>> g1 = zarr.group()
>>> g2 = g1.create_group('foo')
>>> g3 = g1.create_group('bar')
>>> d1 = g1.create_dataset('baz', shape=100, chunks=10)
>>> d2 = g1.create_dataset('quux', shape=200, chunks=20)
>>> sorted(g1.array_keys())
['baz', 'quux']
arrays()

Return an iterator over (name, value) pairs for arrays only.

Examples

>>> import zarr
>>> g1 = zarr.group()
>>> g2 = g1.create_group('foo')
>>> g3 = g1.create_group('bar')
>>> d1 = g1.create_dataset('baz', shape=100, chunks=10)
>>> d2 = g1.create_dataset('quux', shape=200, chunks=20)
>>> for n, v in g1.arrays():
...     print(n, type(v))
baz <class 'zarr.core.Array'>
quux <class 'zarr.core.Array'>
create_group(name, overwrite=False)

Create a sub-group.

Parameters:

name : string

Group name.

overwrite : bool, optional

If True, overwrite any existing array with the given name.

Returns:

g : zarr.hierarchy.Group

Examples

>>> import zarr
>>> g1 = zarr.group()
>>> g2 = g1.create_group('foo')
>>> g3 = g1.create_group('bar')
>>> g4 = g1.create_group('baz/quux')
require_group(name, overwrite=False)

Obtain a sub-group, creating one if it doesn’t exist.

Parameters:

name : string

Group name.

overwrite : bool, optional

Overwrite any existing array with given name if present.

Returns:

g : zarr.hierarchy.Group

Examples

>>> import zarr
>>> g1 = zarr.group()
>>> g2 = g1.require_group('foo')
>>> g3 = g1.require_group('foo')
>>> g2 == g3
True
create_groups(*names, **kwargs)

Convenience method to create multiple groups in a single call.

require_groups(*names)

Convenience method to require multiple groups in a single call.

create_dataset(name, **kwargs)

Create an array.

Parameters:

name : string

Array name.

data : array_like, optional

Initial data.

shape : int or tuple of ints

Array shape.

chunks : int or tuple of ints, optional

Chunk shape. If not provided, will be guessed from shape and dtype.

dtype : string or dtype, optional

NumPy dtype.

compressor : Codec, optional

Primary compressor.

fill_value : object

Default value to use for uninitialized portions of the array.

order : {‘C’, ‘F’}, optional

Memory layout to be used within each chunk.

synchronizer : zarr.sync.ArraySynchronizer, optional

Array synchronizer.

filters : sequence of Codecs, optional

Sequence of filters to use to encode chunk data prior to compression.

overwrite : bool, optional

If True, replace any existing array or group with the given name.

cache_metadata : bool, optional

If True, array configuration metadata will be cached for the lifetime of the object. If False, array metadata will be reloaded prior to all data access and modification operations (may incur overhead depending on storage and data access pattern).

Returns:

a : zarr.core.Array

Examples

>>> import zarr
>>> g1 = zarr.group()
>>> d1 = g1.create_dataset('foo', shape=(10000, 10000),
...                        chunks=(1000, 1000))
>>> d1
Array(/foo, (10000, 10000), float64, chunks=(1000, 1000), order=C)
  nbytes: 762.9M; nbytes_stored: 323; ratio: 2476780.2; initialized: 0/100
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: DictStore
require_dataset(name, shape, dtype=None, exact=False, **kwargs)

Obtain an array, creating if it doesn’t exist. Other kwargs are as per zarr.hierarchy.Group.create_dataset().

Parameters:

name : string

Array name.

shape : int or tuple of ints

Array shape.

dtype : string or dtype, optional

NumPy dtype.

exact : bool, optional

If True, require dtype to match exactly. If false, require dtype can be cast from array dtype.

create(name, **kwargs)

Create an array. Keyword arguments as per zarr.creation.create().

empty(name, **kwargs)

Create an array. Keyword arguments as per zarr.creation.empty().

zeros(name, **kwargs)

Create an array. Keyword arguments as per zarr.creation.zeros().

ones(name, **kwargs)

Create an array. Keyword arguments as per zarr.creation.ones().

full(name, fill_value, **kwargs)

Create an array. Keyword arguments as per zarr.creation.full().

array(name, data, **kwargs)

Create an array. Keyword arguments as per zarr.creation.array().

empty_like(name, data, **kwargs)

Create an array. Keyword arguments as per zarr.creation.empty_like().

zeros_like(name, data, **kwargs)

Create an array. Keyword arguments as per zarr.creation.zeros_like().

ones_like(name, data, **kwargs)

Create an array. Keyword arguments as per zarr.creation.ones_like().

full_like(name, data, **kwargs)

Create an array. Keyword arguments as per zarr.creation.full_like().

Storage (zarr.storage)

This module contains storage classes for use with Zarr arrays and groups. However, note that any object implementing the MutableMapping interface can be used as a Zarr array store.

zarr.storage.init_array(store, shape, chunks=None, dtype=None, compressor='default', fill_value=None, order='C', overwrite=False, path=None, chunk_store=None, filters=None)

initialize an array store with the given configuration.

Parameters:

store : MutableMapping

A mapping that supports string keys and bytes-like values.

shape : int or tuple of ints

Array shape.

chunks : int or tuple of ints, optional

Chunk shape. If not provided, will be guessed from shape and dtype.

dtype : string or dtype, optional

NumPy dtype.

compressor : Codec, optional

Primary compressor.

fill_value : object

Default value to use for uninitialized portions of the array.

order : {‘C’, ‘F’}, optional

Memory layout to be used within each chunk.

overwrite : bool, optional

If True, erase all data in store prior to initialisation.

path : string, optional

Path under which array is stored.

chunk_store : MutableMapping, optional

Separate storage for chunks. If not provided, store will be used for storage of both chunks and metadata.

filters : sequence, optional

Sequence of filters to use to encode chunk data prior to compression.

Notes

The initialisation process involves normalising all array metadata, encoding as JSON and storing under the ‘.zarray’ key. User attributes are also initialized and stored as JSON under the ‘.zattrs’ key.

Examples

Initialize an array store:

>>> from zarr.storage import init_array
>>> store = dict()
>>> init_array(store, shape=(10000, 10000), chunks=(1000, 1000))
>>> sorted(store.keys())
['.zarray', '.zattrs']

Array metadata is stored as JSON:

>>> print(str(store['.zarray'], 'ascii'))
{
    "chunks": [
        1000,
        1000
    ],
    "compressor": {
        "clevel": 5,
        "cname": "lz4",
        "id": "blosc",
        "shuffle": 1
    },
    "dtype": "<f8",
    "fill_value": null,
    "filters": null,
    "order": "C",
    "shape": [
        10000,
        10000
    ],
    "zarr_format": 2
}

User-defined attributes are also stored as JSON, initially empty:

>>> print(str(store['.zattrs'], 'ascii'))
{}

Initialize an array using a storage path:

>>> store = dict()
>>> init_array(store, shape=100000000, chunks=1000000, dtype='i1',
...            path='foo')
>>> sorted(store.keys())
['.zattrs', '.zgroup', 'foo/.zarray', 'foo/.zattrs']
>>> print(str(store['foo/.zarray'], 'ascii'))
{
    "chunks": [
        1000000
    ],
    "compressor": {
        "clevel": 5,
        "cname": "lz4",
        "id": "blosc",
        "shuffle": 1
    },
    "dtype": "|i1",
    "fill_value": null,
    "filters": null,
    "order": "C",
    "shape": [
        100000000
    ],
    "zarr_format": 2
}
zarr.storage.init_group(store, overwrite=False, path=None, chunk_store=None)

initialize a group store.

Parameters:

store : MutableMapping

A mapping that supports string keys and byte sequence values.

overwrite : bool, optional

If True, erase all data in store prior to initialisation.

path : string, optional

Path under which array is stored.

chunk_store : MutableMapping, optional

Separate storage for chunks. If not provided, store will be used for storage of both chunks and metadata.

class zarr.storage.DictStore(cls=<type 'dict'>)

Extended mutable mapping interface to a hierarchy of dicts.

Examples

>>> import zarr
>>> store = zarr.DictStore()
>>> store['foo'] = b'bar'
>>> store['foo']
b'bar'
>>> store['a/b/c'] = b'xxx'
>>> store['a/b/c']
b'xxx'
>>> sorted(store.keys())
['a/b/c', 'foo']
>>> store.listdir()
['a', 'foo']
>>> store.listdir('a/b')
['c']
>>> store.rmdir('a')
>>> sorted(store.keys())
['foo']
class zarr.storage.DirectoryStore(path)

Mutable Mapping interface to a directory. Keys must be strings, values must be bytes-like objects.

Parameters:

path : string

Location of directory.

Examples

>>> import zarr
>>> store = zarr.DirectoryStore('example_store')
>>> store['foo'] = b'bar'
>>> store['foo']
b'bar'
>>> open('example_store/foo', 'rb').read()
b'bar'
>>> store['a/b/c'] = b'xxx'
>>> store['a/b/c']
b'xxx'
>>> open('example_store/a/b/c', 'rb').read()
b'xxx'
>>> sorted(store.keys())
['a/b/c', 'foo']
>>> store.listdir()
['a', 'foo']
>>> store.listdir('a/b')
['c']
>>> store.rmdir('a')
>>> sorted(store.keys())
['foo']
>>> import os
>>> os.path.exists('example_store/a')
False
class zarr.storage.TempStore(suffix='', prefix='zarr', dir=None)

Directory store using a temporary directory for storage.

class zarr.storage.ZipStore(path, compression=0, allowZip64=True, mode='a')

Mutable Mapping interface to a Zip file. Keys must be strings, values must be bytes-like objects.

Parameters:

path : string

Location of file.

compression : integer, optional

Compression method to use when writing to the archive.

allowZip64 : bool, optional

If True (the default) will create ZIP files that use the ZIP64 extensions when the zipfile is larger than 2 GiB. If False will raise an exception when the ZIP file would require ZIP64 extensions.

mode : string, optional

One of ‘r’ to read an existing file, ‘w’ to truncate and write a new file, ‘a’ to append to an existing file, or ‘x’ to exclusively create and write a new file.

Notes

When modifying a ZipStore the close() method must be called otherwise essential data will not be written to the underlying zip file. The ZipStore class also supports the context manager protocol, which ensures the close() method is called on leaving the with statement.

Examples

>>> import zarr
>>> store = zarr.ZipStore('example.zip', mode='w')
>>> store['foo'] = b'bar'
>>> store['foo']
b'bar'
>>> store['a/b/c'] = b'xxx'
>>> store['a/b/c']
b'xxx'
>>> sorted(store.keys())
['a/b/c', 'foo']
>>> store.close()
>>> import zipfile
>>> zf = zipfile.ZipFile('example.zip', mode='r')
>>> sorted(zf.namelist())
['a/b/c', 'foo']
close()

Closes the underlying zip file, ensuring all records are written.

flush()

Closes the underlying zip file, ensuring all records are written, then re-opens the file for further modifications.

zarr.storage.migrate_1to2(store)

Migrate array metadata in store from Zarr format version 1 to version 2.

Parameters:

store : MutableMapping

Store to be migrated.

Notes

Version 1 did not support hierarchies, so this migration function will look for a single array in store and migrate the array metadata to version 2.

Compressors and filters (zarr.codecs)

This module contains compressor and filter classes for use with Zarr.

Other codecs can be registered dynamically with Zarr. All that is required is to implement a class that provides the same interface as the classes listed below, and then to add the class to the codec_registry. See the source code of this module for details.

class zarr.codecs.Codec

Codec abstract base class.

encode(buf)

Encode data in buf.

Parameters:

buf : buffer-like

Data to be encoded. May be any object supporting the new-style buffer protocol or array.array.

Returns:

enc : buffer-like

Encoded data. May be any object supporting the new-style buffer protocol or array.array.

decode(buf, out=None)

Decode data in buf.

Parameters:

buf : buffer-like

Encoded data. May be any object supporting the new-style buffer protocol or array.array.

out : buffer-like, optional

Buffer to store decoded data.

Returns:

out : buffer-like

Decoded data. May be any object supporting the new-style buffer protocol or array.array.

get_config()

Return a dictionary holding configuration parameters for this codec. All values must be compatible with JSON encoding.

classmethod from_config(config)

Instantiate from a configuration object.

class zarr.codecs.Blosc(cname='lz4', clevel=5, shuffle=1)

Provides compression using the blosc meta-compressor.

Parameters:

cname : string, optional

A string naming one of the compression algorithms available within blosc, e.g., ‘blosclz’, ‘lz4’, ‘zlib’ or ‘snappy’.

clevel : integer, optional

An integer between 0 and 9 specifying the compression level.

shuffle : integer, optional

Either 0 (no shuffle), 1 (byte shuffle) or 2 (bit shuffle).

class zarr.codecs.Zlib(level=1)

Provides compression using zlib via the Python standard library.

Parameters:

level : int

Compression level.

class zarr.codecs.BZ2(level=1)

Provides compression using bzip2 via the Python standard library.

Parameters:

level : int

Compression level.

class zarr.codecs.LZMA(format=1, check=-1, preset=None, filters=None)

Provides compression using lzma via the Python standard library (only available under Python 3).

Parameters:

format : integer, optional

One of the lzma format codes, e.g., lzma.FORMAT_XZ.

check : integer, optional

One of the lzma check codes, e.g., lzma.CHECK_NONE.

preset : integer, optional

An integer between 0 and 9 inclusive, specifying the compression level.

filters : list, optional

A list of dictionaries specifying compression filters. If filters are provided, ‘preset’ must be None.

class zarr.codecs.Delta(dtype, astype=None)

Filter to encode data as the difference between adjacent values.

Parameters:

dtype : dtype

Data type to use for decoded data.

astype : dtype, optional

Data type to use for encoded data.

Notes

If astype is an integer data type, please ensure that it is sufficiently large to store encoded values. No checks are made and data may become corrupted due to integer overflow if astype is too small. Note also that the encoded data for each chunk includes the absolute value of the first element in the chunk, and so the encoded data type in general needs to be large enough to store absolute values from the array.

Examples

>>> import zarr
>>> import numpy as np
>>> x = np.arange(100, 120, 2, dtype='i8')
>>> f = zarr.Delta(dtype='i8', astype='i1')
>>> y = f.encode(x)
>>> y
array([100,   2,   2,   2,   2,   2,   2,   2,   2,   2], dtype=int8)
>>> z = f.decode(y)
>>> z
array([100, 102, 104, 106, 108, 110, 112, 114, 116, 118])
class zarr.codecs.FixedScaleOffset(offset, scale, dtype, astype=None)

Simplified version of the scale-offset filter available in HDF5. Applies the transformation (x - offset) * scale to all chunks. Results are rounded to the nearest integer but are not packed according to the minimum number of bits.

Parameters:

offset : float

Value to subtract from data.

scale : int

Value to multiply by data.

dtype : dtype

Data type to use for decoded data.

astype : dtype, optional

Data type to use for encoded data.

Notes

If astype is an integer data type, please ensure that it is sufficiently large to store encoded values. No checks are made and data may become corrupted due to integer overflow if astype is too small.

Examples

>>> import zarr
>>> import numpy as np
>>> x = np.linspace(1000, 1001, 10, dtype='f8')
>>> x
array([ 1000.        ,  1000.11111111,  1000.22222222,  1000.33333333,
        1000.44444444,  1000.55555556,  1000.66666667,  1000.77777778,
        1000.88888889,  1001.        ])
>>> f1 = zarr.FixedScaleOffset(offset=1000, scale=10, dtype='f8', astype='u1')
>>> y1 = f1.encode(x)
>>> y1
array([ 0,  1,  2,  3,  4,  6,  7,  8,  9, 10], dtype=uint8)
>>> z1 = f1.decode(y1)
>>> z1
array([ 1000. ,  1000.1,  1000.2,  1000.3,  1000.4,  1000.6,  1000.7,
        1000.8,  1000.9,  1001. ])
>>> f2 = zarr.FixedScaleOffset(offset=1000, scale=10**2, dtype='f8', astype='u1')
>>> y2 = f2.encode(x)
>>> y2
array([  0,  11,  22,  33,  44,  56,  67,  78,  89, 100], dtype=uint8)
>>> z2 = f2.decode(y2)
>>> z2
array([ 1000.  ,  1000.11,  1000.22,  1000.33,  1000.44,  1000.56,
        1000.67,  1000.78,  1000.89,  1001.  ])
>>> f3 = zarr.FixedScaleOffset(offset=1000, scale=10**3, dtype='f8', astype='u2')
>>> y3 = f3.encode(x)
>>> y3
array([   0,  111,  222,  333,  444,  556,  667,  778,  889, 1000], dtype=uint16)
>>> z3 = f3.decode(y3)
>>> z3
array([ 1000.   ,  1000.111,  1000.222,  1000.333,  1000.444,  1000.556,
        1000.667,  1000.778,  1000.889,  1001.   ])
class zarr.codecs.Quantize(digits, dtype, astype=None)

Lossy filter to reduce the precision of floating point data.

Parameters:

digits : int

Desired precision (number of decimal digits).

dtype : dtype

Data type to use for decoded data.

astype : dtype, optional

Data type to use for encoded data.

Examples

>>> import zarr
>>> import numpy as np
>>> x = np.linspace(0, 1, 10, dtype='f8')
>>> x
array([ 0.        ,  0.11111111,  0.22222222,  0.33333333,  0.44444444,
        0.55555556,  0.66666667,  0.77777778,  0.88888889,  1.        ])
>>> f1 = zarr.Quantize(digits=1, dtype='f8')
>>> y1 = f1.encode(x)
>>> y1
array([ 0.    ,  0.125 ,  0.25  ,  0.3125,  0.4375,  0.5625,  0.6875,
        0.75  ,  0.875 ,  1.    ])
>>> f2 = zarr.Quantize(digits=2, dtype='f8')
>>> y2 = f2.encode(x)
>>> y2
array([ 0.       ,  0.109375 ,  0.21875  ,  0.3359375,  0.4453125,
        0.5546875,  0.6640625,  0.78125  ,  0.890625 ,  1.       ])
>>> f3 = zarr.Quantize(digits=3, dtype='f8')
>>> y3 = f3.encode(x)
>>> y3
array([ 0.        ,  0.11132812,  0.22265625,  0.33300781,  0.44433594,
        0.55566406,  0.66699219,  0.77734375,  0.88867188,  1.        ])
class zarr.codecs.PackBits

Filter to pack elements of a boolean array into bits in a uint8 array.

Notes

The first element of the encoded array stores the number of bits that were padded to complete the final byte.

Examples

>>> import zarr
>>> import numpy as np
>>> f = zarr.PackBits()
>>> x = np.array([True, False, False, True], dtype=bool)
>>> y = f.encode(x)
>>> y
array([  4, 144], dtype=uint8)
>>> z = f.decode(y)
>>> z
array([ True, False, False,  True], dtype=bool)
class zarr.codecs.Categorize(labels, dtype, astype='u1')

Filter encoding categorical string data as integers.

Parameters:

labels : sequence of strings

Category labels.

dtype : dtype

Data type to use for decoded data.

astype : dtype, optional

Data type to use for encoded data.

Examples

>>> import zarr
>>> import numpy as np
>>> x = np.array([b'male', b'female', b'female', b'male', b'unexpected'])
>>> x
array([b'male', b'female', b'female', b'male', b'unexpected'],
      dtype='|S10')
>>> f = zarr.Categorize(labels=[b'female', b'male'], dtype=x.dtype)
>>> y = f.encode(x)
>>> y
array([2, 1, 1, 2, 0], dtype=uint8)
>>> z = f.decode(y)
>>> z
array([b'male', b'female', b'female', b'male', b''],
      dtype='|S10')

Synchronization (zarr.sync)

class zarr.sync.ThreadSynchronizer

Provides synchronization using thread locks.

class zarr.sync.ProcessSynchronizer(path)

Provides synchronization using file locks via the fasteners package.

Parameters:

path : string

Path to a directory on a file system that is shared by all processes. N.B., this should be a different path to where you store the array.

Specifications

Zarr storage specification version 1

This document provides a technical specification of the protocol and format used for storing a Zarr array. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.

Status

This specification is deprecated. See Specifications for the latest version.

Storage

A Zarr array can be stored in any storage system that provides a key/value interface, where a key is an ASCII string and a value is an arbitrary sequence of bytes, and the supported operations are read (get the sequence of bytes associated with a given key), write (set the sequence of bytes associated with a given key) and delete (remove a key/value pair).

For example, a directory in a file system can provide this interface, where keys are file names, values are file contents, and files can be read, written or deleted via the operating system. Equally, an S3 bucket can provide this interface, where keys are resource names, values are resource contents, and resources can be read, written or deleted via HTTP.

Below an “array store” refers to any system implementing this interface.

Metadata

Each array requires essential configuration metadata to be stored, enabling correct interpretation of the stored data. This metadata is encoded using JSON and stored as the value of the ‘meta’ key within an array store.

The metadata resource is a JSON object. The following keys MUST be present within the object:

zarr_format
An integer defining the version of the storage specification to which the array store adheres.
shape
A list of integers defining the length of each dimension of the array.
chunks
A list of integers defining the length of each dimension of a chunk of the array. Note that all chunks within a Zarr array have the same shape.
dtype
A string or list defining a valid data type for the array. See also the subsection below on data type encoding.
compression
A string identifying the primary compression library used to compress each chunk of the array.
compression_opts
An integer, string or dictionary providing options to the primary compression library.
fill_value
A scalar value providing the default value to use for uninitialized portions of the array.
order
Either ‘C’ or ‘F’, defining the layout of bytes within each chunk of the array. ‘C’ means row-major order, i.e., the last dimension varies fastest; ‘F’ means column-major order, i.e., the first dimension varies fastest.

Other keys MAY be present within the metadata object however they MUST NOT alter the interpretation of the required fields defined above.

For example, the JSON object below defines a 2-dimensional array of 64-bit little-endian floating point numbers with 10000 rows and 10000 columns, divided into chunks of 1000 rows and 1000 columns (so there will be 100 chunks in total arranged in a 10 by 10 grid). Within each chunk the data are laid out in C contiguous order, and each chunk is compressed using the Blosc compression library:

{
    "chunks": [
        1000,
        1000
    ],
    "compression": "blosc",
    "compression_opts": {
        "clevel": 5,
        "cname": "lz4",
        "shuffle": 1
    },
    "dtype": "<f8",
    "fill_value": null,
    "order": "C",
    "shape": [
        10000,
        10000
    ],
    "zarr_format": 1
}
Data type encoding

Simple data types are encoded within the array metadata resource as a string, following the NumPy array protocol type string (typestr) format. The format consists of 3 parts: a character describing the byteorder of the data (<: little-endian, >: big-endian, |: not-relevant), a character code giving the basic type of the array, and an integer providing the number of bytes the type uses. The byte order MUST be specified. E.g., "<f8", ">i4", "|b1" and "|S12" are valid data types.

Structure data types (i.e., with multiple named fields) are encoded as a list of two-element lists, following NumPy array protocol type descriptions (descr). For example, the JSON list [["r", "|u1"], ["g", "|u1"], ["b", "|u1"]] defines a data type composed of three single-byte unsigned integers labelled ‘r’, ‘g’ and ‘b’.

Chunks

Each chunk of the array is compressed by passing the raw bytes for the chunk through the primary compression library to obtain a new sequence of bytes comprising the compressed chunk data. No header is added to the compressed bytes or any other modification made. The internal structure of the compressed bytes will depend on which primary compressor was used. For example, the Blosc compressor produces a sequence of bytes that begins with a 16-byte header followed by compressed data.

The compressed sequence of bytes for each chunk is stored under a key formed from the index of the chunk within the grid of chunks representing the array. To form a string key for a chunk, the indices are converted to strings and concatenated with the period character (‘.’) separating each index. For example, given an array with shape (10000, 10000) and chunk shape (1000, 1000) there will be 100 chunks laid out in a 10 by 10 grid. The chunk with indices (0, 0) provides data for rows 0-1000 and columns 0-1000 and is stored under the key ‘0.0’; the chunk with indices (2, 4) provides data for rows 2000-3000 and columns 4000-5000 and is stored under the key ‘2.4’; etc.

There is no need for all chunks to be present within an array store. If a chunk is not present then it is considered to be in an uninitialized state. An unitialized chunk MUST be treated as if it was uniformly filled with the value of the ‘fill_value’ field in the array metadata. If the ‘fill_value’ field is null then the contents of the chunk are undefined.

Note that all chunks in an array have the same shape. If the length of any array dimension is not exactly divisible by the length of the corresponding chunk dimension then some chunks will overhang the edge of the array. The contents of any chunk region falling outside the array are undefined.

Attributes

Each array can also be associated with custom attributes, which are simple key/value items with application-specific meaning. Custom attributes are encoded as a JSON object and stored under the ‘attrs’ key within an array store. Even if the attributes are empty, the ‘attrs’ key MUST be present within an array store.

For example, the JSON object below encodes three attributes named ‘foo’, ‘bar’ and ‘baz’:

{
    "foo": 42,
    "bar": "apples",
    "baz": [1, 2, 3, 4]
}
Example

Below is an example of storing a Zarr array, using a directory on the local file system as storage.

Initialize the store:

>>> import zarr
>>> store = zarr.DirectoryStore('example.zarr')
>>> zarr.init_store(store, shape=(20, 20), chunks=(10, 10),
...                 dtype='i4', fill_value=42, compression='zlib',
...                 compression_opts=1, overwrite=True)

No chunks are initialized yet, so only the ‘meta’ and ‘attrs’ keys have been set:

>>> import os
>>> sorted(os.listdir('example.zarr'))
['attrs', 'meta']

Inspect the array metadata:

>>> print(open('example.zarr/meta').read())
{
    "chunks": [
        10,
        10
    ],
    "compression": "zlib",
    "compression_opts": 1,
    "dtype": "<i4",
    "fill_value": 42,
    "order": "C",
    "shape": [
        20,
        20
    ],
    "zarr_format": 1
}

Inspect the array attributes:

>>> print(open('example.zarr/attrs').read())
{}

Set some data:

>>> z = zarr.Array(store)
>>> z[0:10, 0:10] = 1
>>> sorted(os.listdir('example.zarr'))
['0.0', 'attrs', 'meta']

Set some more data:

>>> z[0:10, 10:20] = 2
>>> z[10:20, :] = 3
>>> sorted(os.listdir('example.zarr'))
['0.0', '0.1', '1.0', '1.1', 'attrs', 'meta']

Manually decompress a single chunk for illustration:

>>> import zlib
>>> b = zlib.decompress(open('example.zarr/0.0', 'rb').read())
>>> import numpy as np
>>> a = np.frombuffer(b, dtype='<i4')
>>> a
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

Modify the array attributes:

>>> z.attrs['foo'] = 42
>>> z.attrs['bar'] = 'apples'
>>> z.attrs['baz'] = [1, 2, 3, 4]
>>> print(open('example.zarr/attrs').read())
{
    "bar": "apples",
    "baz": [
        1,
        2,
        3,
        4
    ],
    "foo": 42
}

Zarr storage specification version 2

This document provides a technical specification of the protocol and format used for storing Zarr arrays. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.

Status

This specification is the latest version. See Specifications for previous versions.

Storage

A Zarr array can be stored in any storage system that provides a key/value interface, where a key is an ASCII string and a value is an arbitrary sequence of bytes, and the supported operations are read (get the sequence of bytes associated with a given key), write (set the sequence of bytes associated with a given key) and delete (remove a key/value pair).

For example, a directory in a file system can provide this interface, where keys are file names, values are file contents, and files can be read, written or deleted via the operating system. Equally, an S3 bucket can provide this interface, where keys are resource names, values are resource contents, and resources can be read, written or deleted via HTTP.

Below an “array store” refers to any system implementing this interface.

Arrays
Metadata

Each array requires essential configuration metadata to be stored, enabling correct interpretation of the stored data. This metadata is encoded using JSON and stored as the value of the ”.zarray” key within an array store.

The metadata resource is a JSON object. The following keys MUST be present within the object:

zarr_format
An integer defining the version of the storage specification to which the array store adheres.
shape
A list of integers defining the length of each dimension of the array.
chunks
A list of integers defining the length of each dimension of a chunk of the array. Note that all chunks within a Zarr array have the same shape.
dtype
A string or list defining a valid data type for the array. See also the subsection below on data type encoding.
compressor
A JSON object identifying the primary compression codec and providing configuration parameters, or null if no compressor is to be used. The object MUST contain an "id" key identifying the codec to be used.
fill_value
A scalar value providing the default value to use for uninitialized portions of the array, or null if no fill_value is to be used.
order
Either “C” or “F”, defining the layout of bytes within each chunk of the array. “C” means row-major order, i.e., the last dimension varies fastest; “F” means column-major order, i.e., the first dimension varies fastest.
filters
A list of JSON objects providing codec configurations, or null if no filters are to be applied. Each codec configuration object MUST contain a "id" key identifying the codec to be used.

Other keys MUST NOT be present within the metadata object.

For example, the JSON object below defines a 2-dimensional array of 64-bit little-endian floating point numbers with 10000 rows and 10000 columns, divided into chunks of 1000 rows and 1000 columns (so there will be 100 chunks in total arranged in a 10 by 10 grid). Within each chunk the data are laid out in C contiguous order. Each chunk is encoded using a delta filter and compressed using the Blosc compression library prior to storage:

{
    "chunks": [
        1000,
        1000
    ],
    "compressor": {
        "id": "blosc",
        "cname": "lz4",
        "clevel": 5,
        "shuffle": 1
    },
    "dtype": "<f8",
    "fill_value": "NaN",
    "filters": [
        {"id": "delta", "dtype": "<f8", "astype": "<f4"}
    ],
    "order": "C",
    "shape": [
        10000,
        10000
    ],
    "zarr_format": 2
}
Data type encoding

Simple data types are encoded within the array metadata as a string, following the NumPy array protocol type string (typestr) format. The format consists of 3 parts:

  • One character describing the byteorder of the data ("<": little-endian; ">": big-endian; "|": not-relevant)
  • One character code giving the basic type of the array ("b": Boolean (integer type where all values are only True or False); "i": integer; "u": unsigned integer; "f": floating point; "c": complex floating point; "m": timedelta; "M": datetime; "S": string (fixed-length sequence of char); "U": unicode (fixed-length sequence of Py_UNICODE); "V": other (void * – each item is a fixed-size chunk of memory))
  • An integer specifying the number of bytes the type uses.

The byte order MUST be specified. E.g., "<f8", ">i4", "|b1" and "|S12" are valid data type encodings.

Structured data types (i.e., with multiple named fields) are encoded as a list of two-element lists, following NumPy array protocol type descriptions (descr). For example, the JSON list [["r", "|u1"], ["g", "|u1"], ["b", "|u1"]] defines a data type composed of three single-byte unsigned integers labelled “r”, “g” and “b”.

Fill value encoding

For simple floating point data types, the following table MUST be used to encode values of the “fill_value” field:

Value JSON encoding
Not a Number "NaN"
Positive Infinity "Infinity"
Negative Infinity "-Infinity"
Chunks

Each chunk of the array is compressed by passing the raw bytes for the chunk through the primary compression library to obtain a new sequence of bytes comprising the compressed chunk data. No header is added to the compressed bytes or any other modification made. The internal structure of the compressed bytes will depend on which primary compressor was used. For example, the Blosc compressor produces a sequence of bytes that begins with a 16-byte header followed by compressed data.

The compressed sequence of bytes for each chunk is stored under a key formed from the index of the chunk within the grid of chunks representing the array. To form a string key for a chunk, the indices are converted to strings and concatenated with the period character (”.”) separating each index. For example, given an array with shape (10000, 10000) and chunk shape (1000, 1000) there will be 100 chunks laid out in a 10 by 10 grid. The chunk with indices (0, 0) provides data for rows 0-1000 and columns 0-1000 and is stored under the key “0.0”; the chunk with indices (2, 4) provides data for rows 2000-3000 and columns 4000-5000 and is stored under the key “2.4”; etc.

There is no need for all chunks to be present within an array store. If a chunk is not present then it is considered to be in an uninitialized state. An unitialized chunk MUST be treated as if it was uniformly filled with the value of the “fill_value” field in the array metadata. If the “fill_value” field is null then the contents of the chunk are undefined.

Note that all chunks in an array have the same shape. If the length of any array dimension is not exactly divisible by the length of the corresponding chunk dimension then some chunks will overhang the edge of the array. The contents of any chunk region falling outside the array are undefined.

Filters

Optionally a sequence of one or more filters can be used to transform chunk data prior to compression. When storing data, filters are applied in the order specified in array metadata to encode data, then the encoded data are passed to the primary compressor. When retrieving data, stored chunk data are decompressed by the primary compressor then decoded using filters in the reverse order.

Hierarchies
Logical storage paths

Multiple arrays can be stored in the same array store by associating each array with a different logical path. A logical path is simply an ASCII string. The logical path is used to form a prefix for keys used by the array. For example, if an array is stored at logical path “foo/bar” then the array metadata will be stored under the key “foo/bar/.zarray”, the user-defined attributes will be stored under the key “foo/bar/.zattrs”, and the chunks will be stored under keys like “foo/bar/0.0”, “foo/bar/0.1”, etc.

To ensure consistent behaviour across different storage systems, logical paths MUST be normalized as follows:

  • Replace all backward slash characters (“\”) with forward slash characters (“/”)
  • Strip any leading “/” characters
  • Strip any trailing “/” characters
  • Collapse any sequence of more than one “/” character into a single “/” character

The key prefix is then obtained by appending a single “/” character to the normalized logical path.

After normalization, if splitting a logical path by the “/” character results in any path segment equal to the string ”.” or the string ”..” then an error MUST be raised.

N.B., how the underlying array store processes requests to store values under keys containing the “/” character is entirely up to the store implementation and is not constrained by this specification. E.g., an array store could simply treat all keys as opaque ASCII strings; equally, an array store could map logical paths onto some kind of hierarchical storage (e.g., directories on a file system).

Groups

Arrays can be organized into groups which can also contain other groups. A group is created by storing group metadata under the ”.zgroup” key under some logical path. E.g., a group exists at the root of an array store if the ”.zgroup” key exists in the store, and a group exists at logical path “foo/bar” if the “foo/bar/.zgroup” key exists in the store.

If the user requests a group to be created under some logical path, then groups MUST also be created at all ancestor paths. E.g., if the user requests group creation at path “foo/bar” then groups MUST be created at path “foo” and the root of the store, if they don’t already exist.

If the user requests an array to be created under some logical path, then groups MUST also be created at all ancestor paths. E.g., if the user requests array creation at path “foo/bar/baz” then groups must be created at path “foo/bar”, path “foo”, and the root of the store, if they don’t already exist.

The group metadata resource is a JSON object. The following keys MUST be present within the object:

zarr_format
An integer defining the version of the storage specification to which the array store adheres.

Other keys MUST NOT be present within the metadata object.

The members of a group are arrays and groups stored under logical paths that are direct children of the parent group’s logical path. E.g., if groups exist under the logical paths “foo” and “foo/bar” and an array exists at logical path “foo/baz” then the members of the group at path “foo” are the group at path “foo/bar” and the array at path “foo/baz”.

Attributes

An array or group can be associated with custom attributes, which are simple key/value items with application-specific meaning. Custom attributes are encoded as a JSON object and stored under the ”.zattrs” key within an array store.

For example, the JSON object below encodes three attributes named “foo”, “bar” and “baz”:

{
    "foo": 42,
    "bar": "apples",
    "baz": [1, 2, 3, 4]
}
Examples
Storing a single array

Below is an example of storing a Zarr array, using a directory on the local file system as storage.

Create an array:

>>> import zarr
>>> store = zarr.DirectoryStore('example')
>>> a = zarr.create(shape=(20, 20), chunks=(10, 10), dtype='i4',
...                 fill_value=42, compressor=zarr.Zlib(level=1),
...                 store=store, overwrite=True)

No chunks are initialized yet, so only the ”.zarray” and ”.zattrs” keys have been set in the store:

>>> import os
>>> sorted(os.listdir('example'))
['.zarray', '.zattrs']

Inspect the array metadata:

>>> print(open('example/.zarray').read())
{
    "chunks": [
        10,
        10
    ],
    "compressor": {
        "id": "zlib",
        "level": 1
    },
    "dtype": "<i4",
    "fill_value": 42,
    "filters": null,
    "order": "C",
    "shape": [
        20,
        20
    ],
    "zarr_format": 2
}

Inspect the array attributes:

>>> print(open('example/.zattrs').read())
{}

Chunks are initialized on demand. E.g., set some data:

>>> a[0:10, 0:10] = 1
>>> sorted(os.listdir('example'))
['.zarray', '.zattrs', '0.0']

Set some more data:

>>> a[0:10, 10:20] = 2
>>> a[10:20, :] = 3
>>> sorted(os.listdir('example'))
['.zarray', '.zattrs', '0.0', '0.1', '1.0', '1.1']

Manually decompress a single chunk for illustration:

>>> import zlib
>>> buf = zlib.decompress(open('example/0.0', 'rb').read())
>>> import numpy as np
>>> chunk = np.frombuffer(buf, dtype='<i4')
>>> chunk
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

Modify the array attributes:

>>> a.attrs['foo'] = 42
>>> a.attrs['bar'] = 'apples'
>>> a.attrs['baz'] = [1, 2, 3, 4]
>>> print(open('example/.zattrs').read())
{
    "bar": "apples",
    "baz": [
        1,
        2,
        3,
        4
    ],
    "foo": 42
}
Storing multiple arrays in a hierarchy

Below is an example of storing multiple Zarr arrays organized into a group hierarchy, using a directory on the local file system as storage. This storage implementation maps logical paths onto directory paths on the file system, however this is an implementation choice and is not required.

Setup the store:

>>> import zarr
>>> store = zarr.DirectoryStore('example_hierarchy')

Create the root group:

>>> root_grp = zarr.group(store, overwrite=True)

The metadata resource for the root group has been created, as well as a custom attributes resource:

>>> import os
>>> sorted(os.listdir('example_hierarchy'))
['.zattrs', '.zgroup']

Inspect the group metadata:

>>> print(open('example_hierarchy/.zgroup').read())
{
    "zarr_format": 2
}

Inspect the group attributes:

>>> print(open('example_hierarchy/.zattrs').read())
{}

Create a sub-group:

>>> sub_grp = root_grp.create_group('foo')

What has been stored:

>>> sorted(os.listdir('example_hierarchy'))
['.zattrs', '.zgroup', 'foo']
>>> sorted(os.listdir('example_hierarchy/foo'))
['.zattrs', '.zgroup']

Create an array within the sub-group:

>>> a = sub_grp.create_dataset('bar', shape=(20, 20), chunks=(10, 10))
>>> a[:] = 42

What has been stored:

>>> sorted(os.listdir('example_hierarchy'))
['.zattrs', '.zgroup', 'foo']
>>> sorted(os.listdir('example_hierarchy/foo'))
['.zattrs', '.zgroup', 'bar']
>>> sorted(os.listdir('example_hierarchy/foo/bar'))
['.zarray', '.zattrs', '0.0', '0.1', '1.0', '1.1']

Here is the same example using a Zip file as storage:

>>> store = zarr.ZipStore('example_hierarchy.zip', mode='w')
>>> root_grp = zarr.group(store)
>>> sub_grp = root_grp.create_group('foo')
>>> a = sub_grp.create_dataset('bar', shape=(20, 20), chunks=(10, 10))
>>> a[:] = 42
>>> store.close()

What has been stored:

>>> import zipfile
>>> zf = zipfile.ZipFile('example_hierarchy.zip', mode='r')
>>> for name in sorted(zf.namelist()):
...     print(name)
.zattrs
.zgroup
foo/.zattrs
foo/.zgroup
foo/bar/.zarray
foo/bar/.zattrs
foo/bar/0.0
foo/bar/0.1
foo/bar/1.0
foo/bar/1.1
Changes
Changes in version 2
  • Added support for storing multiple arrays in the same store and organising arrays into hierarchies using groups.
  • Array metadata is now stored under the ”.zarray” key instead of the “meta” key.
  • Custom attributes are now stored under the ”.zattrs” key instead of the “attrs” key.
  • Added support for filters.
  • Changed encoding of “fill_value” field within array metadata.
  • Changed encoding of compressor information within array metadata to be consistent with representation of filter information.

Release notes

2.1.2

Resolved an issue when no compression is used and chunks are stored in memory (#79).

2.1.1

Various minor improvements, including: Group objects support member access via dot notation (__getattr__); fixed metadata caching for Array.shape property and derivatives; added Array.ndim property; fixed Array.__array__ method arguments; fixed bug in pickling Array state; fixed bug in pickling ThreadSynchronizer.

2.1.0

  • Group objects now support member deletion via del statement (#65).
  • Added zarr.storage.TempStore class for convenience to provide storage via a temporary directory (#59).
  • Fixed performance issues with zarr.storage.ZipStore class (#66).
  • The Blosc extension has been modified to return bytes instead of array objects from compress and decompress function calls. This should improve compatibility and also provides a small performance increase for compressing high compression ratio data (#55).
  • Added overwrite keyword argument to array and group creation methods on the zarr.hierarchy.Group class (#71).
  • Added cache_metadata keyword argument to array creation methods.
  • The functions zarr.creation.open_array() and zarr.hierarchy.open_group() now accept any store as first argument (#56).

2.0.1

The bundled Blosc library has been upgraded to version 1.11.1.

2.0.0

Hierarchies

Support has been added for organizing arrays into hierarchies via groups. See the tutorial section on Groups and the zarr.hierarchy API docs for more information.

Filters

Support has been added for configuring filters to preprocess chunk data prior to compression. See the tutorial section on Filters and the zarr.codecs API docs for more information.

Other changes

To accommodate support for hierarchies and filters, the Zarr metadata format has been modified. See the Zarr storage specification version 2 for more information. To migrate an array stored using Zarr version 1.x, use the zarr.storage.migrate_1to2() function.

The bundled Blosc library has been upgraded to version 1.11.0.

Acknowledgments

Thanks to Matthew Rocklin (mrocklin), Stephan Hoyer (shoyer) and Francesc Alted (FrancescAlted) for contributions and comments.

1.1.0

  • The bundled Blosc library has been upgraded to version 1.10.0. The ‘zstd’ internal compression library is now available within Blosc. See the tutorial section on Compressors for an example.
  • When using the Blosc compressor, the default internal compression library is now ‘lz4’.
  • The default number of internal threads for the Blosc compressor has been increased to a maximum of 8 (previously 4).
  • Added convenience functions zarr.blosc.list_compressors() and zarr.blosc.get_nthreads().

1.0.0

This release includes a complete re-organization of the code base. The major version number has been bumped to indicate that there have been backwards-incompatible changes to the API and the on-disk storage format. However, Zarr is still in an early stage of development, so please do not take the version number as an indicator of maturity.

Storage

The main motivation for re-organizing the code was to create an abstraction layer between the core array logic and data storage (#21). In this release, any object that implements the MutableMapping interface can be used as an array store. See the tutorial sections on Persistent arrays and Storage alternatives, the Zarr storage specification version 1, and the zarr.storage module documentation for more information.

Please note also that the file organization and file name conventions used when storing a Zarr array in a directory on the file system have changed. Persistent Zarr arrays created using previous versions of the software will not be compatible with this version. See the zarr.storage API docs and the Zarr storage specification version 1 for more information.

Compression

An abstraction layer has also been created between the core array logic and the code for compressing and decompressing array chunks. This release still bundles the c-blosc library and uses Blosc as the default compressor, however other compressors including zlib, BZ2 and LZMA are also now supported via the Python standard library. New compressors can also be dynamically registered for use with Zarr. See the tutorial sections on Compressors and Configuring Blosc, the Zarr storage specification version 1, and the zarr.compressors module documentation for more information.

Synchronization

The synchronization code has also been refactored to create a layer of abstraction, enabling Zarr arrays to be used in parallel computations with a number of alternative synchronization methods. For more information see the tutorial section on Parallel computing and synchronization and the zarr.sync module documentation.

Changes to the Blosc extension

NumPy is no longer a build dependency for the zarr.blosc Cython extension, so setup.py will run even if NumPy is not already installed, and should automatically install NumPy as a runtime dependency. Manual installation of NumPy prior to installing Zarr is still recommended, however, as the automatic installation of NumPy may fail or be sub-optimal on some platforms.

Some optimizations have been made within the zarr.blosc extension to avoid unnecessary memory copies, giving a ~10-20% performance improvement for multi-threaded compression operations.

The zarr.blosc extension now automatically detects whether it is running within a single-threaded or multi-threaded program and adapts its internal behaviour accordingly (#27). There is no need for the user to make any API calls to switch Blosc between contextual and non-contextual (global lock) mode. See also the tutorial section on Configuring Blosc.

Other changes

The internal code for managing chunks has been rewritten to be more efficient. Now no state is maintained for chunks outside of the array store, meaning that chunks do not carry any extra memory overhead not accounted for by the store. This negates the need for the “lazy” option present in the previous release, and this has been removed.

The memory layout within chunks can now be set as either “C” (row-major) or “F” (column-major), which can help to provide better compression for some data (#7). See the tutorial section on Changing memory layout for more information.

A bug has been fixed within the __getitem__ and __setitem__ machinery for slicing arrays, to properly handle getting and setting partial slices.

Acknowledgments

Thanks to Matthew Rocklin (mrocklin), Stephan Hoyer (shoyer), Francesc Alted (FrancescAlted), Anthony Scopatz (scopatz) and Martin Durant (martindurant) for contributions and comments.

Acknowledgments

Zarr bundles the c-blosc library and uses it as the default compressor.

Zarr is inspired by HDF5, h5py and bcolz.

Development of this package is supported by the MRC Centre for Genomics and Global Health.

Indices and tables