Convenience functions (`zarr.convenience`)#

Convenience functions for storing and loading data.

zarr.convenience.open(store: BaseStore | MutableMapping | str | None = None, mode: str = 'a', *, zarr_version=None, path=None, **kwargs)[source]#

Convenience function to open a group or array using file-mode-like semantics.

Parameters:

storeStore or string, optional: Store or path to directory in file system or name of zip file.
mode{‘r’, ‘r+’, ‘a’, ‘w’, ‘w-‘}, optional: Persistence mode: ‘r’ means read only (must exist); ‘r+’ means read/write (must exist); ‘a’ means read/write (create if doesn’t exist); ‘w’ means create (overwrite if exists); ‘w-’ means create (fail if exists).
zarr_version{2, 3, None}, optional: The zarr protocol version to use. The default value of None will attempt to infer the version from store if possible, otherwise it will fall back to 2.
pathstr or None, optional: The path within the store to open.
**kwargs: Additional parameters are passed through to zarr.creation.open_array() or zarr.hierarchy.open_group().

Returns:

zzarr.core.Array or zarr.hierarchy.Group: Array or group, depending on what exists in the given store.

See also

zarr.creation.open_array, zarr.hierarchy.open_group

Examples

Storing data in a directory ‘data/example.zarr’ on the local file system:

>>> import zarr
>>> store = 'data/example.zarr'
>>> zw = zarr.open(store, mode='w', shape=100, dtype='i4')  # open new array
>>> zw
<zarr.core.Array (100,) int32>
>>> za = zarr.open(store, mode='a')  # open existing array for reading and writing
>>> za
<zarr.core.Array (100,) int32>
>>> zr = zarr.open(store, mode='r')  # open existing array read-only
>>> zr
<zarr.core.Array (100,) int32 read-only>
>>> gw = zarr.open(store, mode='w')  # open new group, overwriting previous data
>>> gw
<zarr.hierarchy.Group '/'>
>>> ga = zarr.open(store, mode='a')  # open existing group for reading and writing
>>> ga
<zarr.hierarchy.Group '/'>
>>> gr = zarr.open(store, mode='r')  # open existing group read-only
>>> gr
<zarr.hierarchy.Group '/' read-only>

zarr.convenience.save(store: BaseStore | MutableMapping | str | None, *args, zarr_version=None, path=None, **kwargs)[source]#

Convenience function to save an array or group of arrays to the local file system.

Parameters:

storeMutableMapping or string: Store or path to directory in file system or name of zip file.
argsndarray: NumPy arrays with data to save.
zarr_version{2, 3, None}, optional: The zarr protocol version to use when saving. The default value of None will attempt to infer the version from store if possible, otherwise it will fall back to 2.
pathstr or None, optional: The path within the group where the arrays will be saved.
kwargs: NumPy arrays with data to save.

See also

save_array, save_group

Examples

Save an array to a directory on the file system (uses a DirectoryStore):

>>> import zarr
>>> import numpy as np
>>> arr = np.arange(10000)
>>> zarr.save('data/example.zarr', arr)
>>> zarr.load('data/example.zarr')
array([   0,    1,    2, ..., 9997, 9998, 9999])

Save an array to a Zip file (uses a ZipStore):

>>> zarr.save('data/example.zip', arr)
>>> zarr.load('data/example.zip')
array([   0,    1,    2, ..., 9997, 9998, 9999])

Save several arrays to a directory on the file system (uses a DirectoryStore and stores arrays in a group):

>>> import zarr
>>> import numpy as np
>>> a1 = np.arange(10000)
>>> a2 = np.arange(10000, 0, -1)
>>> zarr.save('data/example.zarr', a1, a2)
>>> loader = zarr.load('data/example.zarr')
>>> loader
<LazyLoader: arr_0, arr_1>
>>> loader['arr_0']
array([   0,    1,    2, ..., 9997, 9998, 9999])
>>> loader['arr_1']
array([10000,  9999,  9998, ...,     3,     2,     1])

Save several arrays using named keyword arguments:

>>> zarr.save('data/example.zarr', foo=a1, bar=a2)
>>> loader = zarr.load('data/example.zarr')
>>> loader
<LazyLoader: bar, foo>
>>> loader['foo']
array([   0,    1,    2, ..., 9997, 9998, 9999])
>>> loader['bar']
array([10000,  9999,  9998, ...,     3,     2,     1])

Store several arrays in a single zip file (uses a ZipStore):

>>> zarr.save('data/example.zip', foo=a1, bar=a2)
>>> loader = zarr.load('data/example.zip')
>>> loader
<LazyLoader: bar, foo>
>>> loader['foo']
array([   0,    1,    2, ..., 9997, 9998, 9999])
>>> loader['bar']
array([10000,  9999,  9998, ...,     3,     2,     1])

zarr.convenience.load(store: BaseStore | MutableMapping | str | None, zarr_version=None, path=None)[source]#

Load data from an array or group into memory.

Parameters:

storeMutableMapping or string: Store or path to directory in file system or name of zip file.
zarr_version{2, 3, None}, optional: The zarr protocol version to use when loading. The default value of None will attempt to infer the version from store if possible, otherwise it will fall back to 2.
pathstr or None, optional: The path within the store from which to load.

Returns:

out: If the store contains an array, out will be a numpy array. If the store contains a group, out will be a dict-like object where keys are array names and values are numpy arrays.

See also

save, savez

Notes

If loading data from a group of arrays, data will not be immediately loaded into memory. Rather, arrays will be loaded into memory as they are requested.

zarr.convenience.save_array(store: BaseStore | MutableMapping | str | None, arr, *, zarr_version=None, path=None, **kwargs)[source]#

Convenience function to save a NumPy array to the local file system, following a similar API to the NumPy save() function.

Parameters:

storeMutableMapping or string: Store or path to directory in file system or name of zip file.
arrndarray: NumPy array with data to save.
zarr_version{2, 3, None}, optional: The zarr protocol version to use when saving. The default value of None will attempt to infer the version from store if possible, otherwise it will fall back to 2.
pathstr or None, optional: The path within the store where the array will be saved.
kwargs: Passed through to create(), e.g., compressor.

Examples

Save an array to a directory on the file system (uses a DirectoryStore):

>>> import zarr
>>> import numpy as np
>>> arr = np.arange(10000)
>>> zarr.save_array('data/example.zarr', arr)
>>> zarr.load('data/example.zarr')
array([   0,    1,    2, ..., 9997, 9998, 9999])

Save an array to a single file (uses a ZipStore):

>>> zarr.save_array('data/example.zip', arr)
>>> zarr.load('data/example.zip')
array([   0,    1,    2, ..., 9997, 9998, 9999])

zarr.convenience.save_group(store: BaseStore | MutableMapping | str | None, *args, zarr_version=None, path=None, **kwargs)[source]#

Convenience function to save several NumPy arrays to the local file system, following a similar API to the NumPy savez()/savez_compressed() functions.

Parameters:

storeMutableMapping or string: Store or path to directory in file system or name of zip file.
argsndarray: NumPy arrays with data to save.
zarr_version{2, 3, None}, optional: The zarr protocol version to use when saving. The default value of None will attempt to infer the version from store if possible, otherwise it will fall back to 2.
pathstr or None, optional: Path within the store where the group will be saved.
kwargs: NumPy arrays with data to save.

Notes

Default compression options will be used.

Examples

Save several arrays to a directory on the file system (uses a DirectoryStore):

>>> import zarr
>>> import numpy as np
>>> a1 = np.arange(10000)
>>> a2 = np.arange(10000, 0, -1)
>>> zarr.save_group('data/example.zarr', a1, a2)
>>> loader = zarr.load('data/example.zarr')
>>> loader
<LazyLoader: arr_0, arr_1>
>>> loader['arr_0']
array([   0,    1,    2, ..., 9997, 9998, 9999])
>>> loader['arr_1']
array([10000,  9999,  9998, ...,     3,     2,     1])

Save several arrays using named keyword arguments:

>>> zarr.save_group('data/example.zarr', foo=a1, bar=a2)
>>> loader = zarr.load('data/example.zarr')
>>> loader
<LazyLoader: bar, foo>
>>> loader['foo']
array([   0,    1,    2, ..., 9997, 9998, 9999])
>>> loader['bar']
array([10000,  9999,  9998, ...,     3,     2,     1])

Store several arrays in a single zip file (uses a ZipStore):

>>> zarr.save_group('data/example.zip', foo=a1, bar=a2)
>>> loader = zarr.load('data/example.zip')
>>> loader
<LazyLoader: bar, foo>
>>> loader['foo']
array([   0,    1,    2, ..., 9997, 9998, 9999])
>>> loader['bar']
array([10000,  9999,  9998, ...,     3,     2,     1])

zarr.convenience.copy(source, dest, name=None, shallow=False, without_attrs=False, log=None, if_exists='raise', dry_run=False, **create_kws)[source]#

Copy the source array or group into the dest group.

Parameters:

sourcegroup or array/dataset: A zarr group or array, or an h5py group or dataset.
destgroup: A zarr or h5py group.
namestr, optional: Name to copy the object to.
shallowbool, optional: If True, only copy immediate children of source.
without_attrsbool, optional: Do not copy user attributes.
logcallable, file path or file-like object, optional: If provided, will be used to log progress information.
if_exists{‘raise’, ‘replace’, ‘skip’, ‘skip_initialized’}, optional: How to handle arrays that already exist in the destination group. If ‘raise’ then a CopyError is raised on the first array already present in the destination group. If ‘replace’ then any array will be replaced in the destination. If ‘skip’ then any existing arrays will not be copied. If ‘skip_initialized’ then any existing arrays with all chunks initialized will not be copied (not available when copying to h5py).
dry_runbool, optional: If True, don’t actually copy anything, just log what would have happened.
**create_kws: Passed through to the create_dataset method when copying an array/dataset.

Returns:

n_copiedint: Number of items copied.
n_skippedint: Number of items skipped.
n_bytes_copiedint: Number of bytes of data that were actually copied.

Notes

Please note that this is an experimental feature. The behaviour of this function is still evolving and the default behaviour and/or parameters may change in future versions.

Examples

Here’s an example of copying a group named ‘foo’ from an HDF5 file to a Zarr group:

>>> import h5py
>>> import zarr
>>> import numpy as np
>>> source = h5py.File('data/example.h5', mode='w')
>>> foo = source.create_group('foo')
>>> baz = foo.create_dataset('bar/baz', data=np.arange(100), chunks=(50,))
>>> spam = source.create_dataset('spam', data=np.arange(100, 200), chunks=(30,))
>>> zarr.tree(source)
/
 ├── foo
 │   └── bar
 │       └── baz (100,) int64
 └── spam (100,) int64
>>> dest = zarr.group()
>>> from sys import stdout
>>> zarr.copy(source['foo'], dest, log=stdout)
copy /foo
copy /foo/bar
copy /foo/bar/baz (100,) int64
all done: 3 copied, 0 skipped, 800 bytes copied
(3, 0, 800)
>>> dest.tree()  # N.B., no spam
/
 └── foo
     └── bar
         └── baz (100,) int64
>>> source.close()

The if_exists parameter provides options for how to handle pre-existing data in the destination. Here are some examples of these options, also using dry_run=True to find out what would happen without actually copying anything:

>>> source = zarr.group()
>>> dest = zarr.group()
>>> baz = source.create_dataset('foo/bar/baz', data=np.arange(100))
>>> spam = source.create_dataset('foo/spam', data=np.arange(1000))
>>> existing_spam = dest.create_dataset('foo/spam', data=np.arange(1000))
>>> from sys import stdout
>>> try:
...     zarr.copy(source['foo'], dest, log=stdout, dry_run=True)
... except zarr.CopyError as e:
...     print(e)
...
copy /foo
copy /foo/bar
copy /foo/bar/baz (100,) int64
an object 'spam' already exists in destination '/foo'
>>> zarr.copy(source['foo'], dest, log=stdout, if_exists='replace', dry_run=True)
copy /foo
copy /foo/bar
copy /foo/bar/baz (100,) int64
copy /foo/spam (1000,) int64
dry run: 4 copied, 0 skipped
(4, 0, 0)
>>> zarr.copy(source['foo'], dest, log=stdout, if_exists='skip', dry_run=True)
copy /foo
copy /foo/bar
copy /foo/bar/baz (100,) int64
skip /foo/spam (1000,) int64
dry run: 3 copied, 1 skipped
(3, 1, 0)

zarr.convenience.copy_all(source, dest, shallow=False, without_attrs=False, log=None, if_exists='raise', dry_run=False, **create_kws)[source]#

Copy all children of the source group into the dest group.

Parameters:

sourcegroup or array/dataset: A zarr group or array, or an h5py group or dataset.
destgroup: A zarr or h5py group.
shallowbool, optional: If True, only copy immediate children of source.
without_attrsbool, optional: Do not copy user attributes.
logcallable, file path or file-like object, optional: If provided, will be used to log progress information.
if_exists{‘raise’, ‘replace’, ‘skip’, ‘skip_initialized’}, optional: How to handle arrays that already exist in the destination group. If ‘raise’ then a CopyError is raised on the first array already present in the destination group. If ‘replace’ then any array will be replaced in the destination. If ‘skip’ then any existing arrays will not be copied. If ‘skip_initialized’ then any existing arrays with all chunks initialized will not be copied (not available when copying to h5py).
dry_runbool, optional: If True, don’t actually copy anything, just log what would have happened.
**create_kws: Passed through to the create_dataset method when copying an array/dataset.

Returns:

n_copiedint: Number of items copied.
n_skippedint: Number of items skipped.
n_bytes_copiedint: Number of bytes of data that were actually copied.

Notes

Please note that this is an experimental feature. The behaviour of this function is still evolving and the default behaviour and/or parameters may change in future versions.

Examples

>>> import h5py
>>> import zarr
>>> import numpy as np
>>> source = h5py.File('data/example.h5', mode='w')
>>> foo = source.create_group('foo')
>>> baz = foo.create_dataset('bar/baz', data=np.arange(100), chunks=(50,))
>>> spam = source.create_dataset('spam', data=np.arange(100, 200), chunks=(30,))
>>> zarr.tree(source)
/
 ├── foo
 │   └── bar
 │       └── baz (100,) int64
 └── spam (100,) int64
>>> dest = zarr.group()
>>> import sys
>>> zarr.copy_all(source, dest, log=sys.stdout)
copy /foo
copy /foo/bar
copy /foo/bar/baz (100,) int64
copy /spam (100,) int64
all done: 4 copied, 0 skipped, 1,600 bytes copied
(4, 0, 1600)
>>> dest.tree()
/
 ├── foo
 │   └── bar
 │       └── baz (100,) int64
 └── spam (100,) int64
>>> source.close()

zarr.convenience.copy_store(source, dest, source_path='', dest_path='', excludes=None, includes=None, flags=0, if_exists='raise', dry_run=False, log=None)[source]#

Copy data directly from the source store to the dest store. Use this function when you want to copy a group or array in the most efficient way, preserving all configuration and attributes. This function is more efficient than the copy() or copy_all() functions because it avoids de-compressing and re-compressing data, rather the compressed chunk data for each array are copied directly between stores.

Parameters:

sourceMapping: Store to copy data from.
destMutableMapping: Store to copy data into.
source_pathstr, optional: Only copy data from under this path in the source store.
dest_pathstr, optional: Copy data into this path in the destination store.
excludessequence of str, optional: One or more regular expressions which will be matched against keys in the source store. Any matching key will not be copied.
includessequence of str, optional: One or more regular expressions which will be matched against keys in the source store and will override any excludes also matching.
flagsint, optional: Regular expression flags used for matching excludes and includes.
if_exists{‘raise’, ‘replace’, ‘skip’}, optional: How to handle keys that already exist in the destination store. If ‘raise’ then a CopyError is raised on the first key already present in the destination store. If ‘replace’ then any data will be replaced in the destination. If ‘skip’ then any existing keys will not be copied.
dry_runbool, optional: If True, don’t actually copy anything, just log what would have happened.
logcallable, file path or file-like object, optional: If provided, will be used to log progress information.

Returns:

n_copiedint: Number of items copied.
n_skippedint: Number of items skipped.
n_bytes_copiedint: Number of bytes of data that were actually copied.

Notes

Please note that this is an experimental feature. The behaviour of this function is still evolving and the default behaviour and/or parameters may change in future versions.

Examples

>>> import zarr
>>> store1 = zarr.DirectoryStore('data/example.zarr')
>>> root = zarr.group(store1, overwrite=True)
>>> foo = root.create_group('foo')
>>> bar = foo.create_group('bar')
>>> baz = bar.create_dataset('baz', shape=100, chunks=50, dtype='i8')
>>> import numpy as np
>>> baz[:] = np.arange(100)
>>> root.tree()
/
 └── foo
     └── bar
         └── baz (100,) int64
>>> from sys import stdout
>>> store2 = zarr.ZipStore('data/example.zip', mode='w')
>>> zarr.copy_store(store1, store2, log=stdout)
copy .zgroup
copy foo/.zgroup
copy foo/bar/.zgroup
copy foo/bar/baz/.zarray
copy foo/bar/baz/0
copy foo/bar/baz/1
all done: 6 copied, 0 skipped, 566 bytes copied
(6, 0, 566)
>>> new_root = zarr.group(store2)
>>> new_root.tree()
/
 └── foo
     └── bar
         └── baz (100,) int64
>>> new_root['foo/bar/baz'][:]
array([ 0,  1,  2,  ..., 97, 98, 99])
>>> store2.close()  # zip stores need to be closed

zarr.convenience.tree(grp, expand=False, level=None)[source]#

Provide a print-able display of the hierarchy. This function is provided mainly as a convenience for obtaining a tree view of an h5py group - zarr groups have a .tree() method.

Parameters:

grpGroup: Zarr or h5py group.
expandbool, optional: Only relevant for HTML representation. If True, tree will be fully expanded.
levelint, optional: Maximum depth to descend into hierarchy.

See also

zarr.hierarchy.Group.tree

Notes

Please note that this is an experimental feature. The behaviour of this function is still evolving and the default output and/or parameters may change in future versions.

Examples

>>> import zarr
>>> g1 = zarr.group()
>>> g2 = g1.create_group('foo')
>>> g3 = g1.create_group('bar')
>>> g4 = g3.create_group('baz')
>>> g5 = g3.create_group('qux')
>>> d1 = g5.create_dataset('baz', shape=100, chunks=10)
>>> g1.tree()
/
 ├── bar
 │   ├── baz
 │   └── qux
 │       └── baz (100,) float64
 └── foo
>>> import h5py
>>> h5f = h5py.File('data/example.h5', mode='w')
>>> zarr.copy_all(g1, h5f)
(5, 0, 800)
>>> zarr.tree(h5f)
/
 ├── bar
 │   ├── baz
 │   └── qux
 │       └── baz (100,) float64
 └── foo

zarr.convenience.consolidate_metadata(store: BaseStore, metadata_key='.zmetadata', *, path='')[source]#

Consolidate all metadata for groups and arrays within the given store into a single resource and put it under the given key.

This produces a single object in the backend store, containing all the metadata read from all the zarr-related keys that can be found. After metadata have been consolidated, use open_consolidated() to open the root group in optimised, read-only mode, using the consolidated metadata to reduce the number of read operations on the backend store.

Note, that if the metadata in the store is changed after this consolidation, then the metadata read by open_consolidated() would be incorrect unless this function is called again.

Note

This is an experimental feature.

Parameters:

storeMutableMapping or string: Store or path to directory in file system or name of zip file.
metadata_keystr: Key to put the consolidated metadata under.
pathstr or None: Path corresponding to the group that is being consolidated. Not required for zarr v2 stores.

Returns:

gzarr.hierarchy.Group: Group instance, opened with the new consolidated metadata.

See also

open_consolidated

zarr.convenience.open_consolidated(store: BaseStore | MutableMapping | str | None, metadata_key='.zmetadata', mode='r+', **kwargs)[source]#

Open group using metadata previously consolidated into a single key.

This is an optimised method for opening a Zarr group, where instead of traversing the group/array hierarchy by accessing the metadata keys at each level, a single key contains all of the metadata for everything. For remote data sources where the overhead of accessing a key is large compared to the time to read data.

The group accessed must have already had its metadata consolidated into a single key using the function consolidate_metadata().

This optimised method only works in modes which do not change the metadata, although the data may still be written/updated.

Parameters:

storeMutableMapping or string: Store or path to directory in file system or name of zip file.
metadata_keystr: Key to read the consolidated metadata from. The default (.zmetadata) corresponds to the default used by consolidate_metadata().
mode{‘r’, ‘r+’}, optional: Persistence mode: ‘r’ means read only (must exist); ‘r+’ means read/write (must exist) although only writes to data are allowed, changes to metadata including creation of new arrays or group are not allowed.
**kwargs: Additional parameters are passed through to zarr.creation.open_array() or zarr.hierarchy.open_group().

Returns:

gzarr.hierarchy.Group: Group instance, opened with the consolidated metadata.

See also

consolidate_metadata

Convenience functions (zarr.convenience)#

Convenience functions (`zarr.convenience`)#