Convenience functions (zarr.convenience)

Convenience functions for storing and loading data.

zarr.convenience.open(store: Optional[Union[zarr._storage.store.BaseStore, collections.abc.MutableMapping, str]] = None, mode: str = 'a', **kwargs)[source]

Convenience function to open a group or array using file-mode-like semantics.

Parameters
storeStore or string, optional

Store or path to directory in file system or name of zip file.

mode{‘r’, ‘r+’, ‘a’, ‘w’, ‘w-‘}, optional

Persistence mode: ‘r’ means read only (must exist); ‘r+’ means read/write (must exist); ‘a’ means read/write (create if doesn’t exist); ‘w’ means create (overwrite if exists); ‘w-’ means create (fail if exists).

**kwargs

Additional parameters are passed through to zarr.creation.open_array() or zarr.hierarchy.open_group().

Returns
zzarr.core.Array or zarr.hierarchy.Group

Array or group, depending on what exists in the given store.

Examples

Storing data in a directory ‘data/example.zarr’ on the local file system:

>>> import zarr
>>> store = 'data/example.zarr'
>>> zw = zarr.open(store, mode='w', shape=100, dtype='i4')  # open new array
>>> zw
<zarr.core.Array (100,) int32>
>>> za = zarr.open(store, mode='a')  # open existing array for reading and writing
>>> za
<zarr.core.Array (100,) int32>
>>> zr = zarr.open(store, mode='r')  # open existing array read-only
>>> zr
<zarr.core.Array (100,) int32 read-only>
>>> gw = zarr.open(store, mode='w')  # open new group, overwriting previous data
>>> gw
<zarr.hierarchy.Group '/'>
>>> ga = zarr.open(store, mode='a')  # open existing group for reading and writing
>>> ga
<zarr.hierarchy.Group '/'>
>>> gr = zarr.open(store, mode='r')  # open existing group read-only
>>> gr
<zarr.hierarchy.Group '/' read-only>
zarr.convenience.save(store: Optional[Union[zarr._storage.store.BaseStore, collections.abc.MutableMapping, str]], *args, **kwargs)[source]

Convenience function to save an array or group of arrays to the local file system.

Parameters
storeMutableMapping or string

Store or path to directory in file system or name of zip file.

argsndarray

NumPy arrays with data to save.

kwargs

NumPy arrays with data to save.

Examples

Save an array to a directory on the file system (uses a DirectoryStore):

>>> import zarr
>>> import numpy as np
>>> arr = np.arange(10000)
>>> zarr.save('data/example.zarr', arr)
>>> zarr.load('data/example.zarr')
array([   0,    1,    2, ..., 9997, 9998, 9999])

Save an array to a Zip file (uses a ZipStore):

>>> zarr.save('data/example.zip', arr)
>>> zarr.load('data/example.zip')
array([   0,    1,    2, ..., 9997, 9998, 9999])

Save several arrays to a directory on the file system (uses a DirectoryStore and stores arrays in a group):

>>> import zarr
>>> import numpy as np
>>> a1 = np.arange(10000)
>>> a2 = np.arange(10000, 0, -1)
>>> zarr.save('data/example.zarr', a1, a2)
>>> loader = zarr.load('data/example.zarr')
>>> loader
<LazyLoader: arr_0, arr_1>
>>> loader['arr_0']
array([   0,    1,    2, ..., 9997, 9998, 9999])
>>> loader['arr_1']
array([10000,  9999,  9998, ...,     3,     2,     1])

Save several arrays using named keyword arguments:

>>> zarr.save('data/example.zarr', foo=a1, bar=a2)
>>> loader = zarr.load('data/example.zarr')
>>> loader
<LazyLoader: bar, foo>
>>> loader['foo']
array([   0,    1,    2, ..., 9997, 9998, 9999])
>>> loader['bar']
array([10000,  9999,  9998, ...,     3,     2,     1])

Store several arrays in a single zip file (uses a ZipStore):

>>> zarr.save('data/example.zip', foo=a1, bar=a2)
>>> loader = zarr.load('data/example.zip')
>>> loader
<LazyLoader: bar, foo>
>>> loader['foo']
array([   0,    1,    2, ..., 9997, 9998, 9999])
>>> loader['bar']
array([10000,  9999,  9998, ...,     3,     2,     1])
zarr.convenience.load(store: Optional[Union[zarr._storage.store.BaseStore, collections.abc.MutableMapping, str]])[source]

Load data from an array or group into memory.

Parameters
storeMutableMapping or string

Store or path to directory in file system or name of zip file.

Returns
out

If the store contains an array, out will be a numpy array. If the store contains a group, out will be a dict-like object where keys are array names and values are numpy arrays.

See also

save, savez

Notes

If loading data from a group of arrays, data will not be immediately loaded into memory. Rather, arrays will be loaded into memory as they are requested.

zarr.convenience.save_array(store: Optional[Union[zarr._storage.store.BaseStore, collections.abc.MutableMapping, str]], arr, **kwargs)[source]

Convenience function to save a NumPy array to the local file system, following a similar API to the NumPy save() function.

Parameters
storeMutableMapping or string

Store or path to directory in file system or name of zip file.

arrndarray

NumPy array with data to save.

kwargs

Passed through to create(), e.g., compressor.

Examples

Save an array to a directory on the file system (uses a DirectoryStore):

>>> import zarr
>>> import numpy as np
>>> arr = np.arange(10000)
>>> zarr.save_array('data/example.zarr', arr)
>>> zarr.load('data/example.zarr')
array([   0,    1,    2, ..., 9997, 9998, 9999])

Save an array to a single file (uses a ZipStore):

>>> zarr.save_array('data/example.zip', arr)
>>> zarr.load('data/example.zip')
array([   0,    1,    2, ..., 9997, 9998, 9999])
zarr.convenience.save_group(store: Optional[Union[zarr._storage.store.BaseStore, collections.abc.MutableMapping, str]], *args, **kwargs)[source]

Convenience function to save several NumPy arrays to the local file system, following a similar API to the NumPy savez()/savez_compressed() functions.

Parameters
storeMutableMapping or string

Store or path to directory in file system or name of zip file.

argsndarray

NumPy arrays with data to save.

kwargs

NumPy arrays with data to save.

Notes

Default compression options will be used.

Examples

Save several arrays to a directory on the file system (uses a DirectoryStore):

>>> import zarr
>>> import numpy as np
>>> a1 = np.arange(10000)
>>> a2 = np.arange(10000, 0, -1)
>>> zarr.save_group('data/example.zarr', a1, a2)
>>> loader = zarr.load('data/example.zarr')
>>> loader
<LazyLoader: arr_0, arr_1>
>>> loader['arr_0']
array([   0,    1,    2, ..., 9997, 9998, 9999])
>>> loader['arr_1']
array([10000,  9999,  9998, ...,     3,     2,     1])

Save several arrays using named keyword arguments:

>>> zarr.save_group('data/example.zarr', foo=a1, bar=a2)
>>> loader = zarr.load('data/example.zarr')
>>> loader
<LazyLoader: bar, foo>
>>> loader['foo']
array([   0,    1,    2, ..., 9997, 9998, 9999])
>>> loader['bar']
array([10000,  9999,  9998, ...,     3,     2,     1])

Store several arrays in a single zip file (uses a ZipStore):

>>> zarr.save_group('data/example.zip', foo=a1, bar=a2)
>>> loader = zarr.load('data/example.zip')
>>> loader
<LazyLoader: bar, foo>
>>> loader['foo']
array([   0,    1,    2, ..., 9997, 9998, 9999])
>>> loader['bar']
array([10000,  9999,  9998, ...,     3,     2,     1])
zarr.convenience.copy(source, dest, name=None, shallow=False, without_attrs=False, log=None, if_exists='raise', dry_run=False, **create_kws)[source]

Copy the source array or group into the dest group.

Parameters
sourcegroup or array/dataset

A zarr group or array, or an h5py group or dataset.

destgroup

A zarr or h5py group.

namestr, optional

Name to copy the object to.

shallowbool, optional

If True, only copy immediate children of source.

without_attrsbool, optional

Do not copy user attributes.

logcallable, file path or file-like object, optional

If provided, will be used to log progress information.

if_exists{‘raise’, ‘replace’, ‘skip’, ‘skip_initialized’}, optional

How to handle arrays that already exist in the destination group. If ‘raise’ then a CopyError is raised on the first array already present in the destination group. If ‘replace’ then any array will be replaced in the destination. If ‘skip’ then any existing arrays will not be copied. If ‘skip_initialized’ then any existing arrays with all chunks initialized will not be copied (not available when copying to h5py).

dry_runbool, optional

If True, don’t actually copy anything, just log what would have happened.

**create_kws

Passed through to the create_dataset method when copying an array/dataset.

Returns
n_copiedint

Number of items copied.

n_skippedint

Number of items skipped.

n_bytes_copiedint

Number of bytes of data that were actually copied.

Notes

Please note that this is an experimental feature. The behaviour of this function is still evolving and the default behaviour and/or parameters may change in future versions.

Examples

Here’s an example of copying a group named ‘foo’ from an HDF5 file to a Zarr group:

>>> import h5py
>>> import zarr
>>> import numpy as np
>>> source = h5py.File('data/example.h5', mode='w')
>>> foo = source.create_group('foo')
>>> baz = foo.create_dataset('bar/baz', data=np.arange(100), chunks=(50,))
>>> spam = source.create_dataset('spam', data=np.arange(100, 200), chunks=(30,))
>>> zarr.tree(source)
/
 ├── foo
 │   └── bar
 │       └── baz (100,) int64
 └── spam (100,) int64
>>> dest = zarr.group()
>>> from sys import stdout
>>> zarr.copy(source['foo'], dest, log=stdout)
copy /foo
copy /foo/bar
copy /foo/bar/baz (100,) int64
all done: 3 copied, 0 skipped, 800 bytes copied
(3, 0, 800)
>>> dest.tree()  # N.B., no spam
/
 └── foo
     └── bar
         └── baz (100,) int64
>>> source.close()

The if_exists parameter provides options for how to handle pre-existing data in the destination. Here are some examples of these options, also using dry_run=True to find out what would happen without actually copying anything:

>>> source = zarr.group()
>>> dest = zarr.group()
>>> baz = source.create_dataset('foo/bar/baz', data=np.arange(100))
>>> spam = source.create_dataset('foo/spam', data=np.arange(1000))
>>> existing_spam = dest.create_dataset('foo/spam', data=np.arange(1000))
>>> from sys import stdout
>>> try:
...     zarr.copy(source['foo'], dest, log=stdout, dry_run=True)
... except zarr.CopyError as e:
...     print(e)
...
copy /foo
copy /foo/bar
copy /foo/bar/baz (100,) int64
an object 'spam' already exists in destination '/foo'
>>> zarr.copy(source['foo'], dest, log=stdout, if_exists='replace', dry_run=True)
copy /foo
copy /foo/bar
copy /foo/bar/baz (100,) int64
copy /foo/spam (1000,) int64
dry run: 4 copied, 0 skipped
(4, 0, 0)
>>> zarr.copy(source['foo'], dest, log=stdout, if_exists='skip', dry_run=True)
copy /foo
copy /foo/bar
copy /foo/bar/baz (100,) int64
skip /foo/spam (1000,) int64
dry run: 3 copied, 1 skipped
(3, 1, 0)
zarr.convenience.copy_all(source, dest, shallow=False, without_attrs=False, log=None, if_exists='raise', dry_run=False, **create_kws)[source]

Copy all children of the source group into the dest group.

Parameters
sourcegroup or array/dataset

A zarr group or array, or an h5py group or dataset.

destgroup

A zarr or h5py group.

shallowbool, optional

If True, only copy immediate children of source.

without_attrsbool, optional

Do not copy user attributes.

logcallable, file path or file-like object, optional

If provided, will be used to log progress information.

if_exists{‘raise’, ‘replace’, ‘skip’, ‘skip_initialized’}, optional

How to handle arrays that already exist in the destination group. If ‘raise’ then a CopyError is raised on the first array already present in the destination group. If ‘replace’ then any array will be replaced in the destination. If ‘skip’ then any existing arrays will not be copied. If ‘skip_initialized’ then any existing arrays with all chunks initialized will not be copied (not available when copying to h5py).

dry_runbool, optional

If True, don’t actually copy anything, just log what would have happened.

**create_kws

Passed through to the create_dataset method when copying an array/dataset.

Returns
n_copiedint

Number of items copied.

n_skippedint

Number of items skipped.

n_bytes_copiedint

Number of bytes of data that were actually copied.

Notes

Please note that this is an experimental feature. The behaviour of this function is still evolving and the default behaviour and/or parameters may change in future versions.

Examples

>>> import h5py
>>> import zarr
>>> import numpy as np
>>> source = h5py.File('data/example.h5', mode='w')
>>> foo = source.create_group('foo')
>>> baz = foo.create_dataset('bar/baz', data=np.arange(100), chunks=(50,))
>>> spam = source.create_dataset('spam', data=np.arange(100, 200), chunks=(30,))
>>> zarr.tree(source)
/
 ├── foo
 │   └── bar
 │       └── baz (100,) int64
 └── spam (100,) int64
>>> dest = zarr.group()
>>> import sys
>>> zarr.copy_all(source, dest, log=sys.stdout)
copy /foo
copy /foo/bar
copy /foo/bar/baz (100,) int64
copy /spam (100,) int64
all done: 4 copied, 0 skipped, 1,600 bytes copied
(4, 0, 1600)
>>> dest.tree()
/
 ├── foo
 │   └── bar
 │       └── baz (100,) int64
 └── spam (100,) int64
>>> source.close()
zarr.convenience.copy_store(source, dest, source_path='', dest_path='', excludes=None, includes=None, flags=0, if_exists='raise', dry_run=False, log=None)[source]

Copy data directly from the source store to the dest store. Use this function when you want to copy a group or array in the most efficient way, preserving all configuration and attributes. This function is more efficient than the copy() or copy_all() functions because it avoids de-compressing and re-compressing data, rather the compressed chunk data for each array are copied directly between stores.

Parameters
sourceMapping

Store to copy data from.

destMutableMapping

Store to copy data into.

source_pathstr, optional

Only copy data from under this path in the source store.

dest_pathstr, optional

Copy data into this path in the destination store.

excludessequence of str, optional

One or more regular expressions which will be matched against keys in the source store. Any matching key will not be copied.

includessequence of str, optional

One or more regular expressions which will be matched against keys in the source store and will override any excludes also matching.

flagsint, optional

Regular expression flags used for matching excludes and includes.

if_exists{‘raise’, ‘replace’, ‘skip’}, optional

How to handle keys that already exist in the destination store. If ‘raise’ then a CopyError is raised on the first key already present in the destination store. If ‘replace’ then any data will be replaced in the destination. If ‘skip’ then any existing keys will not be copied.

dry_runbool, optional

If True, don’t actually copy anything, just log what would have happened.

logcallable, file path or file-like object, optional

If provided, will be used to log progress information.

Returns
n_copiedint

Number of items copied.

n_skippedint

Number of items skipped.

n_bytes_copiedint

Number of bytes of data that were actually copied.

Notes

Please note that this is an experimental feature. The behaviour of this function is still evolving and the default behaviour and/or parameters may change in future versions.

Examples

>>> import zarr
>>> store1 = zarr.DirectoryStore('data/example.zarr')
>>> root = zarr.group(store1, overwrite=True)
>>> foo = root.create_group('foo')
>>> bar = foo.create_group('bar')
>>> baz = bar.create_dataset('baz', shape=100, chunks=50, dtype='i8')
>>> import numpy as np
>>> baz[:] = np.arange(100)
>>> root.tree()
/
 └── foo
     └── bar
         └── baz (100,) int64
>>> from sys import stdout
>>> store2 = zarr.ZipStore('data/example.zip', mode='w')
>>> zarr.copy_store(store1, store2, log=stdout)
copy .zgroup
copy foo/.zgroup
copy foo/bar/.zgroup
copy foo/bar/baz/.zarray
copy foo/bar/baz/0
copy foo/bar/baz/1
all done: 6 copied, 0 skipped, 566 bytes copied
(6, 0, 566)
>>> new_root = zarr.group(store2)
>>> new_root.tree()
/
 └── foo
     └── bar
         └── baz (100,) int64
>>> new_root['foo/bar/baz'][:]
array([ 0,  1,  2,  ..., 97, 98, 99])
>>> store2.close()  # zip stores need to be closed
zarr.convenience.tree(grp, expand=False, level=None)[source]

Provide a print-able display of the hierarchy. This function is provided mainly as a convenience for obtaining a tree view of an h5py group - zarr groups have a .tree() method.

Parameters
grpGroup

Zarr or h5py group.

expandbool, optional

Only relevant for HTML representation. If True, tree will be fully expanded.

levelint, optional

Maximum depth to descend into hierarchy.

Notes

Please note that this is an experimental feature. The behaviour of this function is still evolving and the default output and/or parameters may change in future versions.

Examples

>>> import zarr
>>> g1 = zarr.group()
>>> g2 = g1.create_group('foo')
>>> g3 = g1.create_group('bar')
>>> g4 = g3.create_group('baz')
>>> g5 = g3.create_group('qux')
>>> d1 = g5.create_dataset('baz', shape=100, chunks=10)
>>> g1.tree()
/
 ├── bar
 │   ├── baz
 │   └── qux
 │       └── baz (100,) float64
 └── foo
>>> import h5py
>>> h5f = h5py.File('data/example.h5', mode='w')
>>> zarr.copy_all(g1, h5f)
(5, 0, 800)
>>> zarr.tree(h5f)
/
 ├── bar
 │   ├── baz
 │   └── qux
 │       └── baz (100,) float64
 └── foo
zarr.convenience.consolidate_metadata(store: Optional[Union[zarr._storage.store.BaseStore, collections.abc.MutableMapping, str]], metadata_key='.zmetadata')[source]

Consolidate all metadata for groups and arrays within the given store into a single resource and put it under the given key.

This produces a single object in the backend store, containing all the metadata read from all the zarr-related keys that can be found. After metadata have been consolidated, use open_consolidated() to open the root group in optimised, read-only mode, using the consolidated metadata to reduce the number of read operations on the backend store.

Note, that if the metadata in the store is changed after this consolidation, then the metadata read by open_consolidated() would be incorrect unless this function is called again.

Note

This is an experimental feature.

Parameters
storeMutableMapping or string

Store or path to directory in file system or name of zip file.

metadata_keystr

Key to put the consolidated metadata under.

Returns
gzarr.hierarchy.Group

Group instance, opened with the new consolidated metadata.

zarr.convenience.open_consolidated(store: Optional[Union[zarr._storage.store.BaseStore, collections.abc.MutableMapping, str]], metadata_key='.zmetadata', mode='r+', **kwargs)[source]

Open group using metadata previously consolidated into a single key.

This is an optimised method for opening a Zarr group, where instead of traversing the group/array hierarchy by accessing the metadata keys at each level, a single key contains all of the metadata for everything. For remote data sources where the overhead of accessing a key is large compared to the time to read data.

The group accessed must have already had its metadata consolidated into a single key using the function consolidate_metadata().

This optimised method only works in modes which do not change the metadata, although the data may still be written/updated.

Parameters
storeMutableMapping or string

Store or path to directory in file system or name of zip file.

metadata_keystr

Key to read the consolidated metadata from. The default (.zmetadata) corresponds to the default used by consolidate_metadata().

mode{‘r’, ‘r+’}, optional

Persistence mode: ‘r’ means read only (must exist); ‘r+’ means read/write (must exist) although only writes to data are allowed, changes to metadata including creation of new arrays or group are not allowed.

**kwargs

Additional parameters are passed through to zarr.creation.open_array() or zarr.hierarchy.open_group().

Returns
gzarr.hierarchy.Group

Group instance, opened with the consolidated metadata.