Convenience functions (zarr.convenience)

Convenience functions for storing and loading data.

zarr.convenience.open(store, mode='a', **kwargs)

Convenience function to open a group or array using file-mode-like semantics.

Parameters:

store : MutableMapping or string

Store or path to directory in file system or name of zip file.

mode : {‘r’, ‘r+’, ‘a’, ‘w’, ‘w-‘}, optional

Persistence mode: ‘r’ means read only (must exist); ‘r+’ means read/write (must exist); ‘a’ means read/write (create if doesn’t exist); ‘w’ means create (overwrite if exists); ‘w-‘ means create (fail if exists).

**kwargs

Additional parameters are passed through to zarr.open_array() or zarr.open_group().

See also

zarr.open_array, zarr.open_group

Examples

Storing data in a directory ‘data/example.zarr’ on the local file system:

>>> import zarr
>>> store = 'data/example.zarr'
>>> zw = zarr.open(store, mode='w', shape=100, dtype='i4')  # open new array
>>> zw
<zarr.core.Array (100,) int32>
>>> za = zarr.open(store, mode='a')  # open existing array for reading and writing
>>> za
<zarr.core.Array (100,) int32>
>>> zr = zarr.open(store, mode='r')  # open existing array read-only
>>> zr
<zarr.core.Array (100,) int32 read-only>
>>> gw = zarr.open(store, mode='w')  # open new group, overwriting previous data
>>> gw
<zarr.hierarchy.Group '/'>
>>> ga = zarr.open(store, mode='a')  # open existing group for reading and writing
>>> ga
<zarr.hierarchy.Group '/'>
>>> gr = zarr.open(store, mode='r')  # open existing group read-only
>>> gr
<zarr.hierarchy.Group '/' read-only>
zarr.convenience.save(store, *args, **kwargs)

Convenience function to save an array or group of arrays to the local file system.

Parameters:

store : MutableMapping or string

Store or path to directory in file system or name of zip file.

args : ndarray

NumPy arrays with data to save.

kwargs

NumPy arrays with data to save.

Examples

Save an array to a directory on the file system (uses a DirectoryStore):

>>> import zarr
>>> import numpy as np
>>> arr = np.arange(10000)
>>> zarr.save('data/example.zarr', arr)
>>> zarr.load('data/example.zarr')
array([   0,    1,    2, ..., 9997, 9998, 9999])

Save an array to a Zip file (uses a ZipStore):

>>> zarr.save('data/example.zip', arr)
>>> zarr.load('data/example.zip')
array([   0,    1,    2, ..., 9997, 9998, 9999])

Save several arrays to a directory on the file system (uses a DirectoryStore and stores arrays in a group):

>>> import zarr
>>> import numpy as np
>>> a1 = np.arange(10000)
>>> a2 = np.arange(10000, 0, -1)
>>> zarr.save('data/example.zarr', a1, a2)
>>> loader = zarr.load('data/example.zarr')
>>> loader
<LazyLoader: arr_0, arr_1>
>>> loader['arr_0']
array([   0,    1,    2, ..., 9997, 9998, 9999])
>>> loader['arr_1']
array([10000,  9999,  9998, ...,     3,     2,     1])

Save several arrays using named keyword arguments:

>>> zarr.save('data/example.zarr', foo=a1, bar=a2)
>>> loader = zarr.load('data/example.zarr')
>>> loader
<LazyLoader: bar, foo>
>>> loader['foo']
array([   0,    1,    2, ..., 9997, 9998, 9999])
>>> loader['bar']
array([10000,  9999,  9998, ...,     3,     2,     1])

Store several arrays in a single zip file (uses a ZipStore):

>>> zarr.save('data/example.zip', foo=a1, bar=a2)
>>> loader = zarr.load('data/example.zip')
>>> loader
<LazyLoader: bar, foo>
>>> loader['foo']
array([   0,    1,    2, ..., 9997, 9998, 9999])
>>> loader['bar']
array([10000,  9999,  9998, ...,     3,     2,     1])
zarr.convenience.load(store)

Load data from an array or group into memory.

Parameters:

store : MutableMapping or string

Store or path to directory in file system or name of zip file.

Returns:

out

If the store contains an array, out will be a numpy array. If the store contains a group, out will be a dict-like object where keys are array names and values are numpy arrays.

See also

save, savez

Notes

If loading data from a group of arrays, data will not be immediately loaded into memory. Rather, arrays will be loaded into memory as they are requested.

zarr.convenience.save_array(store, arr, **kwargs)

Convenience function to save a NumPy array to the local file system, following a similar API to the NumPy save() function.

Parameters:

store : MutableMapping or string

Store or path to directory in file system or name of zip file.

arr : ndarray

NumPy array with data to save.

kwargs

Passed through to create(), e.g., compressor.

Examples

Save an array to a directory on the file system (uses a DirectoryStore):

>>> import zarr
>>> import numpy as np
>>> arr = np.arange(10000)
>>> zarr.save_array('data/example.zarr', arr)
>>> zarr.load('data/example.zarr')
array([   0,    1,    2, ..., 9997, 9998, 9999])

Save an array to a single file (uses a ZipStore):

>>> zarr.save_array('data/example.zip', arr)
>>> zarr.load('data/example.zip')
array([   0,    1,    2, ..., 9997, 9998, 9999])
zarr.convenience.save_group(store, *args, **kwargs)

Convenience function to save several NumPy arrays to the local file system, following a similar API to the NumPy savez()/savez_compressed() functions.

Parameters:

store : MutableMapping or string

Store or path to directory in file system or name of zip file.

args : ndarray

NumPy arrays with data to save.

kwargs

NumPy arrays with data to save.

Notes

Default compression options will be used.

Examples

Save several arrays to a directory on the file system (uses a DirectoryStore):

>>> import zarr
>>> import numpy as np
>>> a1 = np.arange(10000)
>>> a2 = np.arange(10000, 0, -1)
>>> zarr.save_group('data/example.zarr', a1, a2)
>>> loader = zarr.load('data/example.zarr')
>>> loader
<LazyLoader: arr_0, arr_1>
>>> loader['arr_0']
array([   0,    1,    2, ..., 9997, 9998, 9999])
>>> loader['arr_1']
array([10000,  9999,  9998, ...,     3,     2,     1])

Save several arrays using named keyword arguments:

>>> zarr.save_group('data/example.zarr', foo=a1, bar=a2)
>>> loader = zarr.load('data/example.zarr')
>>> loader
<LazyLoader: bar, foo>
>>> loader['foo']
array([   0,    1,    2, ..., 9997, 9998, 9999])
>>> loader['bar']
array([10000,  9999,  9998, ...,     3,     2,     1])

Store several arrays in a single zip file (uses a ZipStore):

>>> zarr.save_group('data/example.zip', foo=a1, bar=a2)
>>> loader = zarr.load('data/example.zip')
>>> loader
<LazyLoader: bar, foo>
>>> loader['foo']
array([   0,    1,    2, ..., 9997, 9998, 9999])
>>> loader['bar']
array([10000,  9999,  9998, ...,     3,     2,     1])
zarr.convenience.copy(source, dest, name=None, shallow=False, without_attrs=False, log=None, if_exists='raise', dry_run=False, **create_kws)

Copy the source array or group into the dest group.

Parameters:

source : group or array/dataset

A zarr group or array, or an h5py group or dataset.

dest : group

A zarr or h5py group.

name : str, optional

Name to copy the object to.

shallow : bool, optional

If True, only copy immediate children of source.

without_attrs : bool, optional

Do not copy user attributes.

log : callable, file path or file-like object, optional

If provided, will be used to log progress information.

if_exists : {‘raise’, ‘replace’, ‘skip’, ‘skip_initialized’}, optional

How to handle arrays that already exist in the destination group. If ‘raise’ then a CopyError is raised on the first array already present in the destination group. If ‘replace’ then any array will be replaced in the destination. If ‘skip’ then any existing arrays will not be copied. If ‘skip_initialized’ then any existing arrays with all chunks initialized will not be copied (not available when copying to h5py).

dry_run : bool, optional

If True, don’t actually copy anything, just log what would have happened.

**create_kws

Passed through to the create_dataset method when copying an array/dataset.

Returns:

n_copied : int

Number of items copied.

n_skipped : int

Number of items skipped.

n_bytes_copied : int

Number of bytes of data that were actually copied.

Notes

Please note that this is an experimental feature. The behaviour of this function is still evolving and the default behaviour and/or parameters may change in future versions.

Examples

Here’s an example of copying a group named ‘foo’ from an HDF5 file to a Zarr group:

>>> import h5py
>>> import zarr
>>> import numpy as np
>>> source = h5py.File('data/example.h5', mode='w')
>>> foo = source.create_group('foo')
>>> baz = foo.create_dataset('bar/baz', data=np.arange(100), chunks=(50,))
>>> spam = source.create_dataset('spam', data=np.arange(100, 200), chunks=(30,))
>>> zarr.tree(source)
/
 ├── foo
 │   └── bar
 │       └── baz (100,) int64
 └── spam (100,) int64
>>> dest = zarr.group()
>>> from sys import stdout
>>> zarr.copy(source['foo'], dest, log=stdout)
copy /foo
copy /foo/bar
copy /foo/bar/baz (100,) int64
all done: 3 copied, 0 skipped, 800 bytes copied
(3, 0, 800)
>>> dest.tree()  # N.B., no spam
/
 └── foo
     └── bar
         └── baz (100,) int64
>>> source.close()

The if_exists parameter provides options for how to handle pre-existing data in the destination. Here are some examples of these options, also using dry_run=True to find out what would happen without actually copying anything:

>>> source = zarr.group()
>>> dest = zarr.group()
>>> baz = source.create_dataset('foo/bar/baz', data=np.arange(100))
>>> spam = source.create_dataset('foo/spam', data=np.arange(1000))
>>> existing_spam = dest.create_dataset('foo/spam', data=np.arange(1000))
>>> from sys import stdout
>>> try:
...     zarr.copy(source['foo'], dest, log=stdout, dry_run=True)
... except zarr.CopyError as e:
...     print(e)
...
copy /foo
copy /foo/bar
copy /foo/bar/baz (100,) int64
an object 'spam' already exists in destination '/foo'
>>> zarr.copy(source['foo'], dest, log=stdout, if_exists='replace', dry_run=True)
copy /foo
copy /foo/bar
copy /foo/bar/baz (100,) int64
copy /foo/spam (1000,) int64
dry run: 4 copied, 0 skipped
(4, 0, 0)
>>> zarr.copy(source['foo'], dest, log=stdout, if_exists='skip', dry_run=True)
copy /foo
copy /foo/bar
copy /foo/bar/baz (100,) int64
skip /foo/spam (1000,) int64
dry run: 3 copied, 1 skipped
(3, 1, 0)
zarr.convenience.copy_all(source, dest, shallow=False, without_attrs=False, log=None, if_exists='raise', dry_run=False, **create_kws)

Copy all children of the source group into the dest group.

Parameters:

source : group or array/dataset

A zarr group or array, or an h5py group or dataset.

dest : group

A zarr or h5py group.

shallow : bool, optional

If True, only copy immediate children of source.

without_attrs : bool, optional

Do not copy user attributes.

log : callable, file path or file-like object, optional

If provided, will be used to log progress information.

if_exists : {‘raise’, ‘replace’, ‘skip’, ‘skip_initialized’}, optional

How to handle arrays that already exist in the destination group. If ‘raise’ then a CopyError is raised on the first array already present in the destination group. If ‘replace’ then any array will be replaced in the destination. If ‘skip’ then any existing arrays will not be copied. If ‘skip_initialized’ then any existing arrays with all chunks initialized will not be copied (not available when copying to h5py).

dry_run : bool, optional

If True, don’t actually copy anything, just log what would have happened.

**create_kws

Passed through to the create_dataset method when copying an array/dataset.

Returns:

n_copied : int

Number of items copied.

n_skipped : int

Number of items skipped.

n_bytes_copied : int

Number of bytes of data that were actually copied.

Notes

Please note that this is an experimental feature. The behaviour of this function is still evolving and the default behaviour and/or parameters may change in future versions.

Examples

>>> import h5py
>>> import zarr
>>> import numpy as np
>>> source = h5py.File('data/example.h5', mode='w')
>>> foo = source.create_group('foo')
>>> baz = foo.create_dataset('bar/baz', data=np.arange(100), chunks=(50,))
>>> spam = source.create_dataset('spam', data=np.arange(100, 200), chunks=(30,))
>>> zarr.tree(source)
/
 ├── foo
 │   └── bar
 │       └── baz (100,) int64
 └── spam (100,) int64
>>> dest = zarr.group()
>>> import sys
>>> zarr.copy_all(source, dest, log=sys.stdout)
copy /foo
copy /foo/bar
copy /foo/bar/baz (100,) int64
copy /spam (100,) int64
all done: 4 copied, 0 skipped, 1,600 bytes copied
(4, 0, 1600)
>>> dest.tree()
/
 ├── foo
 │   └── bar
 │       └── baz (100,) int64
 └── spam (100,) int64
>>> source.close()
zarr.convenience.copy_store(source, dest, source_path='', dest_path='', excludes=None, includes=None, flags=0, if_exists='raise', dry_run=False, log=None)

Copy data directly from the source store to the dest store. Use this function when you want to copy a group or array in the most efficient way, preserving all configuration and attributes. This function is more efficient than the copy() or copy_all() functions because it avoids de-compressing and re-compressing data, rather the compressed chunk data for each array are copied directly between stores.

Parameters:

source : Mapping

Store to copy data from.

dest : MutableMapping

Store to copy data into.

source_path : str, optional

Only copy data from under this path in the source store.

dest_path : str, optional

Copy data into this path in the destination store.

excludes : sequence of str, optional

One or more regular expressions which will be matched against keys in the source store. Any matching key will not be copied.

includes : sequence of str, optional

One or more regular expressions which will be matched against keys in the source store and will override any excludes also matching.

flags : int, optional

Regular expression flags used for matching excludes and includes.

if_exists : {‘raise’, ‘replace’, ‘skip’}, optional

How to handle keys that already exist in the destination store. If ‘raise’ then a CopyError is raised on the first key already present in the destination store. If ‘replace’ then any data will be replaced in the destination. If ‘skip’ then any existing keys will not be copied.

dry_run : bool, optional

If True, don’t actually copy anything, just log what would have happened.

log : callable, file path or file-like object, optional

If provided, will be used to log progress information.

Returns:

n_copied : int

Number of items copied.

n_skipped : int

Number of items skipped.

n_bytes_copied : int

Number of bytes of data that were actually copied.

Notes

Please note that this is an experimental feature. The behaviour of this function is still evolving and the default behaviour and/or parameters may change in future versions.

Examples

>>> import zarr
>>> store1 = zarr.DirectoryStore('data/example.zarr')
>>> root = zarr.group(store1, overwrite=True)
>>> foo = root.create_group('foo')
>>> bar = foo.create_group('bar')
>>> baz = bar.create_dataset('baz', shape=100, chunks=50, dtype='i8')
>>> import numpy as np
>>> baz[:] = np.arange(100)
>>> root.tree()
/
 └── foo
     └── bar
         └── baz (100,) int64
>>> from sys import stdout
>>> store2 = zarr.ZipStore('data/example.zip', mode='w')
>>> zarr.copy_store(store1, store2, log=stdout)
copy .zgroup
copy foo/.zgroup
copy foo/bar/.zgroup
copy foo/bar/baz/.zarray
copy foo/bar/baz/0
copy foo/bar/baz/1
all done: 6 copied, 0 skipped, 566 bytes copied
(6, 0, 566)
>>> new_root = zarr.group(store2)
>>> new_root.tree()
/
 └── foo
     └── bar
         └── baz (100,) int64
>>> new_root['foo/bar/baz'][:]
array([ 0,  1,  2,  ..., 97, 98, 99])
>>> store2.close()  # zip stores need to be closed
zarr.convenience.tree(grp, expand=False, level=None)

Provide a print-able display of the hierarchy. This function is provided mainly as a convenience for obtaining a tree view of an h5py group - zarr groups have a .tree() method.

Parameters:

grp : Group

Zarr or h5py group.

expand : bool, optional

Only relevant for HTML representation. If True, tree will be fully expanded.

level : int, optional

Maximum depth to descend into hierarchy.

Notes

Please note that this is an experimental feature. The behaviour of this function is still evolving and the default output and/or parameters may change in future versions.

Examples

>>> import zarr
>>> g1 = zarr.group()
>>> g2 = g1.create_group('foo')
>>> g3 = g1.create_group('bar')
>>> g4 = g3.create_group('baz')
>>> g5 = g3.create_group('qux')
>>> d1 = g5.create_dataset('baz', shape=100, chunks=10)
>>> g1.tree()
/
 ├── bar
 │   ├── baz
 │   └── qux
 │       └── baz (100,) float64
 └── foo
>>> import h5py
>>> h5f = h5py.File('data/example.h5', mode='w')
>>> zarr.copy_all(g1, h5f)
(5, 0, 800)
>>> zarr.tree(h5f)
/
 ├── bar
 │   ├── baz
 │   └── qux
 │       └── baz (100,) float64
 └── foo