Convenience functions (zarr.convenience
)#
Convenience functions for storing and loading data.
- zarr.convenience.open(store: Optional[Union[BaseStore, MutableMapping, str]] = None, mode: str = 'a', *, zarr_version=None, path=None, **kwargs)[source]#
Convenience function to open a group or array using file-mode-like semantics.
- Parameters
- storeStore or string, optional
Store or path to directory in file system or name of zip file.
- mode{‘r’, ‘r+’, ‘a’, ‘w’, ‘w-‘}, optional
Persistence mode: ‘r’ means read only (must exist); ‘r+’ means read/write (must exist); ‘a’ means read/write (create if doesn’t exist); ‘w’ means create (overwrite if exists); ‘w-’ means create (fail if exists).
- zarr_version{2, 3, None}, optional
The zarr protocol version to use. The default value of None will attempt to infer the version from store if possible, otherwise it will fall back to 2.
- pathstr or None, optional
The path within the store to open.
- **kwargs
Additional parameters are passed through to
zarr.creation.open_array()
orzarr.hierarchy.open_group()
.
- Returns
- z
zarr.core.Array
orzarr.hierarchy.Group
Array or group, depending on what exists in the given store.
- z
Examples
Storing data in a directory ‘data/example.zarr’ on the local file system:
>>> import zarr >>> store = 'data/example.zarr' >>> zw = zarr.open(store, mode='w', shape=100, dtype='i4') # open new array >>> zw <zarr.core.Array (100,) int32> >>> za = zarr.open(store, mode='a') # open existing array for reading and writing >>> za <zarr.core.Array (100,) int32> >>> zr = zarr.open(store, mode='r') # open existing array read-only >>> zr <zarr.core.Array (100,) int32 read-only> >>> gw = zarr.open(store, mode='w') # open new group, overwriting previous data >>> gw <zarr.hierarchy.Group '/'> >>> ga = zarr.open(store, mode='a') # open existing group for reading and writing >>> ga <zarr.hierarchy.Group '/'> >>> gr = zarr.open(store, mode='r') # open existing group read-only >>> gr <zarr.hierarchy.Group '/' read-only>
- zarr.convenience.save(store: Optional[Union[BaseStore, MutableMapping, str]], *args, zarr_version=None, path=None, **kwargs)[source]#
Convenience function to save an array or group of arrays to the local file system.
- Parameters
- storeMutableMapping or string
Store or path to directory in file system or name of zip file.
- argsndarray
NumPy arrays with data to save.
- zarr_version{2, 3, None}, optional
The zarr protocol version to use when saving. The default value of None will attempt to infer the version from store if possible, otherwise it will fall back to 2.
- pathstr or None, optional
The path within the group where the arrays will be saved.
- kwargs
NumPy arrays with data to save.
See also
Examples
Save an array to a directory on the file system (uses a
DirectoryStore
):>>> import zarr >>> import numpy as np >>> arr = np.arange(10000) >>> zarr.save('data/example.zarr', arr) >>> zarr.load('data/example.zarr') array([ 0, 1, 2, ..., 9997, 9998, 9999])
Save an array to a Zip file (uses a
ZipStore
):>>> zarr.save('data/example.zip', arr) >>> zarr.load('data/example.zip') array([ 0, 1, 2, ..., 9997, 9998, 9999])
Save several arrays to a directory on the file system (uses a
DirectoryStore
and stores arrays in a group):>>> import zarr >>> import numpy as np >>> a1 = np.arange(10000) >>> a2 = np.arange(10000, 0, -1) >>> zarr.save('data/example.zarr', a1, a2) >>> loader = zarr.load('data/example.zarr') >>> loader <LazyLoader: arr_0, arr_1> >>> loader['arr_0'] array([ 0, 1, 2, ..., 9997, 9998, 9999]) >>> loader['arr_1'] array([10000, 9999, 9998, ..., 3, 2, 1])
Save several arrays using named keyword arguments:
>>> zarr.save('data/example.zarr', foo=a1, bar=a2) >>> loader = zarr.load('data/example.zarr') >>> loader <LazyLoader: bar, foo> >>> loader['foo'] array([ 0, 1, 2, ..., 9997, 9998, 9999]) >>> loader['bar'] array([10000, 9999, 9998, ..., 3, 2, 1])
Store several arrays in a single zip file (uses a
ZipStore
):>>> zarr.save('data/example.zip', foo=a1, bar=a2) >>> loader = zarr.load('data/example.zip') >>> loader <LazyLoader: bar, foo> >>> loader['foo'] array([ 0, 1, 2, ..., 9997, 9998, 9999]) >>> loader['bar'] array([10000, 9999, 9998, ..., 3, 2, 1])
- zarr.convenience.load(store: Optional[Union[BaseStore, MutableMapping, str]], zarr_version=None, path=None)[source]#
Load data from an array or group into memory.
- Parameters
- storeMutableMapping or string
Store or path to directory in file system or name of zip file.
- zarr_version{2, 3, None}, optional
The zarr protocol version to use when loading. The default value of None will attempt to infer the version from store if possible, otherwise it will fall back to 2.
- pathstr or None, optional
The path within the store from which to load.
- Returns
- out
If the store contains an array, out will be a numpy array. If the store contains a group, out will be a dict-like object where keys are array names and values are numpy arrays.
See also
save
,savez
Notes
If loading data from a group of arrays, data will not be immediately loaded into memory. Rather, arrays will be loaded into memory as they are requested.
- zarr.convenience.save_array(store: Optional[Union[BaseStore, MutableMapping, str]], arr, *, zarr_version=None, path=None, **kwargs)[source]#
Convenience function to save a NumPy array to the local file system, following a similar API to the NumPy save() function.
- Parameters
- storeMutableMapping or string
Store or path to directory in file system or name of zip file.
- arrndarray
NumPy array with data to save.
- zarr_version{2, 3, None}, optional
The zarr protocol version to use when saving. The default value of None will attempt to infer the version from store if possible, otherwise it will fall back to 2.
- pathstr or None, optional
The path within the store where the array will be saved.
- kwargs
Passed through to
create()
, e.g., compressor.
Examples
Save an array to a directory on the file system (uses a
DirectoryStore
):>>> import zarr >>> import numpy as np >>> arr = np.arange(10000) >>> zarr.save_array('data/example.zarr', arr) >>> zarr.load('data/example.zarr') array([ 0, 1, 2, ..., 9997, 9998, 9999])
Save an array to a single file (uses a
ZipStore
):>>> zarr.save_array('data/example.zip', arr) >>> zarr.load('data/example.zip') array([ 0, 1, 2, ..., 9997, 9998, 9999])
- zarr.convenience.save_group(store: Optional[Union[BaseStore, MutableMapping, str]], *args, zarr_version=None, path=None, **kwargs)[source]#
Convenience function to save several NumPy arrays to the local file system, following a similar API to the NumPy savez()/savez_compressed() functions.
- Parameters
- storeMutableMapping or string
Store or path to directory in file system or name of zip file.
- argsndarray
NumPy arrays with data to save.
- zarr_version{2, 3, None}, optional
The zarr protocol version to use when saving. The default value of None will attempt to infer the version from store if possible, otherwise it will fall back to 2.
- pathstr or None, optional
Path within the store where the group will be saved.
- kwargs
NumPy arrays with data to save.
Notes
Default compression options will be used.
Examples
Save several arrays to a directory on the file system (uses a
DirectoryStore
):>>> import zarr >>> import numpy as np >>> a1 = np.arange(10000) >>> a2 = np.arange(10000, 0, -1) >>> zarr.save_group('data/example.zarr', a1, a2) >>> loader = zarr.load('data/example.zarr') >>> loader <LazyLoader: arr_0, arr_1> >>> loader['arr_0'] array([ 0, 1, 2, ..., 9997, 9998, 9999]) >>> loader['arr_1'] array([10000, 9999, 9998, ..., 3, 2, 1])
Save several arrays using named keyword arguments:
>>> zarr.save_group('data/example.zarr', foo=a1, bar=a2) >>> loader = zarr.load('data/example.zarr') >>> loader <LazyLoader: bar, foo> >>> loader['foo'] array([ 0, 1, 2, ..., 9997, 9998, 9999]) >>> loader['bar'] array([10000, 9999, 9998, ..., 3, 2, 1])
Store several arrays in a single zip file (uses a
ZipStore
):>>> zarr.save_group('data/example.zip', foo=a1, bar=a2) >>> loader = zarr.load('data/example.zip') >>> loader <LazyLoader: bar, foo> >>> loader['foo'] array([ 0, 1, 2, ..., 9997, 9998, 9999]) >>> loader['bar'] array([10000, 9999, 9998, ..., 3, 2, 1])
- zarr.convenience.copy(source, dest, name=None, shallow=False, without_attrs=False, log=None, if_exists='raise', dry_run=False, **create_kws)[source]#
Copy the source array or group into the dest group.
- Parameters
- sourcegroup or array/dataset
A zarr group or array, or an h5py group or dataset.
- destgroup
A zarr or h5py group.
- namestr, optional
Name to copy the object to.
- shallowbool, optional
If True, only copy immediate children of source.
- without_attrsbool, optional
Do not copy user attributes.
- logcallable, file path or file-like object, optional
If provided, will be used to log progress information.
- if_exists{‘raise’, ‘replace’, ‘skip’, ‘skip_initialized’}, optional
How to handle arrays that already exist in the destination group. If ‘raise’ then a CopyError is raised on the first array already present in the destination group. If ‘replace’ then any array will be replaced in the destination. If ‘skip’ then any existing arrays will not be copied. If ‘skip_initialized’ then any existing arrays with all chunks initialized will not be copied (not available when copying to h5py).
- dry_runbool, optional
If True, don’t actually copy anything, just log what would have happened.
- **create_kws
Passed through to the create_dataset method when copying an array/dataset.
- Returns
- n_copiedint
Number of items copied.
- n_skippedint
Number of items skipped.
- n_bytes_copiedint
Number of bytes of data that were actually copied.
Notes
Please note that this is an experimental feature. The behaviour of this function is still evolving and the default behaviour and/or parameters may change in future versions.
Examples
Here’s an example of copying a group named ‘foo’ from an HDF5 file to a Zarr group:
>>> import h5py >>> import zarr >>> import numpy as np >>> source = h5py.File('data/example.h5', mode='w') >>> foo = source.create_group('foo') >>> baz = foo.create_dataset('bar/baz', data=np.arange(100), chunks=(50,)) >>> spam = source.create_dataset('spam', data=np.arange(100, 200), chunks=(30,)) >>> zarr.tree(source) / ├── foo │ └── bar │ └── baz (100,) int64 └── spam (100,) int64 >>> dest = zarr.group() >>> from sys import stdout >>> zarr.copy(source['foo'], dest, log=stdout) copy /foo copy /foo/bar copy /foo/bar/baz (100,) int64 all done: 3 copied, 0 skipped, 800 bytes copied (3, 0, 800) >>> dest.tree() # N.B., no spam / └── foo └── bar └── baz (100,) int64 >>> source.close()
The
if_exists
parameter provides options for how to handle pre-existing data in the destination. Here are some examples of these options, also usingdry_run=True
to find out what would happen without actually copying anything:>>> source = zarr.group() >>> dest = zarr.group() >>> baz = source.create_dataset('foo/bar/baz', data=np.arange(100)) >>> spam = source.create_dataset('foo/spam', data=np.arange(1000)) >>> existing_spam = dest.create_dataset('foo/spam', data=np.arange(1000)) >>> from sys import stdout >>> try: ... zarr.copy(source['foo'], dest, log=stdout, dry_run=True) ... except zarr.CopyError as e: ... print(e) ... copy /foo copy /foo/bar copy /foo/bar/baz (100,) int64 an object 'spam' already exists in destination '/foo' >>> zarr.copy(source['foo'], dest, log=stdout, if_exists='replace', dry_run=True) copy /foo copy /foo/bar copy /foo/bar/baz (100,) int64 copy /foo/spam (1000,) int64 dry run: 4 copied, 0 skipped (4, 0, 0) >>> zarr.copy(source['foo'], dest, log=stdout, if_exists='skip', dry_run=True) copy /foo copy /foo/bar copy /foo/bar/baz (100,) int64 skip /foo/spam (1000,) int64 dry run: 3 copied, 1 skipped (3, 1, 0)
- zarr.convenience.copy_all(source, dest, shallow=False, without_attrs=False, log=None, if_exists='raise', dry_run=False, **create_kws)[source]#
Copy all children of the source group into the dest group.
- Parameters
- sourcegroup or array/dataset
A zarr group or array, or an h5py group or dataset.
- destgroup
A zarr or h5py group.
- shallowbool, optional
If True, only copy immediate children of source.
- without_attrsbool, optional
Do not copy user attributes.
- logcallable, file path or file-like object, optional
If provided, will be used to log progress information.
- if_exists{‘raise’, ‘replace’, ‘skip’, ‘skip_initialized’}, optional
How to handle arrays that already exist in the destination group. If ‘raise’ then a CopyError is raised on the first array already present in the destination group. If ‘replace’ then any array will be replaced in the destination. If ‘skip’ then any existing arrays will not be copied. If ‘skip_initialized’ then any existing arrays with all chunks initialized will not be copied (not available when copying to h5py).
- dry_runbool, optional
If True, don’t actually copy anything, just log what would have happened.
- **create_kws
Passed through to the create_dataset method when copying an array/dataset.
- Returns
- n_copiedint
Number of items copied.
- n_skippedint
Number of items skipped.
- n_bytes_copiedint
Number of bytes of data that were actually copied.
Notes
Please note that this is an experimental feature. The behaviour of this function is still evolving and the default behaviour and/or parameters may change in future versions.
Examples
>>> import h5py >>> import zarr >>> import numpy as np >>> source = h5py.File('data/example.h5', mode='w') >>> foo = source.create_group('foo') >>> baz = foo.create_dataset('bar/baz', data=np.arange(100), chunks=(50,)) >>> spam = source.create_dataset('spam', data=np.arange(100, 200), chunks=(30,)) >>> zarr.tree(source) / ├── foo │ └── bar │ └── baz (100,) int64 └── spam (100,) int64 >>> dest = zarr.group() >>> import sys >>> zarr.copy_all(source, dest, log=sys.stdout) copy /foo copy /foo/bar copy /foo/bar/baz (100,) int64 copy /spam (100,) int64 all done: 4 copied, 0 skipped, 1,600 bytes copied (4, 0, 1600) >>> dest.tree() / ├── foo │ └── bar │ └── baz (100,) int64 └── spam (100,) int64 >>> source.close()
- zarr.convenience.copy_store(source, dest, source_path='', dest_path='', excludes=None, includes=None, flags=0, if_exists='raise', dry_run=False, log=None)[source]#
Copy data directly from the source store to the dest store. Use this function when you want to copy a group or array in the most efficient way, preserving all configuration and attributes. This function is more efficient than the copy() or copy_all() functions because it avoids de-compressing and re-compressing data, rather the compressed chunk data for each array are copied directly between stores.
- Parameters
- sourceMapping
Store to copy data from.
- destMutableMapping
Store to copy data into.
- source_pathstr, optional
Only copy data from under this path in the source store.
- dest_pathstr, optional
Copy data into this path in the destination store.
- excludessequence of str, optional
One or more regular expressions which will be matched against keys in the source store. Any matching key will not be copied.
- includessequence of str, optional
One or more regular expressions which will be matched against keys in the source store and will override any excludes also matching.
- flagsint, optional
Regular expression flags used for matching excludes and includes.
- if_exists{‘raise’, ‘replace’, ‘skip’}, optional
How to handle keys that already exist in the destination store. If ‘raise’ then a CopyError is raised on the first key already present in the destination store. If ‘replace’ then any data will be replaced in the destination. If ‘skip’ then any existing keys will not be copied.
- dry_runbool, optional
If True, don’t actually copy anything, just log what would have happened.
- logcallable, file path or file-like object, optional
If provided, will be used to log progress information.
- Returns
- n_copiedint
Number of items copied.
- n_skippedint
Number of items skipped.
- n_bytes_copiedint
Number of bytes of data that were actually copied.
Notes
Please note that this is an experimental feature. The behaviour of this function is still evolving and the default behaviour and/or parameters may change in future versions.
Examples
>>> import zarr >>> store1 = zarr.DirectoryStore('data/example.zarr') >>> root = zarr.group(store1, overwrite=True) >>> foo = root.create_group('foo') >>> bar = foo.create_group('bar') >>> baz = bar.create_dataset('baz', shape=100, chunks=50, dtype='i8') >>> import numpy as np >>> baz[:] = np.arange(100) >>> root.tree() / └── foo └── bar └── baz (100,) int64 >>> from sys import stdout >>> store2 = zarr.ZipStore('data/example.zip', mode='w') >>> zarr.copy_store(store1, store2, log=stdout) copy .zgroup copy foo/.zgroup copy foo/bar/.zgroup copy foo/bar/baz/.zarray copy foo/bar/baz/0 copy foo/bar/baz/1 all done: 6 copied, 0 skipped, 566 bytes copied (6, 0, 566) >>> new_root = zarr.group(store2) >>> new_root.tree() / └── foo └── bar └── baz (100,) int64 >>> new_root['foo/bar/baz'][:] array([ 0, 1, 2, ..., 97, 98, 99]) >>> store2.close() # zip stores need to be closed
- zarr.convenience.tree(grp, expand=False, level=None)[source]#
Provide a
print
-able display of the hierarchy. This function is provided mainly as a convenience for obtaining a tree view of an h5py group - zarr groups have a.tree()
method.- Parameters
- grpGroup
Zarr or h5py group.
- expandbool, optional
Only relevant for HTML representation. If True, tree will be fully expanded.
- levelint, optional
Maximum depth to descend into hierarchy.
See also
Notes
Please note that this is an experimental feature. The behaviour of this function is still evolving and the default output and/or parameters may change in future versions.
Examples
>>> import zarr >>> g1 = zarr.group() >>> g2 = g1.create_group('foo') >>> g3 = g1.create_group('bar') >>> g4 = g3.create_group('baz') >>> g5 = g3.create_group('qux') >>> d1 = g5.create_dataset('baz', shape=100, chunks=10) >>> g1.tree() / ├── bar │ ├── baz │ └── qux │ └── baz (100,) float64 └── foo >>> import h5py >>> h5f = h5py.File('data/example.h5', mode='w') >>> zarr.copy_all(g1, h5f) (5, 0, 800) >>> zarr.tree(h5f) / ├── bar │ ├── baz │ └── qux │ └── baz (100,) float64 └── foo
- zarr.convenience.consolidate_metadata(store: BaseStore, metadata_key='.zmetadata', *, path='')[source]#
Consolidate all metadata for groups and arrays within the given store into a single resource and put it under the given key.
This produces a single object in the backend store, containing all the metadata read from all the zarr-related keys that can be found. After metadata have been consolidated, use
open_consolidated()
to open the root group in optimised, read-only mode, using the consolidated metadata to reduce the number of read operations on the backend store.Note, that if the metadata in the store is changed after this consolidation, then the metadata read by
open_consolidated()
would be incorrect unless this function is called again.Note
This is an experimental feature.
- Parameters
- storeMutableMapping or string
Store or path to directory in file system or name of zip file.
- metadata_keystr
Key to put the consolidated metadata under.
- pathstr or None
Path corresponding to the group that is being consolidated. Not required for zarr v2 stores.
- Returns
- g
zarr.hierarchy.Group
Group instance, opened with the new consolidated metadata.
- g
See also
- zarr.convenience.open_consolidated(store: Optional[Union[BaseStore, MutableMapping, str]], metadata_key='.zmetadata', mode='r+', **kwargs)[source]#
Open group using metadata previously consolidated into a single key.
This is an optimised method for opening a Zarr group, where instead of traversing the group/array hierarchy by accessing the metadata keys at each level, a single key contains all of the metadata for everything. For remote data sources where the overhead of accessing a key is large compared to the time to read data.
The group accessed must have already had its metadata consolidated into a single key using the function
consolidate_metadata()
.This optimised method only works in modes which do not change the metadata, although the data may still be written/updated.
- Parameters
- storeMutableMapping or string
Store or path to directory in file system or name of zip file.
- metadata_keystr
Key to read the consolidated metadata from. The default (.zmetadata) corresponds to the default used by
consolidate_metadata()
.- mode{‘r’, ‘r+’}, optional
Persistence mode: ‘r’ means read only (must exist); ‘r+’ means read/write (must exist) although only writes to data are allowed, changes to metadata including creation of new arrays or group are not allowed.
- **kwargs
Additional parameters are passed through to
zarr.creation.open_array()
orzarr.hierarchy.open_group()
.
- Returns
- g
zarr.hierarchy.Group
Group instance, opened with the consolidated metadata.
- g
See also