V3 Specification Implementation(zarr._storage.v3)#

This module contains the implementation of the Zarr V3 Specification.

Warning

Since Zarr Python 2.12 release, this module provides experimental infrastructure for reading and writing the upcoming V3 spec of the Zarr format. Users wishing to prepare for the migration can set the environment variable ZARR_V3_EXPERIMENTAL_API=1 to begin experimenting, however data written with this API should be expected to become stale, as the implementation will still change.

The new zarr._store.v3 package has the necessary classes and functions for evaluating Zarr V3. Since the design is not finalised, the classes and functions are not automatically imported into the regular Zarr namespace.

Code snippet for creating Zarr V3 arrays:

>>> import zarr
>>> z = zarr.create((10000, 10000),
>>>             chunks=(100, 100),
>>>             dtype='f8',
>>>             compressor='default',
>>>             path='path-where-you-want-zarr-v3-array',
>>>             zarr_version=3)

Further, you can use z.info to see details about the array you just created:

>>> z.info
Name               : path-where-you-want-zarr-v3-array
Type               : zarr.core.Array
Data type          : float64
Shape              : (10000, 10000)
Chunk shape        : (100, 100)
Order              : C
Read-only          : False
Compressor         : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)
Store type         : zarr._storage.v3.KVStoreV3
No. bytes          : 800000000 (762.9M)
No. bytes stored   : 557
Storage ratio      : 1436265.7
Chunks initialized : 0/10000

You can also check Store type here (which indicates Zarr V3).

class zarr._storage.v3.RmdirV3[source]#

Mixin class that can be used to ensure override of any existing v2 rmdir class.

class zarr._storage.v3.KVStoreV3(mutablemapping)[source]#

This provides a default implementation of a store interface around a mutable mapping, to avoid having to test stores for presence of methods.

This, for most methods should just be a pass-through to the underlying KV store which is likely to expose a MuttableMapping interface,

class zarr._storage.v3.FSStoreV3(url, normalize_keys=False, key_separator=None, mode='w', exceptions=(<class 'KeyError'>, <class 'PermissionError'>, <class 'OSError'>), dimension_separator: ~typing.Literal['.', '/'] | None = None, fs=None, check=False, create=False, missing_exceptions=None, **storage_options)[source]#
class zarr._storage.v3.MemoryStoreV3(root=None, cls=<class 'dict'>, dimension_separator: ~typing.Literal['.', '/'] | None = None)[source]#

Store class that uses a hierarchy of KVStore objects, thus all data will be held in main memory.

Notes

Safe to write in multiple threads.

Examples

This is the default class used when creating a group. E.g.:

>>> import zarr
>>> g = zarr.group()
>>> type(g.store)
<class 'zarr.storage.MemoryStore'>

Note that the default class when creating an array is the built-in KVStore class, i.e.:

>>> z = zarr.zeros(100)
>>> type(z.store)
<class 'zarr.storage.KVStore'>
class zarr._storage.v3.DirectoryStoreV3(path, normalize_keys=False, dimension_separator: Literal['.', '/'] | None = None)[source]#

Storage class using directories and files on a standard file system.

Parameters:
pathstring

Location of directory to use as the root of the storage hierarchy.

normalize_keysbool, optional

If True, all store keys will be normalized to use lower case characters (e.g. ‘foo’ and ‘FOO’ will be treated as equivalent). This can be useful to avoid potential discrepancies between case-sensitive and case-insensitive file system. Default value is False.

dimension_separator{‘.’, ‘/’}, optional

Separator placed between the dimensions of a chunk.

Notes

Atomic writes are used, which means that data are first written to a temporary file, then moved into place when the write is successfully completed. Files are only held open while they are being read or written and are closed immediately afterwards, so there is no need to manually close any files.

Safe to write in multiple threads or processes.

Examples

Store a single array:

>>> import zarr
>>> store = zarr.DirectoryStore('data/array.zarr')
>>> z = zarr.zeros((10, 10), chunks=(5, 5), store=store, overwrite=True)
>>> z[...] = 42

Each chunk of the array is stored as a separate file on the file system, i.e.:

>>> import os
>>> sorted(os.listdir('data/array.zarr'))
['.zarray', '0.0', '0.1', '1.0', '1.1']

Store a group:

>>> store = zarr.DirectoryStore('data/group.zarr')
>>> root = zarr.group(store=store, overwrite=True)
>>> foo = root.create_group('foo')
>>> bar = foo.zeros('bar', shape=(10, 10), chunks=(5, 5))
>>> bar[...] = 42

When storing a group, levels in the group hierarchy will correspond to directories on the file system, i.e.:

>>> sorted(os.listdir('data/group.zarr'))
['.zgroup', 'foo']
>>> sorted(os.listdir('data/group.zarr/foo'))
['.zgroup', 'bar']
>>> sorted(os.listdir('data/group.zarr/foo/bar'))
['.zarray', '0.0', '0.1', '1.0', '1.1']
class zarr._storage.v3.ZipStoreV3(path, compression=0, allowZip64=True, mode='a', dimension_separator: Literal['.', '/'] | None = None)[source]#

Storage class using a Zip file.

Parameters:
pathstring

Location of file.

compressioninteger, optional

Compression method to use when writing to the archive.

allowZip64bool, optional

If True (the default) will create ZIP files that use the ZIP64 extensions when the zipfile is larger than 2 GiB. If False will raise an exception when the ZIP file would require ZIP64 extensions.

modestring, optional

One of ‘r’ to read an existing file, ‘w’ to truncate and write a new file, ‘a’ to append to an existing file, or ‘x’ to exclusively create and write a new file.

dimension_separator{‘.’, ‘/’}, optional

Separator placed between the dimensions of a chunk.

Notes

Each chunk of an array is stored as a separate entry in the Zip file. Note that Zip files do not provide any way to remove or replace existing entries. If an attempt is made to replace an entry, then a warning is generated by the Python standard library about a duplicate Zip file entry. This can be triggered if you attempt to write data to a Zarr array more than once, e.g.:

>>> store = zarr.ZipStore('data/example.zip', mode='w')
>>> z = zarr.zeros(100, chunks=10, store=store)
>>> # first write OK
... z[...] = 42
>>> # second write generates warnings
... z[...] = 42  
>>> store.close()

This can also happen in a more subtle situation, where data are written only once to a Zarr array, but the write operations are not aligned with chunk boundaries, e.g.:

>>> store = zarr.ZipStore('data/example.zip', mode='w')
>>> z = zarr.zeros(100, chunks=10, store=store)
>>> z[5:15] = 42
>>> # write overlaps chunk previously written, generates warnings
... z[15:25] = 42  

To avoid creating duplicate entries, only write data once, and align writes with chunk boundaries. This alignment is done automatically if you call z[...] = ... or create an array from existing data via zarr.array().

Alternatively, use a DirectoryStore when writing the data, then manually Zip the directory and use the Zip file for subsequent reads. Take note that the files in the Zip file must be relative to the root of the Zarr archive. You may find it easier to create such a Zip file with 7z, e.g.:

7z a -tzip archive.zarr.zip archive.zarr/.

Safe to write in multiple threads but not in multiple processes.

Examples

Store a single array:

>>> import zarr
>>> store = zarr.ZipStore('data/array.zip', mode='w')
>>> z = zarr.zeros((10, 10), chunks=(5, 5), store=store)
>>> z[...] = 42
>>> store.close()  # don't forget to call this when you're done

Store a group:

>>> store = zarr.ZipStore('data/group.zip', mode='w')
>>> root = zarr.group(store=store)
>>> foo = root.create_group('foo')
>>> bar = foo.zeros('bar', shape=(10, 10), chunks=(5, 5))
>>> bar[...] = 42
>>> store.close()  # don't forget to call this when you're done

After modifying a ZipStore, the close() method must be called, otherwise essential data will not be written to the underlying Zip file. The ZipStore class also supports the context manager protocol, which ensures the close() method is called on leaving the context, e.g.:

>>> with zarr.ZipStore('data/array.zip', mode='w') as store:
...     z = zarr.zeros((10, 10), chunks=(5, 5), store=store)
...     z[...] = 42
...     # no need to call store.close()
class zarr._storage.v3.RedisStoreV3(prefix='zarr', dimension_separator: Literal['.', '/'] | None = None, **kwargs)[source]#

Storage class using Redis.

Note

This is an experimental feature.

Requires the redis package to be installed.

Parameters:
prefixstring

Name of prefix for Redis keys

dimension_separator{‘.’, ‘/’}, optional

Separator placed between the dimensions of a chunk.

**kwargs

Keyword arguments passed through to the redis.Redis function.

class zarr._storage.v3.MongoDBStoreV3(database='mongodb_zarr', collection='zarr_collection', dimension_separator: Literal['.', '/'] | None = None, **kwargs)[source]#

Storage class using MongoDB.

Note

This is an experimental feature.

Requires the pymongo package to be installed.

Parameters:
databasestring

Name of database

collectionstring

Name of collection

dimension_separator{‘.’, ‘/’}, optional

Separator placed between the dimensions of a chunk.

**kwargs

Keyword arguments passed through to the pymongo.MongoClient function.

Notes

The maximum chunksize in MongoDB documents is 16 MB.

class zarr._storage.v3.DBMStoreV3(path, flag='c', mode=438, open=None, write_lock=True, dimension_separator: Literal['.', '/'] | None = None, **open_kwargs)[source]#

Storage class using a DBM-style database.

Parameters:
pathstring

Location of database file.

flagstring, optional

Flags for opening the database file.

modeint

File mode used if a new file is created.

openfunction, optional

Function to open the database file. If not provided, dbm.open() will be used on Python 3, and anydbm.open() will be used on Python 2.

write_lock: bool, optional

Use a lock to prevent concurrent writes from multiple threads (True by default).

dimension_separator{‘.’, ‘/’}, optional

Separator placed between the dimensions of a chunk.e

**open_kwargs

Keyword arguments to pass the open function.

Notes

Please note that, by default, this class will use the Python standard library dbm.open function to open the database file (or anydbm.open on Python 2). There are up to three different implementations of DBM-style databases available in any Python installation, and which one is used may vary from one system to another. Database file formats are not compatible between these different implementations. Also, some implementations are more efficient than others. In particular, the “dumb” implementation will be the fall-back on many systems, and has very poor performance for some usage scenarios. If you want to ensure a specific implementation is used, pass the corresponding open function, e.g., dbm.gnu.open to use the GNU DBM library.

Safe to write in multiple threads. May be safe to write in multiple processes, depending on which DBM implementation is being used, although this has not been tested.

Examples

Store a single array:

>>> import zarr
>>> store = zarr.DBMStore('data/array.db')
>>> z = zarr.zeros((10, 10), chunks=(5, 5), store=store, overwrite=True)
>>> z[...] = 42
>>> store.close()  # don't forget to call this when you're done

Store a group:

>>> store = zarr.DBMStore('data/group.db')
>>> root = zarr.group(store=store, overwrite=True)
>>> foo = root.create_group('foo')
>>> bar = foo.zeros('bar', shape=(10, 10), chunks=(5, 5))
>>> bar[...] = 42
>>> store.close()  # don't forget to call this when you're done

After modifying a DBMStore, the close() method must be called, otherwise essential data may not be written to the underlying database file. The DBMStore class also supports the context manager protocol, which ensures the close() method is called on leaving the context, e.g.:

>>> with zarr.DBMStore('data/array.db') as store:
...     z = zarr.zeros((10, 10), chunks=(5, 5), store=store, overwrite=True)
...     z[...] = 42
...     # no need to call store.close()

A different database library can be used by passing a different function to the open parameter. For example, if the bsddb3 package is installed, a Berkeley DB database can be used:

>>> import bsddb3
>>> store = zarr.DBMStore('data/array.bdb', open=bsddb3.btopen)
>>> z = zarr.zeros((10, 10), chunks=(5, 5), store=store, overwrite=True)
>>> z[...] = 42
>>> store.close()
class zarr._storage.v3.LMDBStoreV3(path, buffers=True, dimension_separator: Literal['.', '/'] | None = None, **kwargs)[source]#

Storage class using LMDB. Requires the lmdb package to be installed.

Parameters:
pathstring

Location of database file.

buffersbool, optional

If True (default) use support for buffers, which should increase performance by reducing memory copies.

dimension_separator{‘.’, ‘/’}, optional

Separator placed between the dimensions of a chunk.

**kwargs

Keyword arguments passed through to the lmdb.open function.

Notes

By default writes are not immediately flushed to disk to increase performance. You can ensure data are flushed to disk by calling the flush() or close() methods.

Should be safe to write in multiple threads or processes due to the synchronization support within LMDB, although writing from multiple processes has not been tested.

Examples

Store a single array:

>>> import zarr
>>> store = zarr.LMDBStore('data/array.mdb')
>>> z = zarr.zeros((10, 10), chunks=(5, 5), store=store, overwrite=True)
>>> z[...] = 42
>>> store.close()  # don't forget to call this when you're done

Store a group:

>>> store = zarr.LMDBStore('data/group.mdb')
>>> root = zarr.group(store=store, overwrite=True)
>>> foo = root.create_group('foo')
>>> bar = foo.zeros('bar', shape=(10, 10), chunks=(5, 5))
>>> bar[...] = 42
>>> store.close()  # don't forget to call this when you're done

After modifying a DBMStore, the close() method must be called, otherwise essential data may not be written to the underlying database file. The DBMStore class also supports the context manager protocol, which ensures the close() method is called on leaving the context, e.g.:

>>> with zarr.LMDBStore('data/array.mdb') as store:
...     z = zarr.zeros((10, 10), chunks=(5, 5), store=store, overwrite=True)
...     z[...] = 42
...     # no need to call store.close()
class zarr._storage.v3.SQLiteStoreV3(path, dimension_separator: Literal['.', '/'] | None = None, **kwargs)[source]#

Storage class using SQLite.

Parameters:
pathstring

Location of database file.

dimension_separator{‘.’, ‘/’}, optional

Separator placed between the dimensions of a chunk.

**kwargs

Keyword arguments passed through to the sqlite3.connect function.

Examples

Store a single array:

>>> import zarr
>>> store = zarr.SQLiteStore('data/array.sqldb')
>>> z = zarr.zeros((10, 10), chunks=(5, 5), store=store, overwrite=True)
>>> z[...] = 42
>>> store.close()  # don't forget to call this when you're done

Store a group:

>>> store = zarr.SQLiteStore('data/group.sqldb')
>>> root = zarr.group(store=store, overwrite=True)
>>> foo = root.create_group('foo')
>>> bar = foo.zeros('bar', shape=(10, 10), chunks=(5, 5))
>>> bar[...] = 42
>>> store.close()  # don't forget to call this when you're done
class zarr._storage.v3.LRUStoreCacheV3(store, max_size: int)[source]#

Storage class that implements a least-recently-used (LRU) cache layer over some other store. Intended primarily for use with stores that can be slow to access, e.g., remote stores that require network communication to store and retrieve data.

Parameters:
storeStore

The store containing the actual data to be cached.

max_sizeint

The maximum size that the cache may grow to, in number of bytes. Provide None if you would like the cache to have unlimited size.

Examples

The example below wraps an S3 store with an LRU cache:

>>> import s3fs
>>> import zarr
>>> s3 = s3fs.S3FileSystem(anon=True, client_kwargs=dict(region_name='eu-west-2'))
>>> store = s3fs.S3Map(root='zarr-demo/store', s3=s3, check=False)
>>> cache = zarr.LRUStoreCache(store, max_size=2**28)
>>> root = zarr.group(store=cache)  
>>> z = root['foo/bar/baz']  
>>> from timeit import timeit
>>> # first data access is relatively slow, retrieved from store
... timeit('print(z[:].tobytes())', number=1, globals=globals())  
b'Hello from the cloud!'
0.1081731989979744
>>> # second data access is faster, uses cache
... timeit('print(z[:].tobytes())', number=1, globals=globals())  
b'Hello from the cloud!'
0.0009490990014455747
class zarr._storage.v3.ConsolidatedMetadataStoreV3(store: BaseStore | MutableMapping, metadata_key='meta/root/consolidated/.zmetadata')[source]#

A layer over other storage, where the metadata has been consolidated into a single key.

The purpose of this class, is to be able to get all of the metadata for a given array in a single read operation from the underlying storage. See zarr.convenience.consolidate_metadata() for how to create this single metadata key.

This class loads from the one key, and stores the data in a dict, so that accessing the keys no longer requires operations on the backend store.

This class is read-only, and attempts to change the array metadata will fail, but changing the data is possible. If the backend storage is changed directly, then the metadata stored here could become obsolete, and zarr.convenience.consolidate_metadata() should be called again and the class re-invoked. The use case is for write once, read many times.

Note

This is an experimental feature.

Parameters:
store: Store

Containing the zarr array.

metadata_key: str

The target in the store where all of the metadata are stored. We assume JSON encoding.

In v3 storage transformers can be set via zarr.create(…, storage_transformers=[…]). The experimental sharding storage transformer can be tested by setting the environment variable ZARR_V3_SHARDING=1. Data written with this flag enabled should be expected to become stale until ZEP 2 is approved and fully implemented.

class zarr._storage.v3_storage_transformers.ShardingStorageTransformer(_type, chunks_per_shard)[source]#

Implements sharding as a storage transformer, as described in the spec: https://zarr-specs.readthedocs.io/en/latest/extensions/storage-transformers/sharding/v1.0.html https://purl.org/zarr/spec/storage_transformers/sharding/1.0

The abstract base class for storage transformers is

class zarr._storage.store.StorageTransformer(_type)[source]#

Base class for storage transformers. The methods simply pass on the data as-is and should be overwritten by sub-classes.