The Array class (zarr.core
)¶
-
class
zarr.core.
Array
(store, path=None, read_only=False, chunk_store=None, synchronizer=None, cache_metadata=True)¶ Instantiate an array from an initialized store.
Parameters: store : MutableMapping
Array store, already initialized.
path : string, optional
Storage path.
read_only : bool, optional
True if array should be protected against modification.
chunk_store : MutableMapping, optional
Separate storage for chunks. If not provided, store will be used for storage of both chunks and metadata.
synchronizer : object, optional
Array synchronizer.
cache_metadata : bool, optional
If True, array configuration metadata will be cached for the lifetime of the object. If False, array metadata will be reloaded prior to all data access and modification operations (may incur overhead depending on storage and data access pattern).
Attributes
store
A MutableMapping providing the underlying storage for the array. path
Storage path. name
Array name following h5py convention. read_only
A boolean, True if modification operations are not permitted. chunk_store
A MutableMapping providing the underlying storage for array chunks. shape
A tuple of integers describing the length of each dimension of the array. chunks
A tuple of integers describing the length of each dimension of a chunk of the array. dtype
The NumPy data type. fill_value
A value used for uninitialized portions of the array. order
A string indicating the order in which bytes are arranged within chunks of the array. synchronizer
Object used to synchronize write access to the array. filters
One or more codecs used to transform data prior to compression. attrs
A MutableMapping containing user-defined attributes. size
The total number of elements in the array. itemsize
The size in bytes of each item in the array. nbytes
The total number of bytes that would be required to store the array without compression. nbytes_stored
The total number of stored bytes of data for the array. cdata_shape
A tuple of integers describing the number of chunks along each dimension of the array. nchunks
Total number of chunks. nchunks_initialized
The number of chunks that have been initialized with some data. is_view
A boolean, True if this array is a view on another array. compression compression_opts Methods
__getitem__
(item)Retrieve data for some portion of the array. __setitem__
(key, value)Modify data for some portion of the array. resize
(*args)Change the shape of the array by growing or shrinking one or more dimensions. append
(data[, axis])Append data to axis. view
([shape, chunks, dtype, fill_value, ...])Return an array sharing the same data. -
__getitem__
(item)¶ Retrieve data for some portion of the array. Most NumPy-style slicing operations are supported.
Returns: out : ndarray
A NumPy array containing the data for the requested region.
Examples
Setup a 1-dimensional array:
>>> import zarr >>> import numpy as np >>> z = zarr.array(np.arange(100000000), chunks=1000000, dtype='i4') >>> z Array((100000000,), int32, chunks=(1000000,), order=C) nbytes: 381.5M; nbytes_stored: 6.4M; ratio: 59.9; initialized: 100/100 compressor: Blosc(cname='lz4', clevel=5, shuffle=1) store: dict
Take some slices:
>>> z[5] 5 >>> z[:5] array([0, 1, 2, 3, 4], dtype=int32) >>> z[-5:] array([99999995, 99999996, 99999997, 99999998, 99999999], dtype=int32) >>> z[5:10] array([5, 6, 7, 8, 9], dtype=int32) >>> z[:] array([ 0, 1, 2, ..., 99999997, 99999998, 99999999], dtype=int32)
Setup a 2-dimensional array:
>>> import zarr >>> import numpy as np >>> z = zarr.array(np.arange(100000000).reshape(10000, 10000), ... chunks=(1000, 1000), dtype='i4') >>> z Array((10000, 10000), int32, chunks=(1000, 1000), order=C) nbytes: 381.5M; nbytes_stored: 9.2M; ratio: 41.6; initialized: 100/100 compressor: Blosc(cname='lz4', clevel=5, shuffle=1) store: dict
Take some slices:
>>> z[2, 2] 20002 >>> z[:2, :2] array([[ 0, 1], [10000, 10001]], dtype=int32) >>> z[:2] array([[ 0, 1, 2, ..., 9997, 9998, 9999], [10000, 10001, 10002, ..., 19997, 19998, 19999]], dtype=int32) >>> z[:, :2] array([[ 0, 1], [ 10000, 10001], [ 20000, 20001], ..., [99970000, 99970001], [99980000, 99980001], [99990000, 99990001]], dtype=int32) >>> z[:] array([[ 0, 1, 2, ..., 9997, 9998, 9999], [ 10000, 10001, 10002, ..., 19997, 19998, 19999], [ 20000, 20001, 20002, ..., 29997, 29998, 29999], ..., [99970000, 99970001, 99970002, ..., 99979997, 99979998, 99979999], [99980000, 99980001, 99980002, ..., 99989997, 99989998, 99989999], [99990000, 99990001, 99990002, ..., 99999997, 99999998, 99999999]], dtype=int32)
-
__setitem__
(key, value)¶ Modify data for some portion of the array.
Examples
Setup a 1-dimensional array:
>>> import zarr >>> z = zarr.zeros(100000000, chunks=1000000, dtype='i4') >>> z Array((100000000,), int32, chunks=(1000000,), order=C) nbytes: 381.5M; nbytes_stored: 301; ratio: 1328903.7; initialized: 0/100 compressor: Blosc(cname='lz4', clevel=5, shuffle=1) store: dict
Set all array elements to the same scalar value:
>>> z[:] = 42 >>> z[:] array([42, 42, 42, ..., 42, 42, 42], dtype=int32)
Set a portion of the array:
>>> z[:100] = np.arange(100) >>> z[-100:] = np.arange(100)[::-1] >>> z[:] array([0, 1, 2, ..., 2, 1, 0], dtype=int32)
Setup a 2-dimensional array:
>>> z = zarr.zeros((10000, 10000), chunks=(1000, 1000), dtype='i4') >>> z Array((10000, 10000), int32, chunks=(1000, 1000), order=C) nbytes: 381.5M; nbytes_stored: 323; ratio: 1238390.1; initialized: 0/100 compressor: Blosc(cname='lz4', clevel=5, shuffle=1) store: dict
Set all array elements to the same scalar value:
>>> z[:] = 42 >>> z[:] array([[42, 42, 42, ..., 42, 42, 42], [42, 42, 42, ..., 42, 42, 42], [42, 42, 42, ..., 42, 42, 42], ..., [42, 42, 42, ..., 42, 42, 42], [42, 42, 42, ..., 42, 42, 42], [42, 42, 42, ..., 42, 42, 42]], dtype=int32)
Set a portion of the array:
>>> z[0, :] = np.arange(z.shape[1]) >>> z[:, 0] = np.arange(z.shape[0]) >>> z[:] array([[ 0, 1, 2, ..., 9997, 9998, 9999], [ 1, 42, 42, ..., 42, 42, 42], [ 2, 42, 42, ..., 42, 42, 42], ..., [9997, 42, 42, ..., 42, 42, 42], [9998, 42, 42, ..., 42, 42, 42], [9999, 42, 42, ..., 42, 42, 42]], dtype=int32)
-
resize
(*args)¶ Change the shape of the array by growing or shrinking one or more dimensions.
Notes
When resizing an array, the data are not rearranged in any way.
If one or more dimensions are shrunk, any chunks falling outside the new array shape will be deleted from the underlying store.
Examples
>>> import zarr >>> z = zarr.zeros(shape=(10000, 10000), chunks=(1000, 1000)) >>> z Array((10000, 10000), float64, chunks=(1000, 1000), order=C) nbytes: 762.9M; nbytes_stored: 323; ratio: 2476780.2; initialized: 0/100 compressor: Blosc(cname='lz4', clevel=5, shuffle=1) store: dict >>> z.resize(20000, 10000) >>> z Array((20000, 10000), float64, chunks=(1000, 1000), order=C) nbytes: 1.5G; nbytes_stored: 323; ratio: 4953560.4; initialized: 0/200 compressor: Blosc(cname='lz4', clevel=5, shuffle=1) store: dict >>> z.resize(30000, 1000) >>> z Array((30000, 1000), float64, chunks=(1000, 1000), order=C) nbytes: 228.9M; nbytes_stored: 322; ratio: 745341.6; initialized: 0/30 compressor: Blosc(cname='lz4', clevel=5, shuffle=1) store: dict
-
append
(data, axis=0)¶ Append data to axis.
Parameters: data : array_like
Data to be appended.
axis : int
Axis along which to append.
Returns: new_shape : tuple
Notes
The size of all dimensions other than axis must match between this array and data.
Examples
>>> import numpy as np >>> import zarr >>> a = np.arange(10000000, dtype='i4').reshape(10000, 1000) >>> z = zarr.array(a, chunks=(1000, 100)) >>> z Array((10000, 1000), int32, chunks=(1000, 100), order=C) nbytes: 38.1M; nbytes_stored: 1.9M; ratio: 20.3; initialized: 100/100 compressor: Blosc(cname='lz4', clevel=5, shuffle=1) store: dict >>> z.append(a) (20000, 1000) >>> z Array((20000, 1000), int32, chunks=(1000, 100), order=C) nbytes: 76.3M; nbytes_stored: 3.8M; ratio: 20.3; initialized: 200/200 compressor: Blosc(cname='lz4', clevel=5, shuffle=1) store: dict >>> z.append(np.vstack([a, a]), axis=1) (20000, 2000) >>> z Array((20000, 2000), int32, chunks=(1000, 100), order=C) nbytes: 152.6M; nbytes_stored: 7.5M; ratio: 20.3; initialized: 400/400 compressor: Blosc(cname='lz4', clevel=5, shuffle=1) store: dict
-
view
(shape=None, chunks=None, dtype=None, fill_value=None, filters=None, read_only=None, synchronizer=None)¶ Return an array sharing the same data.
Parameters: shape : int or tuple of ints
Array shape.
chunks : int or tuple of ints, optional
Chunk shape.
dtype : string or dtype, optional
NumPy dtype.
fill_value : object
Default value to use for uninitialized portions of the array.
filters : sequence, optional
Sequence of filters to use to encode chunk data prior to compression.
read_only : bool, optional
True if array should be protected against modification.
synchronizer : object, optional
Array synchronizer.
Notes
WARNING: This is an experimental feature and should be used with care. There are plenty of ways to generate errors and/or cause data corruption.
Examples
Bypass filters:
>>> import zarr >>> import numpy as np >>> np.random.seed(42) >>> labels = [b'female', b'male'] >>> data = np.random.choice(labels, size=10000) >>> filters = [zarr.Categorize(labels=labels, ... dtype=data.dtype, ... astype='u1')] >>> a = zarr.array(data, chunks=1000, filters=filters) >>> a[:] array([b'female', b'male', b'female', ..., b'male', b'male', b'female'], dtype='|S6') >>> v = a.view(dtype='u1', filters=[]) >>> v.is_view True >>> v[:] array([1, 2, 1, ..., 2, 2, 1], dtype=uint8)
Views can be used to modify data:
>>> x = v[:] >>> x.sort() >>> v[:] = x >>> v[:] array([1, 1, 1, ..., 2, 2, 2], dtype=uint8) >>> a[:] array([b'female', b'female', b'female', ..., b'male', b'male', b'male'], dtype='|S6')
View as a different dtype with the same itemsize:
>>> data = np.random.randint(0, 2, size=10000, dtype='u1') >>> a = zarr.array(data, chunks=1000) >>> a[:] array([0, 0, 1, ..., 1, 0, 0], dtype=uint8) >>> v = a.view(dtype=bool) >>> v[:] array([False, False, True, ..., True, False, False], dtype=bool) >>> np.all(a[:].view(dtype=bool) == v[:]) True
An array can be viewed with a dtype with a different itemsize, however some care is needed to adjust the shape and chunk shape so that chunk data is interpreted correctly:
>>> data = np.arange(10000, dtype='u2') >>> a = zarr.array(data, chunks=1000) >>> a[:10] array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint16) >>> v = a.view(dtype='u1', shape=20000, chunks=2000) >>> v[:10] array([0, 0, 1, 0, 2, 0, 3, 0, 4, 0], dtype=uint8) >>> np.all(a[:].view('u1') == v[:]) True
Change fill value for uninitialized chunks:
>>> a = zarr.full(10000, chunks=1000, fill_value=-1, dtype='i1') >>> a[:] array([-1, -1, -1, ..., -1, -1, -1], dtype=int8) >>> v = a.view(fill_value=42) >>> v[:] array([42, 42, 42, ..., 42, 42, 42], dtype=int8)
Note that resizing or appending to views is not permitted:
>>> a = zarr.empty(10000) >>> v = a.view() >>> try: ... v.resize(20000) ... except PermissionError as e: ... print(e) not permitted for views
-