The Array class (zarr.core)¶
-
class
zarr.core.Array(store, path=None, read_only=False, chunk_store=None, synchronizer=None)¶ Instantiate an array from an initialized store.
Parameters: store : MutableMapping
Array store, already initialized.
path : string, optional
Storage path.
read_only : bool, optional
True if array should be protected against modification.
chunk_store : MutableMapping, optional
Separate storage for chunks. If not provided, store will be used for storage of both chunks and metadata.
synchronizer : object, optional
Array synchronizer.
Attributes
storeA MutableMapping providing the underlying storage for the array. pathStorage path. nameArray name following h5py convention. read_onlyA boolean, True if modification operations are not permitted. chunk_storeA MutableMapping providing the underlying storage for array chunks. shapeA tuple of integers describing the length of each dimension of the array. chunksA tuple of integers describing the length of each dimension of a chunk of the array. dtypeThe NumPy data type. fill_valueA value used for uninitialized portions of the array. orderA string indicating the order in which bytes are arranged within chunks of the array. synchronizerObject used to synchronize write access to the array. filtersOne or more codecs used to transform data prior to compression. attrsA MutableMapping containing user-defined attributes. sizeThe total number of elements in the array. itemsizeThe size in bytes of each item in the array. nbytesThe total number of bytes that would be required to store the array without compression. nbytes_storedThe total number of stored bytes of data for the array. initializedThe number of chunks that have been initialized with some data. cdata_shapeA tuple of integers describing the number of chunks along each dimension of the array. is_viewA boolean, True if this array is a view on another array. compression compression_opts Methods
__getitem__(item)Retrieve data for some portion of the array. __setitem__(key, value)Modify data for some portion of the array. resize(*args)Change the shape of the array by growing or shrinking one or more dimensions. append(data[, axis])Append data to axis. view([shape, chunks, dtype, fill_value, ...])Return an array sharing the same data. -
__getitem__(item)¶ Retrieve data for some portion of the array. Most NumPy-style slicing operations are supported.
Returns: out : ndarray
A NumPy array containing the data for the requested region.
Examples
Setup a 1-dimensional array:
>>> import zarr >>> import numpy as np >>> z = zarr.array(np.arange(100000000), chunks=1000000, dtype='i4') >>> z Array((100000000,), int32, chunks=(1000000,), order=C) nbytes: 381.5M; nbytes_stored: 6.4M; ratio: 59.9; initialized: 100/100 compressor: Blosc(cname='lz4', clevel=5, shuffle=1) store: dict
Take some slices:
>>> z[5] 5 >>> z[:5] array([0, 1, 2, 3, 4], dtype=int32) >>> z[-5:] array([99999995, 99999996, 99999997, 99999998, 99999999], dtype=int32) >>> z[5:10] array([5, 6, 7, 8, 9], dtype=int32) >>> z[:] array([ 0, 1, 2, ..., 99999997, 99999998, 99999999], dtype=int32)
Setup a 2-dimensional array:
>>> import zarr >>> import numpy as np >>> z = zarr.array(np.arange(100000000).reshape(10000, 10000), ... chunks=(1000, 1000), dtype='i4') >>> z Array((10000, 10000), int32, chunks=(1000, 1000), order=C) nbytes: 381.5M; nbytes_stored: 9.2M; ratio: 41.6; initialized: 100/100 compressor: Blosc(cname='lz4', clevel=5, shuffle=1) store: dict
Take some slices:
>>> z[2, 2] 20002 >>> z[:2, :2] array([[ 0, 1], [10000, 10001]], dtype=int32) >>> z[:2] array([[ 0, 1, 2, ..., 9997, 9998, 9999], [10000, 10001, 10002, ..., 19997, 19998, 19999]], dtype=int32) >>> z[:, :2] array([[ 0, 1], [ 10000, 10001], [ 20000, 20001], ..., [99970000, 99970001], [99980000, 99980001], [99990000, 99990001]], dtype=int32) >>> z[:] array([[ 0, 1, 2, ..., 9997, 9998, 9999], [ 10000, 10001, 10002, ..., 19997, 19998, 19999], [ 20000, 20001, 20002, ..., 29997, 29998, 29999], ..., [99970000, 99970001, 99970002, ..., 99979997, 99979998, 99979999], [99980000, 99980001, 99980002, ..., 99989997, 99989998, 99989999], [99990000, 99990001, 99990002, ..., 99999997, 99999998, 99999999]], dtype=int32)
-
__setitem__(key, value)¶ Modify data for some portion of the array.
Examples
Setup a 1-dimensional array:
>>> import zarr >>> z = zarr.zeros(100000000, chunks=1000000, dtype='i4') >>> z Array((100000000,), int32, chunks=(1000000,), order=C) nbytes: 381.5M; nbytes_stored: 301; ratio: 1328903.7; initialized: 0/100 compressor: Blosc(cname='lz4', clevel=5, shuffle=1) store: dict
Set all array elements to the same scalar value:
>>> z[:] = 42 >>> z[:] array([42, 42, 42, ..., 42, 42, 42], dtype=int32)
Set a portion of the array:
>>> z[:100] = np.arange(100) >>> z[-100:] = np.arange(100)[::-1] >>> z[:] array([0, 1, 2, ..., 2, 1, 0], dtype=int32)
Setup a 2-dimensional array:
>>> z = zarr.zeros((10000, 10000), chunks=(1000, 1000), dtype='i4') >>> z Array((10000, 10000), int32, chunks=(1000, 1000), order=C) nbytes: 381.5M; nbytes_stored: 323; ratio: 1238390.1; initialized: 0/100 compressor: Blosc(cname='lz4', clevel=5, shuffle=1) store: dict
Set all array elements to the same scalar value:
>>> z[:] = 42 >>> z[:] array([[42, 42, 42, ..., 42, 42, 42], [42, 42, 42, ..., 42, 42, 42], [42, 42, 42, ..., 42, 42, 42], ..., [42, 42, 42, ..., 42, 42, 42], [42, 42, 42, ..., 42, 42, 42], [42, 42, 42, ..., 42, 42, 42]], dtype=int32)
Set a portion of the array:
>>> z[0, :] = np.arange(z.shape[1]) >>> z[:, 0] = np.arange(z.shape[0]) >>> z[:] array([[ 0, 1, 2, ..., 9997, 9998, 9999], [ 1, 42, 42, ..., 42, 42, 42], [ 2, 42, 42, ..., 42, 42, 42], ..., [9997, 42, 42, ..., 42, 42, 42], [9998, 42, 42, ..., 42, 42, 42], [9999, 42, 42, ..., 42, 42, 42]], dtype=int32)
-
resize(*args)¶ Change the shape of the array by growing or shrinking one or more dimensions.
Notes
When resizing an array, the data are not rearranged in any way.
If one or more dimensions are shrunk, any chunks falling outside the new array shape will be deleted from the underlying store.
Examples
>>> import zarr >>> z = zarr.zeros(shape=(10000, 10000), chunks=(1000, 1000)) >>> z Array((10000, 10000), float64, chunks=(1000, 1000), order=C) nbytes: 762.9M; nbytes_stored: 323; ratio: 2476780.2; initialized: 0/100 compressor: Blosc(cname='lz4', clevel=5, shuffle=1) store: dict >>> z.resize(20000, 10000) >>> z Array((20000, 10000), float64, chunks=(1000, 1000), order=C) nbytes: 1.5G; nbytes_stored: 323; ratio: 4953560.4; initialized: 0/200 compressor: Blosc(cname='lz4', clevel=5, shuffle=1) store: dict >>> z.resize(30000, 1000) >>> z Array((30000, 1000), float64, chunks=(1000, 1000), order=C) nbytes: 228.9M; nbytes_stored: 322; ratio: 745341.6; initialized: 0/30 compressor: Blosc(cname='lz4', clevel=5, shuffle=1) store: dict
-
append(data, axis=0)¶ Append data to axis.
Parameters: data : array_like
Data to be appended.
axis : int
Axis along which to append.
Notes
The size of all dimensions other than axis must match between this array and data.
Examples
>>> import numpy as np >>> import zarr >>> a = np.arange(10000000, dtype='i4').reshape(10000, 1000) >>> z = zarr.array(a, chunks=(1000, 100)) >>> z Array((10000, 1000), int32, chunks=(1000, 100), order=C) nbytes: 38.1M; nbytes_stored: 1.9M; ratio: 20.3; initialized: 100/100 compressor: Blosc(cname='lz4', clevel=5, shuffle=1) store: dict >>> z.append(a) >>> z Array((20000, 1000), int32, chunks=(1000, 100), order=C) nbytes: 76.3M; nbytes_stored: 3.8M; ratio: 20.3; initialized: 200/200 compressor: Blosc(cname='lz4', clevel=5, shuffle=1) store: dict >>> z.append(np.vstack([a, a]), axis=1) >>> z Array((20000, 2000), int32, chunks=(1000, 100), order=C) nbytes: 152.6M; nbytes_stored: 7.5M; ratio: 20.3; initialized: 400/400 compressor: Blosc(cname='lz4', clevel=5, shuffle=1) store: dict
-
view(shape=None, chunks=None, dtype=None, fill_value=None, filters=None, read_only=None, synchronizer=None)¶ Return an array sharing the same data.
Parameters: shape : int or tuple of ints
Array shape.
chunks : int or tuple of ints, optional
Chunk shape.
dtype : string or dtype, optional
NumPy dtype.
fill_value : object
Default value to use for uninitialized portions of the array.
filters : sequence, optional
Sequence of filters to use to encode chunk data prior to compression.
read_only : bool, optional
True if array should be protected against modification.
synchronizer : object, optional
Array synchronizer.
Notes
WARNING: This is an experimental feature and should be used with care. There are plenty of ways to generate errors and/or cause data corruption.
Examples
Bypass filters:
>>> import zarr >>> import numpy as np >>> np.random.seed(42) >>> labels = [b'female', b'male'] >>> data = np.random.choice(labels, size=10000) >>> filters = [zarr.Categorize(labels=labels, ... dtype=data.dtype, ... astype='u1')] >>> a = zarr.array(data, chunks=1000, filters=filters) >>> a[:] array([b'female', b'male', b'female', ..., b'male', b'male', b'female'], dtype='|S6') >>> v = a.view(dtype='u1', filters=[]) >>> v.is_view True >>> v[:] array([1, 2, 1, ..., 2, 2, 1], dtype=uint8)
Views can be used to modify data:
>>> x = v[:] >>> x.sort() >>> v[:] = x >>> v[:] array([1, 1, 1, ..., 2, 2, 2], dtype=uint8) >>> a[:] array([b'female', b'female', b'female', ..., b'male', b'male', b'male'], dtype='|S6')
View as a different dtype with the same itemsize:
>>> data = np.random.randint(0, 2, size=10000, dtype='u1') >>> a = zarr.array(data, chunks=1000) >>> a[:] array([0, 0, 1, ..., 1, 0, 0], dtype=uint8) >>> v = a.view(dtype=bool) >>> v[:] array([False, False, True, ..., True, False, False], dtype=bool) >>> np.all(a[:].view(dtype=bool) == v[:]) True
An array can be viewed with a dtype with a different itemsize, however some care is needed to adjust the shape and chunk shape so that chunk data is interpreted correctly:
>>> data = np.arange(10000, dtype='u2') >>> a = zarr.array(data, chunks=1000) >>> a[:10] array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint16) >>> v = a.view(dtype='u1', shape=20000, chunks=2000) >>> v[:10] array([0, 0, 1, 0, 2, 0, 3, 0, 4, 0], dtype=uint8) >>> np.all(a[:].view('u1') == v[:]) True
Change fill value for uninitialized chunks:
>>> a = zarr.full(10000, chunks=1000, fill_value=-1, dtype='i1') >>> a[:] array([-1, -1, -1, ..., -1, -1, -1], dtype=int8) >>> v = a.view(fill_value=42) >>> v[:] array([42, 42, 42, ..., 42, 42, 42], dtype=int8)
Note that resizing or appending to views is not permitted:
>>> a = zarr.empty(10000) >>> v = a.view() >>> try: ... v.resize(20000) ... except PermissionError as e: ... print(e) operation not permitted for views
-