Array#

class zarr.core.Array(store: Any, path=None, read_only=False, chunk_store=None, synchronizer=None, cache_metadata=True, cache_attrs=True, partial_decompress=False, write_empty_chunks=True, zarr_version=None, meta_array=None)[source]#

Bases: object

Instantiate an array from an initialized store.

Parameters:

storeMutableMapping: Array store, already initialized.
pathstring, optional: Storage path.
read_onlybool, optional: True if array should be protected against modification.
chunk_storeMutableMapping, optional: Separate storage for chunks. If not provided, store will be used for storage of both chunks and metadata.
synchronizerobject, optional: Array synchronizer.
cache_metadatabool, optional: If True (default), array configuration metadata will be cached for the lifetime of the object. If False, array metadata will be reloaded prior to all data access and modification operations (may incur overhead depending on storage and data access pattern).
cache_attrsbool, optional: If True (default), user attributes will be cached for attribute read operations. If False, user attributes are reloaded from the store prior to all attribute read operations.
partial_decompressbool, optional: If True and while the chunk_store is a FSStore and the compression used is Blosc, when getting data from the array chunks will be partially read and decompressed when possible.

New in version 2.7.
write_empty_chunksbool, optional: If True, all chunks will be stored regardless of their contents. If False (default), each chunk is compared to the array’s fill value prior to storing. If a chunk is uniformly equal to the fill value, then that chunk is not be stored, and the store entry for that chunk’s key is deleted. This setting enables sparser storage, as only chunks with non-fill-value data are stored, at the expense of overhead associated with checking the data of each chunk.

New in version 2.11.
meta_arrayarray-like, optional: An array instance to use for determining arrays to create and return to users. Use numpy.empty(()) by default.

New in version 2.13.

Attributes Summary

`attrs`	A MutableMapping containing user-defined attributes.
`basename`	Final component of name.
`blocks`	Shortcut for blocked chunked indexing, see `get_block_selection()` and `set_block_selection()` for documentation and examples.
`cdata_shape`	A tuple of integers describing the number of chunks along each dimension of the array.
`chunk_store`	A MutableMapping providing the underlying storage for array chunks.
`chunks`	A tuple of integers describing the length of each dimension of a chunk of the array.
`compressor`	Primary compression codec.
`dtype`	The NumPy data type.
`fill_value`	A value used for uninitialized portions of the array.
`filters`	One or more codecs used to transform data prior to compression.
`info`	Report some diagnostic information about the array.
`initialized`	The number of chunks that have been initialized with some data.
`is_view`	A boolean, True if this array is a view on another array.
`itemsize`	The size in bytes of each item in the array.
`meta_array`	An array-like instance to use for determining arrays to create and return to users.
`name`	Array name following h5py convention.
`nbytes`	The total number of bytes that would be required to store the array without compression.
`nbytes_stored`	The total number of stored bytes of data for the array.
`nchunks`	Total number of chunks.
`nchunks_initialized`	The number of chunks that have been initialized with some data.
`ndim`	Number of dimensions.
`oindex`	Shortcut for orthogonal (outer) indexing, see `get_orthogonal_selection()` and `set_orthogonal_selection()` for documentation and examples.
`order`	A string indicating the order in which bytes are arranged within chunks of the array.
`path`	Storage path.
`read_only`	A boolean, True if modification operations are not permitted.
`shape`	A tuple of integers describing the length of each dimension of the array.
`size`	The total number of elements in the array.
`store`	A MutableMapping providing the underlying storage for the array.
`synchronizer`	Object used to synchronize write access to the array.
`vindex`	Shortcut for vectorized (inner) indexing, see `get_coordinate_selection()`, `set_coordinate_selection()`, `get_mask_selection()` and `set_mask_selection()` for documentation and examples.
`write_empty_chunks`	A Boolean, True if chunks composed of the array's fill value will be stored.

Methods Summary

`append`(data[, axis])	Append data to axis.
`astype`(dtype)	Returns a view that does on the fly type conversion of the underlying data.
`digest`([hashname])	Compute a checksum for the data.
`get_basic_selection`([selection, out, fields])	Retrieve data for an item or region of the array.
`get_block_selection`(selection[, out, fields])	Retrieve a selection of individual chunk blocks, by providing the indices (coordinates) for each chunk block.
`get_coordinate_selection`(selection[, out, ...])	Retrieve a selection of individual items, by providing the indices (coordinates) for each selected item.
`get_mask_selection`(selection[, out, fields])	Retrieve a selection of individual items, by providing a Boolean array of the same shape as the array against which the selection is being made, where True values indicate a selected item.
`get_orthogonal_selection`(selection[, out, ...])	Retrieve data by making a selection for each dimension of the array.
`hexdigest`([hashname])	Compute a checksum for the data.
`info_items`()
`islice`([start, end])	Yield a generator for iterating over the entire or parts of the array.
`resize`(*args)	Change the shape of the array by growing or shrinking one or more dimensions.
`set_basic_selection`(selection, value[, fields])	Modify data for an item or region of the array.
`set_block_selection`(selection, value[, fields])	Modify a selection of individual blocks, by providing the chunk indices (coordinates) for each block to be modified.
`set_coordinate_selection`(selection, value[, ...])	Modify a selection of individual items, by providing the indices (coordinates) for each item to be modified.
`set_mask_selection`(selection, value[, fields])	Modify a selection of individual items, by providing a Boolean array of the same shape as the array against which the selection is being made, where True values indicate a selected item.
`set_orthogonal_selection`(selection, value[, ...])	Modify data via a selection for each dimension of the array.
`view`([shape, chunks, dtype, fill_value, ...])	Return an array sharing the same data.

Attributes Documentation

attrs#: A MutableMapping containing user-defined attributes. Note that attribute values must be JSON serializable.

basename#: Final component of name.

blocks#: Shortcut for blocked chunked indexing, see get_block_selection() and set_block_selection() for documentation and examples.

cdata_shape#: A tuple of integers describing the number of chunks along each dimension of the array.

chunk_store#: A MutableMapping providing the underlying storage for array chunks.

chunks#: A tuple of integers describing the length of each dimension of a chunk of the array.

compressor#: Primary compression codec.

dtype#: The NumPy data type.

fill_value#: A value used for uninitialized portions of the array.

filters#: One or more codecs used to transform data prior to compression.

info#

Report some diagnostic information about the array.

Examples

>>> import zarr
>>> z = zarr.zeros(1000000, chunks=100000, dtype='i4')
>>> z.info
Type               : zarr.core.Array
Data type          : int32
Shape              : (1000000,)
Chunk shape        : (100000,)
Order              : C
Read-only          : False
Compressor         : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)
Store type         : zarr.storage.KVStore
No. bytes          : 4000000 (3.8M)
No. bytes stored   : 320
Storage ratio      : 12500.0
Chunks initialized : 0/10

initialized#: The number of chunks that have been initialized with some data.

is_view#: A boolean, True if this array is a view on another array.

itemsize#: The size in bytes of each item in the array.

meta_array#: An array-like instance to use for determining arrays to create and return to users.

name#: Array name following h5py convention.

nbytes#: The total number of bytes that would be required to store the array without compression.

nbytes_stored#: The total number of stored bytes of data for the array. This includes storage required for configuration metadata and user attributes.

nchunks#: Total number of chunks.

nchunks_initialized#: The number of chunks that have been initialized with some data.

ndim#: Number of dimensions.

oindex#: Shortcut for orthogonal (outer) indexing, see get_orthogonal_selection() and set_orthogonal_selection() for documentation and examples.

order#: A string indicating the order in which bytes are arranged within chunks of the array.

path#: Storage path.

read_only#: A boolean, True if modification operations are not permitted.

shape#: A tuple of integers describing the length of each dimension of the array.

size#: The total number of elements in the array.

store#: A MutableMapping providing the underlying storage for the array.

synchronizer#: Object used to synchronize write access to the array.

vindex#: Shortcut for vectorized (inner) indexing, see get_coordinate_selection(), set_coordinate_selection(), get_mask_selection() and set_mask_selection() for documentation and examples.

write_empty_chunks#: A Boolean, True if chunks composed of the array’s fill value will be stored. If False, such chunks will not be stored.

Methods Documentation

append(data, axis=0)[source]#

Append data to axis.

Parameters:

dataarray-like: Data to be appended.
axisint: Axis along which to append.

Returns:

new_shapetuple

Notes

The size of all dimensions other than axis must match between this array and data.

Examples

>>> import numpy as np
>>> import zarr
>>> a = np.arange(10000000, dtype='i4').reshape(10000, 1000)
>>> z = zarr.array(a, chunks=(1000, 100))
>>> z.shape
(10000, 1000)
>>> z.append(a)
(20000, 1000)
>>> z.append(np.vstack([a, a]), axis=1)
(20000, 2000)
>>> z.shape
(20000, 2000)

astype(dtype)[source]#

Returns a view that does on the fly type conversion of the underlying data.

Parameters:

dtypestring or dtype: NumPy dtype.

See also

Array.view

Notes

This method returns a new Array object which is a view on the same underlying chunk data. Modifying any data via the view is currently not permitted and will result in an error. This is an experimental feature and its behavior is subject to change in the future.

Examples

>>> import zarr
>>> import numpy as np
>>> data = np.arange(100, dtype=np.uint8)
>>> a = zarr.array(data, chunks=10)
>>> a[:]
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,
       16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
       32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
       48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
       64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79,
       80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95,
       96, 97, 98, 99], dtype=uint8)
>>> v = a.astype(np.float32)
>>> v.is_view
True
>>> v[:]
array([  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,
        10.,  11.,  12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.,
        20.,  21.,  22.,  23.,  24.,  25.,  26.,  27.,  28.,  29.,
        30.,  31.,  32.,  33.,  34.,  35.,  36.,  37.,  38.,  39.,
        40.,  41.,  42.,  43.,  44.,  45.,  46.,  47.,  48.,  49.,
        50.,  51.,  52.,  53.,  54.,  55.,  56.,  57.,  58.,  59.,
        60.,  61.,  62.,  63.,  64.,  65.,  66.,  67.,  68.,  69.,
        70.,  71.,  72.,  73.,  74.,  75.,  76.,  77.,  78.,  79.,
        80.,  81.,  82.,  83.,  84.,  85.,  86.,  87.,  88.,  89.,
        90.,  91.,  92.,  93.,  94.,  95.,  96.,  97.,  98.,  99.],
      dtype=float32)

digest(hashname='sha1')[source]#

Compute a checksum for the data. Default uses sha1 for speed.

Examples

>>> import binascii
>>> import zarr
>>> z = zarr.empty(shape=(10000, 10000), chunks=(1000, 1000))
>>> binascii.hexlify(z.digest())
b'041f90bc7a571452af4f850a8ca2c6cddfa8a1ac'
>>> z = zarr.zeros(shape=(10000, 10000), chunks=(1000, 1000))
>>> binascii.hexlify(z.digest())
b'7162d416d26a68063b66ed1f30e0a866e4abed60'
>>> z = zarr.zeros(shape=(10000, 10000), dtype="u1", chunks=(1000, 1000))
>>> binascii.hexlify(z.digest())
b'cb387af37410ae5a3222e893cf3373e4e4f22816'

get_basic_selection(selection=Ellipsis, out=None, fields=None)[source]#

Retrieve data for an item or region of the array.

Parameters:

selectiontuple: A tuple specifying the requested item or region for each dimension of the array. May be any combination of int and/or slice for multidimensional arrays.
outndarray, optional: If given, load the selected data directly into this array.
fieldsstr or sequence of str, optional: For arrays with a structured dtype, one or more fields can be specified to extract data for.

Returns:

outndarray: A NumPy array containing the data for the requested region.

See also

set_basic_selection, get_mask_selection, set_mask_selection
get_coordinate_selection, set_coordinate_selection, get_orthogonal_selection
set_orthogonal_selection, get_block_selection, set_block_selection
vindex, oindex, blocks, __getitem__, __setitem__

Notes

Slices with step > 1 are supported, but slices with negative step are not.

Currently this method provides the implementation for accessing data via the square bracket notation (__getitem__). See __getitem__() for examples using the alternative notation.

Examples

Setup a 1-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.array(np.arange(100))

Retrieve a single item:

>>> z.get_basic_selection(5)
5

Retrieve a region via slicing:

>>> z.get_basic_selection(slice(5))
array([0, 1, 2, 3, 4])
>>> z.get_basic_selection(slice(-5, None))
array([95, 96, 97, 98, 99])
>>> z.get_basic_selection(slice(5, 10))
array([5, 6, 7, 8, 9])
>>> z.get_basic_selection(slice(5, 10, 2))
array([5, 7, 9])
>>> z.get_basic_selection(slice(None, None, 2))
array([  0,  2,  4, ..., 94, 96, 98])

Setup a 2-dimensional array:

>>> z = zarr.array(np.arange(100).reshape(10, 10))

Retrieve an item:

>>> z.get_basic_selection((2, 2))
22

Retrieve a region via slicing:

>>> z.get_basic_selection((slice(1, 3), slice(1, 3)))
array([[11, 12],
       [21, 22]])
>>> z.get_basic_selection((slice(1, 3), slice(None)))
array([[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])
>>> z.get_basic_selection((slice(None), slice(1, 3)))
array([[ 1,  2],
       [11, 12],
       [21, 22],
       [31, 32],
       [41, 42],
       [51, 52],
       [61, 62],
       [71, 72],
       [81, 82],
       [91, 92]])
>>> z.get_basic_selection((slice(0, 5, 2), slice(0, 5, 2)))
array([[ 0,  2,  4],
       [20, 22, 24],
       [40, 42, 44]])
>>> z.get_basic_selection((slice(None, None, 2), slice(None, None, 2)))
array([[ 0,  2,  4,  6,  8],
       [20, 22, 24, 26, 28],
       [40, 42, 44, 46, 48],
       [60, 62, 64, 66, 68],
       [80, 82, 84, 86, 88]])

For arrays with a structured dtype, specific fields can be retrieved, e.g.:

>>> a = np.array([(b'aaa', 1, 4.2),
...               (b'bbb', 2, 8.4),
...               (b'ccc', 3, 12.6)],
...              dtype=[('foo', 'S3'), ('bar', 'i4'), ('baz', 'f8')])
>>> z = zarr.array(a)
>>> z.get_basic_selection(slice(2), fields='foo')
array([b'aaa', b'bbb'],
      dtype='|S3')

get_block_selection(selection, out=None, fields=None)[source]#

Retrieve a selection of individual chunk blocks, by providing the indices (coordinates) for each chunk block.

Parameters:

selectiontuple: An integer (coordinate) or slice for each dimension of the array.
outndarray, optional: If given, load the selected data directly into this array.
fieldsstr or sequence of str, optional: For arrays with a structured dtype, one or more fields can be specified to extract data for.

Returns:

outndarray: A NumPy array containing the data for the requested selection.

See also

get_basic_selection, set_basic_selection, get_mask_selection, set_mask_selection
get_orthogonal_selection, set_orthogonal_selection, get_coordinate_selection
set_coordinate_selection, set_block_selection
vindex, oindex, blocks, __getitem__, __setitem__

Notes

Block indexing is a convenience indexing method to work on individual chunks with chunk index slicing. It has the same concept as Dask’s Array.blocks indexing.

Slices are supported. However, only with a step size of one.

Block index arrays may be multidimensional to index multidimensional arrays. For example:

>>> z.blocks[0, 1:3]
array([[ 3,  4,  5,  6,  7,  8],
       [13, 14, 15, 16, 17, 18],
       [23, 24, 25, 26, 27, 28]])

Examples

Setup a 2-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.array(np.arange(100).reshape(10, 10), chunks=(3, 3))

Retrieve items by specifying their block coordinates:

>>> z.get_block_selection((1, slice(None)))
array([[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59]])

Which is equivalent to:

>>> z[3:6, :]
array([[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59]])

For convenience, the block selection functionality is also available via the blocks property, e.g.:

>>> z.blocks[1]
array([[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59]])

get_coordinate_selection(selection, out=None, fields=None)[source]#

Retrieve a selection of individual items, by providing the indices (coordinates) for each selected item.

Parameters:

selectiontuple: An integer (coordinate) array for each dimension of the array.
outndarray, optional: If given, load the selected data directly into this array.
fieldsstr or sequence of str, optional: For arrays with a structured dtype, one or more fields can be specified to extract data for.

Returns:

outndarray: A NumPy array containing the data for the requested selection.

See also

get_basic_selection, set_basic_selection, get_mask_selection, set_mask_selection
get_orthogonal_selection, set_orthogonal_selection, set_coordinate_selection
get_block_selection, set_block_selection
vindex, oindex, blocks, __getitem__, __setitem__

Notes

Coordinate indexing is also known as point selection, and is a form of vectorized or inner indexing.

Slices are not supported. Coordinate arrays must be provided for all dimensions of the array.

Coordinate arrays may be multidimensional, in which case the output array will also be multidimensional. Coordinate arrays are broadcast against each other before being applied. The shape of the output will be the same as the shape of each coordinate array after broadcasting.

Examples

Setup a 2-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.array(np.arange(100).reshape(10, 10))

Retrieve items by specifying their coordinates:

>>> z.get_coordinate_selection(([1, 4], [1, 4]))
array([11, 44])

For convenience, the coordinate selection functionality is also available via the vindex property, e.g.:

>>> z.vindex[[1, 4], [1, 4]]
array([11, 44])

get_mask_selection(selection, out=None, fields=None)[source]#

Retrieve a selection of individual items, by providing a Boolean array of the same shape as the array against which the selection is being made, where True values indicate a selected item.

Parameters:

selectionndarray, bool: A Boolean array of the same shape as the array against which the selection is being made.
outndarray, optional: If given, load the selected data directly into this array.
fieldsstr or sequence of str, optional: For arrays with a structured dtype, one or more fields can be specified to extract data for.

Returns:

outndarray: A NumPy array containing the data for the requested selection.

See also

get_basic_selection, set_basic_selection, set_mask_selection
get_orthogonal_selection, set_orthogonal_selection, get_coordinate_selection
set_coordinate_selection, get_block_selection, set_block_selection
vindex, oindex, blocks, __getitem__, __setitem__

Notes

Mask indexing is a form of vectorized or inner indexing, and is equivalent to coordinate indexing. Internally the mask array is converted to coordinate arrays by calling np.nonzero.

Examples

Setup a 2-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.array(np.arange(100).reshape(10, 10))

Retrieve items by specifying a mask:

>>> sel = np.zeros_like(z, dtype=bool)
>>> sel[1, 1] = True
>>> sel[4, 4] = True
>>> z.get_mask_selection(sel)
array([11, 44])

For convenience, the mask selection functionality is also available via the vindex property, e.g.:

>>> z.vindex[sel]
array([11, 44])

get_orthogonal_selection(selection, out=None, fields=None)[source]#

Retrieve data by making a selection for each dimension of the array. For example, if an array has 2 dimensions, allows selecting specific rows and/or columns. The selection for each dimension can be either an integer (indexing a single item), a slice, an array of integers, or a Boolean array where True values indicate a selection.

Parameters:

selectiontuple: A selection for each dimension of the array. May be any combination of int, slice, integer array or Boolean array.
outndarray, optional: If given, load the selected data directly into this array.
fieldsstr or sequence of str, optional: For arrays with a structured dtype, one or more fields can be specified to extract data for.

Returns:

outndarray: A NumPy array containing the data for the requested selection.

See also

get_basic_selection, set_basic_selection, get_mask_selection, set_mask_selection
get_coordinate_selection, set_coordinate_selection, set_orthogonal_selection
get_block_selection, set_block_selection
vindex, oindex, blocks, __getitem__, __setitem__

Notes

Orthogonal indexing is also known as outer indexing.

Slices with step > 1 are supported, but slices with negative step are not.

Examples

Setup a 2-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.array(np.arange(100).reshape(10, 10))

Retrieve rows and columns via any combination of int, slice, integer array and/or Boolean array:

>>> z.get_orthogonal_selection(([1, 4], slice(None)))
array([[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]])
>>> z.get_orthogonal_selection((slice(None), [1, 4]))
array([[ 1,  4],
       [11, 14],
       [21, 24],
       [31, 34],
       [41, 44],
       [51, 54],
       [61, 64],
       [71, 74],
       [81, 84],
       [91, 94]])
>>> z.get_orthogonal_selection(([1, 4], [1, 4]))
array([[11, 14],
       [41, 44]])
>>> sel = np.zeros(z.shape[0], dtype=bool)
>>> sel[1] = True
>>> sel[4] = True
>>> z.get_orthogonal_selection((sel, sel))
array([[11, 14],
       [41, 44]])

For convenience, the orthogonal selection functionality is also available via the oindex property, e.g.:

>>> z.oindex[[1, 4], :]
array([[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]])
>>> z.oindex[:, [1, 4]]
array([[ 1,  4],
       [11, 14],
       [21, 24],
       [31, 34],
       [41, 44],
       [51, 54],
       [61, 64],
       [71, 74],
       [81, 84],
       [91, 94]])
>>> z.oindex[[1, 4], [1, 4]]
array([[11, 14],
       [41, 44]])
>>> sel = np.zeros(z.shape[0], dtype=bool)
>>> sel[1] = True
>>> sel[4] = True
>>> z.oindex[sel, sel]
array([[11, 14],
       [41, 44]])

hexdigest(hashname='sha1')[source]#

Compute a checksum for the data. Default uses sha1 for speed.

Examples

>>> import zarr
>>> z = zarr.empty(shape=(10000, 10000), chunks=(1000, 1000))
>>> z.hexdigest()
'041f90bc7a571452af4f850a8ca2c6cddfa8a1ac'
>>> z = zarr.zeros(shape=(10000, 10000), chunks=(1000, 1000))
>>> z.hexdigest()
'7162d416d26a68063b66ed1f30e0a866e4abed60'
>>> z = zarr.zeros(shape=(10000, 10000), dtype="u1", chunks=(1000, 1000))
>>> z.hexdigest()
'cb387af37410ae5a3222e893cf3373e4e4f22816'

info_items()[source]#

islice(start=None, end=None)[source]#

Yield a generator for iterating over the entire or parts of the array. Uses a cache so chunks only have to be decompressed once.

Parameters:

startint, optional: Start index for the generator to start at. Defaults to 0.
endint, optional: End index for the generator to stop at. Defaults to self.shape[0].

Yields:

outgenerator: A generator that can be used to iterate over the requested region the array.

Examples

Setup a 1-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.array(np.arange(100))

Iterate over part of the array:

>>> for value in z.islice(25, 30): value;
25
26
27
28
29

resize(*args)[source]#

Change the shape of the array by growing or shrinking one or more dimensions.

Notes

When resizing an array, the data are not rearranged in any way.

If one or more dimensions are shrunk, any chunks falling outside the new array shape will be deleted from the underlying store. However, it is noteworthy that the chunks partially falling inside the new array (i.e. boundary chunks) will remain intact, and therefore, the data falling outside the new array but inside the boundary chunks would be restored by a subsequent resize operation that grows the array size.

Examples

>>> import zarr
>>> z = zarr.zeros(shape=(10000, 10000), chunks=(1000, 1000))
>>> z.shape
(10000, 10000)
>>> z.resize(20000, 10000)
>>> z.shape
(20000, 10000)
>>> z.resize(30000, 1000)
>>> z.shape
(30000, 1000)

set_basic_selection(selection, value, fields=None)[source]#

Modify data for an item or region of the array.

Parameters:

selectiontuple: An integer index or slice or tuple of int/slice specifying the requested region for each dimension of the array.
valuescalar or array-like: Value to be stored into the array.
fieldsstr or sequence of str, optional: For arrays with a structured dtype, one or more fields can be specified to set data for.

See also

get_basic_selection, get_mask_selection, set_mask_selection
get_coordinate_selection, set_coordinate_selection, get_orthogonal_selection
set_orthogonal_selection, get_block_selection, set_block_selection
vindex, oindex, blocks, __getitem__, __setitem__

Notes

This method provides the underlying implementation for modifying data via square bracket notation, see __setitem__() for equivalent examples using the alternative notation.

Examples

Setup a 1-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.zeros(100, dtype=int)

Set all array elements to the same scalar value:

>>> z.set_basic_selection(..., 42)
>>> z[...]
array([42, 42, 42, ..., 42, 42, 42])

Set a portion of the array:

>>> z.set_basic_selection(slice(10), np.arange(10))
>>> z.set_basic_selection(slice(-10, None), np.arange(10)[::-1])
>>> z[...]
array([ 0, 1, 2, ..., 2, 1, 0])

Setup a 2-dimensional array:

>>> z = zarr.zeros((5, 5), dtype=int)

Set all array elements to the same scalar value:

>>> z.set_basic_selection(..., 42)

Set a portion of the array:

>>> z.set_basic_selection((0, slice(None)), np.arange(z.shape[1]))
>>> z.set_basic_selection((slice(None), 0), np.arange(z.shape[0]))
>>> z[...]
array([[ 0,  1,  2,  3,  4],
       [ 1, 42, 42, 42, 42],
       [ 2, 42, 42, 42, 42],
       [ 3, 42, 42, 42, 42],
       [ 4, 42, 42, 42, 42]])

For arrays with a structured dtype, the fields parameter can be used to set data for a specific field, e.g.:

>>> a = np.array([(b'aaa', 1, 4.2),
...               (b'bbb', 2, 8.4),
...               (b'ccc', 3, 12.6)],
...              dtype=[('foo', 'S3'), ('bar', 'i4'), ('baz', 'f8')])
>>> z = zarr.array(a)
>>> z.set_basic_selection(slice(0, 2), b'zzz', fields='foo')
>>> z[:]
array([(b'zzz', 1,   4.2), (b'zzz', 2,   8.4), (b'ccc', 3,  12.6)],
      dtype=[('foo', 'S3'), ('bar', '<i4'), ('baz', '<f8')])

set_block_selection(selection, value, fields=None)[source]#

Modify a selection of individual blocks, by providing the chunk indices (coordinates) for each block to be modified.

Parameters:

selectiontuple: An integer (coordinate) or slice for each dimension of the array.
valuescalar or array-like: Value to be stored into the array.
fieldsstr or sequence of str, optional: For arrays with a structured dtype, one or more fields can be specified to set data for.

See also

get_basic_selection, set_basic_selection, get_mask_selection, set_mask_selection
get_orthogonal_selection, set_orthogonal_selection, get_coordinate_selection
get_block_selection, set_block_selection
vindex, oindex, blocks, __getitem__, __setitem__

Notes

Block indexing is a convenience indexing method to work on individual chunks with chunk index slicing. It has the same concept as Dask’s Array.blocks indexing.

Slices are supported. However, only with a step size of one.

Examples

Set up a 2-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.zeros((6, 6), dtype=int, chunks=2)

Set data for a selection of items:

>>> z.set_block_selection((1, 0), 1)
>>> z[...]
array([[0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [1, 1, 0, 0, 0, 0],
       [1, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0]])

For convenience, this functionality is also available via the blocks property. E.g.:

>>> z.blocks[2, 1] = 4
>>> z[...]
array([[0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [1, 1, 0, 0, 0, 0],
       [1, 1, 0, 0, 0, 0],
       [0, 0, 4, 4, 0, 0],
       [0, 0, 4, 4, 0, 0]])

>>> z.blocks[:, 2] = 7
>>> z[...]
array([[0, 0, 0, 0, 7, 7],
       [0, 0, 0, 0, 7, 7],
       [1, 1, 0, 0, 7, 7],
       [1, 1, 0, 0, 7, 7],
       [0, 0, 4, 4, 7, 7],
       [0, 0, 4, 4, 7, 7]])

set_coordinate_selection(selection, value, fields=None)[source]#

Modify a selection of individual items, by providing the indices (coordinates) for each item to be modified.

Parameters:

selectiontuple: An integer (coordinate) array for each dimension of the array.
valuescalar or array-like: Value to be stored into the array.
fieldsstr or sequence of str, optional: For arrays with a structured dtype, one or more fields can be specified to set data for.

See also

get_basic_selection, set_basic_selection, get_mask_selection, set_mask_selection
get_orthogonal_selection, set_orthogonal_selection, get_coordinate_selection
get_block_selection, set_block_selection
vindex, oindex, blocks, __getitem__, __setitem__

Notes

Coordinate indexing is also known as point selection, and is a form of vectorized or inner indexing.

Slices are not supported. Coordinate arrays must be provided for all dimensions of the array.

Examples

Setup a 2-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.zeros((5, 5), dtype=int)

Set data for a selection of items:

>>> z.set_coordinate_selection(([1, 4], [1, 4]), 1)
>>> z[...]
array([[0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 1]])

For convenience, this functionality is also available via the vindex property. E.g.:

>>> z.vindex[[1, 4], [1, 4]] = 2
>>> z[...]
array([[0, 0, 0, 0, 0],
       [0, 2, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 2]])

set_mask_selection(selection, value, fields=None)[source]#

Modify a selection of individual items, by providing a Boolean array of the same shape as the array against which the selection is being made, where True values indicate a selected item.

Parameters:

selectionndarray, bool: A Boolean array of the same shape as the array against which the selection is being made.
valuescalar or array-like: Value to be stored into the array.
fieldsstr or sequence of str, optional: For arrays with a structured dtype, one or more fields can be specified to set data for.

See also

get_basic_selection, set_basic_selection, get_mask_selection
get_orthogonal_selection, set_orthogonal_selection, get_coordinate_selection
set_coordinate_selection, get_block_selection, set_block_selection
vindex, oindex, blocks, __getitem__, __setitem__

Notes

Mask indexing is a form of vectorized or inner indexing, and is equivalent to coordinate indexing. Internally the mask array is converted to coordinate arrays by calling np.nonzero.

Examples

Setup a 2-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.zeros((5, 5), dtype=int)

Set data for a selection of items:

>>> sel = np.zeros_like(z, dtype=bool)
>>> sel[1, 1] = True
>>> sel[4, 4] = True
>>> z.set_mask_selection(sel, 1)
>>> z[...]
array([[0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 1]])

For convenience, this functionality is also available via the vindex property. E.g.:

>>> z.vindex[sel] = 2
>>> z[...]
array([[0, 0, 0, 0, 0],
       [0, 2, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 2]])

set_orthogonal_selection(selection, value, fields=None)[source]#

Modify data via a selection for each dimension of the array.

Parameters:

selectiontuple: A selection for each dimension of the array. May be any combination of int, slice, integer array or Boolean array.
valuescalar or array-like: Value to be stored into the array.
fieldsstr or sequence of str, optional: For arrays with a structured dtype, one or more fields can be specified to set data for.

See also

get_basic_selection, set_basic_selection, get_mask_selection, set_mask_selection
get_coordinate_selection, set_coordinate_selection, get_orthogonal_selection
get_block_selection, set_block_selection
vindex, oindex, blocks, __getitem__, __setitem__

Notes

Orthogonal indexing is also known as outer indexing.

Slices with step > 1 are supported, but slices with negative step are not.

Examples

Setup a 2-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.zeros((5, 5), dtype=int)

Set data for a selection of rows:

>>> z.set_orthogonal_selection(([1, 4], slice(None)), 1)
>>> z[...]
array([[0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1]])

Set data for a selection of columns:

>>> z.set_orthogonal_selection((slice(None), [1, 4]), 2)
>>> z[...]
array([[0, 2, 0, 0, 2],
       [1, 2, 1, 1, 2],
       [0, 2, 0, 0, 2],
       [0, 2, 0, 0, 2],
       [1, 2, 1, 1, 2]])

Set data for a selection of rows and columns:

>>> z.set_orthogonal_selection(([1, 4], [1, 4]), 3)
>>> z[...]
array([[0, 2, 0, 0, 2],
       [1, 3, 1, 1, 3],
       [0, 2, 0, 0, 2],
       [0, 2, 0, 0, 2],
       [1, 3, 1, 1, 3]])

For convenience, this functionality is also available via the oindex property. E.g.:

>>> z.oindex[[1, 4], [1, 4]] = 4
>>> z[...]
array([[0, 2, 0, 0, 2],
       [1, 4, 1, 1, 4],
       [0, 2, 0, 0, 2],
       [0, 2, 0, 0, 2],
       [1, 4, 1, 1, 4]])

view(shape=None, chunks=None, dtype=None, fill_value=None, filters=None, read_only=None, synchronizer=None)[source]#

Return an array sharing the same data.

Parameters:

shapeint or tuple of ints: Array shape.
chunksint or tuple of ints, optional: Chunk shape.
dtypestring or dtype, optional: NumPy dtype.
fill_valueobject: Default value to use for uninitialized portions of the array.
filterssequence, optional: Sequence of filters to use to encode chunk data prior to compression.
read_onlybool, optional: True if array should be protected against modification.
synchronizerobject, optional: Array synchronizer.

Notes

WARNING: This is an experimental feature and should be used with care. There are plenty of ways to generate errors and/or cause data corruption.

Examples

Bypass filters:

>>> import zarr
>>> import numpy as np
>>> np.random.seed(42)
>>> labels = ['female', 'male']
>>> data = np.random.choice(labels, size=10000)
>>> filters = [zarr.Categorize(labels=labels,
...                            dtype=data.dtype,
...                            astype='u1')]
>>> a = zarr.array(data, chunks=1000, filters=filters)
>>> a[:]
array(['female', 'male', 'female', ..., 'male', 'male', 'female'],
      dtype='<U6')
>>> v = a.view(dtype='u1', filters=[])
>>> v.is_view
True
>>> v[:]
array([1, 2, 1, ..., 2, 2, 1], dtype=uint8)

Views can be used to modify data:

>>> x = v[:]
>>> x.sort()
>>> v[:] = x
>>> v[:]
array([1, 1, 1, ..., 2, 2, 2], dtype=uint8)
>>> a[:]
array(['female', 'female', 'female', ..., 'male', 'male', 'male'],
      dtype='<U6')

View as a different dtype with the same item size:

>>> data = np.random.randint(0, 2, size=10000, dtype='u1')
>>> a = zarr.array(data, chunks=1000)
>>> a[:]
array([0, 0, 1, ..., 1, 0, 0], dtype=uint8)
>>> v = a.view(dtype=bool)
>>> v[:]
array([False, False,  True, ...,  True, False, False])
>>> np.all(a[:].view(dtype=bool) == v[:])
True

An array can be viewed with a dtype with a different item size, however some care is needed to adjust the shape and chunk shape so that chunk data is interpreted correctly:

>>> data = np.arange(10000, dtype='u2')
>>> a = zarr.array(data, chunks=1000)
>>> a[:10]
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint16)
>>> v = a.view(dtype='u1', shape=20000, chunks=2000)
>>> v[:10]
array([0, 0, 1, 0, 2, 0, 3, 0, 4, 0], dtype=uint8)
>>> np.all(a[:].view('u1') == v[:])
True

Change fill value for uninitialized chunks:

>>> a = zarr.full(10000, chunks=1000, fill_value=-1, dtype='i1')
>>> a[:]
array([-1, -1, -1, ..., -1, -1, -1], dtype=int8)
>>> v = a.view(fill_value=42)
>>> v[:]
array([42, 42, 42, ..., 42, 42, 42], dtype=int8)

Note that resizing or appending to views is not permitted:

>>> a = zarr.empty(10000)
>>> v = a.view()
>>> try:
...     v.resize(20000)
... except PermissionError as e:
...     print(e)
operation not permitted for views