Array#

class zarr.core.Array(store: Any, path=None, read_only=False, chunk_store=None, synchronizer=None, cache_metadata=True, cache_attrs=True, partial_decompress=False, write_empty_chunks=True, zarr_version=None, meta_array=None)[source]#

Bases: object

Instantiate an array from an initialized store.

Parameters:
storeMutableMapping

Array store, already initialized.

pathstring, optional

Storage path.

read_onlybool, optional

True if array should be protected against modification.

chunk_storeMutableMapping, optional

Separate storage for chunks. If not provided, store will be used for storage of both chunks and metadata.

synchronizerobject, optional

Array synchronizer.

cache_metadatabool, optional

If True (default), array configuration metadata will be cached for the lifetime of the object. If False, array metadata will be reloaded prior to all data access and modification operations (may incur overhead depending on storage and data access pattern).

cache_attrsbool, optional

If True (default), user attributes will be cached for attribute read operations. If False, user attributes are reloaded from the store prior to all attribute read operations.

partial_decompressbool, optional

If True and while the chunk_store is a FSStore and the compression used is Blosc, when getting data from the array chunks will be partially read and decompressed when possible.

New in version 2.7.

write_empty_chunksbool, optional

If True, all chunks will be stored regardless of their contents. If False (default), each chunk is compared to the array’s fill value prior to storing. If a chunk is uniformly equal to the fill value, then that chunk is not be stored, and the store entry for that chunk’s key is deleted. This setting enables sparser storage, as only chunks with non-fill-value data are stored, at the expense of overhead associated with checking the data of each chunk.

New in version 2.11.

meta_arrayarray-like, optional

An array instance to use for determining arrays to create and return to users. Use numpy.empty(()) by default.

New in version 2.13.

Attributes Summary

attrs

A MutableMapping containing user-defined attributes.

basename

Final component of name.

blocks

Shortcut for blocked chunked indexing, see get_block_selection() and set_block_selection() for documentation and examples.

cdata_shape

A tuple of integers describing the number of chunks along each dimension of the array.

chunk_store

A MutableMapping providing the underlying storage for array chunks.

chunks

A tuple of integers describing the length of each dimension of a chunk of the array.

compressor

Primary compression codec.

dtype

The NumPy data type.

fill_value

A value used for uninitialized portions of the array.

filters

One or more codecs used to transform data prior to compression.

info

Report some diagnostic information about the array.

initialized

The number of chunks that have been initialized with some data.

is_view

A boolean, True if this array is a view on another array.

itemsize

The size in bytes of each item in the array.

meta_array

An array-like instance to use for determining arrays to create and return to users.

name

Array name following h5py convention.

nbytes

The total number of bytes that would be required to store the array without compression.

nbytes_stored

The total number of stored bytes of data for the array.

nchunks

Total number of chunks.

nchunks_initialized

The number of chunks that have been initialized with some data.

ndim

Number of dimensions.

oindex

Shortcut for orthogonal (outer) indexing, see get_orthogonal_selection() and set_orthogonal_selection() for documentation and examples.

order

A string indicating the order in which bytes are arranged within chunks of the array.

path

Storage path.

read_only

A boolean, True if modification operations are not permitted.

shape

A tuple of integers describing the length of each dimension of the array.

size

The total number of elements in the array.

store

A MutableMapping providing the underlying storage for the array.

synchronizer

Object used to synchronize write access to the array.

vindex

Shortcut for vectorized (inner) indexing, see get_coordinate_selection(), set_coordinate_selection(), get_mask_selection() and set_mask_selection() for documentation and examples.

write_empty_chunks

A Boolean, True if chunks composed of the array's fill value will be stored.

Methods Summary

append(data[, axis])

Append data to axis.

astype(dtype)

Returns a view that does on the fly type conversion of the underlying data.

digest([hashname])

Compute a checksum for the data.

get_basic_selection([selection, out, fields])

Retrieve data for an item or region of the array.

get_block_selection(selection[, out, fields])

Retrieve a selection of individual chunk blocks, by providing the indices (coordinates) for each chunk block.

get_coordinate_selection(selection[, out, ...])

Retrieve a selection of individual items, by providing the indices (coordinates) for each selected item.

get_mask_selection(selection[, out, fields])

Retrieve a selection of individual items, by providing a Boolean array of the same shape as the array against which the selection is being made, where True values indicate a selected item.

get_orthogonal_selection(selection[, out, ...])

Retrieve data by making a selection for each dimension of the array.

hexdigest([hashname])

Compute a checksum for the data.

info_items()

islice([start, end])

Yield a generator for iterating over the entire or parts of the array.

resize(*args)

Change the shape of the array by growing or shrinking one or more dimensions.

set_basic_selection(selection, value[, fields])

Modify data for an item or region of the array.

set_block_selection(selection, value[, fields])

Modify a selection of individual blocks, by providing the chunk indices (coordinates) for each block to be modified.

set_coordinate_selection(selection, value[, ...])

Modify a selection of individual items, by providing the indices (coordinates) for each item to be modified.

set_mask_selection(selection, value[, fields])

Modify a selection of individual items, by providing a Boolean array of the same shape as the array against which the selection is being made, where True values indicate a selected item.

set_orthogonal_selection(selection, value[, ...])

Modify data via a selection for each dimension of the array.

view([shape, chunks, dtype, fill_value, ...])

Return an array sharing the same data.

Attributes Documentation

attrs#

A MutableMapping containing user-defined attributes. Note that attribute values must be JSON serializable.

basename#

Final component of name.

blocks#

Shortcut for blocked chunked indexing, see get_block_selection() and set_block_selection() for documentation and examples.

cdata_shape#

A tuple of integers describing the number of chunks along each dimension of the array.

chunk_store#

A MutableMapping providing the underlying storage for array chunks.

chunks#

A tuple of integers describing the length of each dimension of a chunk of the array.

compressor#

Primary compression codec.

dtype#

The NumPy data type.

fill_value#

A value used for uninitialized portions of the array.

filters#

One or more codecs used to transform data prior to compression.

info#

Report some diagnostic information about the array.

Examples

>>> import zarr
>>> z = zarr.zeros(1000000, chunks=100000, dtype='i4')
>>> z.info
Type               : zarr.core.Array
Data type          : int32
Shape              : (1000000,)
Chunk shape        : (100000,)
Order              : C
Read-only          : False
Compressor         : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)
Store type         : zarr.storage.KVStore
No. bytes          : 4000000 (3.8M)
No. bytes stored   : 320
Storage ratio      : 12500.0
Chunks initialized : 0/10
initialized#

The number of chunks that have been initialized with some data.

is_view#

A boolean, True if this array is a view on another array.

itemsize#

The size in bytes of each item in the array.

meta_array#

An array-like instance to use for determining arrays to create and return to users.

name#

Array name following h5py convention.

nbytes#

The total number of bytes that would be required to store the array without compression.

nbytes_stored#

The total number of stored bytes of data for the array. This includes storage required for configuration metadata and user attributes.

nchunks#

Total number of chunks.

nchunks_initialized#

The number of chunks that have been initialized with some data.

ndim#

Number of dimensions.

oindex#

Shortcut for orthogonal (outer) indexing, see get_orthogonal_selection() and set_orthogonal_selection() for documentation and examples.

order#

A string indicating the order in which bytes are arranged within chunks of the array.

path#

Storage path.

read_only#

A boolean, True if modification operations are not permitted.

shape#

A tuple of integers describing the length of each dimension of the array.

size#

The total number of elements in the array.

store#

A MutableMapping providing the underlying storage for the array.

synchronizer#

Object used to synchronize write access to the array.

vindex#

Shortcut for vectorized (inner) indexing, see get_coordinate_selection(), set_coordinate_selection(), get_mask_selection() and set_mask_selection() for documentation and examples.

write_empty_chunks#

A Boolean, True if chunks composed of the array’s fill value will be stored. If False, such chunks will not be stored.

Methods Documentation

append(data, axis=0)[source]#

Append data to axis.

Parameters:
dataarray-like

Data to be appended.

axisint

Axis along which to append.

Returns:
new_shapetuple

Notes

The size of all dimensions other than axis must match between this array and data.

Examples

>>> import numpy as np
>>> import zarr
>>> a = np.arange(10000000, dtype='i4').reshape(10000, 1000)
>>> z = zarr.array(a, chunks=(1000, 100))
>>> z.shape
(10000, 1000)
>>> z.append(a)
(20000, 1000)
>>> z.append(np.vstack([a, a]), axis=1)
(20000, 2000)
>>> z.shape
(20000, 2000)
astype(dtype)[source]#

Returns a view that does on the fly type conversion of the underlying data.

Parameters:
dtypestring or dtype

NumPy dtype.

See also

Array.view

Notes

This method returns a new Array object which is a view on the same underlying chunk data. Modifying any data via the view is currently not permitted and will result in an error. This is an experimental feature and its behavior is subject to change in the future.

Examples

>>> import zarr
>>> import numpy as np
>>> data = np.arange(100, dtype=np.uint8)
>>> a = zarr.array(data, chunks=10)
>>> a[:]
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,
       16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
       32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
       48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
       64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79,
       80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95,
       96, 97, 98, 99], dtype=uint8)
>>> v = a.astype(np.float32)
>>> v.is_view
True
>>> v[:]
array([  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,
        10.,  11.,  12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.,
        20.,  21.,  22.,  23.,  24.,  25.,  26.,  27.,  28.,  29.,
        30.,  31.,  32.,  33.,  34.,  35.,  36.,  37.,  38.,  39.,
        40.,  41.,  42.,  43.,  44.,  45.,  46.,  47.,  48.,  49.,
        50.,  51.,  52.,  53.,  54.,  55.,  56.,  57.,  58.,  59.,
        60.,  61.,  62.,  63.,  64.,  65.,  66.,  67.,  68.,  69.,
        70.,  71.,  72.,  73.,  74.,  75.,  76.,  77.,  78.,  79.,
        80.,  81.,  82.,  83.,  84.,  85.,  86.,  87.,  88.,  89.,
        90.,  91.,  92.,  93.,  94.,  95.,  96.,  97.,  98.,  99.],
      dtype=float32)
digest(hashname='sha1')[source]#

Compute a checksum for the data. Default uses sha1 for speed.

Examples

>>> import binascii
>>> import zarr
>>> z = zarr.empty(shape=(10000, 10000), chunks=(1000, 1000))
>>> binascii.hexlify(z.digest())
b'041f90bc7a571452af4f850a8ca2c6cddfa8a1ac'
>>> z = zarr.zeros(shape=(10000, 10000), chunks=(1000, 1000))
>>> binascii.hexlify(z.digest())
b'7162d416d26a68063b66ed1f30e0a866e4abed60'
>>> z = zarr.zeros(shape=(10000, 10000), dtype="u1", chunks=(1000, 1000))
>>> binascii.hexlify(z.digest())
b'cb387af37410ae5a3222e893cf3373e4e4f22816'
get_basic_selection(selection=Ellipsis, out=None, fields=None)[source]#

Retrieve data for an item or region of the array.

Parameters:
selectiontuple

A tuple specifying the requested item or region for each dimension of the array. May be any combination of int and/or slice for multidimensional arrays.

outndarray, optional

If given, load the selected data directly into this array.

fieldsstr or sequence of str, optional

For arrays with a structured dtype, one or more fields can be specified to extract data for.

Returns:
outndarray

A NumPy array containing the data for the requested region.

Notes

Slices with step > 1 are supported, but slices with negative step are not.

Currently this method provides the implementation for accessing data via the square bracket notation (__getitem__). See __getitem__() for examples using the alternative notation.

Examples

Setup a 1-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.array(np.arange(100))

Retrieve a single item:

>>> z.get_basic_selection(5)
5

Retrieve a region via slicing:

>>> z.get_basic_selection(slice(5))
array([0, 1, 2, 3, 4])
>>> z.get_basic_selection(slice(-5, None))
array([95, 96, 97, 98, 99])
>>> z.get_basic_selection(slice(5, 10))
array([5, 6, 7, 8, 9])
>>> z.get_basic_selection(slice(5, 10, 2))
array([5, 7, 9])
>>> z.get_basic_selection(slice(None, None, 2))
array([  0,  2,  4, ..., 94, 96, 98])

Setup a 2-dimensional array:

>>> z = zarr.array(np.arange(100).reshape(10, 10))

Retrieve an item:

>>> z.get_basic_selection((2, 2))
22

Retrieve a region via slicing:

>>> z.get_basic_selection((slice(1, 3), slice(1, 3)))
array([[11, 12],
       [21, 22]])
>>> z.get_basic_selection((slice(1, 3), slice(None)))
array([[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])
>>> z.get_basic_selection((slice(None), slice(1, 3)))
array([[ 1,  2],
       [11, 12],
       [21, 22],
       [31, 32],
       [41, 42],
       [51, 52],
       [61, 62],
       [71, 72],
       [81, 82],
       [91, 92]])
>>> z.get_basic_selection((slice(0, 5, 2), slice(0, 5, 2)))
array([[ 0,  2,  4],
       [20, 22, 24],
       [40, 42, 44]])
>>> z.get_basic_selection((slice(None, None, 2), slice(None, None, 2)))
array([[ 0,  2,  4,  6,  8],
       [20, 22, 24, 26, 28],
       [40, 42, 44, 46, 48],
       [60, 62, 64, 66, 68],
       [80, 82, 84, 86, 88]])

For arrays with a structured dtype, specific fields can be retrieved, e.g.:

>>> a = np.array([(b'aaa', 1, 4.2),
...               (b'bbb', 2, 8.4),
...               (b'ccc', 3, 12.6)],
...              dtype=[('foo', 'S3'), ('bar', 'i4'), ('baz', 'f8')])
>>> z = zarr.array(a)
>>> z.get_basic_selection(slice(2), fields='foo')
array([b'aaa', b'bbb'],
      dtype='|S3')
get_block_selection(selection, out=None, fields=None)[source]#

Retrieve a selection of individual chunk blocks, by providing the indices (coordinates) for each chunk block.

Parameters:
selectiontuple

An integer (coordinate) or slice for each dimension of the array.

outndarray, optional

If given, load the selected data directly into this array.

fieldsstr or sequence of str, optional

For arrays with a structured dtype, one or more fields can be specified to extract data for.

Returns:
outndarray

A NumPy array containing the data for the requested selection.

Notes

Block indexing is a convenience indexing method to work on individual chunks with chunk index slicing. It has the same concept as Dask’s Array.blocks indexing.

Slices are supported. However, only with a step size of one.

Block index arrays may be multidimensional to index multidimensional arrays. For example:

>>> z.blocks[0, 1:3]
array([[ 3,  4,  5,  6,  7,  8],
       [13, 14, 15, 16, 17, 18],
       [23, 24, 25, 26, 27, 28]])

Examples

Setup a 2-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.array(np.arange(100).reshape(10, 10), chunks=(3, 3))

Retrieve items by specifying their block coordinates:

>>> z.get_block_selection((1, slice(None)))
array([[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59]])

Which is equivalent to:

>>> z[3:6, :]
array([[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59]])

For convenience, the block selection functionality is also available via the blocks property, e.g.:

>>> z.blocks[1]
array([[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59]])
get_coordinate_selection(selection, out=None, fields=None)[source]#

Retrieve a selection of individual items, by providing the indices (coordinates) for each selected item.

Parameters:
selectiontuple

An integer (coordinate) array for each dimension of the array.

outndarray, optional

If given, load the selected data directly into this array.

fieldsstr or sequence of str, optional

For arrays with a structured dtype, one or more fields can be specified to extract data for.

Returns:
outndarray

A NumPy array containing the data for the requested selection.

Notes

Coordinate indexing is also known as point selection, and is a form of vectorized or inner indexing.

Slices are not supported. Coordinate arrays must be provided for all dimensions of the array.

Coordinate arrays may be multidimensional, in which case the output array will also be multidimensional. Coordinate arrays are broadcast against each other before being applied. The shape of the output will be the same as the shape of each coordinate array after broadcasting.

Examples

Setup a 2-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.array(np.arange(100).reshape(10, 10))

Retrieve items by specifying their coordinates:

>>> z.get_coordinate_selection(([1, 4], [1, 4]))
array([11, 44])

For convenience, the coordinate selection functionality is also available via the vindex property, e.g.:

>>> z.vindex[[1, 4], [1, 4]]
array([11, 44])
get_mask_selection(selection, out=None, fields=None)[source]#

Retrieve a selection of individual items, by providing a Boolean array of the same shape as the array against which the selection is being made, where True values indicate a selected item.

Parameters:
selectionndarray, bool

A Boolean array of the same shape as the array against which the selection is being made.

outndarray, optional

If given, load the selected data directly into this array.

fieldsstr or sequence of str, optional

For arrays with a structured dtype, one or more fields can be specified to extract data for.

Returns:
outndarray

A NumPy array containing the data for the requested selection.

Notes

Mask indexing is a form of vectorized or inner indexing, and is equivalent to coordinate indexing. Internally the mask array is converted to coordinate arrays by calling np.nonzero.

Examples

Setup a 2-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.array(np.arange(100).reshape(10, 10))

Retrieve items by specifying a mask:

>>> sel = np.zeros_like(z, dtype=bool)
>>> sel[1, 1] = True
>>> sel[4, 4] = True
>>> z.get_mask_selection(sel)
array([11, 44])

For convenience, the mask selection functionality is also available via the vindex property, e.g.:

>>> z.vindex[sel]
array([11, 44])
get_orthogonal_selection(selection, out=None, fields=None)[source]#

Retrieve data by making a selection for each dimension of the array. For example, if an array has 2 dimensions, allows selecting specific rows and/or columns. The selection for each dimension can be either an integer (indexing a single item), a slice, an array of integers, or a Boolean array where True values indicate a selection.

Parameters:
selectiontuple

A selection for each dimension of the array. May be any combination of int, slice, integer array or Boolean array.

outndarray, optional

If given, load the selected data directly into this array.

fieldsstr or sequence of str, optional

For arrays with a structured dtype, one or more fields can be specified to extract data for.

Returns:
outndarray

A NumPy array containing the data for the requested selection.

Notes

Orthogonal indexing is also known as outer indexing.

Slices with step > 1 are supported, but slices with negative step are not.

Examples

Setup a 2-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.array(np.arange(100).reshape(10, 10))

Retrieve rows and columns via any combination of int, slice, integer array and/or Boolean array:

>>> z.get_orthogonal_selection(([1, 4], slice(None)))
array([[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]])
>>> z.get_orthogonal_selection((slice(None), [1, 4]))
array([[ 1,  4],
       [11, 14],
       [21, 24],
       [31, 34],
       [41, 44],
       [51, 54],
       [61, 64],
       [71, 74],
       [81, 84],
       [91, 94]])
>>> z.get_orthogonal_selection(([1, 4], [1, 4]))
array([[11, 14],
       [41, 44]])
>>> sel = np.zeros(z.shape[0], dtype=bool)
>>> sel[1] = True
>>> sel[4] = True
>>> z.get_orthogonal_selection((sel, sel))
array([[11, 14],
       [41, 44]])

For convenience, the orthogonal selection functionality is also available via the oindex property, e.g.:

>>> z.oindex[[1, 4], :]
array([[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]])
>>> z.oindex[:, [1, 4]]
array([[ 1,  4],
       [11, 14],
       [21, 24],
       [31, 34],
       [41, 44],
       [51, 54],
       [61, 64],
       [71, 74],
       [81, 84],
       [91, 94]])
>>> z.oindex[[1, 4], [1, 4]]
array([[11, 14],
       [41, 44]])
>>> sel = np.zeros(z.shape[0], dtype=bool)
>>> sel[1] = True
>>> sel[4] = True
>>> z.oindex[sel, sel]
array([[11, 14],
       [41, 44]])
hexdigest(hashname='sha1')[source]#

Compute a checksum for the data. Default uses sha1 for speed.

Examples

>>> import zarr
>>> z = zarr.empty(shape=(10000, 10000), chunks=(1000, 1000))
>>> z.hexdigest()
'041f90bc7a571452af4f850a8ca2c6cddfa8a1ac'
>>> z = zarr.zeros(shape=(10000, 10000), chunks=(1000, 1000))
>>> z.hexdigest()
'7162d416d26a68063b66ed1f30e0a866e4abed60'
>>> z = zarr.zeros(shape=(10000, 10000), dtype="u1", chunks=(1000, 1000))
>>> z.hexdigest()
'cb387af37410ae5a3222e893cf3373e4e4f22816'
info_items()[source]#
islice(start=None, end=None)[source]#

Yield a generator for iterating over the entire or parts of the array. Uses a cache so chunks only have to be decompressed once.

Parameters:
startint, optional

Start index for the generator to start at. Defaults to 0.

endint, optional

End index for the generator to stop at. Defaults to self.shape[0].

Yields:
outgenerator

A generator that can be used to iterate over the requested region the array.

Examples

Setup a 1-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.array(np.arange(100))
Iterate over part of the array:
>>> for value in z.islice(25, 30): value;
25
26
27
28
29
resize(*args)[source]#

Change the shape of the array by growing or shrinking one or more dimensions.

Notes

When resizing an array, the data are not rearranged in any way.

If one or more dimensions are shrunk, any chunks falling outside the new array shape will be deleted from the underlying store. However, it is noteworthy that the chunks partially falling inside the new array (i.e. boundary chunks) will remain intact, and therefore, the data falling outside the new array but inside the boundary chunks would be restored by a subsequent resize operation that grows the array size.

Examples

>>> import zarr
>>> z = zarr.zeros(shape=(10000, 10000), chunks=(1000, 1000))
>>> z.shape
(10000, 10000)
>>> z.resize(20000, 10000)
>>> z.shape
(20000, 10000)
>>> z.resize(30000, 1000)
>>> z.shape
(30000, 1000)
set_basic_selection(selection, value, fields=None)[source]#

Modify data for an item or region of the array.

Parameters:
selectiontuple

An integer index or slice or tuple of int/slice specifying the requested region for each dimension of the array.

valuescalar or array-like

Value to be stored into the array.

fieldsstr or sequence of str, optional

For arrays with a structured dtype, one or more fields can be specified to set data for.

Notes

This method provides the underlying implementation for modifying data via square bracket notation, see __setitem__() for equivalent examples using the alternative notation.

Examples

Setup a 1-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.zeros(100, dtype=int)

Set all array elements to the same scalar value:

>>> z.set_basic_selection(..., 42)
>>> z[...]
array([42, 42, 42, ..., 42, 42, 42])

Set a portion of the array:

>>> z.set_basic_selection(slice(10), np.arange(10))
>>> z.set_basic_selection(slice(-10, None), np.arange(10)[::-1])
>>> z[...]
array([ 0, 1, 2, ..., 2, 1, 0])

Setup a 2-dimensional array:

>>> z = zarr.zeros((5, 5), dtype=int)

Set all array elements to the same scalar value:

>>> z.set_basic_selection(..., 42)

Set a portion of the array:

>>> z.set_basic_selection((0, slice(None)), np.arange(z.shape[1]))
>>> z.set_basic_selection((slice(None), 0), np.arange(z.shape[0]))
>>> z[...]
array([[ 0,  1,  2,  3,  4],
       [ 1, 42, 42, 42, 42],
       [ 2, 42, 42, 42, 42],
       [ 3, 42, 42, 42, 42],
       [ 4, 42, 42, 42, 42]])

For arrays with a structured dtype, the fields parameter can be used to set data for a specific field, e.g.:

>>> a = np.array([(b'aaa', 1, 4.2),
...               (b'bbb', 2, 8.4),
...               (b'ccc', 3, 12.6)],
...              dtype=[('foo', 'S3'), ('bar', 'i4'), ('baz', 'f8')])
>>> z = zarr.array(a)
>>> z.set_basic_selection(slice(0, 2), b'zzz', fields='foo')
>>> z[:]
array([(b'zzz', 1,   4.2), (b'zzz', 2,   8.4), (b'ccc', 3,  12.6)],
      dtype=[('foo', 'S3'), ('bar', '<i4'), ('baz', '<f8')])
set_block_selection(selection, value, fields=None)[source]#

Modify a selection of individual blocks, by providing the chunk indices (coordinates) for each block to be modified.

Parameters:
selectiontuple

An integer (coordinate) or slice for each dimension of the array.

valuescalar or array-like

Value to be stored into the array.

fieldsstr or sequence of str, optional

For arrays with a structured dtype, one or more fields can be specified to set data for.

Notes

Block indexing is a convenience indexing method to work on individual chunks with chunk index slicing. It has the same concept as Dask’s Array.blocks indexing.

Slices are supported. However, only with a step size of one.

Examples

Set up a 2-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.zeros((6, 6), dtype=int, chunks=2)

Set data for a selection of items:

>>> z.set_block_selection((1, 0), 1)
>>> z[...]
array([[0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [1, 1, 0, 0, 0, 0],
       [1, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0]])

For convenience, this functionality is also available via the blocks property. E.g.:

>>> z.blocks[2, 1] = 4
>>> z[...]
array([[0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [1, 1, 0, 0, 0, 0],
       [1, 1, 0, 0, 0, 0],
       [0, 0, 4, 4, 0, 0],
       [0, 0, 4, 4, 0, 0]])

>>> z.blocks[:, 2] = 7
>>> z[...]
array([[0, 0, 0, 0, 7, 7],
       [0, 0, 0, 0, 7, 7],
       [1, 1, 0, 0, 7, 7],
       [1, 1, 0, 0, 7, 7],
       [0, 0, 4, 4, 7, 7],
       [0, 0, 4, 4, 7, 7]])
set_coordinate_selection(selection, value, fields=None)[source]#

Modify a selection of individual items, by providing the indices (coordinates) for each item to be modified.

Parameters:
selectiontuple

An integer (coordinate) array for each dimension of the array.

valuescalar or array-like

Value to be stored into the array.

fieldsstr or sequence of str, optional

For arrays with a structured dtype, one or more fields can be specified to set data for.

Notes

Coordinate indexing is also known as point selection, and is a form of vectorized or inner indexing.

Slices are not supported. Coordinate arrays must be provided for all dimensions of the array.

Examples

Setup a 2-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.zeros((5, 5), dtype=int)

Set data for a selection of items:

>>> z.set_coordinate_selection(([1, 4], [1, 4]), 1)
>>> z[...]
array([[0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 1]])

For convenience, this functionality is also available via the vindex property. E.g.:

>>> z.vindex[[1, 4], [1, 4]] = 2
>>> z[...]
array([[0, 0, 0, 0, 0],
       [0, 2, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 2]])
set_mask_selection(selection, value, fields=None)[source]#

Modify a selection of individual items, by providing a Boolean array of the same shape as the array against which the selection is being made, where True values indicate a selected item.

Parameters:
selectionndarray, bool

A Boolean array of the same shape as the array against which the selection is being made.

valuescalar or array-like

Value to be stored into the array.

fieldsstr or sequence of str, optional

For arrays with a structured dtype, one or more fields can be specified to set data for.

Notes

Mask indexing is a form of vectorized or inner indexing, and is equivalent to coordinate indexing. Internally the mask array is converted to coordinate arrays by calling np.nonzero.

Examples

Setup a 2-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.zeros((5, 5), dtype=int)

Set data for a selection of items:

>>> sel = np.zeros_like(z, dtype=bool)
>>> sel[1, 1] = True
>>> sel[4, 4] = True
>>> z.set_mask_selection(sel, 1)
>>> z[...]
array([[0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 1]])

For convenience, this functionality is also available via the vindex property. E.g.:

>>> z.vindex[sel] = 2
>>> z[...]
array([[0, 0, 0, 0, 0],
       [0, 2, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 2]])
set_orthogonal_selection(selection, value, fields=None)[source]#

Modify data via a selection for each dimension of the array.

Parameters:
selectiontuple

A selection for each dimension of the array. May be any combination of int, slice, integer array or Boolean array.

valuescalar or array-like

Value to be stored into the array.

fieldsstr or sequence of str, optional

For arrays with a structured dtype, one or more fields can be specified to set data for.

Notes

Orthogonal indexing is also known as outer indexing.

Slices with step > 1 are supported, but slices with negative step are not.

Examples

Setup a 2-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.zeros((5, 5), dtype=int)

Set data for a selection of rows:

>>> z.set_orthogonal_selection(([1, 4], slice(None)), 1)
>>> z[...]
array([[0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1]])

Set data for a selection of columns:

>>> z.set_orthogonal_selection((slice(None), [1, 4]), 2)
>>> z[...]
array([[0, 2, 0, 0, 2],
       [1, 2, 1, 1, 2],
       [0, 2, 0, 0, 2],
       [0, 2, 0, 0, 2],
       [1, 2, 1, 1, 2]])

Set data for a selection of rows and columns:

>>> z.set_orthogonal_selection(([1, 4], [1, 4]), 3)
>>> z[...]
array([[0, 2, 0, 0, 2],
       [1, 3, 1, 1, 3],
       [0, 2, 0, 0, 2],
       [0, 2, 0, 0, 2],
       [1, 3, 1, 1, 3]])

For convenience, this functionality is also available via the oindex property. E.g.:

>>> z.oindex[[1, 4], [1, 4]] = 4
>>> z[...]
array([[0, 2, 0, 0, 2],
       [1, 4, 1, 1, 4],
       [0, 2, 0, 0, 2],
       [0, 2, 0, 0, 2],
       [1, 4, 1, 1, 4]])
view(shape=None, chunks=None, dtype=None, fill_value=None, filters=None, read_only=None, synchronizer=None)[source]#

Return an array sharing the same data.

Parameters:
shapeint or tuple of ints

Array shape.

chunksint or tuple of ints, optional

Chunk shape.

dtypestring or dtype, optional

NumPy dtype.

fill_valueobject

Default value to use for uninitialized portions of the array.

filterssequence, optional

Sequence of filters to use to encode chunk data prior to compression.

read_onlybool, optional

True if array should be protected against modification.

synchronizerobject, optional

Array synchronizer.

Notes

WARNING: This is an experimental feature and should be used with care. There are plenty of ways to generate errors and/or cause data corruption.

Examples

Bypass filters:

>>> import zarr
>>> import numpy as np
>>> np.random.seed(42)
>>> labels = ['female', 'male']
>>> data = np.random.choice(labels, size=10000)
>>> filters = [zarr.Categorize(labels=labels,
...                            dtype=data.dtype,
...                            astype='u1')]
>>> a = zarr.array(data, chunks=1000, filters=filters)
>>> a[:]
array(['female', 'male', 'female', ..., 'male', 'male', 'female'],
      dtype='<U6')
>>> v = a.view(dtype='u1', filters=[])
>>> v.is_view
True
>>> v[:]
array([1, 2, 1, ..., 2, 2, 1], dtype=uint8)

Views can be used to modify data:

>>> x = v[:]
>>> x.sort()
>>> v[:] = x
>>> v[:]
array([1, 1, 1, ..., 2, 2, 2], dtype=uint8)
>>> a[:]
array(['female', 'female', 'female', ..., 'male', 'male', 'male'],
      dtype='<U6')

View as a different dtype with the same item size:

>>> data = np.random.randint(0, 2, size=10000, dtype='u1')
>>> a = zarr.array(data, chunks=1000)
>>> a[:]
array([0, 0, 1, ..., 1, 0, 0], dtype=uint8)
>>> v = a.view(dtype=bool)
>>> v[:]
array([False, False,  True, ...,  True, False, False])
>>> np.all(a[:].view(dtype=bool) == v[:])
True

An array can be viewed with a dtype with a different item size, however some care is needed to adjust the shape and chunk shape so that chunk data is interpreted correctly:

>>> data = np.arange(10000, dtype='u2')
>>> a = zarr.array(data, chunks=1000)
>>> a[:10]
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint16)
>>> v = a.view(dtype='u1', shape=20000, chunks=2000)
>>> v[:10]
array([0, 0, 1, 0, 2, 0, 3, 0, 4, 0], dtype=uint8)
>>> np.all(a[:].view('u1') == v[:])
True

Change fill value for uninitialized chunks:

>>> a = zarr.full(10000, chunks=1000, fill_value=-1, dtype='i1')
>>> a[:]
array([-1, -1, -1, ..., -1, -1, -1], dtype=int8)
>>> v = a.view(fill_value=42)
>>> v[:]
array([42, 42, 42, ..., 42, 42, 42], dtype=int8)

Note that resizing or appending to views is not permitted:

>>> a = zarr.empty(10000)
>>> v = a.view()
>>> try:
...     v.resize(20000)
... except PermissionError as e:
...     print(e)
operation not permitted for views