The Array class (zarr.core)

class zarr.core.Array(store, path=None, read_only=False, chunk_store=None, synchronizer=None, cache_metadata=True, cache_attrs=True)

Instantiate an array from an initialized store.

Parameters:
store : MutableMapping

Array store, already initialized.

path : string, optional

Storage path.

read_only : bool, optional

True if array should be protected against modification.

chunk_store : MutableMapping, optional

Separate storage for chunks. If not provided, store will be used for storage of both chunks and metadata.

synchronizer : object, optional

Array synchronizer.

cache_metadata : bool, optional

If True (default), array configuration metadata will be cached for the lifetime of the object. If False, array metadata will be reloaded prior to all data access and modification operations (may incur overhead depending on storage and data access pattern).

cache_attrs : bool, optional

If True (default), user attributes will be cached for attribute read operations. If False, user attributes are reloaded from the store prior to all attribute read operations.

Attributes:
store

A MutableMapping providing the underlying storage for the array.

path

Storage path.

name

Array name following h5py convention.

read_only

A boolean, True if modification operations are not permitted.

chunk_store

A MutableMapping providing the underlying storage for array chunks.

shape

A tuple of integers describing the length of each dimension of the array.

chunks

A tuple of integers describing the length of each dimension of a chunk of the array.

dtype

The NumPy data type.

compression
compression_opts
fill_value

A value used for uninitialized portions of the array.

order

A string indicating the order in which bytes are arranged within chunks of the array.

synchronizer

Object used to synchronize write access to the array.

filters

One or more codecs used to transform data prior to compression.

attrs

A MutableMapping containing user-defined attributes.

size

The total number of elements in the array.

itemsize

The size in bytes of each item in the array.

nbytes

The total number of bytes that would be required to store the array without compression.

nbytes_stored

The total number of stored bytes of data for the array.

cdata_shape

A tuple of integers describing the number of chunks along each dimension of the array.

nchunks

Total number of chunks.

nchunks_initialized

The number of chunks that have been initialized with some data.

is_view

A boolean, True if this array is a view on another array.

info

Report some diagnostic information about the array.

vindex

Shortcut for vectorized (inner) indexing, see get_coordinate_selection(), set_coordinate_selection(), get_mask_selection() and set_mask_selection() for documentation and examples.

oindex

Shortcut for orthogonal (outer) indexing, see get_orthogonal_selection() and set_orthogonal_selection() for documentation and examples.

Methods

__getitem__(selection) Retrieve data for an item or region of the array.
__setitem__(selection, value) Modify data for an item or region of the array.
get_basic_selection([selection, out, fields]) Retrieve data for an item or region of the array.
set_basic_selection(selection, value[, fields]) Modify data for an item or region of the array.
get_orthogonal_selection(selection[, out, …]) Retrieve data by making a selection for each dimension of the array.
set_orthogonal_selection(selection, value[, …]) Modify data via a selection for each dimension of the array.
get_mask_selection(selection[, out, fields]) Retrieve a selection of individual items, by providing a Boolean array of the same shape as the array against which the selection is being made, where True values indicate a selected item.
set_mask_selection(selection, value[, fields]) Modify a selection of individual items, by providing a Boolean array of the same shape as the array against which the selection is being made, where True values indicate a selected item.
get_coordinate_selection(selection[, out, …]) Retrieve a selection of individual items, by providing the indices (coordinates) for each selected item.
set_coordinate_selection(selection, value[, …]) Modify a selection of individual items, by providing the indices (coordinates) for each item to be modified.
digest([hashname]) Compute a checksum for the data.
hexdigest([hashname]) Compute a checksum for the data.
resize(*args) Change the shape of the array by growing or shrinking one or more dimensions.
append(data[, axis]) Append data to axis.
view([shape, chunks, dtype, fill_value, …]) Return an array sharing the same data.
astype(dtype) Returns a view that does on the fly type conversion of the underlying data.
__getitem__(selection)

Retrieve data for an item or region of the array.

Parameters:
selection : tuple

An integer index or slice or tuple of int/slice objects specifying the requested item or region for each dimension of the array.

Returns:
out : ndarray

A NumPy array containing the data for the requested region.

Notes

Slices with step > 1 are supported, but slices with negative step are not.

Currently the implementation for __getitem__ is provided by get_basic_selection(). For advanced (“fancy”) indexing, see the methods listed under See Also.

Examples

Setup a 1-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.array(np.arange(100))

Retrieve a single item:

>>> z[5]
5

Retrieve a region via slicing:

>>> z[:5]
array([0, 1, 2, 3, 4])
>>> z[-5:]
array([95, 96, 97, 98, 99])
>>> z[5:10]
array([5, 6, 7, 8, 9])
>>> z[5:10:2]
array([5, 7, 9])
>>> z[::2]
array([ 0,  2,  4, ..., 94, 96, 98])

Load the entire array into memory:

>>> z[...]
array([ 0,  1,  2, ..., 97, 98, 99])

Setup a 2-dimensional array:

>>> z = zarr.array(np.arange(100).reshape(10, 10))

Retrieve an item:

>>> z[2, 2]
22

Retrieve a region via slicing:

>>> z[1:3, 1:3]
array([[11, 12],
       [21, 22]])
>>> z[1:3, :]
array([[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])
>>> z[:, 1:3]
array([[ 1,  2],
       [11, 12],
       [21, 22],
       [31, 32],
       [41, 42],
       [51, 52],
       [61, 62],
       [71, 72],
       [81, 82],
       [91, 92]])
>>> z[0:5:2, 0:5:2]
array([[ 0,  2,  4],
       [20, 22, 24],
       [40, 42, 44]])
>>> z[::2, ::2]
array([[ 0,  2,  4,  6,  8],
       [20, 22, 24, 26, 28],
       [40, 42, 44, 46, 48],
       [60, 62, 64, 66, 68],
       [80, 82, 84, 86, 88]])

Load the entire array into memory:

>>> z[...]
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

For arrays with a structured dtype, specific fields can be retrieved, e.g.:

>>> a = np.array([(b'aaa', 1, 4.2),
...               (b'bbb', 2, 8.4),
...               (b'ccc', 3, 12.6)],
...              dtype=[('foo', 'S3'), ('bar', 'i4'), ('baz', 'f8')])
>>> z = zarr.array(a)
>>> z['foo']
array([b'aaa', b'bbb', b'ccc'],
      dtype='|S3')
__setitem__(selection, value)

Modify data for an item or region of the array.

Parameters:
selection : tuple

An integer index or slice or tuple of int/slice specifying the requested region for each dimension of the array.

value : scalar or array-like

Value to be stored into the array.

Notes

Slices with step > 1 are supported, but slices with negative step are not.

Currently the implementation for __setitem__ is provided by set_basic_selection(), which means that only integers and slices are supported within the selection. For advanced (“fancy”) indexing, see the methods listed under See Also.

Examples

Setup a 1-dimensional array:

>>> import zarr
>>> z = zarr.zeros(100, dtype=int)

Set all array elements to the same scalar value:

>>> z[...] = 42
>>> z[...]
array([42, 42, 42, ..., 42, 42, 42])

Set a portion of the array:

>>> z[:10] = np.arange(10)
>>> z[-10:] = np.arange(10)[::-1]
>>> z[...]
array([ 0, 1, 2, ..., 2, 1, 0])

Setup a 2-dimensional array:

>>> z = zarr.zeros((5, 5), dtype=int)

Set all array elements to the same scalar value:

>>> z[...] = 42

Set a portion of the array:

>>> z[0, :] = np.arange(z.shape[1])
>>> z[:, 0] = np.arange(z.shape[0])
>>> z[...]
array([[ 0,  1,  2,  3,  4],
       [ 1, 42, 42, 42, 42],
       [ 2, 42, 42, 42, 42],
       [ 3, 42, 42, 42, 42],
       [ 4, 42, 42, 42, 42]])

For arrays with a structured dtype, specific fields can be modified, e.g.:

>>> a = np.array([(b'aaa', 1, 4.2),
...               (b'bbb', 2, 8.4),
...               (b'ccc', 3, 12.6)],
...              dtype=[('foo', 'S3'), ('bar', 'i4'), ('baz', 'f8')])
>>> z = zarr.array(a)
>>> z['foo'] = b'zzz'
>>> z[...]
array([(b'zzz', 1,   4.2), (b'zzz', 2,   8.4), (b'zzz', 3,  12.6)],
      dtype=[('foo', 'S3'), ('bar', '<i4'), ('baz', '<f8')])
get_basic_selection(selection=Ellipsis, out=None, fields=None)

Retrieve data for an item or region of the array.

Parameters:
selection : tuple

A tuple specifying the requested item or region for each dimension of the array. May be any combination of int and/or slice for multidimensional arrays.

out : ndarray, optional

If given, load the selected data directly into this array.

fields : str or sequence of str, optional

For arrays with a structured dtype, one or more fields can be specified to extract data for.

Returns:
out : ndarray

A NumPy array containing the data for the requested region.

Notes

Slices with step > 1 are supported, but slices with negative step are not.

Currently this method provides the implementation for accessing data via the square bracket notation (__getitem__). See __getitem__() for examples using the alternative notation.

Examples

Setup a 1-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.array(np.arange(100))

Retrieve a single item:

>>> z.get_basic_selection(5)
5

Retrieve a region via slicing:

>>> z.get_basic_selection(slice(5))
array([0, 1, 2, 3, 4])
>>> z.get_basic_selection(slice(-5, None))
array([95, 96, 97, 98, 99])
>>> z.get_basic_selection(slice(5, 10))
array([5, 6, 7, 8, 9])
>>> z.get_basic_selection(slice(5, 10, 2))
array([5, 7, 9])
>>> z.get_basic_selection(slice(None, None, 2))
array([  0,  2,  4, ..., 94, 96, 98])

Setup a 2-dimensional array:

>>> z = zarr.array(np.arange(100).reshape(10, 10))

Retrieve an item:

>>> z.get_basic_selection((2, 2))
22

Retrieve a region via slicing:

>>> z.get_basic_selection((slice(1, 3), slice(1, 3)))
array([[11, 12],
       [21, 22]])
>>> z.get_basic_selection((slice(1, 3), slice(None)))
array([[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])
>>> z.get_basic_selection((slice(None), slice(1, 3)))
array([[ 1,  2],
       [11, 12],
       [21, 22],
       [31, 32],
       [41, 42],
       [51, 52],
       [61, 62],
       [71, 72],
       [81, 82],
       [91, 92]])
>>> z.get_basic_selection((slice(0, 5, 2), slice(0, 5, 2)))
array([[ 0,  2,  4],
       [20, 22, 24],
       [40, 42, 44]])
>>> z.get_basic_selection((slice(None, None, 2), slice(None, None, 2)))
array([[ 0,  2,  4,  6,  8],
       [20, 22, 24, 26, 28],
       [40, 42, 44, 46, 48],
       [60, 62, 64, 66, 68],
       [80, 82, 84, 86, 88]])

For arrays with a structured dtype, specific fields can be retrieved, e.g.:

>>> a = np.array([(b'aaa', 1, 4.2),
...               (b'bbb', 2, 8.4),
...               (b'ccc', 3, 12.6)],
...              dtype=[('foo', 'S3'), ('bar', 'i4'), ('baz', 'f8')])
>>> z = zarr.array(a)
>>> z.get_basic_selection(slice(2), fields='foo')
array([b'aaa', b'bbb'],
      dtype='|S3')
set_basic_selection(selection, value, fields=None)

Modify data for an item or region of the array.

Parameters:
selection : tuple

An integer index or slice or tuple of int/slice specifying the requested region for each dimension of the array.

value : scalar or array-like

Value to be stored into the array.

fields : str or sequence of str, optional

For arrays with a structured dtype, one or more fields can be specified to set data for.

Notes

This method provides the underlying implementation for modifying data via square bracket notation, see __setitem__() for equivalent examples using the alternative notation.

Examples

Setup a 1-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.zeros(100, dtype=int)

Set all array elements to the same scalar value:

>>> z.set_basic_selection(..., 42)
>>> z[...]
array([42, 42, 42, ..., 42, 42, 42])

Set a portion of the array:

>>> z.set_basic_selection(slice(10), np.arange(10))
>>> z.set_basic_selection(slice(-10, None), np.arange(10)[::-1])
>>> z[...]
array([ 0, 1, 2, ..., 2, 1, 0])

Setup a 2-dimensional array:

>>> z = zarr.zeros((5, 5), dtype=int)

Set all array elements to the same scalar value:

>>> z.set_basic_selection(..., 42)

Set a portion of the array:

>>> z.set_basic_selection((0, slice(None)), np.arange(z.shape[1]))
>>> z.set_basic_selection((slice(None), 0), np.arange(z.shape[0]))
>>> z[...]
array([[ 0,  1,  2,  3,  4],
       [ 1, 42, 42, 42, 42],
       [ 2, 42, 42, 42, 42],
       [ 3, 42, 42, 42, 42],
       [ 4, 42, 42, 42, 42]])

For arrays with a structured dtype, the fields parameter can be used to set data for a specific field, e.g.:

>>> a = np.array([(b'aaa', 1, 4.2),
...               (b'bbb', 2, 8.4),
...               (b'ccc', 3, 12.6)],
...              dtype=[('foo', 'S3'), ('bar', 'i4'), ('baz', 'f8')])
>>> z = zarr.array(a)
>>> z.set_basic_selection(slice(0, 2), b'zzz', fields='foo')
>>> z[:]
array([(b'zzz', 1,   4.2), (b'zzz', 2,   8.4), (b'ccc', 3,  12.6)],
      dtype=[('foo', 'S3'), ('bar', '<i4'), ('baz', '<f8')])
get_mask_selection(selection, out=None, fields=None)

Retrieve a selection of individual items, by providing a Boolean array of the same shape as the array against which the selection is being made, where True values indicate a selected item.

Parameters:
selection : ndarray, bool

A Boolean array of the same shape as the array against which the selection is being made.

out : ndarray, optional

If given, load the selected data directly into this array.

fields : str or sequence of str, optional

For arrays with a structured dtype, one or more fields can be specified to extract data for.

Returns:
out : ndarray

A NumPy array containing the data for the requested selection.

Notes

Mask indexing is a form of vectorized or inner indexing, and is equivalent to coordinate indexing. Internally the mask array is converted to coordinate arrays by calling np.nonzero.

Examples

Setup a 2-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.array(np.arange(100).reshape(10, 10))

Retrieve items by specifying a maks:

>>> sel = np.zeros_like(z, dtype=bool)
>>> sel[1, 1] = True
>>> sel[4, 4] = True
>>> z.get_mask_selection(sel)
array([11, 44])

For convenience, the mask selection functionality is also available via the vindex property, e.g.:

>>> z.vindex[sel]
array([11, 44])
set_mask_selection(selection, value, fields=None)

Modify a selection of individual items, by providing a Boolean array of the same shape as the array against which the selection is being made, where True values indicate a selected item.

Parameters:
selection : ndarray, bool

A Boolean array of the same shape as the array against which the selection is being made.

value : scalar or array-like

Value to be stored into the array.

fields : str or sequence of str, optional

For arrays with a structured dtype, one or more fields can be specified to set data for.

Notes

Mask indexing is a form of vectorized or inner indexing, and is equivalent to coordinate indexing. Internally the mask array is converted to coordinate arrays by calling np.nonzero.

Examples

Setup a 2-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.zeros((5, 5), dtype=int)

Set data for a selection of items:

>>> sel = np.zeros_like(z, dtype=bool)
>>> sel[1, 1] = True
>>> sel[4, 4] = True
>>> z.set_mask_selection(sel, 1)
>>> z[...]
array([[0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 1]])

For convenience, this functionality is also available via the vindex property. E.g.:

>>> z.vindex[sel] = 2
>>> z[...]
array([[0, 0, 0, 0, 0],
       [0, 2, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 2]])
get_coordinate_selection(selection, out=None, fields=None)

Retrieve a selection of individual items, by providing the indices (coordinates) for each selected item.

Parameters:
selection : tuple

An integer (coordinate) array for each dimension of the array.

out : ndarray, optional

If given, load the selected data directly into this array.

fields : str or sequence of str, optional

For arrays with a structured dtype, one or more fields can be specified to extract data for.

Returns:
out : ndarray

A NumPy array containing the data for the requested selection.

Notes

Coordinate indexing is also known as point selection, and is a form of vectorized or inner indexing.

Slices are not supported. Coordinate arrays must be provided for all dimensions of the array.

Coordinate arrays may be multidimensional, in which case the output array will also be multidimensional. Coordinate arrays are broadcast against each other before being applied. The shape of the output will be the same as the shape of each coordinate array after broadcasting.

Examples

Setup a 2-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.array(np.arange(100).reshape(10, 10))

Retrieve items by specifying their coordinates:

>>> z.get_coordinate_selection(([1, 4], [1, 4]))
array([11, 44])

For convenience, the coordinate selection functionality is also available via the vindex property, e.g.:

>>> z.vindex[[1, 4], [1, 4]]
array([11, 44])
set_coordinate_selection(selection, value, fields=None)

Modify a selection of individual items, by providing the indices (coordinates) for each item to be modified.

Parameters:
selection : tuple

An integer (coordinate) array for each dimension of the array.

value : scalar or array-like

Value to be stored into the array.

fields : str or sequence of str, optional

For arrays with a structured dtype, one or more fields can be specified to set data for.

Notes

Coordinate indexing is also known as point selection, and is a form of vectorized or inner indexing.

Slices are not supported. Coordinate arrays must be provided for all dimensions of the array.

Examples

Setup a 2-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.zeros((5, 5), dtype=int)

Set data for a selection of items:

>>> z.set_coordinate_selection(([1, 4], [1, 4]), 1)
>>> z[...]
array([[0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 1]])

For convenience, this functionality is also available via the vindex property. E.g.:

>>> z.vindex[[1, 4], [1, 4]] = 2
>>> z[...]
array([[0, 0, 0, 0, 0],
       [0, 2, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 2]])
get_orthogonal_selection(selection, out=None, fields=None)

Retrieve data by making a selection for each dimension of the array. For example, if an array has 2 dimensions, allows selecting specific rows and/or columns. The selection for each dimension can be either an integer (indexing a single item), a slice, an array of integers, or a Boolean array where True values indicate a selection.

Parameters:
selection : tuple

A selection for each dimension of the array. May be any combination of int, slice, integer array or Boolean array.

out : ndarray, optional

If given, load the selected data directly into this array.

fields : str or sequence of str, optional

For arrays with a structured dtype, one or more fields can be specified to extract data for.

Returns:
out : ndarray

A NumPy array containing the data for the requested selection.

Notes

Orthogonal indexing is also known as outer indexing.

Slices with step > 1 are supported, but slices with negative step are not.

Examples

Setup a 2-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.array(np.arange(100).reshape(10, 10))

Retrieve rows and columns via any combination of int, slice, integer array and/or Boolean array:

>>> z.get_orthogonal_selection(([1, 4], slice(None)))
array([[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]])
>>> z.get_orthogonal_selection((slice(None), [1, 4]))
array([[ 1,  4],
       [11, 14],
       [21, 24],
       [31, 34],
       [41, 44],
       [51, 54],
       [61, 64],
       [71, 74],
       [81, 84],
       [91, 94]])
>>> z.get_orthogonal_selection(([1, 4], [1, 4]))
array([[11, 14],
       [41, 44]])
>>> sel = np.zeros(z.shape[0], dtype=bool)
>>> sel[1] = True
>>> sel[4] = True
>>> z.get_orthogonal_selection((sel, sel))
array([[11, 14],
       [41, 44]])

For convenience, the orthogonal selection functionality is also available via the oindex property, e.g.:

>>> z.oindex[[1, 4], :]
array([[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]])
>>> z.oindex[:, [1, 4]]
array([[ 1,  4],
       [11, 14],
       [21, 24],
       [31, 34],
       [41, 44],
       [51, 54],
       [61, 64],
       [71, 74],
       [81, 84],
       [91, 94]])
>>> z.oindex[[1, 4], [1, 4]]
array([[11, 14],
       [41, 44]])
>>> sel = np.zeros(z.shape[0], dtype=bool)
>>> sel[1] = True
>>> sel[4] = True
>>> z.oindex[sel, sel]
array([[11, 14],
       [41, 44]])
set_orthogonal_selection(selection, value, fields=None)

Modify data via a selection for each dimension of the array.

Parameters:
selection : tuple

A selection for each dimension of the array. May be any combination of int, slice, integer array or Boolean array.

value : scalar or array-like

Value to be stored into the array.

fields : str or sequence of str, optional

For arrays with a structured dtype, one or more fields can be specified to set data for.

Notes

Orthogonal indexing is also known as outer indexing.

Slices with step > 1 are supported, but slices with negative step are not.

Examples

Setup a 2-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.zeros((5, 5), dtype=int)

Set data for a selection of rows:

>>> z.set_orthogonal_selection(([1, 4], slice(None)), 1)
>>> z[...]
array([[0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1]])

Set data for a selection of columns:

>>> z.set_orthogonal_selection((slice(None), [1, 4]), 2)
>>> z[...]
array([[0, 2, 0, 0, 2],
       [1, 2, 1, 1, 2],
       [0, 2, 0, 0, 2],
       [0, 2, 0, 0, 2],
       [1, 2, 1, 1, 2]])

Set data for a selection of rows and columns:

>>> z.set_orthogonal_selection(([1, 4], [1, 4]), 3)
>>> z[...]
array([[0, 2, 0, 0, 2],
       [1, 3, 1, 1, 3],
       [0, 2, 0, 0, 2],
       [0, 2, 0, 0, 2],
       [1, 3, 1, 1, 3]])

For convenience, this functionality is also available via the oindex property. E.g.:

>>> z.oindex[[1, 4], [1, 4]] = 4
>>> z[...]
array([[0, 2, 0, 0, 2],
       [1, 4, 1, 1, 4],
       [0, 2, 0, 0, 2],
       [0, 2, 0, 0, 2],
       [1, 4, 1, 1, 4]])
digest(hashname='sha1')

Compute a checksum for the data. Default uses sha1 for speed.

Examples

>>> import binascii
>>> import zarr
>>> z = zarr.empty(shape=(10000, 10000), chunks=(1000, 1000))
>>> binascii.hexlify(z.digest())
b'041f90bc7a571452af4f850a8ca2c6cddfa8a1ac'
>>> z = zarr.zeros(shape=(10000, 10000), chunks=(1000, 1000))
>>> binascii.hexlify(z.digest())
b'7162d416d26a68063b66ed1f30e0a866e4abed60'
>>> z = zarr.zeros(shape=(10000, 10000), dtype="u1", chunks=(1000, 1000))
>>> binascii.hexlify(z.digest())
b'cb387af37410ae5a3222e893cf3373e4e4f22816'
hexdigest(hashname='sha1')

Compute a checksum for the data. Default uses sha1 for speed.

Examples

>>> import zarr
>>> z = zarr.empty(shape=(10000, 10000), chunks=(1000, 1000))
>>> z.hexdigest()
'041f90bc7a571452af4f850a8ca2c6cddfa8a1ac'
>>> z = zarr.zeros(shape=(10000, 10000), chunks=(1000, 1000))
>>> z.hexdigest()
'7162d416d26a68063b66ed1f30e0a866e4abed60'
>>> z = zarr.zeros(shape=(10000, 10000), dtype="u1", chunks=(1000, 1000))
>>> z.hexdigest()
'cb387af37410ae5a3222e893cf3373e4e4f22816'
resize(*args)

Change the shape of the array by growing or shrinking one or more dimensions.

Notes

When resizing an array, the data are not rearranged in any way.

If one or more dimensions are shrunk, any chunks falling outside the new array shape will be deleted from the underlying store.

Examples

>>> import zarr
>>> z = zarr.zeros(shape=(10000, 10000), chunks=(1000, 1000))
>>> z.shape
(10000, 10000)
>>> z.resize(20000, 10000)
>>> z.shape
(20000, 10000)
>>> z.resize(30000, 1000)
>>> z.shape
(30000, 1000)
append(data, axis=0)

Append data to axis.

Parameters:
data : array_like

Data to be appended.

axis : int

Axis along which to append.

Returns:
new_shape : tuple

Notes

The size of all dimensions other than axis must match between this array and data.

Examples

>>> import numpy as np
>>> import zarr
>>> a = np.arange(10000000, dtype='i4').reshape(10000, 1000)
>>> z = zarr.array(a, chunks=(1000, 100))
>>> z.shape
(10000, 1000)
>>> z.append(a)
(20000, 1000)
>>> z.append(np.vstack([a, a]), axis=1)
(20000, 2000)
>>> z.shape
(20000, 2000)
view(shape=None, chunks=None, dtype=None, fill_value=None, filters=None, read_only=None, synchronizer=None)

Return an array sharing the same data.

Parameters:
shape : int or tuple of ints

Array shape.

chunks : int or tuple of ints, optional

Chunk shape.

dtype : string or dtype, optional

NumPy dtype.

fill_value : object

Default value to use for uninitialized portions of the array.

filters : sequence, optional

Sequence of filters to use to encode chunk data prior to compression.

read_only : bool, optional

True if array should be protected against modification.

synchronizer : object, optional

Array synchronizer.

Notes

WARNING: This is an experimental feature and should be used with care. There are plenty of ways to generate errors and/or cause data corruption.

Examples

Bypass filters:

>>> import zarr
>>> import numpy as np
>>> np.random.seed(42)
>>> labels = ['female', 'male']
>>> data = np.random.choice(labels, size=10000)
>>> filters = [zarr.Categorize(labels=labels,
...                            dtype=data.dtype,
...                            astype='u1')]
>>> a = zarr.array(data, chunks=1000, filters=filters)
>>> a[:]
array(['female', 'male', 'female', ..., 'male', 'male', 'female'],
      dtype='<U6')
>>> v = a.view(dtype='u1', filters=[])
>>> v.is_view
True
>>> v[:]
array([1, 2, 1, ..., 2, 2, 1], dtype=uint8)

Views can be used to modify data:

>>> x = v[:]
>>> x.sort()
>>> v[:] = x
>>> v[:]
array([1, 1, 1, ..., 2, 2, 2], dtype=uint8)
>>> a[:]
array(['female', 'female', 'female', ..., 'male', 'male', 'male'],
      dtype='<U6')

View as a different dtype with the same item size:

>>> data = np.random.randint(0, 2, size=10000, dtype='u1')
>>> a = zarr.array(data, chunks=1000)
>>> a[:]
array([0, 0, 1, ..., 1, 0, 0], dtype=uint8)
>>> v = a.view(dtype=bool)
>>> v[:]
array([False, False,  True, ...,  True, False, False], dtype=bool)
>>> np.all(a[:].view(dtype=bool) == v[:])
True

An array can be viewed with a dtype with a different item size, however some care is needed to adjust the shape and chunk shape so that chunk data is interpreted correctly:

>>> data = np.arange(10000, dtype='u2')
>>> a = zarr.array(data, chunks=1000)
>>> a[:10]
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint16)
>>> v = a.view(dtype='u1', shape=20000, chunks=2000)
>>> v[:10]
array([0, 0, 1, 0, 2, 0, 3, 0, 4, 0], dtype=uint8)
>>> np.all(a[:].view('u1') == v[:])
True

Change fill value for uninitialized chunks:

>>> a = zarr.full(10000, chunks=1000, fill_value=-1, dtype='i1')
>>> a[:]
array([-1, -1, -1, ..., -1, -1, -1], dtype=int8)
>>> v = a.view(fill_value=42)
>>> v[:]
array([42, 42, 42, ..., 42, 42, 42], dtype=int8)

Note that resizing or appending to views is not permitted:

>>> a = zarr.empty(10000)
>>> v = a.view()
>>> try:
...     v.resize(20000)
... except PermissionError as e:
...     print(e)
operation not permitted for views
astype(dtype)

Returns a view that does on the fly type conversion of the underlying data.

Parameters:
dtype : string or dtype

NumPy dtype.

See also

Array.view

Notes

This method returns a new Array object which is a view on the same underlying chunk data. Modifying any data via the view is currently not permitted and will result in an error. This is an experimental feature and its behavior is subject to change in the future.

Examples

>>> import zarr
>>> import numpy as np
>>> data = np.arange(100, dtype=np.uint8)
>>> a = zarr.array(data, chunks=10)
>>> a[:]
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,
       16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
       32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
       48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
       64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79,
       80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95,
       96, 97, 98, 99], dtype=uint8)
>>> v = a.astype(np.float32)
>>> v.is_view
True
>>> v[:]
array([  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,
        10.,  11.,  12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.,
        20.,  21.,  22.,  23.,  24.,  25.,  26.,  27.,  28.,  29.,
        30.,  31.,  32.,  33.,  34.,  35.,  36.,  37.,  38.,  39.,
        40.,  41.,  42.,  43.,  44.,  45.,  46.,  47.,  48.,  49.,
        50.,  51.,  52.,  53.,  54.,  55.,  56.,  57.,  58.,  59.,
        60.,  61.,  62.,  63.,  64.,  65.,  66.,  67.,  68.,  69.,
        70.,  71.,  72.,  73.,  74.,  75.,  76.,  77.,  78.,  79.,
        80.,  81.,  82.,  83.,  84.,  85.,  86.,  87.,  88.,  89.,
        90.,  91.,  92.,  93.,  94.,  95.,  96.,  97.,  98.,  99.],
      dtype=float32)