The Array class (zarr.core)

class zarr.core.Array(store, path=None, read_only=False, chunk_store=None, synchronizer=None, cache_metadata=True)

Instantiate an array from an initialized store.

Parameters:

store : MutableMapping

Array store, already initialized.

path : string, optional

Storage path.

read_only : bool, optional

True if array should be protected against modification.

chunk_store : MutableMapping, optional

Separate storage for chunks. If not provided, store will be used for storage of both chunks and metadata.

synchronizer : object, optional

Array synchronizer.

cache_metadata : bool, optional

If True, array configuration metadata will be cached for the lifetime of the object. If False, array metadata will be reloaded prior to all data access and modification operations (may incur overhead depending on storage and data access pattern).

Attributes

store A MutableMapping providing the underlying storage for the array.
path Storage path.
name Array name following h5py convention.
read_only A boolean, True if modification operations are not permitted.
chunk_store A MutableMapping providing the underlying storage for array chunks.
shape A tuple of integers describing the length of each dimension of the array.
chunks A tuple of integers describing the length of each dimension of a chunk of the array.
dtype The NumPy data type.
fill_value A value used for uninitialized portions of the array.
order A string indicating the order in which bytes are arranged within chunks of the array.
synchronizer Object used to synchronize write access to the array.
filters One or more codecs used to transform data prior to compression.
attrs A MutableMapping containing user-defined attributes.
size The total number of elements in the array.
itemsize The size in bytes of each item in the array.
nbytes The total number of bytes that would be required to store the array without compression.
nbytes_stored The total number of stored bytes of data for the array.
cdata_shape A tuple of integers describing the number of chunks along each dimension of the array.
nchunks Total number of chunks.
nchunks_initialized The number of chunks that have been initialized with some data.
is_view A boolean, True if this array is a view on another array.
compression  
compression_opts  

Methods

__getitem__(item) Retrieve data for some portion of the array.
__setitem__(key, value) Modify data for some portion of the array.
resize(*args) Change the shape of the array by growing or shrinking one or more dimensions.
append(data[, axis]) Append data to axis.
view([shape, chunks, dtype, fill_value, ...]) Return an array sharing the same data.
__getitem__(item)

Retrieve data for some portion of the array. Most NumPy-style slicing operations are supported.

Returns:

out : ndarray

A NumPy array containing the data for the requested region.

Examples

Setup a 1-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.array(np.arange(100000000), chunks=1000000, dtype='i4')
>>> z
Array((100000000,), int32, chunks=(1000000,), order=C)
  nbytes: 381.5M; nbytes_stored: 6.4M; ratio: 59.9; initialized: 100/100
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict

Take some slices:

>>> z[5]
5
>>> z[:5]
array([0, 1, 2, 3, 4], dtype=int32)
>>> z[-5:]
array([99999995, 99999996, 99999997, 99999998, 99999999], dtype=int32)
>>> z[5:10]
array([5, 6, 7, 8, 9], dtype=int32)
>>> z[:]
array([       0,        1,        2, ..., 99999997, 99999998, 99999999], dtype=int32)

Setup a 2-dimensional array:

>>> import zarr
>>> import numpy as np
>>> z = zarr.array(np.arange(100000000).reshape(10000, 10000),
...                chunks=(1000, 1000), dtype='i4')
>>> z
Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
  nbytes: 381.5M; nbytes_stored: 9.2M; ratio: 41.6; initialized: 100/100
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict

Take some slices:

>>> z[2, 2]
20002
>>> z[:2, :2]
array([[    0,     1],
       [10000, 10001]], dtype=int32)
>>> z[:2]
array([[    0,     1,     2, ...,  9997,  9998,  9999],
       [10000, 10001, 10002, ..., 19997, 19998, 19999]], dtype=int32)
>>> z[:, :2]
array([[       0,        1],
       [   10000,    10001],
       [   20000,    20001],
       ...,
       [99970000, 99970001],
       [99980000, 99980001],
       [99990000, 99990001]], dtype=int32)
>>> z[:]
array([[       0,        1,        2, ...,     9997,     9998,     9999],
       [   10000,    10001,    10002, ...,    19997,    19998,    19999],
       [   20000,    20001,    20002, ...,    29997,    29998,    29999],
       ...,
       [99970000, 99970001, 99970002, ..., 99979997, 99979998, 99979999],
       [99980000, 99980001, 99980002, ..., 99989997, 99989998, 99989999],
       [99990000, 99990001, 99990002, ..., 99999997, 99999998, 99999999]], dtype=int32)
__setitem__(key, value)

Modify data for some portion of the array.

Examples

Setup a 1-dimensional array:

>>> import zarr
>>> z = zarr.zeros(100000000, chunks=1000000, dtype='i4')
>>> z
Array((100000000,), int32, chunks=(1000000,), order=C)
  nbytes: 381.5M; nbytes_stored: 301; ratio: 1328903.7; initialized: 0/100
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict

Set all array elements to the same scalar value:

>>> z[:] = 42
>>> z[:]
array([42, 42, 42, ..., 42, 42, 42], dtype=int32)

Set a portion of the array:

>>> z[:100] = np.arange(100)
>>> z[-100:] = np.arange(100)[::-1]
>>> z[:]
array([0, 1, 2, ..., 2, 1, 0], dtype=int32)

Setup a 2-dimensional array:

>>> z = zarr.zeros((10000, 10000), chunks=(1000, 1000), dtype='i4')
>>> z
Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
  nbytes: 381.5M; nbytes_stored: 323; ratio: 1238390.1; initialized: 0/100
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict

Set all array elements to the same scalar value:

>>> z[:] = 42
>>> z[:]
array([[42, 42, 42, ..., 42, 42, 42],
       [42, 42, 42, ..., 42, 42, 42],
       [42, 42, 42, ..., 42, 42, 42],
       ...,
       [42, 42, 42, ..., 42, 42, 42],
       [42, 42, 42, ..., 42, 42, 42],
       [42, 42, 42, ..., 42, 42, 42]], dtype=int32)

Set a portion of the array:

>>> z[0, :] = np.arange(z.shape[1])
>>> z[:, 0] = np.arange(z.shape[0])
>>> z[:]
array([[   0,    1,    2, ..., 9997, 9998, 9999],
       [   1,   42,   42, ...,   42,   42,   42],
       [   2,   42,   42, ...,   42,   42,   42],
       ...,
       [9997,   42,   42, ...,   42,   42,   42],
       [9998,   42,   42, ...,   42,   42,   42],
       [9999,   42,   42, ...,   42,   42,   42]], dtype=int32)
resize(*args)

Change the shape of the array by growing or shrinking one or more dimensions.

Notes

When resizing an array, the data are not rearranged in any way.

If one or more dimensions are shrunk, any chunks falling outside the new array shape will be deleted from the underlying store.

Examples

>>> import zarr
>>> z = zarr.zeros(shape=(10000, 10000), chunks=(1000, 1000))
>>> z
Array((10000, 10000), float64, chunks=(1000, 1000), order=C)
  nbytes: 762.9M; nbytes_stored: 323; ratio: 2476780.2; initialized: 0/100
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict
>>> z.resize(20000, 10000)
>>> z
Array((20000, 10000), float64, chunks=(1000, 1000), order=C)
  nbytes: 1.5G; nbytes_stored: 323; ratio: 4953560.4; initialized: 0/200
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict
>>> z.resize(30000, 1000)
>>> z
Array((30000, 1000), float64, chunks=(1000, 1000), order=C)
  nbytes: 228.9M; nbytes_stored: 322; ratio: 745341.6; initialized: 0/30
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict
append(data, axis=0)

Append data to axis.

Parameters:

data : array_like

Data to be appended.

axis : int

Axis along which to append.

Returns:

new_shape : tuple

Notes

The size of all dimensions other than axis must match between this array and data.

Examples

>>> import numpy as np
>>> import zarr
>>> a = np.arange(10000000, dtype='i4').reshape(10000, 1000)
>>> z = zarr.array(a, chunks=(1000, 100))
>>> z
Array((10000, 1000), int32, chunks=(1000, 100), order=C)
  nbytes: 38.1M; nbytes_stored: 1.9M; ratio: 20.3; initialized: 100/100
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict
>>> z.append(a)
(20000, 1000)
>>> z
Array((20000, 1000), int32, chunks=(1000, 100), order=C)
  nbytes: 76.3M; nbytes_stored: 3.8M; ratio: 20.3; initialized: 200/200
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict
>>> z.append(np.vstack([a, a]), axis=1)
(20000, 2000)
>>> z
Array((20000, 2000), int32, chunks=(1000, 100), order=C)
  nbytes: 152.6M; nbytes_stored: 7.5M; ratio: 20.3; initialized: 400/400
  compressor: Blosc(cname='lz4', clevel=5, shuffle=1)
  store: dict
view(shape=None, chunks=None, dtype=None, fill_value=None, filters=None, read_only=None, synchronizer=None)

Return an array sharing the same data.

Parameters:

shape : int or tuple of ints

Array shape.

chunks : int or tuple of ints, optional

Chunk shape.

dtype : string or dtype, optional

NumPy dtype.

fill_value : object

Default value to use for uninitialized portions of the array.

filters : sequence, optional

Sequence of filters to use to encode chunk data prior to compression.

read_only : bool, optional

True if array should be protected against modification.

synchronizer : object, optional

Array synchronizer.

Notes

WARNING: This is an experimental feature and should be used with care. There are plenty of ways to generate errors and/or cause data corruption.

Examples

Bypass filters:

>>> import zarr
>>> import numpy as np
>>> np.random.seed(42)
>>> labels = [b'female', b'male']
>>> data = np.random.choice(labels, size=10000)
>>> filters = [zarr.Categorize(labels=labels,
...                                  dtype=data.dtype,
...                                  astype='u1')]
>>> a = zarr.array(data, chunks=1000, filters=filters)
>>> a[:]
array([b'female', b'male', b'female', ..., b'male', b'male', b'female'],
      dtype='|S6')
>>> v = a.view(dtype='u1', filters=[])
>>> v.is_view
True
>>> v[:]
array([1, 2, 1, ..., 2, 2, 1], dtype=uint8)

Views can be used to modify data:

>>> x = v[:]
>>> x.sort()
>>> v[:] = x
>>> v[:]
array([1, 1, 1, ..., 2, 2, 2], dtype=uint8)
>>> a[:]
array([b'female', b'female', b'female', ..., b'male', b'male', b'male'],
      dtype='|S6')

View as a different dtype with the same itemsize:

>>> data = np.random.randint(0, 2, size=10000, dtype='u1')
>>> a = zarr.array(data, chunks=1000)
>>> a[:]
array([0, 0, 1, ..., 1, 0, 0], dtype=uint8)
>>> v = a.view(dtype=bool)
>>> v[:]
array([False, False,  True, ...,  True, False, False], dtype=bool)
>>> np.all(a[:].view(dtype=bool) == v[:])
True

An array can be viewed with a dtype with a different itemsize, however some care is needed to adjust the shape and chunk shape so that chunk data is interpreted correctly:

>>> data = np.arange(10000, dtype='u2')
>>> a = zarr.array(data, chunks=1000)
>>> a[:10]
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint16)
>>> v = a.view(dtype='u1', shape=20000, chunks=2000)
>>> v[:10]
array([0, 0, 1, 0, 2, 0, 3, 0, 4, 0], dtype=uint8)
>>> np.all(a[:].view('u1') == v[:])
True

Change fill value for uninitialized chunks:

>>> a = zarr.full(10000, chunks=1000, fill_value=-1, dtype='i1')
>>> a[:]
array([-1, -1, -1, ..., -1, -1, -1], dtype=int8)
>>> v = a.view(fill_value=42)
>>> v[:]
array([42, 42, 42, ..., 42, 42, 42], dtype=int8)

Note that resizing or appending to views is not permitted:

>>> a = zarr.empty(10000)
>>> v = a.view()
>>> try:
...     v.resize(20000)
... except PermissionError as e:
...     print(e)
not permitted for views