Compressors and filters (zarr.codecs)

This module contains compressor and filter classes for use with Zarr.

Other codecs can be registered dynamically with Zarr. All that is required is to implement a class that provides the same interface as the classes listed below, and then to add the class to the codec_registry. See the source code of this module for details.

class zarr.codecs.Codec

Codec abstract base class.

encode(buf)

Encode data in buf.

Parameters:

buf : buffer-like

Data to be encoded. May be any object supporting the new-style buffer protocol or array.array.

Returns:

enc : buffer-like

Encoded data. May be any object supporting the new-style buffer protocol or array.array.

decode(buf, out=None)

Decode data in buf.

Parameters:

buf : buffer-like

Encoded data. May be any object supporting the new-style buffer protocol or array.array.

out : buffer-like, optional

Buffer to store decoded data.

Returns:

out : buffer-like

Decoded data. May be any object supporting the new-style buffer protocol or array.array.

get_config()

Return a dictionary holding configuration parameters for this codec. All values must be compatible with JSON encoding.

classmethod from_config(config)

Instantiate from a configuration object.

class zarr.codecs.Blosc(cname='lz4', clevel=5, shuffle=1)

Provides compression using the blosc meta-compressor.

Parameters:

cname : string, optional

A string naming one of the compression algorithms available within blosc, e.g., ‘blosclz’, ‘lz4’, ‘zlib’ or ‘snappy’.

clevel : integer, optional

An integer between 0 and 9 specifying the compression level.

shuffle : integer, optional

Either 0 (no shuffle), 1 (byte shuffle) or 2 (bit shuffle).

class zarr.codecs.Zlib(level=1)

Provides compression using zlib via the Python standard library.

Parameters:

level : int

Compression level.

class zarr.codecs.BZ2(level=1)

Provides compression using bzip2 via the Python standard library.

Parameters:

level : int

Compression level.

class zarr.codecs.LZMA(format=1, check=-1, preset=None, filters=None)

Provides compression using lzma via the Python standard library (only available under Python 3).

Parameters:

format : integer, optional

One of the lzma format codes, e.g., lzma.FORMAT_XZ.

check : integer, optional

One of the lzma check codes, e.g., lzma.CHECK_NONE.

preset : integer, optional

An integer between 0 and 9 inclusive, specifying the compression level.

filters : list, optional

A list of dictionaries specifying compression filters. If filters are provided, ‘preset’ must be None.

class zarr.codecs.Delta(dtype, astype=None)

Filter to encode data as the difference between adjacent values.

Parameters:

dtype : dtype

Data type to use for decoded data.

astype : dtype, optional

Data type to use for encoded data.

Notes

If astype is an integer data type, please ensure that it is sufficiently large to store encoded values. No checks are made and data may become corrupted due to integer overflow if astype is too small. Note also that the encoded data for each chunk includes the absolute value of the first element in the chunk, and so the encoded data type in general needs to be large enough to store absolute values from the array.

Examples

>>> import zarr
>>> import numpy as np
>>> x = np.arange(100, 120, 2, dtype='i8')
>>> f = zarr.Delta(dtype='i8', astype='i1')
>>> y = f.encode(x)
>>> y
array([100,   2,   2,   2,   2,   2,   2,   2,   2,   2], dtype=int8)
>>> z = f.decode(y)
>>> z
array([100, 102, 104, 106, 108, 110, 112, 114, 116, 118])
class zarr.codecs.FixedScaleOffset(offset, scale, dtype, astype=None)

Simplified version of the scale-offset filter available in HDF5. Applies the transformation (x - offset) * scale to all chunks. Results are rounded to the nearest integer but are not packed according to the minimum number of bits.

Parameters:

offset : float

Value to subtract from data.

scale : int

Value to multiply by data.

dtype : dtype

Data type to use for decoded data.

astype : dtype, optional

Data type to use for encoded data.

Notes

If astype is an integer data type, please ensure that it is sufficiently large to store encoded values. No checks are made and data may become corrupted due to integer overflow if astype is too small.

Examples

>>> import zarr
>>> import numpy as np
>>> x = np.linspace(1000, 1001, 10, dtype='f8')
>>> x
array([ 1000.        ,  1000.11111111,  1000.22222222,  1000.33333333,
        1000.44444444,  1000.55555556,  1000.66666667,  1000.77777778,
        1000.88888889,  1001.        ])
>>> f1 = zarr.FixedScaleOffset(offset=1000, scale=10, dtype='f8', astype='u1')
>>> y1 = f1.encode(x)
>>> y1
array([ 0,  1,  2,  3,  4,  6,  7,  8,  9, 10], dtype=uint8)
>>> z1 = f1.decode(y1)
>>> z1
array([ 1000. ,  1000.1,  1000.2,  1000.3,  1000.4,  1000.6,  1000.7,
        1000.8,  1000.9,  1001. ])
>>> f2 = zarr.FixedScaleOffset(offset=1000, scale=10**2, dtype='f8', astype='u1')
>>> y2 = f2.encode(x)
>>> y2
array([  0,  11,  22,  33,  44,  56,  67,  78,  89, 100], dtype=uint8)
>>> z2 = f2.decode(y2)
>>> z2
array([ 1000.  ,  1000.11,  1000.22,  1000.33,  1000.44,  1000.56,
        1000.67,  1000.78,  1000.89,  1001.  ])
>>> f3 = zarr.FixedScaleOffset(offset=1000, scale=10**3, dtype='f8', astype='u2')
>>> y3 = f3.encode(x)
>>> y3
array([   0,  111,  222,  333,  444,  556,  667,  778,  889, 1000], dtype=uint16)
>>> z3 = f3.decode(y3)
>>> z3
array([ 1000.   ,  1000.111,  1000.222,  1000.333,  1000.444,  1000.556,
        1000.667,  1000.778,  1000.889,  1001.   ])
class zarr.codecs.Quantize(digits, dtype, astype=None)

Lossy filter to reduce the precision of floating point data.

Parameters:

digits : int

Desired precision (number of decimal digits).

dtype : dtype

Data type to use for decoded data.

astype : dtype, optional

Data type to use for encoded data.

Examples

>>> import zarr
>>> import numpy as np
>>> x = np.linspace(0, 1, 10, dtype='f8')
>>> x
array([ 0.        ,  0.11111111,  0.22222222,  0.33333333,  0.44444444,
        0.55555556,  0.66666667,  0.77777778,  0.88888889,  1.        ])
>>> f1 = zarr.Quantize(digits=1, dtype='f8')
>>> y1 = f1.encode(x)
>>> y1
array([ 0.    ,  0.125 ,  0.25  ,  0.3125,  0.4375,  0.5625,  0.6875,
        0.75  ,  0.875 ,  1.    ])
>>> f2 = zarr.Quantize(digits=2, dtype='f8')
>>> y2 = f2.encode(x)
>>> y2
array([ 0.       ,  0.109375 ,  0.21875  ,  0.3359375,  0.4453125,
        0.5546875,  0.6640625,  0.78125  ,  0.890625 ,  1.       ])
>>> f3 = zarr.Quantize(digits=3, dtype='f8')
>>> y3 = f3.encode(x)
>>> y3
array([ 0.        ,  0.11132812,  0.22265625,  0.33300781,  0.44433594,
        0.55566406,  0.66699219,  0.77734375,  0.88867188,  1.        ])
class zarr.codecs.PackBits

Filter to pack elements of a boolean array into bits in a uint8 array.

Notes

The first element of the encoded array stores the number of bits that were padded to complete the final byte.

Examples

>>> import zarr
>>> import numpy as np
>>> f = zarr.PackBits()
>>> x = np.array([True, False, False, True], dtype=bool)
>>> y = f.encode(x)
>>> y
array([  4, 144], dtype=uint8)
>>> z = f.decode(y)
>>> z
array([ True, False, False,  True], dtype=bool)
class zarr.codecs.Categorize(labels, dtype, astype='u1')

Filter encoding categorical string data as integers.

Parameters:

labels : sequence of strings

Category labels.

dtype : dtype

Data type to use for decoded data.

astype : dtype, optional

Data type to use for encoded data.

Examples

>>> import zarr
>>> import numpy as np
>>> x = np.array([b'male', b'female', b'female', b'male', b'unexpected'])
>>> x
array([b'male', b'female', b'female', b'male', b'unexpected'],
      dtype='|S10')
>>> f = zarr.Categorize(labels=[b'female', b'male'], dtype=x.dtype)
>>> y = f.encode(x)
>>> y
array([2, 1, 1, 2, 0], dtype=uint8)
>>> z = f.decode(y)
>>> z
array([b'male', b'female', b'female', b'male', b''],
      dtype='|S10')