Compressors and filters (zarr.codecs)

This module contains compressor and filter classes for use with Zarr.

Other codecs can be registered dynamically with Zarr. All that is required is to implement a class that provides the same interface as the classes listed below, and then to add the class to the codec_registry. See the source code of this module for details.

class zarr.codecs.Blosc

Codec providing compression using the Blosc meta-compressor.

Parameters:

cname : string, optional

A string naming one of the compression algorithms available within blosc, e.g., ‘zstd’, ‘blosclz’, ‘lz4’, ‘lz4hc’, ‘zlib’ or ‘snappy’.

clevel : integer, optional

An integer between 0 and 9 specifying the compression level.

shuffle : integer, optional

Either NOSHUFFLE (0), SHUFFLE (1) or BITSHUFFLE (2).

blocksize : int

The requested size of the compressed blocks. If 0 (default), an automatic blocksize will be used.

See also

numcodecs.zstd.Zstd, numcodecs.lz4.LZ4

class zarr.codecs.Zlib(level=1)

Codec providing compression using zlib via the Python standard library.

Parameters:

level : int

Compression level.

class zarr.codecs.BZ2(level=1)

Codec providing compression using bzip2 via the Python standard library.

Parameters:

level : int

Compression level.

class zarr.codecs.LZMA(format=1, check=-1, preset=None, filters=None)

Codec providing compression using lzma via the Python standard library (only available under Python 3).

Parameters:

format : integer, optional

One of the lzma format codes, e.g., lzma.FORMAT_XZ.

check : integer, optional

One of the lzma check codes, e.g., lzma.CHECK_NONE.

preset : integer, optional

An integer between 0 and 9 inclusive, specifying the compression level.

filters : list, optional

A list of dictionaries specifying compression filters. If filters are provided, ‘preset’ must be None.

class zarr.codecs.Delta(dtype, astype=None)

Codec to encode data as the difference between adjacent values.

Parameters:

dtype : dtype

Data type to use for decoded data.

astype : dtype, optional

Data type to use for encoded data.

Notes

If astype is an integer data type, please ensure that it is sufficiently large to store encoded values. No checks are made and data may become corrupted due to integer overflow if astype is too small. Note also that the encoded data for each chunk includes the absolute value of the first element in the chunk, and so the encoded data type in general needs to be large enough to store absolute values from the array.

Examples

>>> import numcodecs
>>> import numpy as np
>>> x = np.arange(100, 120, 2, dtype='i8')
>>> codec = numcodecs.Delta(dtype='i8', astype='i1')
>>> y = codec.encode(x)
>>> y
array([100,   2,   2,   2,   2,   2,   2,   2,   2,   2], dtype=int8)
>>> z = codec.decode(y)
>>> z
array([100, 102, 104, 106, 108, 110, 112, 114, 116, 118])
class zarr.codecs.AsType(encode_dtype, decode_dtype)

Filter to convert data between different types.

Parameters:

encode_dtype : dtype

Data type to use for encoded data.

decode_dtype : dtype, optional

Data type to use for decoded data.

Notes

If encode_dtype is of lower precision than decode_dtype, please be aware that data loss can occur by writing data to disk using this filter. No checks are made to ensure the casting will work in that direction and data corruption will occur.

Examples

>>> import numcodecs
>>> import numpy as np
>>> x = np.arange(100, 120, 2, dtype=np.int8)
>>> x
array([100, 102, 104, 106, 108, 110, 112, 114, 116, 118], dtype=int8)
>>> f = numcodecs.AsType(encode_dtype=x.dtype, decode_dtype=np.int64)
>>> y = f.decode(x)
>>> y
array([100, 102, 104, 106, 108, 110, 112, 114, 116, 118])
>>> z = f.encode(y)
>>> z
array([100, 102, 104, 106, 108, 110, 112, 114, 116, 118], dtype=int8)
class zarr.codecs.FixedScaleOffset(offset, scale, dtype, astype=None)

Simplified version of the scale-offset filter available in HDF5. Applies the transformation (x - offset) * scale to all chunks. Results are rounded to the nearest integer but are not packed according to the minimum number of bits.

Parameters:

offset : float

Value to subtract from data.

scale : int

Value to multiply by data.

dtype : dtype

Data type to use for decoded data.

astype : dtype, optional

Data type to use for encoded data.

See also

numcodecs.quantize.Quantize

Notes

If astype is an integer data type, please ensure that it is sufficiently large to store encoded values. No checks are made and data may become corrupted due to integer overflow if astype is too small.

Examples

>>> import numcodecs
>>> import numpy as np
>>> x = np.linspace(1000, 1001, 10, dtype='f8')
>>> x
array([ 1000.        ,  1000.11111111,  1000.22222222,  1000.33333333,
        1000.44444444,  1000.55555556,  1000.66666667,  1000.77777778,
        1000.88888889,  1001.        ])
>>> codec = numcodecs.FixedScaleOffset(offset=1000, scale=10, dtype='f8', astype='u1')
>>> y1 = codec.encode(x)
>>> y1
array([ 0,  1,  2,  3,  4,  6,  7,  8,  9, 10], dtype=uint8)
>>> z1 = codec.decode(y1)
>>> z1
array([ 1000. ,  1000.1,  1000.2,  1000.3,  1000.4,  1000.6,  1000.7,
        1000.8,  1000.9,  1001. ])
>>> codec = numcodecs.FixedScaleOffset(offset=1000, scale=10**2, dtype='f8', astype='u1')
>>> y2 = codec.encode(x)
>>> y2
array([  0,  11,  22,  33,  44,  56,  67,  78,  89, 100], dtype=uint8)
>>> z2 = codec.decode(y2)
>>> z2
array([ 1000.  ,  1000.11,  1000.22,  1000.33,  1000.44,  1000.56,
        1000.67,  1000.78,  1000.89,  1001.  ])
>>> codec = numcodecs.FixedScaleOffset(offset=1000, scale=10**3, dtype='f8', astype='u2')
>>> y3 = codec.encode(x)
>>> y3
array([   0,  111,  222,  333,  444,  556,  667,  778,  889, 1000], dtype=uint16)
>>> z3 = codec.decode(y3)
>>> z3
array([ 1000.   ,  1000.111,  1000.222,  1000.333,  1000.444,  1000.556,
        1000.667,  1000.778,  1000.889,  1001.   ])
class zarr.codecs.Quantize(digits, dtype, astype=None)

Lossy filter to reduce the precision of floating point data.

Parameters:

digits : int

Desired precision (number of decimal digits).

dtype : dtype

Data type to use for decoded data.

astype : dtype, optional

Data type to use for encoded data.

See also

numcodecs.fixedscaleoffset.FixedScaleOffset

Examples

>>> import numcodecs
>>> import numpy as np
>>> x = np.linspace(0, 1, 10, dtype='f8')
>>> x
array([ 0.        ,  0.11111111,  0.22222222,  0.33333333,  0.44444444,
        0.55555556,  0.66666667,  0.77777778,  0.88888889,  1.        ])
>>> codec = numcodecs.Quantize(digits=1, dtype='f8')
>>> codec.encode(x)
array([ 0.    ,  0.125 ,  0.25  ,  0.3125,  0.4375,  0.5625,  0.6875,
        0.75  ,  0.875 ,  1.    ])
>>> codec = numcodecs.Quantize(digits=2, dtype='f8')
>>> codec.encode(x)
array([ 0.       ,  0.109375 ,  0.21875  ,  0.3359375,  0.4453125,
        0.5546875,  0.6640625,  0.78125  ,  0.890625 ,  1.       ])
>>> codec = numcodecs.Quantize(digits=3, dtype='f8')
>>> codec.encode(x)
array([ 0.        ,  0.11132812,  0.22265625,  0.33300781,  0.44433594,
        0.55566406,  0.66699219,  0.77734375,  0.88867188,  1.        ])
class zarr.codecs.PackBits

Codec to pack elements of a boolean array into bits in a uint8 array.

Notes

The first element of the encoded array stores the number of bits that were padded to complete the final byte.

Examples

>>> import numcodecs
>>> import numpy as np
>>> codec = numcodecs.PackBits()
>>> x = np.array([True, False, False, True], dtype=bool)
>>> y = codec.encode(x)
>>> y
array([  4, 144], dtype=uint8)
>>> z = codec.decode(y)
>>> z
array([ True, False, False,  True], dtype=bool)
class zarr.codecs.Categorize(labels, dtype, astype='u1')

Filter encoding categorical string data as integers.

Parameters:

labels : sequence of strings

Category labels.

dtype : dtype

Data type to use for decoded data.

astype : dtype, optional

Data type to use for encoded data.

Examples

>>> import numcodecs
>>> import numpy as np
>>> x = np.array([b'male', b'female', b'female', b'male', b'unexpected'])
>>> x
array([b'male', b'female', b'female', b'male', b'unexpected'],
      dtype='|S10')
>>> codec = numcodecs.Categorize(labels=[b'female', b'male'], dtype=x.dtype)
>>> y = codec.encode(x)
>>> y
array([2, 1, 1, 2, 0], dtype=uint8)
>>> z = codec.decode(y)
>>> z
array([b'male', b'female', b'female', b'male', b''],
      dtype='|S10')