Compressors and filters (zarr.codecs
)¶
This module contains compressor and filter classes for use with Zarr.
Other codecs can be registered dynamically with Zarr. All that is required
is to implement a class that provides the same interface as the classes listed
below, and then to add the class to the codec_registry
. See the source
code of this module for details.
-
class
zarr.codecs.
Codec
¶ Codec abstract base class.
-
encode
(buf)¶ Encode data in buf.
Parameters: buf : buffer-like
Data to be encoded. May be any object supporting the new-style buffer protocol or array.array.
Returns: enc : buffer-like
Encoded data. May be any object supporting the new-style buffer protocol or array.array.
-
decode
(buf, out=None)¶ Decode data in buf.
Parameters: buf : buffer-like
Encoded data. May be any object supporting the new-style buffer protocol or array.array.
out : buffer-like, optional
Buffer to store decoded data.
Returns: out : buffer-like
Decoded data. May be any object supporting the new-style buffer protocol or array.array.
-
get_config
()¶ Return a dictionary holding configuration parameters for this codec. All values must be compatible with JSON encoding.
-
classmethod
from_config
(config)¶ Instantiate from a configuration object.
-
-
class
zarr.codecs.
Blosc
(cname='lz4', clevel=5, shuffle=1)¶ Provides compression using the blosc meta-compressor.
Parameters: cname : string, optional
A string naming one of the compression algorithms available within blosc, e.g., ‘blosclz’, ‘lz4’, ‘zlib’ or ‘snappy’.
clevel : integer, optional
An integer between 0 and 9 specifying the compression level.
shuffle : integer, optional
Either 0 (no shuffle), 1 (byte shuffle) or 2 (bit shuffle).
-
class
zarr.codecs.
Zlib
(level=1)¶ Provides compression using zlib via the Python standard library.
Parameters: level : int
Compression level.
-
class
zarr.codecs.
BZ2
(level=1)¶ Provides compression using bzip2 via the Python standard library.
Parameters: level : int
Compression level.
-
class
zarr.codecs.
LZMA
(format=1, check=-1, preset=None, filters=None)¶ Provides compression using lzma via the Python standard library (only available under Python 3).
Parameters: format : integer, optional
One of the lzma format codes, e.g.,
lzma.FORMAT_XZ
.check : integer, optional
One of the lzma check codes, e.g.,
lzma.CHECK_NONE
.preset : integer, optional
An integer between 0 and 9 inclusive, specifying the compression level.
filters : list, optional
A list of dictionaries specifying compression filters. If filters are provided, ‘preset’ must be None.
-
class
zarr.codecs.
Delta
(dtype, astype=None)¶ Filter to encode data as the difference between adjacent values.
Parameters: dtype : dtype
Data type to use for decoded data.
astype : dtype, optional
Data type to use for encoded data.
Notes
If astype is an integer data type, please ensure that it is sufficiently large to store encoded values. No checks are made and data may become corrupted due to integer overflow if astype is too small. Note also that the encoded data for each chunk includes the absolute value of the first element in the chunk, and so the encoded data type in general needs to be large enough to store absolute values from the array.
Examples
>>> import zarr >>> import numpy as np >>> x = np.arange(100, 120, 2, dtype='i8') >>> f = zarr.Delta(dtype='i8', astype='i1') >>> y = f.encode(x) >>> y array([100, 2, 2, 2, 2, 2, 2, 2, 2, 2], dtype=int8) >>> z = f.decode(y) >>> z array([100, 102, 104, 106, 108, 110, 112, 114, 116, 118])
-
class
zarr.codecs.
FixedScaleOffset
(offset, scale, dtype, astype=None)¶ Simplified version of the scale-offset filter available in HDF5. Applies the transformation (x - offset) * scale to all chunks. Results are rounded to the nearest integer but are not packed according to the minimum number of bits.
Parameters: offset : float
Value to subtract from data.
scale : int
Value to multiply by data.
dtype : dtype
Data type to use for decoded data.
astype : dtype, optional
Data type to use for encoded data.
Notes
If astype is an integer data type, please ensure that it is sufficiently large to store encoded values. No checks are made and data may become corrupted due to integer overflow if astype is too small.
Examples
>>> import zarr >>> import numpy as np >>> x = np.linspace(1000, 1001, 10, dtype='f8') >>> x array([ 1000. , 1000.11111111, 1000.22222222, 1000.33333333, 1000.44444444, 1000.55555556, 1000.66666667, 1000.77777778, 1000.88888889, 1001. ]) >>> f1 = zarr.FixedScaleOffset(offset=1000, scale=10, dtype='f8', astype='u1') >>> y1 = f1.encode(x) >>> y1 array([ 0, 1, 2, 3, 4, 6, 7, 8, 9, 10], dtype=uint8) >>> z1 = f1.decode(y1) >>> z1 array([ 1000. , 1000.1, 1000.2, 1000.3, 1000.4, 1000.6, 1000.7, 1000.8, 1000.9, 1001. ]) >>> f2 = zarr.FixedScaleOffset(offset=1000, scale=10**2, dtype='f8', astype='u1') >>> y2 = f2.encode(x) >>> y2 array([ 0, 11, 22, 33, 44, 56, 67, 78, 89, 100], dtype=uint8) >>> z2 = f2.decode(y2) >>> z2 array([ 1000. , 1000.11, 1000.22, 1000.33, 1000.44, 1000.56, 1000.67, 1000.78, 1000.89, 1001. ]) >>> f3 = zarr.FixedScaleOffset(offset=1000, scale=10**3, dtype='f8', astype='u2') >>> y3 = f3.encode(x) >>> y3 array([ 0, 111, 222, 333, 444, 556, 667, 778, 889, 1000], dtype=uint16) >>> z3 = f3.decode(y3) >>> z3 array([ 1000. , 1000.111, 1000.222, 1000.333, 1000.444, 1000.556, 1000.667, 1000.778, 1000.889, 1001. ])
-
class
zarr.codecs.
Quantize
(digits, dtype, astype=None)¶ Lossy filter to reduce the precision of floating point data.
Parameters: digits : int
Desired precision (number of decimal digits).
dtype : dtype
Data type to use for decoded data.
astype : dtype, optional
Data type to use for encoded data.
Examples
>>> import zarr >>> import numpy as np >>> x = np.linspace(0, 1, 10, dtype='f8') >>> x array([ 0. , 0.11111111, 0.22222222, 0.33333333, 0.44444444, 0.55555556, 0.66666667, 0.77777778, 0.88888889, 1. ]) >>> f1 = zarr.Quantize(digits=1, dtype='f8') >>> y1 = f1.encode(x) >>> y1 array([ 0. , 0.125 , 0.25 , 0.3125, 0.4375, 0.5625, 0.6875, 0.75 , 0.875 , 1. ]) >>> f2 = zarr.Quantize(digits=2, dtype='f8') >>> y2 = f2.encode(x) >>> y2 array([ 0. , 0.109375 , 0.21875 , 0.3359375, 0.4453125, 0.5546875, 0.6640625, 0.78125 , 0.890625 , 1. ]) >>> f3 = zarr.Quantize(digits=3, dtype='f8') >>> y3 = f3.encode(x) >>> y3 array([ 0. , 0.11132812, 0.22265625, 0.33300781, 0.44433594, 0.55566406, 0.66699219, 0.77734375, 0.88867188, 1. ])
-
class
zarr.codecs.
PackBits
¶ Filter to pack elements of a boolean array into bits in a uint8 array.
Notes
The first element of the encoded array stores the number of bits that were padded to complete the final byte.
Examples
>>> import zarr >>> import numpy as np >>> f = zarr.PackBits() >>> x = np.array([True, False, False, True], dtype=bool) >>> y = f.encode(x) >>> y array([ 4, 144], dtype=uint8) >>> z = f.decode(y) >>> z array([ True, False, False, True], dtype=bool)
-
class
zarr.codecs.
Categorize
(labels, dtype, astype='u1')¶ Filter encoding categorical string data as integers.
Parameters: labels : sequence of strings
Category labels.
dtype : dtype
Data type to use for decoded data.
astype : dtype, optional
Data type to use for encoded data.
Examples
>>> import zarr >>> import numpy as np >>> x = np.array([b'male', b'female', b'female', b'male', b'unexpected']) >>> x array([b'male', b'female', b'female', b'male', b'unexpected'], dtype='|S10') >>> f = zarr.Categorize(labels=[b'female', b'male'], dtype=x.dtype) >>> y = f.encode(x) >>> y array([2, 1, 1, 2, 0], dtype=uint8) >>> z = f.decode(y) >>> z array([b'male', b'female', b'female', b'male', b''], dtype='|S10')