Release notes¶
2.2.0 (release candidate)¶
Version 2.2.0 is currently at the release candidate stage. To install the latest release candidate version:
$ pip install --pre zarr
Enhancements¶
- Advanced indexing. The
Array
class has several new methods and properties that enable a selection of items in an array to be retrieved or updated. See the Advanced indexing tutorial section for more information. There is also a notebook with extended examples and performance benchmarks. #78, #89, #112, #172. - New package for compressor and filter codecs. The classes previously
defined in the
zarr.codecs
module have been factored out into a separate package called NumCodecs. The NumCodecs package also includes several new codec classes not previously available in Zarr, including compressor codecs for Zstd and LZ4. This change is backwards-compatible with existing code, as all codec classes defined by NumCodecs are imported into thezarr.codecs
namespace. However, it is recommended to import codecs from the new package, see the tutorial sections on Compressors and Filters for examples. With contributions by John Kirkham; #74, #102, #120, #123, #139. - New storage class for DBM-style databases. The
zarr.storage.DBMStore
class enables any DBM-style database such as gdbm, ndbm or Berkeley DB, to be used as the backing store for an array or group. See the tutorial section on Storage alternatives for some examples. #133, #186. - New storage class for LMDB databases. The
zarr.storage.LMDBStore
class enables an LMDB “Lightning” database to be used as the backing store for an array or group. #192. - New storage class using a nested directory structure for chunk files. The
zarr.storage.NestedDirectoryStore
has been added, which is similar to the existingzarr.storage.DirectoryStore
class but nests chunk files for multidimensional arrays into sub-directories. #155, #177. - New tree() method for printing hierarchies. The
Group
class has a newzarr.hierarchy.Group.tree()
method which enables a tree representation of a group hierarchy to be printed. Also provides an interactive tree representation when used within a Jupyter notebook. See the Array and group diagnostics tutorial section for examples. By John Kirkham; #82, #140, #184. - Visitor API. The
Group
class now implements the h5py visitor API, see docs for thezarr.hierarchy.Group.visit()
,zarr.hierarchy.Group.visititems()
andzarr.hierarchy.Group.visitvalues()
methods. By John Kirkham, #92, #122. - Viewing an array as a different dtype. The
Array
class has a newzarr.core.Array.astype()
method, which is a convenience that enables an array to be viewed as a different dtype. By John Kirkham, #94, #96. - New open(), save(), load() convenience functions. The function
zarr.convenience.open()
provides a convenient way to open a persistent array or group, using either aDirectoryStore
orZipStore
as the backing store. The functionszarr.convenience.save()
andzarr.convenience.load()
are also available and provide a convenient way to save an entire NumPy array to disk and load back into memory later. See the tutorial section Persistent arrays for examples. #104, #105, #141, #181. - IPython completions. The
Group
class now implements__dir__()
and_ipython_key_completions_()
which enables tab-completion for group members to be used in any IPython interactive environment. #170. - New info property; changes to __repr__. The
Group
andArray
classes have a newinfo
property which can be used to print diagnostic information, including compression ratio where available. See the tutorial section on Array and group diagnostics for examples. The string representation (__repr__
) of these classes has been simplified to ensure it is cheap and quick to compute in all circumstances. #83, #115, #132, #148. - Chunk options. When creating an array,
chunks=False
can be specified, which will result in an array with a single chunk only. Alternatively,chunks=True
will trigger an automatic chunk shape guess. See Chunk optimizations for more on thechunks
parameter. #106, #107, #183. - Zero-dimensional arrays and are now supported; by Prakhar Goel, #154, #161.
- Arrays with one or more zero-length dimensions are now fully supported; by Prakhar Goel, #150, #154, #160.
- The .zattrs key is now optional and will now only be created when the first custom attribute is set; #121, #200.
- New Group.move() method supports moving a sub-group or array to a different location within the same hierarchy. By John Kirkham, #191, #193, #196.
- ZipStore is now thread-safe; #194, #192.
- New Array.hexdigest() method computes an
Array
’s hash withhashlib
. By John Kirkham, #98, #203. - Improved support for object arrays. In previous versions of Zarr,
creating an array with
dtype=object
was possible but could under certain circumstances lead to unexpected errors and/or segmentation faults. To make it easier to properly configure an object array, a newobject_codec
parameter has been added to array creation functions. See the tutorial section on Object arrays for more information and examples. Also, runtime checks have been added in both Zarr and Numcodecs so that segmentation faults are no longer possible, even with a badly configured array. This API change is backwards compatible and previous code that created an object array and provided an object codec via thefilters
parameter will continue to work, however a warning will be raised to encourage use of theobject_codec
parameter. #208, #212. - Added support for datetime64 and timedelta64 data types; #85, #215.
- Array and group attributes are now cached by default to improve performance with slow stores, e.g., stores accessing data via the network; #220, #218, #204.
- New LRUStoreCache class. The class
zarr.storage.LRUStoreCache
has been added and provides a means to locally cache data in memory from a store that may be slow, e.g., a store that retrieves data from a remote server via the network; #223. - New copy functions. The new functions
zarr.convenience.copy()
andzarr.convenience.copy_all()
provide a way to copy groups and/or arrays between HDF5 and Zarr, or between two Zarr groups. Thezarr.convenience.copy_store()
provides a more efficient way to copy data directly between two Zarr stores. #87, #113, #137, #217.
Bug fixes¶
- Fixed bug where
read_only
keyword argument was ignored when creating an array; #151, #179. - Fixed bugs when using a
ZipStore
opened in ‘w’ mode; #158, #182. - Fill values can now be provided for fixed-length string arrays; #165, #176.
- Fixed a bug where the number of chunks initialized could be counted incorrectly; #97, #174.
- Fixed a bug related to the use of an ellipsis (…) in indexing statements; #93, #168, #172.
- Fixed a bug preventing use of other integer types for indexing; #143, #147.
Documentation¶
- Some changes have been made to the Zarr storage specification version 2 document to clarify ambiguities and add some missing information. These changes do not break compatibility with any of the material as previously implemented, and so the changes have been made in-place in the document without incrementing the document version number. See the section on Changes in the specification document for more information.
- A new Advanced indexing section has been added to the tutorial.
- A new String arrays section has been added to the tutorial (#135, #175).
- The Chunk optimizations tutorial section has been reorganised and updated.
- The Persistent arrays and Storage alternatives tutorial sections have been updated with new examples (#100, #101, #103).
- A new tutorial section on Pickle support has been added (#91).
- A new tutorial section on Datetimes and timedeltas has been added.
- A new tutorial section on Array and group diagnostics has been added.
- The tutorial sections on Parallel computing and synchronization and Configuring Blosc have been updated to provide information about how to avoid program hangs when using the Blosc compressor with multiple processes (#199, #201).
Maintenance¶
- A data fixture has been included in the test suite to ensure data format compatibility is maintained; #83, #146.
- The test suite has been migrated from nosetests to pytest; #189, #225.
- Various continuous integration updates and improvements; #118, #124, #125, #126, #109, #114, #171.
- Bump numcodecs dependency to 0.5.3, completely remove nose dependency, #237.
- Fix compatibility issues with NumPy 1.14 regarding fill values for structured arrays, #222, #238, #239.
Acknowledgments¶
Code was contributed to this release by Alistair Miles, John Kirkham and Prakhar Goel.
Documentation was contributed to this release by Mamy Ratsimbazafy and Charles Noyes.
Thank you to John Kirkham, Stephan Hoyer, Francesc Alted, and Matthew Rocklin for code reviews and/or comments on pull requests.
2.1.4¶
- Resolved an issue where calling
hasattr
on aGroup
object erroneously returned aKeyError
. By Vincent Schut; #88, #95.
2.1.3¶
- Resolved an issue with
zarr.creation.array()
where dtype was given as None (#80).
2.1.1¶
Various minor improvements, including: Group
objects support member access
via dot notation (__getattr__
); fixed metadata caching for Array.shape
property and derivatives; added Array.ndim
property; fixed
Array.__array__
method arguments; fixed bug in pickling Array
state;
fixed bug in pickling ThreadSynchronizer
.
2.1.0¶
- Group objects now support member deletion via
del
statement (#65). - Added
zarr.storage.TempStore
class for convenience to provide storage via a temporary directory (#59). - Fixed performance issues with
zarr.storage.ZipStore
class (#66). - The Blosc extension has been modified to return bytes instead of array objects from compress and decompress function calls. This should improve compatibility and also provides a small performance increase for compressing high compression ratio data (#55).
- Added
overwrite
keyword argument to array and group creation methods on thezarr.hierarchy.Group
class (#71). - Added
cache_metadata
keyword argument to array creation methods. - The functions
zarr.creation.open_array()
andzarr.hierarchy.open_group()
now accept any store as first argument (#56).
2.0.1¶
The bundled Blosc library has been upgraded to version 1.11.1.
2.0.0¶
Hierarchies¶
Support has been added for organizing arrays into hierarchies via groups. See
the tutorial section on Groups and the zarr.hierarchy
API docs for more information.
Filters¶
Support has been added for configuring filters to preprocess chunk data prior
to compression. See the tutorial section on Filters and the
zarr.codecs
API docs for more information.
Other changes¶
To accommodate support for hierarchies and filters, the Zarr metadata format
has been modified. See the Zarr storage specification version 2 for more information. To migrate an
array stored using Zarr version 1.x, use the zarr.storage.migrate_1to2()
function.
The bundled Blosc library has been upgraded to version 1.11.0.
Acknowledgments¶
Thanks to Matthew Rocklin, Stephan Hoyer and Francesc Alted for contributions and comments.
1.1.0¶
- The bundled Blosc library has been upgraded to version 1.10.0. The ‘zstd’ internal compression library is now available within Blosc. See the tutorial section on Compressors for an example.
- When using the Blosc compressor, the default internal compression library is now ‘lz4’.
- The default number of internal threads for the Blosc compressor has been increased to a maximum of 8 (previously 4).
- Added convenience functions
zarr.blosc.list_compressors()
andzarr.blosc.get_nthreads()
.
1.0.0¶
This release includes a complete re-organization of the code base. The major version number has been bumped to indicate that there have been backwards-incompatible changes to the API and the on-disk storage format. However, Zarr is still in an early stage of development, so please do not take the version number as an indicator of maturity.
Storage¶
The main motivation for re-organizing the code was to create an
abstraction layer between the core array logic and data storage (#21).
In this release, any
object that implements the MutableMapping
interface can be used as
an array store. See the tutorial sections on Persistent arrays
and Storage alternatives, the Zarr storage specification version 1, and the
zarr.storage
module documentation for more information.
Please note also that the file organization and file name conventions
used when storing a Zarr array in a directory on the file system have
changed. Persistent Zarr arrays created using previous versions of the
software will not be compatible with this version. See the
zarr.storage
API docs and the Zarr storage specification version 1 for more
information.
Compression¶
An abstraction layer has also been created between the core array
logic and the code for compressing and decompressing array
chunks. This release still bundles the c-blosc library and uses Blosc
as the default compressor, however other compressors including zlib,
BZ2 and LZMA are also now supported via the Python standard
library. New compressors can also be dynamically registered for use
with Zarr. See the tutorial sections on Compressors and
Configuring Blosc, the Zarr storage specification version 1, and the
zarr.compressors
module documentation for more information.
Synchronization¶
The synchronization code has also been refactored to create a layer of
abstraction, enabling Zarr arrays to be used in parallel computations
with a number of alternative synchronization methods. For more
information see the tutorial section on Parallel computing and synchronization and the
zarr.sync
module documentation.
Changes to the Blosc extension¶
NumPy is no longer a build dependency for the zarr.blosc
Cython
extension, so setup.py will run even if NumPy is not already
installed, and should automatically install NumPy as a runtime
dependency. Manual installation of NumPy prior to installing Zarr is
still recommended, however, as the automatic installation of NumPy may
fail or be sub-optimal on some platforms.
Some optimizations have been made within the zarr.blosc
extension to avoid unnecessary memory copies, giving a ~10-20%
performance improvement for multi-threaded compression operations.
The zarr.blosc
extension now automatically detects whether it
is running within a single-threaded or multi-threaded program and
adapts its internal behaviour accordingly (#27). There is no need for
the user to make any API calls to switch Blosc between contextual and
non-contextual (global lock) mode. See also the tutorial section on
Configuring Blosc.
Other changes¶
The internal code for managing chunks has been rewritten to be more efficient. Now no state is maintained for chunks outside of the array store, meaning that chunks do not carry any extra memory overhead not accounted for by the store. This negates the need for the “lazy” option present in the previous release, and this has been removed.
The memory layout within chunks can now be set as either “C” (row-major) or “F” (column-major), which can help to provide better compression for some data (#7). See the tutorial section on Chunk memory layout for more information.
A bug has been fixed within the __getitem__
and __setitem__
machinery for slicing arrays, to properly handle getting and setting
partial slices.
Acknowledgments¶
Thanks to Matthew Rocklin, Stephan Hoyer, Francesc Alted, Anthony Scopatz and Martin Durant for contributions and comments.