3.0 Migration Guide#

Zarr-Python 3 represents a major refactor of the Zarr-Python codebase. Some of the goals motivating this refactor included:

  • adding support for the Zarr format 3 specification (along with the Zarr format 2 specification)

  • cleaning up internal and user facing APIs

  • improving performance (particularly in high latency storage environments like cloud object stores)

To accommodate this, Zarr-Python 3 introduces a number of changes to the API, including a number of significant breaking changes and deprecations.

This page provides a guide explaining breaking changes and deprecations to help you migrate your code from version 2 to version 3. If we have missed anything, please open a GitHub issue so we can improve this guide.

Compatibility target#

The goals described above necessitated some breaking changes to the API (hence the major version update), but where possible we have maintained backwards compatibility in the most widely used parts of the API. This in the zarr.Array and zarr.Group classes and the “top-level API” (e.g. zarr.open_array() and zarr.open_group()).

Getting ready for 3.0#

Before migrating to Zarr-Python 3, we suggest projects that depend on Zarr-Python take the following actions in order:

  1. Pin the supported Zarr-Python version to zarr>=2,<3. This is a best practice and will protect your users from any incompatibilities that may arise during the release of Zarr-Python 3. This pin can be removed after migrating to Zarr-Python 3.

  2. Limit your imports from the Zarr-Python package. Most of the primary API zarr.* will be compatible in Zarr-Python 3. However, the following breaking API changes are planned:

    • numcodecs.* will no longer be available in zarr.*. To migrate, import codecs directly from numcodecs:

      from numcodecs import Blosc
      # instead of:
      # from zarr import Blosc
      
    • The zarr.v3_api_available feature flag is being removed. In Zarr-Python 3 the v3 API is always available, so you shouldn’t need to use this flag.

    • The following internal modules are being removed or significantly changed. If your application relies on imports from any of the below modules, you will need to either a) modify your application to no longer rely on these imports or b) vendor the parts of the specific modules that you need.

      • zarr.attrs has gone, with no replacement

      • zarr.codecs has gone, use numcodecs instead

      • zarr.context has gone, with no replacement

      • zarr.core remains but should be considered private API

      • zarr.hierarchy has gone, with no replacement (use zarr.Group inplace of zarr.hierarchy.Group)

      • zarr.indexing has gone, with no replacement

      • zarr.meta has gone, with no replacement

      • zarr.meta_v1 has gone, with no replacement

      • zarr.sync has gone, with no replacement

      • zarr.types has gone, with no replacement

      • zarr.util has gone, with no replacement

      • zarr.n5 has gone, see below for an alternative N5 options

  3. Test that your package works with version 3.

  4. Update the pin to include zarr>=3,<4.

Zarr-Python 2 support window#

Zarr-Python 2.x is still available, though we recommend migrating to Zarr-Python 3 for its performance improvements and new features. Security and bug fixes will be made to the 2.x series for at least six months following the first Zarr-Python 3 release. If you need to use the latest Zarr-Python 2 release, you can install it with:

$ pip install "zarr==2.*"

Note

Development and maintenance of the 2.x release series has moved to the support/v2 branch. Issues and pull requests related to this branch are tagged with the V2 label.

Migrating to Zarr-Python 3#

The following sections provide details on breaking changes in Zarr-Python 3.

The Array class#

  1. Disallow direct construction - the signature for initializing the Array class has changed significantly. Please use zarr.create_array() or zarr.open_array() instead of directly constructing the zarr.Array class.

  2. Defaulting to zarr_format=3 - newly created arrays will use the version 3 of the Zarr specification. To continue using version 2, set zarr_format=2 when creating arrays or set default_zarr_version=2 in Zarr’s runtime configuration.

The Group class#

  1. Disallow direct construction - use zarr.open_group() or zarr.create_group() instead of directly constructing the zarr.Group class.

  2. Most of the h5py compatibility methods are deprecated and will issue warnings if used. The following functions are drop in replacements that have the same signature and functionality:

The Store class#

The Store API has changed significant in Zarr-Python 3. The most notable changes to the Store API are:

  1. Replaced the MutableMapping base class in favor of a custom abstract base class (zarr.abc.store.Store).

  2. Switched to an asynchronous interface for all store methods that result in IO. This change ensures that all store methods are non-blocking and are as performant as possible.

Beyond the changes store interface, a number of deprecated stores were also removed in Zarr-Python 3. See #1274 for more details on the removal of these stores.

The following stores have been removed altogether. Users who need these stores will have to implement their own version in zarr-python v3.

  • DBMStore

  • LMDBStore

  • SQLiteStore

  • MongoDBStore

  • RedisStore

At present, the latter five stores in this list do not have an equivalent in Zarr-Python 3. If you are interested in developing a custom store that targets these backends, see developing custom stores or open an issue to discuss your use case.

Dependencies#

When installing using pip:

  • The new remote dependency group can be used to install a supported version of fsspec, required for remote data access.

  • The new gpu dependency group can be used to install a supported version of cuda, required for GPU functionality.

  • The jupyter optional dependency group has been removed, since v3 contains no jupyter specific functionality.

Miscellaneous#

🚧 Work in Progress 🚧#

Zarr-Python 3 is still under active development, and is not yet fully complete. The following list summarizes areas of the codebase that we expect to build out after the 3.0.0 release. If features listed below are important to your use case of Zarr-Python, please open (or comment on) a GitHub issue.

  • The following functions / methods have not been ported to Zarr-Python 3 yet:

  • The following features (corresponding to function arguments to functions in zarr) have not been ported to Zarr-Python 3 yet. Using these features will raise a warning or a NotImplementedError:

    • cache_attrs

    • cache_metadata

    • chunk_store (#2495)

    • meta_array

    • object_codec (#2617)

    • synchronizer (#1596)

    • dimension_separator

  • The following features that were supported by Zarr-Python 2 have not been ported to Zarr-Python 3 yet:

    • Structured arrays / dtypes (#2134)

    • Fixed-length string dtypes (#2347)

    • Datetime and timedelta dtypes (#2616)

    • Object dtypes (#2617)

    • Ragged arrays (#2618)

    • Groups and Arrays do not implement __enter__ and __exit__ protocols (#2619)

    • Big Endian dtypes (#2324)

    • Default filters for object dtypes for Zarr format 2 arrays (#2627)