-
-
Notifications
You must be signed in to change notification settings - Fork 350
[do not merge] proof of concept for unified v2 / v3 codecs #3276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
d-v-b
wants to merge
129
commits into
zarr-developers:main
Choose a base branch
from
d-v-b:feat/numcodecs-compat
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… registry load frequency, add object_codec_id for v2 json deserialization
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3276 +/- ##
==========================================
- Coverage 59.56% 58.15% -1.42%
==========================================
Files 78 79 +1
Lines 8684 9057 +373
==========================================
+ Hits 5173 5267 +94
- Misses 3511 3790 +279
🚀 New features to boost your workflow:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
this PR is a proof of concept that demonstrates how we can augment our
Codec
API to support v2 or v3 codecs.Here are two motivating examples that run under this PR and fail in
main
:gzip compression
This example shows that we can use the exact same codec (
GZipCodec
ORnumcodecs.GZip
) with zarr v2 or zarr v3to create an array. This fails on main if you try to use
GZipCodec
with zarr v2, ornumcodecs.GZip
with zarr v3, even though the underlying gzip compression is identical.jpeg compression
This example demonstrates using the
Jpeg
codec defined inimagecodecs
as a compressor for a v2 array and a serializer for a v3 array. Data can be written and read back.create_array
as an object that implements thenumcodecs.abc.Codec
APICodec
API.What's this requires
This functionality requires a set of changes that I would like to introduce in a series of PRs:
to_json(zarr_format: Literal[2,3]
methods to all the codecs, so the same object can generate zarr v2 or zarr v3 metadata. example.from_json(data, zarr_format: Literal[2,3])
methods to all the codecs, so the same object can be created from zarr v2 metadata or zarr v3 metadata. example.numcodecs,abc.Codec
API, so we can interact with numcodecs objects type-safely without a numcodecs dependency. example.Codec
class that wraps a numcodec-like object and endows it with full codec powers. These numcodec-adapter objects can be cast to array-array codec, array-bytes-codec, or bytes-bytes codec as needed. example... There might be more changes that I'm forgetting right now, but these are the big ones.
We also need checks to ensure that a codec claiming to be "gzip" generates gzip-compatible metadata. There are a few ways of doing this (inspect the metadata it generates, or replace it with the in-house codec with the same name), I haven't implemented either option in this PR.
What I would like to do next
If we all agree on this strategy, I would like to start breaking this PR into separate segments and getting them merged. I think we can do all of this in a non-breaking manner, so hopefully we don't need this to be part of 3.2