-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Fix get_node_from_slot to handle resharding #3182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
58254be
to
f9f2a39
Compare
@gerzse I was wondering if there might be any bandwidth to review this PR? |
Hi @gerzse I've added UTs, a change test, and fixed a bug. My initial commit was intended to solicit feedback before performing those actions, but I figured I'd go ahead and complete them in hope this change might make it in. I've structured two commits that you can see, and eventually squash into one before merging. It demonstrates the problem area here and here (notably, two separate exception types for effectively the same issue). The actual-fix commit eliminates the cause demonstrated by those tests. |
@johan-seesaw How did you fix it in the application as this is taking time to get merged? |
Pull Request check-list
Please make sure to review and check all of these items:
NOTE: these things are not required to open a PR and can be done
afterwards / while the PR is open.
Description of change
Introduce a new Enum and optional flag value to allow reading only from replicas, if the command supports it. It appears the sync version ofRedisCluster
used to support this whenserver_type
was passed in toget_node_from_slot
, but that parameter isn't set anywhere.This PR addresses the issue of reading from replicas while resharding, which can cause index failures.
It is an alternative, and more comprehensive solution to #2989, but in both sync and asyncio implementations. (#2988).
This implementation moves the logic to a shared location between the asyncio and the sync versions of the library. I have a follow on PR to introduce additional read-only modes that was initially part of this PR, but has been kept separate to hopefully increase the likelihood that this PR can get merged.
Before the change, we stored the next replica to read from in the
primary_to_idx
cache. After the change we store the last read replica/primary. This is important for the following example:1
, settingprimary_to_idx
to value 2.2
.get_node_from_slot
reads the next-node value as2
, and returns that (and sets the next-node value to1
, due to2+1%2 = 1
, skipping the primary on the next run)slot_nodes[2]
, which doesn't exist, and gives us an index exception.After, we only store the last-read, not the next-value, so the situation would unfold as follows:
1
, settingprimary_name_to_last_used_index
to value 1.2
.get_node_from_slot
reads the last-read-node value as1
, increments, and modulos against length of 2, resulting in the value of0
being stored inprimary_name_to_last_used_index
and returnedslot_nodes[0]
, the primary, as expected, and everything works.Fixes #2988