Fix SlotNotCoveredError when cluster is resharding #2989

dinispeixoto · 2023-10-06T14:56:16Z

Pull Request check-list

Please make sure to review and check all of these items:

Do tests and lints pass with this change?
Do the CI tests pass with this change (enable it first in your forked repo and wait for the github action build to finish)?
Is the new or changed code fully tested?
Is a documentation update included (if this change modifies existing APIs, or introduces new ones)?
Is there an example added to the examples folder (if applicable)?
Was the change added to CHANGES file?

NOTE: these things are not required to open a PR and can be done
afterwards / while the PR is open.

Description of change

Please provide a description of the change here.

Fixes #2988

…s-resharding

johan-seesaw · 2023-11-15T20:24:36Z

Interestingly I think I hit roughly the same bug today but in the sync version of this code.

https://github.com/redis/redis-py/blob/master/redis/cluster.py#L1392

  File "/python311/lib64/python3.11/site-packages/redis/cluster.py", line 1115, in execute_command
    raise e
  File "/python311/lib64/python3.11/site-packages/redis/cluster.py", line 1101, in execute_command
    res[node.name] = self._execute_command(node, *args, **kwargs)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/python311/lib64/python3.11/site-packages/redis/cluster.py", line 1210, in _execute_command
    raise e
  File "/python311/lib64/python3.11/site-packages/redis/cluster.py", line 1138, in _execute_command
    target_node = self.nodes_manager.get_node_from_slot(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/python311/lib64/python3.11/site-packages/redis/cluster.py", line 1425, in get_node_from_slot
    return self.slots_cache[slot][node_idx]
           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
IndexError: list index out of range

I don't really understand why these two seemly very similar NodesManagers exist in isolation on the sync/asyncio versions of this library. The asyncio version catches the IndexOutOfRange and exposes it as a SlotNotCoveredError, while the sync version passes it through unchanged. But the same fix I think would apply to both.

johan-seesaw · 2023-11-15T21:38:08Z

redis/asyncio/cluster.py

+
+                # we use the node returned by RR in the load balancer
+                # if it's part of the slots cache, otherwise we use primary
+                node = node_idx if node_idx < len(self.slots_cache[slot]) else 0


The only case I can see this happening, is the LoadBalancer object having a pre-existing history with a list_size of lets say 3 nodes for the given primary. If an event happens where the list is no longer 3 (lets say new list is size 1), and we enter LoadBalancer with the existing dictionary having a value of 2 for this primary, then we will return 2 from the get_server_index method, but it will perform a % 1 operation before storing the "next value" in the dictionary.

Perhaps a simpler rewrite would be to store the last-used value, not the next-used value in the LoadBalancer class, then there would only be one modulo operation, and we would always be performing it with the current list size.

def get_server_index(self, primary: str, list_size: int) -> int: # default to -1 if not found, so after incrementing it will be 0 server_index = (self.primary_to_idx.get(primary, -1) + 1) % list_size self.primary_to_idx[primary] = server_index return server_index

petyaslavova · 2025-05-14T05:14:25Z

SlotNotCoveredError is now handled by the cluster's retry mechanism, so I'm closing this PR as the issue has been addressed.

dinispeixoto added 4 commits October 6, 2023 15:55

Fix SlotNotCoveredError when cluster is resharding

c0e6e2b

Merge branch 'master' into fix-slots-not-covered-error-when-cluster-i…

196f439

…s-resharding

Merge branch 'master' into fix-slots-not-covered-error-when-cluster-i…

f177f04

…s-resharding

Fix logic to check if index is in slots cache list

d31e0f3

johan-seesaw reviewed Nov 15, 2023

View reviewed changes

johan-seesaw mentioned this pull request Mar 13, 2024

Fix get_node_from_slot to handle resharding #3182

Open

6 tasks

petyaslavova closed this May 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix SlotNotCoveredError when cluster is resharding #2989

Fix SlotNotCoveredError when cluster is resharding #2989

dinispeixoto commented Oct 6, 2023

johan-seesaw commented Nov 15, 2023 •

edited

Loading

johan-seesaw Nov 15, 2023 •

edited

Loading

petyaslavova commented May 14, 2025

Fix SlotNotCoveredError when cluster is resharding #2989

Fix SlotNotCoveredError when cluster is resharding #2989

Conversation

dinispeixoto commented Oct 6, 2023

Pull Request check-list

Description of change

johan-seesaw commented Nov 15, 2023 • edited Loading

johan-seesaw Nov 15, 2023 • edited Loading

Choose a reason for hiding this comment

petyaslavova commented May 14, 2025

johan-seesaw commented Nov 15, 2023 •

edited

Loading

johan-seesaw Nov 15, 2023 •

edited

Loading