SlotNotCoveredError when cluster is resharding #2988

dinispeixoto · 2023-10-06T14:28:35Z

Version: What redis-py and what redis version is the issue happening on?

Client: 4.5.5
Cluster Engine: 6.2.6

Platform: What platform / version? (For example Python 3.5.1 on Windows 7 / Ubuntu 15.10 / Azure)

ECS and ElastiCache

Description: Description of your issue, stack traces from errors and code that reproduces the issue

When the cluster is up or down scaling, while slots are being migrated to the new shard, the client raises the following error:

redis.exceptions.SlotNotCoveredError: Command # 9 (HGETALL ...) of pipeline caused error: ('Slot "4718" not covered by the cluster. "require_full_coverage=True"',)

Full stack trace:

  File "/root/clients/redis_client.py", line 219, in run_queries
    return await pipeline.execute()
  File "/usr/local/lib/python3.8/dist-packages/redis/asyncio/cluster.py", line 1455, in execute
    return await self._execute(
  File "/usr/local/lib/python3.8/dist-packages/redis/asyncio/cluster.py", line 1536, in _execute
    raise result
  File "/usr/local/lib/python3.8/dist-packages/redis/asyncio/cluster.py", line 1520, in _execute
    cmd.result = await client.execute_command(
  File "/usr/local/lib/python3.8/dist-packages/redis/asyncio/cluster.py", line 725, in execute_command
    raise e
  File "/usr/local/lib/python3.8/dist-packages/redis/asyncio/cluster.py", line 696, in execute_command
    ret = await self._execute_command(target_nodes[0], *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/redis/asyncio/cluster.py", line 745, in _execute_command
    target_node = self.nodes_manager.get_node_from_slot(
  File "/usr/local/lib/python3.8/dist-packages/redis/asyncio/cluster.py", line 1174, in get_node_from_slot
    raise SlotNotCoveredError(
redis.exceptions.SlotNotCoveredError: Command # 6 (HGET ... ...) of pipeline caused error: ('Slot "11246" not covered by the cluster. "require_full_coverage=True"',)

Reproduce locally
It's also possible to reproduce this locally quite easily:

Setup a Redis cluster locally (https://github.com/Grokzen/docker-redis-cluster)
Create a simple Redis app that leverages a pipeline of read-only commands and run a load test to force multiple reads. I also made sure there was a different client writing non-stop to the local cluster.
Force resharding of a few slots (e.g. 5k) with redis-cli --cluster reshard 0.0.0.0:7000
Redis app should raise SlotNotCoveredError quite often while the migration of slots is happening, once it's done it stops.

Possible root cause
I tried debugging the library and here's what I could find so far:

The slots cache is outdated, the slot we need is in a different shard. Client replaces self.slots_cache[slot] with the redirected_node (returned by the MOVED error) which is only the master node for the new shard of the slot.

https://github.com/redis/redis-py/blob/master/redis/asyncio/cluster.py#L1177

else:
    # The new slot owner is a new server, or a server from a different
    # shard. We need to remove all current nodes from the slot's list
    # (including replications) and add just the new node.
    self.slots_cache[e.slot_id] = [redirected_node]

However at the same time, we might call NodesManager.get_node_from_slot. Since we are only reading from read replicas it starts by getting the server index using the RR load balancer. Let's say we have 1 master node and 2 replicas for each shard, it can return an index from 0 to 2. However, as we've seen earlier for this specific slot the slots cache only has a single entry (the master node). This essentially means that if the RR load balancer returns either 1 or 2, it will raise an IndexError, thus a SlotNotCoveredError.

https://github.com/redis/redis-py/blob/master/redis/asyncio/cluster.py#L1191C1-L1204C14

        try:
            if read_from_replicas:
                # get the server index in a Round-Robin manner
                primary_name = self.slots_cache[slot][0].name
                node_idx = self.read_load_balancer.get_server_index(
                    primary_name, len(self.slots_cache[slot])
                )
                return self.slots_cache[slot][node_idx]
            return self.slots_cache[slot][0]
        except (IndexError, TypeError):
            raise SlotNotCoveredError(
                f'Slot "{slot}" not covered by the cluster. '
                f'"require_full_coverage={self.require_full_coverage}"'
            )

In order to fix the issue I just patched the method to check whether the index returned by the load balancer is available in the slots cache and in case it isn't just get it from the primary node instead.

                node_idx = self.read_load_balancer.get_server_index(
                    primary_name, len(self.slots_cache[slot])
                )

                node = node_idx if node_idx < len(self.slots_cache[slot]) else 0
                return self.slots_cache[slot][node]

Happy to create a PR if this makes sense. Sorry if I'm missing something.

Thanks in advance 🙌

The text was updated successfully, but these errors were encountered:

vishalwadhwa13 · 2024-01-25T09:51:42Z

Facing this same issue with autoscaling Redis in v5.0.1

noorul · 2025-03-26T08:05:02Z

@dinispeixoto How did you solve this in your application as this is yet to be fixed?

petyaslavova · 2025-05-09T13:13:35Z

Fixed with PR #3621

dinispeixoto mentioned this issue Oct 6, 2023

Fix SlotNotCoveredError when cluster is resharding #2989

Closed

6 tasks

johan-seesaw mentioned this issue Mar 13, 2024

Fix get_node_from_slot to handle resharding #3182

Open

6 tasks

petyaslavova closed this as completed May 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SlotNotCoveredError when cluster is resharding #2988

SlotNotCoveredError when cluster is resharding #2988

dinispeixoto commented Oct 6, 2023 •

edited

Loading

vishalwadhwa13 commented Jan 25, 2024 •

edited

Loading

noorul commented Mar 26, 2025

petyaslavova commented May 9, 2025

SlotNotCoveredError when cluster is resharding #2988

SlotNotCoveredError when cluster is resharding #2988

Comments

dinispeixoto commented Oct 6, 2023 • edited Loading

vishalwadhwa13 commented Jan 25, 2024 • edited Loading

noorul commented Mar 26, 2025

petyaslavova commented May 9, 2025

dinispeixoto commented Oct 6, 2023 •

edited

Loading

vishalwadhwa13 commented Jan 25, 2024 •

edited

Loading