Skip to content

SlotNotCoveredError when cluster is resharding #2988

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dinispeixoto opened this issue Oct 6, 2023 · 3 comments · May be fixed by #3182
Closed

SlotNotCoveredError when cluster is resharding #2988

dinispeixoto opened this issue Oct 6, 2023 · 3 comments · May be fixed by #3182

Comments

@dinispeixoto
Copy link

dinispeixoto commented Oct 6, 2023

Version: What redis-py and what redis version is the issue happening on?

  • Client: 4.5.5
  • Cluster Engine: 6.2.6

Platform: What platform / version? (For example Python 3.5.1 on Windows 7 / Ubuntu 15.10 / Azure)

  • ECS and ElastiCache

Description: Description of your issue, stack traces from errors and code that reproduces the issue

When the cluster is up or down scaling, while slots are being migrated to the new shard, the client raises the following error:

redis.exceptions.SlotNotCoveredError: Command # 9 (HGETALL ...) of pipeline caused error: ('Slot "4718" not covered by the cluster. "require_full_coverage=True"',)

Full stack trace:

  File "/root/clients/redis_client.py", line 219, in run_queries
    return await pipeline.execute()
  File "/usr/local/lib/python3.8/dist-packages/redis/asyncio/cluster.py", line 1455, in execute
    return await self._execute(
  File "/usr/local/lib/python3.8/dist-packages/redis/asyncio/cluster.py", line 1536, in _execute
    raise result
  File "/usr/local/lib/python3.8/dist-packages/redis/asyncio/cluster.py", line 1520, in _execute
    cmd.result = await client.execute_command(
  File "/usr/local/lib/python3.8/dist-packages/redis/asyncio/cluster.py", line 725, in execute_command
    raise e
  File "/usr/local/lib/python3.8/dist-packages/redis/asyncio/cluster.py", line 696, in execute_command
    ret = await self._execute_command(target_nodes[0], *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/redis/asyncio/cluster.py", line 745, in _execute_command
    target_node = self.nodes_manager.get_node_from_slot(
  File "/usr/local/lib/python3.8/dist-packages/redis/asyncio/cluster.py", line 1174, in get_node_from_slot
    raise SlotNotCoveredError(
redis.exceptions.SlotNotCoveredError: Command # 6 (HGET ... ...) of pipeline caused error: ('Slot "11246" not covered by the cluster. "require_full_coverage=True"',)

Reproduce locally
It's also possible to reproduce this locally quite easily:

  1. Setup a Redis cluster locally (https://github.com/Grokzen/docker-redis-cluster)
  2. Create a simple Redis app that leverages a pipeline of read-only commands and run a load test to force multiple reads. I also made sure there was a different client writing non-stop to the local cluster.
  3. Force resharding of a few slots (e.g. 5k) with redis-cli --cluster reshard 0.0.0.0:7000
  4. Redis app should raise SlotNotCoveredError quite often while the migration of slots is happening, once it's done it stops.

Possible root cause
I tried debugging the library and here's what I could find so far:

  1. The slots cache is outdated, the slot we need is in a different shard. Client replaces self.slots_cache[slot] with the redirected_node (returned by the MOVED error) which is only the master node for the new shard of the slot.

https://github.com/redis/redis-py/blob/master/redis/asyncio/cluster.py#L1177

else:
    # The new slot owner is a new server, or a server from a different
    # shard. We need to remove all current nodes from the slot's list
    # (including replications) and add just the new node.
    self.slots_cache[e.slot_id] = [redirected_node]
  1. However at the same time, we might call NodesManager.get_node_from_slot. Since we are only reading from read replicas it starts by getting the server index using the RR load balancer. Let's say we have 1 master node and 2 replicas for each shard, it can return an index from 0 to 2. However, as we've seen earlier for this specific slot the slots cache only has a single entry (the master node). This essentially means that if the RR load balancer returns either 1 or 2, it will raise an IndexError, thus a SlotNotCoveredError.

https://github.com/redis/redis-py/blob/master/redis/asyncio/cluster.py#L1191C1-L1204C14

        try:
            if read_from_replicas:
                # get the server index in a Round-Robin manner
                primary_name = self.slots_cache[slot][0].name
                node_idx = self.read_load_balancer.get_server_index(
                    primary_name, len(self.slots_cache[slot])
                )
                return self.slots_cache[slot][node_idx]
            return self.slots_cache[slot][0]
        except (IndexError, TypeError):
            raise SlotNotCoveredError(
                f'Slot "{slot}" not covered by the cluster. '
                f'"require_full_coverage={self.require_full_coverage}"'
            )
  1. In order to fix the issue I just patched the method to check whether the index returned by the load balancer is available in the slots cache and in case it isn't just get it from the primary node instead.
                node_idx = self.read_load_balancer.get_server_index(
                    primary_name, len(self.slots_cache[slot])
                )

                node = node_idx if node_idx < len(self.slots_cache[slot]) else 0
                return self.slots_cache[slot][node]

Happy to create a PR if this makes sense. Sorry if I'm missing something.

Thanks in advance 🙌

@vishalwadhwa13
Copy link

vishalwadhwa13 commented Jan 25, 2024

Facing this same issue with autoscaling Redis in v5.0.1

@noorul
Copy link

noorul commented Mar 26, 2025

@dinispeixoto How did you solve this in your application as this is yet to be fixed?

@petyaslavova
Copy link
Collaborator

Fixed with PR #3621

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants