Skip to content

RedisCluster client failing to reconnect to AWS Elasticache cluster after node failover #3284

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
avnandu opened this issue Jun 19, 2024 · 1 comment

Comments

@avnandu
Copy link

avnandu commented Jun 19, 2024

Version: redis-py version 5.0.4

Platform: Python 3.9

Description: After node failovers, our RedisCluster clients sometimes fail to reconnect to the cluster. When we run test failovers on our cluster (only failing over one primary node one time), usually clients are able to reconnect after a short period of ConnectionError/TimeoutErrors. However, when we do node type upgrades, for example, or any other type of action that will cause all nodes in a cluster to failover, the clients persistently cannot reconnect and throw RedisClusterException:Redis Cluster cannot be connected. Please provide at least one reachable node: <None, or some IP, Timeout connecting to server>. We also see persistent TimeoutError when this happens. Our Elasticache redis cluster instances are running redis engine 6.X. We are wondering if we are configuring the client wrong in some way. The client only gets initialized once so we are not creating new clients/connections for each redis command.
Example of how we're initializing the client:

        args.update(
            host=self.host, # Elasticache endpoint
            port=self.port,  # default redis port
            read_from_replicas=True,
            retry=self.retry, # Retry(backoff=FullJitterBackoff(), retries=7)
            ssl=self.ssl,
            ssl_cert_reqs=None
        )
        return RedisCluster(**args)

Are there any other params to the client that we need to include? I was wondering if dynamic_startup_nodes could have something to do with the issue.

@petyaslavova
Copy link
Collaborator

Related to issue #3604

Same type of use case is explained in the mentioned above issue and possible resolutions are provided.
Closing this issue for now. Please feel free to reopen it if further assistance is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants