You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am able to perform the rolling restart on PgBouncer without errors with the exact same (where applicable) settings.
To Reproduce
Steps to reproduce the behavior:
Deploy pgcat on a Kubernetes cluster.
Configure client-side connection pooling on the application with a short connection recycle (i.e., lifetime) period that is shorter than the graceful termination period for pgcat. This should be a python application using sqlalchemy. pre-ping should not be set.
After the application has connected, perform a rolling deploy.
Expected behavior
Rolling restart should not produce any connection errors on the client's side.
Additional context
I believe this issue stems from the lack of an optimistic connection health check in the codebase I work with. sqlalchemy (a python ORM) can perform a health check query when a connection is checked out of its connection pool, but this setting is not enabled by default and my codebase does not use it.
Assuming this is the issue, I am not sure if I would consider this a bug or a feature request. Ideally there would be option to allow clients to disconnect only when they initiate it. I believe this is what pgbouncer is doing. In pgcat, on shutdown request, I see messages like the following:
I have connection lifetime in sqlalchemy set to 120 seconds, so the fact I am seeing clients disconnect earlier than this makes me think pgcat is initiating this. In pgbouncer, I see that it takes the full 2 minutes for clients to drain. I would expect to see something like this:
Essentially, I was expecting graceful shutdown to by default implement pgbouncer's SHUTDOWN WAIT_FOR_CLIENTS behavior:
Stop accepting new connections and shutdown the process once all existing clients have disconnected.
Based on their zero-downtime example right below that quote:
Run SHUTDOWN WAIT_FOR_CLIENTS (or send SIGTERM) to process A.
Cause all clients to reconnect. Possibly by waiting some time until the client side pooler causes reconnects due to its server_idle_timeout (or similar config). Or if no client side pooler is used, possibly by restarting the clients. Once all clients have reconnected. Process A will exit automatically, because no clients are connected to it anymore.
It seems to me that pgbouncer is just waiting for the clients to disconnect on their own terms. There's no informing the client of a shutdown, as far as I can tell. This may be inefficient, but it seems like a much safer default.
From what I understand of pgcat's code, which is very little since I do not know rust, it looks like you are explicitly sending a termination message to the client if they are idle. This would break clients that aren't handling that message and are expecting to checkout a connection that is in good health.
Correct me if I am completely misunderstanding how the code works.
Describe the bug
Performing a rolling restart of Kubernetes-deployed pgcat results in a portion of client connections failing with the error message:
I am able to perform the rolling restart on PgBouncer without errors with the exact same (where applicable) settings.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Rolling restart should not produce any connection errors on the client's side.
Additional context
I believe this issue stems from the lack of an optimistic connection health check in the codebase I work with. sqlalchemy (a python ORM) can perform a health check query when a connection is checked out of its connection pool, but this setting is not enabled by default and my codebase does not use it.
Assuming this is the issue, I am not sure if I would consider this a bug or a feature request. Ideally there would be option to allow clients to disconnect only when they initiate it. I believe this is what pgbouncer is doing. In pgcat, on shutdown request, I see messages like the following:
I have connection lifetime in sqlalchemy set to 120 seconds, so the fact I am seeing clients disconnect earlier than this makes me think pgcat is initiating this. In pgbouncer, I see that it takes the full 2 minutes for clients to drain. I would expect to see something like this:
The text was updated successfully, but these errors were encountered: