Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
I have a hunch why our connections keep dropping. `rethinkdbdash` uses connection pooling, this means they keep a certain number of connections to the database constantly open and reuse them, so you only incur the connection overhead once and every subsequent query just reuses an existing connection. By default, `rethinkdbdash` sets the minimum number of open connections to 50, the maximum number of connections to 1000 and the time for a connection to shut down to 60 minutes. Those are very sensible defaults, as 1000 TCP connections are easily handled by any server, and once they're established why not keep them around? By having a minimum you can quickly serve the common use case, and when a spike of traffic comes in you create a ton of new connections and then keep them around if the traffic spike continues. Our issue is that we, at any given moment, have 4+ instances of iris plus 3 other workers running. Assuming we hit a traffic spike that means 7 * 1000 = 7000 TCP connections to a single proxy, which I think is just too much. This patch adjusts the rethindbdash connection pooling options so that workers have, by default if nothing is going on, 10 open connections and at most open 100 at peak traffic time. (since they aren't time critical, 100 should be more than enough) They also shut them down after one minute rather than one hour. Iris on the other hand is now limited to a maximum of 500 connections, keeping the default minimum of 50 open at any given time. Those connections also shut down after one minute rather than one hour. This means that even if all our instances of Iris get slammed for a second it'll at most be slow for one minute, since after that rethinkdbdash will kill the unnecessary connections and things'll go back to fine again. ---- Note: I have no fucking clue if these numbers are correct, nor how this will impact the performance of our stuff. I want to test this out to figure out if it helps, if necessary we can always adjust these numbers again. (or go back to the default ones) Let's keep a close eye on performance after launching these changes. I'd like to get todays other performance patches out first, so let's let this sit for a day or so so we can see what an impact those patches had on loading times in Apex Ping. For more information about rethinkdbdash option ssee https://github.com/neumino/rethinkdbdash#importing-the-driver
- Loading branch information