Skip to content

Commit

Permalink
[DOCS] Fine-tunes connection pool and selectors chapters.
Browse files Browse the repository at this point in the history
  • Loading branch information
szabosteve committed Apr 17, 2020
1 parent 88b7e1c commit 2287b8c
Show file tree
Hide file tree
Showing 2 changed files with 124 additions and 79 deletions.
137 changes: 82 additions & 55 deletions docs/connection-pool.asciidoc
Original file line number Diff line number Diff line change
@@ -1,37 +1,44 @@
[[connection_pool]]
== Connection Pool

The connection pool is an object inside the client that is responsible for maintaining the current list of nodes.
Theoretically, nodes are either dead or alive.

However, in the real world, things are never so clear. Nodes are sometimes in a gray-zone of _"probably dead but not
confirmed"_, _"timed-out but unclear why"_ or _"recently dead but now alive"_. The connection pool's job is to
manage this set of unruly connections and try to provide the best behavior to the client.

If a connection pool is unable to find an alive node to query against, it will return a `NoNodesAvailableException`.
This is distinct from an exception due to maximum retries. For example, your cluster may have 10 nodes. You execute
a request and 9 out of the 10 nodes fail due to connection timeouts. The tenth node succeeds and the query executes.
The first nine nodes will be marked dead (depending on the connection pool being used) and their "dead" timers will begin
The connection pool is an object inside the client that is responsible for
maintaining the current list of nodes. Theoretically, nodes are either dead or
alive. However, in the real world, things are never so clear. Nodes are
sometimes in a gray-zone of _"probably dead but not confirmed"_, _"timed-out but
unclear why"_ or _"recently dead but now alive"_. The job of the connection pool
is to manage this set of unruly connections and try to provide the best behavior
to the client.

If a connection pool is unable to find an alive node to query against, it
returns a `NoNodesAvailableException`. This is distinct from an exception due to
maximum retries. For example, your cluster may have 10 nodes. You execute a
request and 9 out of the 10 nodes fail due to connection timeouts. The tenth
node succeeds and the query executes. The first nine nodes are marked dead
(depending on the connection pool being used) and their "dead" timers begin
ticking.

When the next request is sent to the client, nodes 1-9 are still considered "dead", so they will be skipped. The request
is sent to the only known alive node (#10), and if this node fails, a `NoNodesAvailableException` is returned. You'll note
this is much less than the `retries` value, because `retries` only applies to retries against alive nodes. In this case,
only one node is known to be alive, so `NoNodesAvailableException` is returned.

When the next request is sent to the client, nodes 1-9 are still considered
"dead", so they are skipped. The request is sent to the only known alive node
(#10), and if this node fails, a `NoNodesAvailableException` is returned. You
will note this much less than the `retries` value, because `retries` only
applies to retries against alive nodes. In this case, only one node is known to
be alive, so `NoNodesAvailableException` is returned.

There are several connection pool implementations that you can choose from:


=== staticNoPingConnectionPool (default)

This connection pool maintains a static list of hosts, which are assumed to be alive when the client initializes. If
a node fails a request, it is marked as `dead` for 60 seconds and the next node is tried. After 60 seconds, the node
is revived and put back into rotation. Each additional failed request will cause the dead timeout to increase exponentially.
This connection pool maintains a static list of hosts which are assumed to be
alive when the client initializes. If a node fails a request, it is marked as
`dead` for 60 seconds and the next node is tried. After 60 seconds, the node is
revived and put back into rotation. Each additional failed request causes the
dead timeout to increase exponentially.

A successful request will reset the "failed ping timeout" counter.
A successful request resets the "failed ping timeout" counter.

If you wish to explicitly set the `StaticNoPingConnectionPool` implementation, you may do so with the `setConnectionPool()`
method of the ClientBuilder object:
If you wish to explicitly set the `StaticNoPingConnectionPool` implementation,
you may do so with the `setConnectionPool()` method of the ClientBuilder object:

[source,php]
----
Expand All @@ -42,10 +49,13 @@ $client = ClientBuilder::create()

Note that the implementation is specified via a namespace path to the class.


=== staticConnectionPool

Identical to the `StaticNoPingConnectionPool`, except it pings nodes before they are used to determine if they are alive.
This may be useful for long-running scripts, but tends to be additional overhead that is unnecessary for average PHP scripts.
Identical to the `StaticNoPingConnectionPool`, except it pings nodes before they
are used to determine if they are alive. This may be useful for long-running
scripts but tends to be additional overhead that is unnecessary for average PHP
scripts.

To use the `StaticConnectionPool`:

Expand All @@ -58,13 +68,15 @@ $client = ClientBuilder::create()

Note that the implementation is specified via a namespace path to the class.


=== simpleConnectionPool

The `SimpleConnectionPool` simply returns the next node as specified by the Selector; it does not perform track
the "liveness" of nodes. This pool will return nodes whether they are alive or dead. It is just a simple pool of static
hosts.
The `SimpleConnectionPool` returns the next node as specified by the selector;
it does not track node conditions. It returns nodes either they are dead or
alive. It is a simple pool of static hosts.

The `SimpleConnectionPool` is not recommended for routine use, but it may be a useful debugging tool.
The `SimpleConnectionPool` is not recommended for routine use but it may be a
useful debugging tool.

To use the `SimpleConnectionPool`:

Expand All @@ -77,11 +89,13 @@ $client = ClientBuilder::create()

Note that the implementation is specified via a namespace path to the class.


=== sniffingConnectionPool

Unlike the two previous static connection pools, this one is dynamic. The user provides a seed list of hosts, which the
client uses to "sniff" and discover the rest of the cluster. It achieves this through the Cluster State API. As new
nodes are added or removed from the cluster, the client will update it's pool of active connections.
Unlike the two previous static connection pools, this one is dynamic. The user
provides a seed list of hosts, which the client uses to "sniff" and discover the
rest of the cluster by using the Cluster State API. As new nodes are added or
removed from the cluster, the client updates its pool of active connections.

To use the `SniffingConnectionPool`:

Expand All @@ -97,7 +111,8 @@ Note that the implementation is specified via a namespace path to the class.

=== Custom Connection Pool

If you wish to implement your own custom Connection Pool, your class must implement `ConnectionPoolInterface`:
If you wish to implement your own custom Connection Pool, your class must
implement `ConnectionPoolInterface`:

[source,php]
----
Expand All @@ -124,7 +139,9 @@ class MyCustomConnectionPool implements ConnectionPoolInterface
}
----

You can then instantiate an instance of your ConnectionPool and inject it into the ClientBuilder:

You can then instantiate an instance of your ConnectionPool and inject it into
the ClientBuilder:

[source,php]
----
Expand All @@ -135,9 +152,11 @@ $client = ClientBuilder::create()
->build();
----

If your connection pool only makes minor changes, you may consider extending `AbstractConnectionPool`, which provides
some helper concrete methods. If you choose to go down this route, you need to make sure your ConnectionPool's implementation
has a compatible constructor (since it is not defined in the interface):
If your connection pool only makes minor changes, you may consider extending
`AbstractConnectionPool` which provides some helper concrete methods. If you
choose to go down this route, you need to make sure your ConnectionPool
implementation has a compatible constructor (since it is not defined in the
interface):

[source,php]
----
Expand Down Expand Up @@ -169,7 +188,9 @@ class MyCustomConnectionPool extends AbstractConnectionPool implements Connectio
}
----

If your constructor matches AbstractConnectionPool, you may use either object injection or namespace instantiation:

If your constructor matches AbstractConnectionPool, you may use either object
injection or namespace instantiation:

[source,php]
----
Expand All @@ -184,21 +205,27 @@ $client = ClientBuilder::create()

=== Which connection pool to choose? PHP and connection pooling

At first glance, the `sniffingConnectionPool` implementation seems superior. For many languages, it is. In PHP, the
conversation is a bit more nuanced.

Because PHP is a share-nothing architecture, there is no way to maintain a connection pool across script instances.
This means that every script is responsible for creating, maintaining, and destroying connections everytime the script
is re-run.

Sniffing is a relatively lightweight operation (one API call to `/_cluster/state`, followed by pings to each node) but
it may be a non-negligible overhead for certain PHP applications. The average PHP script will likely load the client,
execute a few queries and then close. Imagine this script being called 1000 times per second: the sniffing connection
pool will perform the sniffing and pinging process 1000 times per second. The sniffing process will add a large
amount of overhead

In reality, if your script only executes a few queries, the sniffing concept is _too_ robust. It tends to be more
useful in long-lived processes which potentially "out-live" a static list.

For this reason the default connection pool is currently the `staticNoPingConnectionPool`. You can, of course, change
this default - but we strongly recommend you load test and verify that it does not negatively impact your performance.
At first glance, the `sniffingConnectionPool` implementation seems superior. For
many languages, it is. In PHP, the conversation is a bit more nuanced.

Because PHP is a share-nothing architecture, there is no way to maintain a
connection pool across script instances. This means that every script is
responsible for creating, maintaining, and destroying connections everytime the
script is re-run.

Sniffing is a relatively lightweight operation (one API call to
`/_cluster/state`, followed by pings to each node) but it may be a
non-negligible overhead for certain PHP applications. The average PHP script
likely loads the client, executes a few queries and then closes. Imagine that
this script being called 1000 times per second: the sniffing connection pool
performS the sniffing and pinging process 1000 times per second. The sniffing
process eventually adds a large amount of overhead.

In reality, if your script only executes a few queries, the sniffing concept is
_too_ robust. It tends to be more useful in long-lived processes which
potentially "out-live" a static list.

For this reason the default connection pool is currently the
`staticNoPingConnectionPool`. You can, of course, change this default - but we
strongly recommend you to perform load test and to verify that the change does
not negatively impact the performance.
66 changes: 42 additions & 24 deletions docs/selectors.asciidoc
Original file line number Diff line number Diff line change
@@ -1,19 +1,24 @@
[[selectors]]
== Selectors

The connection pool maintains the list of connections, and decides when nodes should transition from alive to dead (and
vice versa). It has no logic to choose connections, however. That job belongs to the Selector class.
The connection pool maintains the list of connections, and decides when nodes
should transition from alive to dead (and vice versa). It has no logic to choose
connections, however. That job belongs to the selector class.

The job of a selector is to return a single connection from a provided array of
connections. Like the connection pool, there are several implementations to
choose from.

The selector's job is to return a single connection from a provided array of connections. Like the Connection Pool,
there are several implementations to choose from.

=== RoundRobinSelector (Default)

This selector returns connections in a round-robin fashion. Node #1 is selected on the first request, Node #2 on
the second request, etc. This ensures an even load of traffic across your cluster. Round-robin'ing happens on a
per-request basis (e.g. sequential requests go to different nodes).
This selector returns connections in a round-robin fashion. Node #1 is selected
on the first request, Node #2 on the second request, and so on. This ensures an
even load of traffic across your cluster. Round-robining happens on a
per-request basis (for example sequential requests go to different nodes).

The `RoundRobinSelector` is default, but if you wish to explicitily configure it you can do:
The `RoundRobinSelector` is default but if you wish to explicitly configure it
you can do:

[source,php]
----
Expand All @@ -24,21 +29,28 @@ $client = ClientBuilder::create()

Note that the implementation is specified via a namespace path to the class.

=== StickyRoundRobinSelector

This selector is "sticky", in that it prefers to reuse the same connection repeatedly. For example, Node #1 is chosen
on the first request. Node #1 will continue to be re-used for each subsequent request until that node fails. Upon failure,
the selector will round-robin to the next available node, then "stick" to that node.

This is an ideal strategy for many PHP scripts. Since PHP scripts are shared-nothing and tend to exit quickly, creating
new connections for each request is often a sub-optimal strategy and introduces a lot of overhead. Instead, it is
better to "stick" to a single connection for the duration of the script.

By default, this selector will randomize the hosts upon initialization, which will still guarantee an even distribution
of load across the cluster. It changes the round-robin dynamics from per-request to per-script.
=== StickyRoundRobinSelector

If you are using <<future_mode>>, the "sticky" behavior of this selector will be non-ideal, since all parallel requests
will go to the same node instead of multiple nodes in your cluster. When using future mode, the default `RoundRobinSelector`
This selector is "sticky", so that it prefers to reuse the same connection
repeatedly. For example, Node #1 is chosen on the first request. Node #1 will
continue to be re-used for each subsequent request until that node fails. Upon
failure, the selector will round-robin to the next available node, then "stick"
to that node.

This is an ideal strategy for many PHP scripts. Since PHP scripts are
shared-nothing and tend to exit quickly, creating new connections for each
request is often a sub-optimal strategy and introduces a lot of overhead.
Instead, it is better to "stick" to a single connection for the duration of the
script.

By default, this selector randomizes the hosts upon initialization which still
guarantees an even load distribution across the cluster. It changes the
round-robin dynamics from per-request to per-script.

If you are using <<future_mode>>, the "sticky" behavior of this selector is
non-ideal, since all parallel requests go to the same node instead of multiple
nodes in your cluster. When using future mode, the default `RoundRobinSelector`
should be preferred.

If you wish to use this selector, you may do so with:
Expand All @@ -52,9 +64,11 @@ $client = ClientBuilder::create()

Note that the implementation is specified via a namespace path to the class.


=== RandomSelector

This selector simply returns a random node, regardless of state. It is generally just for testing.
This selector returns a random node, regardless of state. It is generally just
for testing.

If you wish to use this selector, you may do so with:

Expand All @@ -67,9 +81,11 @@ $client = ClientBuilder::create()

Note that the implementation is specified via a namespace path to the class.


=== Custom Selector

You can implement your own custom selector. Custom selectors must implement `SelectorInterface`
You can implement your own custom selector. Custom selectors must implement
`SelectorInterface`:

[source,php]
----
Expand Down Expand Up @@ -97,7 +113,9 @@ class MyCustomSelector implements SelectorInterface
----
{zwsp} +

You can then use your custom selector either via object injection or namespace instantiation:

You can then use your custom selector either via object injection or namespace
instantiation:

[source,php]
----
Expand Down

0 comments on commit 2287b8c

Please sign in to comment.