Skip to content

Commit

Permalink
cluster: add aggregate cluster (envoyproxy#7967)
Browse files Browse the repository at this point in the history
Signed-off-by: Yan Xue <[email protected]>
  • Loading branch information
yxue authored and mattklein123 committed Nov 26, 2019
1 parent 67a760b commit aaafd6b
Show file tree
Hide file tree
Showing 23 changed files with 1,791 additions and 239 deletions.
3 changes: 2 additions & 1 deletion CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ extensions/filters/common/original_src @snowp @klarose
# adaptive concurrency limit extension.
/*/extensions/filters/http/adaptive_concurrency @tonya11en @mattklein123
# http inspector
/*/extensions/filters/listener/http_inspector @crazyxy @PiotrSikora @lizan
/*/extensions/filters/listener/http_inspector @yxue @PiotrSikora @lizan
# attribute context
/*/extensions/filters/common/expr @kyessenov @yangminzhu
# webassembly common extension
Expand Down Expand Up @@ -91,3 +91,4 @@ extensions/filters/common/original_src @snowp @klarose
/*/extensions/filters/network/tcp_proxy @alyssawilk @zuercher
/*/extensions/filters/network/echo @htuch @alyssawilk
/*/extensions/filters/udp/udp_proxy @mattklein123 @danzh2010
/*/extensions/clusters/aggregate @yxue @snowp
1 change: 1 addition & 0 deletions api/docs/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ proto_library(
"//envoy/api/v2/route:pkg",
"//envoy/config/accesslog/v2:pkg",
"//envoy/config/bootstrap/v2:pkg",
"//envoy/config/cluster/aggregate/v2alpha:pkg",
"//envoy/config/cluster/dynamic_forward_proxy/v2alpha:pkg",
"//envoy/config/cluster/redis:pkg",
"//envoy/config/common/dynamic_forward_proxy/v2alpha:pkg",
Expand Down
7 changes: 7 additions & 0 deletions api/envoy/config/cluster/aggregate/v2alpha/BUILD
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# DO NOT EDIT. This file is generated by tools/proto_sync.py.

load("@envoy_api//bazel:api_build_system.bzl", "api_proto_package")

licenses(["notice"]) # Apache 2

api_proto_package()
20 changes: 20 additions & 0 deletions api/envoy/config/cluster/aggregate/v2alpha/cluster.proto
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
syntax = "proto3";

package envoy.config.cluster.aggregate.v2alpha;

option java_outer_classname = "ClusterProto";
option java_multiple_files = true;
option java_package = "io.envoyproxy.envoy.config.cluster.aggregate.v2alpha";

import "validate/validate.proto";

// [#protodoc-title: Aggregate cluster configuration]

// Configuration for the aggregate cluster. See the :ref:`architecture overview
// <arch_overview_aggregate_cluster>` for more information.
// [#extension: envoy.clusters.aggregate]
message ClusterConfig {
// Load balancing clusters in aggregate cluster. Clusters are prioritized based on the order they
// appear in this list.
repeated string clusters = 1 [(validate.rules).repeated = {min_items: 1}];
}
7 changes: 7 additions & 0 deletions api/envoy/config/cluster/aggregate/v3alpha/BUILD
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# DO NOT EDIT. This file is generated by tools/proto_sync.py.

load("@envoy_api//bazel:api_build_system.bzl", "api_proto_package")

licenses(["notice"]) # Apache 2

api_proto_package()
20 changes: 20 additions & 0 deletions api/envoy/config/cluster/aggregate/v3alpha/cluster.proto
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
syntax = "proto3";

package envoy.config.cluster.aggregate.v3alpha;

option java_outer_classname = "ClusterProto";
option java_multiple_files = true;
option java_package = "io.envoyproxy.envoy.config.cluster.aggregate.v3alpha";

import "validate/validate.proto";

// [#protodoc-title: Aggregate cluster configuration]

// Configuration for the aggregate cluster. See the :ref:`architecture overview
// <arch_overview_aggregate_cluster>` for more information.
// [#extension: envoy.clusters.aggregate]
message ClusterConfig {
// Load balancing clusters in aggregate cluster. Clusters are prioritized based on the order they
// appear in this list.
repeated string clusters = 1 [(validate.rules).repeated = {min_items: 1}];
}
1 change: 1 addition & 0 deletions docs/root/api-v2/config/cluster/cluster.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,6 @@ Cluster
:glob:
:maxdepth: 2

aggregate/v2alpha/*
dynamic_forward_proxy/v2alpha/*
redis/*
145 changes: 145 additions & 0 deletions docs/root/intro/arch_overview/upstream/aggregate_cluster.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
.. _arch_overview_aggregate_cluster:

Aggregate Cluster
=================

Aggregate cluster is used for failover between clusters with different configuration, e.g., from EDS
upstream cluster to STRICT_DNS upstream cluster, from cluster using ROUND_ROBIN load balancing
policy to cluster using MAGLEV, from cluster with 0.1s connection timeout to cluster with 1s
connection timeout, etc. Aggregate cluster loosely couples multiple clusters by referencing their
name in the :ref:`configuration <envoy_api_msg_config.cluster.aggregate.v2alpha.ClusterConfig>`. The
fallback priority is defined implicitly by the ordering in the :ref:`clusters list <envoy_api_field_config.cluster.aggregate.v2alpha.ClusterConfig.clusters>`.
Aggregate cluster uses tiered load balancing. The load balancer chooses cluster and piority first
and then delegates the load balancing to the load balancer of the selected cluster. The top level
load balancer reuses the existing load balancing algorithm by linearizing the priority set of
multiple clusters into one.

Linearize Priority Set
----------------------

Upstream hosts are divided into multiple :ref:`priority levels <arch_overview_load_balancing_priority_levels>`
and each priority level contains a list of healthy, degraded and unhealthy hosts. Linearization is
used to simplify the host selection during load balancing by merging priority levels from multiple
clusters. For example, primary cluster has 3 priority levels, secondary has 2 and tertiary has 2 and
the failover ordering is primary, secondary, tertiary.

+-----------+----------------+-------------------------------------+
| Cluster | Priority Level | Priority Level after Linearization |
+===========+================+=====================================+
| Primary | 0 | 0 |
+-----------+----------------+-------------------------------------+
| Primary | 1 | 1 |
+-----------+----------------+-------------------------------------+
| Primary | 2 | 2 |
+-----------+----------------+-------------------------------------+
| Secondary | 0 | 3 |
+-----------+----------------+-------------------------------------+
| Secondary | 1 | 4 |
+-----------+----------------+-------------------------------------+
| Tertiary | 0 | 5 |
+-----------+----------------+-------------------------------------+
| Tertiary | 1 | 6 |
+-----------+----------------+-------------------------------------+

Example
-------

A sample aggregate cluster configuration could be:

.. code-block:: yaml
name: aggregate_cluster
connect_timeout: 0.25s
lb_policy: CLUSTER_PROVIDED
cluster_type:
name: envoy.clusters.aggregate
typed_config:
"@type": type.googleapis.com/envoy.config.cluster.aggregate.v2alpha.ClusterConfig
clusters:
# cluster primary, secondary and tertiary should be defined outside.
- primary
- secondary
- tertiary
Note: :ref:`PriorityLoad retry plugins <envoy_api_field_route.RetryPolicy.retry_priority>` won't
work for aggregate cluster because the aggregate load balancer will override the *PriorityLoad*
during load balancing.


Load Balancing Example
----------------------

Aggregate cluster uses tiered load balancing algorithm and the top tier is distributing traffic to
different clusters according to the health score across all :ref:`priorities <arch_overview_load_balancing_priority_levels>`
in each cluster. The aggregate cluster in this section includes two clusters which is different from
what the above configuration describes.

+-----------------------------------------------------------------------------------------------------------------------+--------------------+----------------------+
| Cluster | Traffic to Primary | Traffic to Secondary |
+=======================================================================+===============================================+====================+======================+
| Primary | Secondary | |
+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+ +
| P=0 Healthy Endpoints | P=1 Healthy Endpoints | P=2 Healthy Endpoints | P=0 Healthy Endpoints | P=1 Healthy Endpoints | |
+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--------------------+----------------------+
| 100% | 100% | 100% | 100% | 100% | 100% | 0% |
+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--------------------+----------------------+
| 72% | 100% | 100% | 100% | 100% | 100% | 0% |
+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--------------------+----------------------+
| 71% | 1% | 0% | 100% | 100% | 100% | 0% |
+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--------------------+----------------------+
| 71% | 0% | 0% | 100% | 100% | 99% | 1% |
+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--------------------+----------------------+
| 50% | 0% | 0% | 50% | 0% | 70% | 30% |
+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--------------------+----------------------+
| 20% | 20% | 10% | 25% | 25% | 70% | 30% |
+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--------------------+----------------------+
| 20% | 0% | 0% | 20% | 0% | 50% | 50% |
+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--------------------+----------------------+
| 0% | 0% | 0% | 100% | 0% | 0% | 100% |
+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--------------------+----------------------+
| 0% | 0% | 0% | 72% | 0% | 0% | 100% |
+-----------------------+-----------------------+-----------------------+-----------------------+-----------------------+--------------------+----------------------+

Note: The above load balancing uses default :ref:`overprovisioning factor <arch_overview_load_balancing_overprovisioning_factor>`
which is 1.4 which means if 80% of the endpoints in a priority level are healthy, that level is
still considered fully healthy because 80 * 1.4 > 100.

The example shows how the aggregate cluster level load balancer selects the cluster. E.g., healths
of {{20, 20, 10}, {25, 25}} would result in a priority load of {{28%, 28%, 14%}, {30%, 0%}} of
traffic. When normalized total health drops below 100, traffic is distributed after normalizing the
levels' health scores to that sub-100 total. E.g. healths of {{20, 0, 0}, {20, 0}} (yielding a
normalized total health of 56) would be normalized and each cluster will receive 20 * 1.4 / 56 = 50%
of the traffic which results in a priority load of {{50%, 0%, 0%}, {50%, 0%, 0%}} of traffic.

The load balancer reuses priority level logic to help with the cluster selection. The priority level
logic works with integer health scores. The health score of a level is (percent of healthy hosts in
the level) * (overprovisioning factor), capped at 100%. P=0 endpoints receive level 0's health
score percent of the traffic, with the rest flowing to P=1 (assuming P=1 is 100% healthy - more on
that later). The integer percents of traffic that each cluster receives are collectively called the
system's "cluster priority load". For instance, for primary cluster, when 20% of P=0 endpoints are
healthy, 20% of P=1 endpoints are healthy, and 10% of P=2 endpoints are healthy; for secondary, when
25% of P=0 endpoints are healthy and 25% of P=1 endpoints are healthy. The primary cluster will
receive 20% * 1.4 + 20% * 1.4 + 10% * 1.4 = 70% of the traffic. The secondary cluster will receive
min(100 - 70, 25% * 1.4 + 25% * 1.4) = 30% of the traffic. The traffic to all clusters sum up to
100. The normalized health score and priority load are pre-computed before selecting the cluster and
priority.

To sum this up in pseudo algorithms:

::

health(P_X) = min(100, 1.4 * 100 * healthy_P_X_backends / total_P_X_backends), where
total_P_X_backends is the number of backends for priority P_X after linearization
normalized_total_health = min(100, Σ(health(P_0)...health(P_X)))
cluster_priority_load(C_0) = min(100, Σ(health(P_0)...health(P_k)) * 100 / normalized_total_health),
where P_0...P_k belong to C_0
cluster_priority_load(C_X) = min(100 - Σ(priority_load(C_0)..priority_load(C_X-1)),
Σ(health(P_x)...health(P_X)) * 100 / normalized_total_health),
where P_x...P_X belong to C_X
map from priorities to clusters:
P_0 ... P_k ... ...P_x ... P_X
^ ^ ^ ^
cluster C_0 cluster C_X

The second tier is delegating the load balancing to the cluster selected in the first step and the
cluster could use any load balancing algorithms specified by :ref:`load balancer type <arch_overview_load_balancing_types>`.
1 change: 1 addition & 0 deletions docs/root/intro/arch_overview/upstream/upstream.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ Upstream clusters
health_checking
connection_pooling
load_balancing/load_balancing
aggregate_cluster
outlier
circuit_breaking
upstream_filters
1 change: 1 addition & 0 deletions docs/root/intro/version_history.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Version history
* api: remove all support for v1
* buffer: remove old implementation
* build: official released binary is now built against libc++.
* cluster: added :ref: `aggregate cluster <arch_overview_aggregate_cluster>` that allows load balancing between clusters.
* ext_authz: added :ref:`configurable ability<envoy_api_field_config.filter.http.ext_authz.v2.ExtAuthz.include_peer_certificate>` to send the :ref:`certificate<envoy_api_field_service.auth.v2.AttributeContext.Peer.certificate>` to the `ext_authz` service.
* health check: gRPC health checker sets the gRPC deadline to the configured timeout duration.
* http: added the ability to sanitize headers nominated by the Connection header. This new behavior is guarded by envoy.reloadable_features.connection_header_sanitization which defaults to true.
Expand Down
25 changes: 25 additions & 0 deletions source/extensions/clusters/aggregate/BUILD
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
licenses(["notice"]) # Apache 2

load(
"//bazel:envoy_build_system.bzl",
"envoy_cc_extension",
"envoy_package",
)

envoy_package()

envoy_cc_extension(
name = "cluster",
srcs = ["cluster.cc"],
hdrs = [
"cluster.h",
"lb_context.h",
],
security_posture = "requires_trusted_downstream_and_upstream",
deps = [
"//source/common/upstream:cluster_factory_lib",
"//source/common/upstream:upstream_includes",
"//source/extensions/clusters:well_known_names",
"@envoy_api//envoy/config/cluster/aggregate/v2alpha:pkg_cc_proto",
],
)
Loading

0 comments on commit aaafd6b

Please sign in to comment.