[CORE-10974] Add bandwidth QoS controls #9881

coutinhop · 2025-02-21T01:41:44Z

Description

Add bandwidth limiting for ingress and egress, implemented via tc qdisc configuration based on the {ingress,egress} bandwidth and burst fields of the QoSControls struct in WorkloadEndpoint. In kubernetes, they come from annotations on the kubernetes pods.

Felix FV tests added to verify that the tc qdisc configuration is properly added and that the bandwidth is properly limited with iperf3.

Some code borrowed from the kubernetes bandwidth plugin (credit given).

Related issues/PRs

Todos

Tests
Documentation
Release note

Release Note

Add bandwidth limiting QoS control to workloads.

Reminder for the reviewer

Make sure that this PR has the correct labels and milestone set.

Every PR needs one docs-* label.

docs-pr-required: This change requires a change to the documentation that has not been completed yet.
docs-completed: This change has all necessary documentation completed.
docs-not-required: This change has no user-facing impact and requires no docs.

Every PR needs one release-note-* label.

release-note-required: This PR has user-facing changes. Most PRs should have this label.
release-note-not-required: This PR has no user-facing changes.

Other optional labels:

cherry-pick-candidate: This PR should be cherry-picked to an earlier release. For bug fixes only.
needs-operator-pr: This PR is related to install and requires a corresponding change to the operator.

Add bandwidth limiting for ingress and egress, implemented via `tc qdisc` configuration based on the {ingress,egress} bandwidth and burst fields of the QoSControls struct in WorkloadEndpoint. In kubernetes, they come from annotations on the kubernetes pods. Felix FV tests added to verify that the `tc qdisc` configuration is properly added and that the bandwidth is properly limited with `iperf3`. Some code borrowed from the kubernetes bandwidth plugin (credit given).

nelljerram

Some questions and corrections...

felix/dataplane/linux/bwp/ifb_creator.go

nelljerram · 2025-02-21T13:51:30Z

felix/dataplane/linux/bwp/ifb_creator.go

+		return fmt.Errorf("create ifb qdisc: %s", err)
+	}
+	return nil
+}


Why is there so much asymmetry between the ingress and egress coding?

From my understanding, we can only add one tbf qdisc per interface, and it only works for ingress. So for egress, the workaround is to create an ingress qdisc, an ifb interface, mirror the egress traffic from the original interface to the ifb, and then add the tbf qdisc to the ifb interface...

Providing some information here, if this can help:

LinuxBridge (OpenStack Neutron plugin) uses tbf for ingress and police filter for egress (reflected by the list filter code).

Libvirt uses htb for ingress and police for egress. Notice that the direction is reversed because libvirt is doing rx (ingress) limit request -> police and tx (egress) limit request -> htb, which is correct from the hypervisor perspective, but reversed from the interface/VM's perspective.

Regardless, all of the tools use the token bucket algorithm (tbf).

felix/dataplane/linux/bwp/ifb_creator.go

felix/fv/qos_controls_test.go

- Improve error messages - Fix ingress bandwidth testing - Other misc corrections

It will be replaced by felix functionality. Needs to merge in sync with projectcalico/calico#9881.

fasaxc

Couple of queries from me but it looks like Nell has done a more thorough review already!

I suppose my general question is how do we make sure everything always gets cleaned up, bearing in mind that the CNI plugin can remove the cali device under felix's feet. Will we still clean up any left-over bandwidth control device?

Do we need to worry about races when changing the limits? You remove the limit and then re-add it so there'll be a moment when a pod could send too much data. Might be OK, depending on use case (And it might not be possible to adjust the value on the fly).

felix/dataplane/linux/bwp/bandwidth.go

felix/dataplane/linux/endpoint_mgr.go

coutinhop · 2025-02-26T21:25:22Z

Thank you!

Couple of queries from me but it looks like Nell has done a more thorough review already!

I suppose my general question is how do we make sure everything always gets cleaned up, bearing in mind that the CNI plugin can remove the cali device under felix's feet. Will we still clean up any left-over bandwidth control device?

Yes, the tc qdisc configuration is removed when the interface is removed, regardless of who does it (felix or CNI plugin), so that takes care of the ingress controls. The egress controls (which use a mirror ifb interface) is torn down by felix regardless of who removed the "main" interface...

Do we need to worry about races when changing the limits? You remove the limit and then re-add it so there'll be a moment when a pod could send too much data. Might be OK, depending on use case (And it might not be possible to adjust the value on the fly).

Hm, right, my laziness is probably to blame here, but you make a good point on altering the config in-place instead of removing+readding, will do!

Guard QoS setup against BPF mode. Add cleanup of QoS for when BPF is enabled. Update QoS bandwidth in place instead of remove+readd. Minor test fixes.

felix/dataplane/linux/endpoint_mgr.go

fasaxc · 2025-02-27T09:59:22Z

felix/dataplane/linux/qos_controls.go

-		if oldIngressBw != 0 {
-			err := removeIngressQdisc(old.Name)
+	if oldName != newName || oldIngressBw != newIngressBw || oldIngressBurst != newIngressBurst {
+		if oldName == newName && oldIngressBw != 0 && newIngressBw != 0 {


This logic seems to assume that the old values match the dataplane; we try not to make such assumptions in Felix. So what I'd expect to see is logic that reads back the dataplane state, compares it with the desired state and then "fixes" the dataplane by adding/removing/changing the qdisc.

I'd probabaly use a DeltaTracker for this with a struct that tracks the state of the dataplane:

type qosState struct { IngressBandwidth int IngressBurst int }

then you need to be able to read back those values from the dataplane to populate the "Dataplane()` side of the delta tracker, and populate the "Desired()" side with the values from the datastore.

Then the delta tracker has methods for iterating over all the updates/additions/deletions that are needed.

zhanz1

Introducing myself as I'm making some comments in this PR: This is Zhan from the Bloomberg team. We had a meeting with Nell and Patrick and discussed a parameter that is missing in Neutron Bandwidth QoS API but we'd like to have it in Calico. Please see my comments for more details. Thanks!

zhanz1 · 2025-02-27T14:23:26Z

felix/dataplane/linux/bwp/ifb_creator.go

+		return fmt.Errorf("create ifb qdisc: %s", err)
+	}
+	return nil
+}


Providing some information here, if this can help:

LinuxBridge (OpenStack Neutron plugin) uses tbf for ingress and police filter for egress (reflected by the list filter code).

Libvirt uses htb for ingress and police for egress. Notice that the direction is reversed because libvirt is doing rx (ingress) limit request -> police and tx (egress) limit request -> htb, which is correct from the hypervisor perspective, but reversed from the interface/VM's perspective.

Regardless, all of the tools use the token bucket algorithm (tbf).

zhanz1 · 2025-02-27T14:48:39Z

felix/dataplane/linux/bwp/ifb_creator.go

+	latency := latencyInUsec(latencyInMillis)
+	limitInBytes := limit(uint64(rateInBytes), latency, uint32(burstInBytes))
+
+	qdisc := &netlink.Tbf{


When creating tbf for ingress & egress traffic, we may also need to specify peakrate, which allows users to specify the maximum depletion rate of the bucket (i.e., burst/buffer). @nelljerram This is what I was referring to during our meeting, the parameter that is missing on the Neutron QoS API.

Also, some clarifications on the Neutron QoS parameters (as we discussed in our meeting): Even though it says max_burst_kbps in the table, it should be max-burst-kbits. For example, in the "notes" section of the doc, they wrote that the burst value "will be 800kbit". Looking into the Neutron code and taking LinuxBridge as an example again, even though it says max_burst_kbps when setting up tbf, the underlying function still treats it as burst_kb, not kbps. We will sync up with the Neutron community on our end to get clarification on this.

Currently, Neutron API only supports max-kbps and max-burst-kbits, which maps to:

Neutron tc-tbf/tc-police/tc-htb ----------------------------------------------- max-kbps rate max-burst-kbits burst

According to the token bucket algorithm, if we don't specify peakrate, interfaces will be allowed to transmit/receive burst bytes at the maximum available bandwidth on the machine, assuming the bucket is full at the start. This is less ideal. However, with peakrate, we can control the speed that tokens flow out of the bucket, which is equivalent to controlling the burst bandwidth.

Since Neutron QoS API does not support peakrate yet, we can do something just like the limits on connections - a default value that applies to all. We will work with the Neutron community on getting this in.

In terms of the code here, I do see peakrate is supported in the library you use: tbf, police, and htb (in htb it is called ceil). However, my experience with Go is limited so please double-check and see if this makes sense.

Thanks @zhanz1 for the clarification re: burst, this does make sense. We'll work on adding peakrate to the Tbf config (it is indeed supported, I'll only double check any necessary unit conversion)

@zhanz1 Do you know if someone is working to address the misnaming of max-burst-kbits on the Neutron API? I guess it will probably be best just to explain this prominently in the API documentation, instead of creating all the pain of changing the field name.

@coutinhop Thanks for taking a look!
@nelljerram We do have the plan to ask the Neutron community on this issue in the future, but we haven't prioritize that yet. In the conversion, ideally, we want to make sure this parameter is not used or understood differently for other Neutron drivers and their QoS extensions (e.g., another Neutron driver treats it as kbps instead of kbits - Although I think this is unlikely), even though in the tc case, it is bits instead of bps.

Change logic of maybeUpdateQoSBandwidth() to consider the dataplane state (actual `tc qdisc` config) instead of the datastore values. Fix QoS cleanup on BPF startup to not short-circuit on errors.

nelljerram

Just filing my comments so far. I still have more to look at.

nelljerram · 2025-02-27T15:22:54Z

felix/bpf/tc/attach.go

@@ -317,6 +318,22 @@ func EnsureQdisc(ifaceName string) (bool, error) {
 		log.WithField("iface", ifaceName).Debug("Already have a clsact qdisc on this interface")
 		return true, nil
 	}
+
+	// Clean up QoS config if it exists


I think it's worth commenting here that this is for transition from an iptables dataplane to eBPF, and reflects that it's currently impossible for eBPF and QoS to coexist.

felix/dataplane/linux/bwp/bandwidth.go

felix/dataplane/linux/bwp/ifb_creator.go

nelljerram · 2025-02-28T12:49:17Z

felix/dataplane/linux/bwp/ifb_creator.go

+func GetTBFValues(rateInBits, burstInBits uint64) (rateInBytes uint64, bufferInBytes, limitInBytes uint32) {
+	rateInBytes = rateInBits / 8
+	burstInBytes := burstInBits / 8
+	bufferInBytes = buffer(uint64(rateInBytes), uint32(burstInBytes))


Can we inline buffer here? I find the levels of indirection make this code hard to follow.
Ditto for latencyInUsec and limit in the next two lines.

Ditto for time2tick in buffer.

nelljerram · 2025-02-28T13:26:59Z

felix/dataplane/linux/bwp/ifb_creator.go

+}
+
+func time2Tick(time uint32) uint32 {
+	return uint32(float64(time) * float64(netlink.TickInUsec()))


Also I think good to rename time to timeUsec.

nelljerram · 2025-02-28T13:30:59Z

felix/dataplane/linux/bwp/ifb_creator.go

+}
+
+func buffer(rate uint64, burst uint32) uint32 {
+	return time2Tick(uint32(float64(burst) * float64(netlink.TIME_UNITS_PER_SEC) / float64(rate)))


TIME_UNITS_PER_SEC is actually USEC_PER_SEC, so this is:

BURST_BYTES * USEC_PER_SEC / RATE_BYTES_PER_SEC

= the time in usec for a BURST to be transmitted at RATE

applying time2Tick => the number of ticks needed for a BURST to be transmitted at RATE

I don't see how it's then correct to handle that as a number of bytes.

nelljerram · 2025-02-28T13:38:02Z

felix/dataplane/linux/bwp/ifb_creator.go

+}
+
+func latencyInUsec(latencyInMillis float64) float64 {
+	return float64(netlink.TIME_UNITS_PER_SEC) * (latencyInMillis / 1000.0)


It's OK, I see that TIME_UNITS_PER_SEC is actually USEC_PER_SEC. Can we replace netlink.TIME_UNITS_PER_SEC everywhere by 1000000 ? That will be more comprehensible.

nelljerram · 2025-02-28T13:38:38Z

felix/dataplane/linux/bwp/ifb_creator.go

+}
+
+func latencyInUsec(latencyInMillis float64) float64 {
+	return float64(netlink.TIME_UNITS_PER_SEC) * (latencyInMillis / 1000.0)


(Or, if you really want a constant, by a constant named ONE_MILLION.)

nelljerram · 2025-02-28T13:49:14Z

felix/dataplane/linux/bwp/ifb_creator.go

+	latency := latencyInUsec(latencyInMillis)
+	limitInBytes := limit(uint64(rateInBytes), latency, uint32(burstInBytes))
+
+	qdisc := &netlink.Tbf{


@zhanz1 Do you know if someone is working to address the misnaming of max-burst-kbits on the Neutron API? I guess it will probably be best just to explain this prominently in the API documentation, instead of creating all the pain of changing the field name.

- Generally use TokenBucketState pointers - Use *TokenBucketState more widely

nelljerram

Filing some more comments. I've made a branch at https://github.com/nelljerram/calico/tree/pedro-CORE-10974 with some of the refactorings I've suggested.

felix/fv/qos_controls_test.go

felix/dataplane/linux/qos_controls.go

- Merge suggested changes. A separate 'qos' package still needed to be split from 'intdataplane' in order to avoid an import loop. - Properly attribute and copyright code from the CNI bandwidth plugin - Add a proper update function for kdd workloads to the felix FV infra - Adjust test expectations (using 'Consistently()' where appropriate) - Correctly use '%w' for outputting errors - Inline calculations used for converting rate and burst in bits {per second,} to TokenBucketState{} values - Other misc changes

coutinhop requested review from fasaxc and nelljerram February 21, 2025 01:41

coutinhop self-assigned this Feb 21, 2025

coutinhop requested a review from a team as a code owner February 21, 2025 01:41

marvin-tigera added this to the Calico v3.30.0 milestone Feb 21, 2025

marvin-tigera added release-note-required Change has user-facing impact (no matter how small) docs-pr-required Change is not yet documented labels Feb 21, 2025

coutinhop force-pushed the pedro-CORE-10974 branch 2 times, most recently from 3e04ce5 to 60759ea Compare February 21, 2025 02:11

nelljerram requested changes Feb 21, 2025

View reviewed changes

coutinhop added 2 commits February 21, 2025 10:26

Address review comments:

b5a9295

- Improve error messages - Fix ingress bandwidth testing - Other misc corrections

Fix error logging on ifb teardown

32c284a

coutinhop added a commit to coutinhop/operator that referenced this pull request Feb 22, 2025

[CORE-10974] Remove bandwidth plugin chaining

b39da31

It will be replaced by felix functionality. Needs to merge in sync with projectcalico/calico#9881.

coutinhop mentioned this pull request Feb 22, 2025

[CORE-10974] Remove bandwidth plugin chaining tigera/operator#3784

Open

5 tasks

fasaxc requested changes Feb 26, 2025

View reviewed changes

felix/dataplane/linux/bwp/bandwidth.go Outdated Show resolved Hide resolved

felix/dataplane/linux/endpoint_mgr.go Outdated Show resolved Hide resolved

coutinhop removed the release-note-required Change has user-facing impact (no matter how small) label Feb 26, 2025

marvin-tigera added the release-note-required Change has user-facing impact (no matter how small) label Feb 26, 2025

Address review comments:

00ab648

Guard QoS setup against BPF mode. Add cleanup of QoS for when BPF is enabled. Update QoS bandwidth in place instead of remove+readd. Minor test fixes.

coutinhop requested review from fasaxc and nelljerram February 27, 2025 00:07

Run QoS FVs on kubernetes and etcd

f52ef71

fasaxc requested changes Feb 27, 2025

View reviewed changes

zhanz1 reviewed Feb 27, 2025

View reviewed changes

Further address review comments:

ddc0751

Change logic of maybeUpdateQoSBandwidth() to consider the dataplane state (actual `tc qdisc` config) instead of the datastore values. Fix QoS cleanup on BPF startup to not short-circuit on errors.

nelljerram reviewed Feb 28, 2025

View reviewed changes

nelljerram added 3 commits February 28, 2025 15:56

Put all QoS coding into one file

f0f374a

Rename QoSState to TokenBucketState

6e424ff

Refactoring 1

d063872

- Generally use TokenBucketState pointers - Use *TokenBucketState more widely

Use TokenBucketState all through stack

45a7102

nelljerram reviewed Feb 28, 2025

View reviewed changes

nelljerram and others added 2 commits February 28, 2025 17:10

Reorganize maybeUpdateQoSBandwidth a bit

772b494

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CORE-10974] Add bandwidth QoS controls #9881

[CORE-10974] Add bandwidth QoS controls #9881

coutinhop commented Feb 21, 2025 •

edited

Loading

nelljerram left a comment

nelljerram Feb 21, 2025

coutinhop Feb 21, 2025

zhanz1 Feb 27, 2025

fasaxc left a comment

coutinhop commented Feb 26, 2025 •

edited

Loading

fasaxc Feb 27, 2025

zhanz1 left a comment

zhanz1 Feb 27, 2025

zhanz1 Feb 27, 2025 •

edited

Loading

coutinhop Feb 28, 2025

nelljerram Feb 28, 2025

zhanz1 Feb 28, 2025 •

edited

Loading

nelljerram left a comment

nelljerram Feb 27, 2025

nelljerram Feb 28, 2025

nelljerram Feb 28, 2025

coutinhop Mar 1, 2025

nelljerram Feb 28, 2025

nelljerram Feb 28, 2025

nelljerram Feb 28, 2025

nelljerram Feb 28, 2025

nelljerram Feb 28, 2025

nelljerram Feb 28, 2025

nelljerram left a comment

[CORE-10974] Add bandwidth QoS controls #9881

Are you sure you want to change the base?

[CORE-10974] Add bandwidth QoS controls #9881

Conversation

coutinhop commented Feb 21, 2025 • edited Loading

Description

Related issues/PRs

Todos

Release Note

Reminder for the reviewer

nelljerram left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fasaxc left a comment

Choose a reason for hiding this comment

coutinhop commented Feb 26, 2025 • edited Loading

Choose a reason for hiding this comment

zhanz1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhanz1 Feb 27, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhanz1 Feb 28, 2025 • edited Loading

Choose a reason for hiding this comment

nelljerram left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nelljerram left a comment

Choose a reason for hiding this comment

coutinhop commented Feb 21, 2025 •

edited

Loading

coutinhop commented Feb 26, 2025 •

edited

Loading

zhanz1 Feb 27, 2025 •

edited

Loading

zhanz1 Feb 28, 2025 •

edited

Loading