Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP]Support nested BGP peering with calico-nodes running in local kubevirt VM pods #9875

Open
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

song-jiang
Copy link
Member

@song-jiang song-jiang commented Feb 20, 2025

Description

This PR adds support for allowing calico-node to peer with calico-node instances running inside KubeVirt VM pods locally, based on the labels of the VM pods.

API changes:

  • New field LocalWorkloadSelector to BGPPeer resource.
  • New field localWorkloadPeeringIPV4 and localWorkloadPeeringIPV4 to BGPConfigurations.

Felix changes:

  • It watches BGPPeer and calculates local workloads selected by the BGPPeer.
  • It populates endpoint status files with peering information.
  • It add localWorkloadPeeringIP to the network interface of the workload selected by the BGPPeer.

Confd changes

  • It watches endpoint status files updated by Felix.
  • It reconfigures bird.cfg/bird6.cfg based on the peering information read from endpoint status files.

libcalico-go changes

  • Added status-file-writer and status-file-watcher.

Related issues/PRs

Todos

  • Tests
  • Documentation
  • Release note

Release Note

Support nested BGP peering with calico-nodes running in local kubevirt VM pods.

Reminder for the reviewer

Make sure that this PR has the correct labels and milestone set.

Every PR needs one docs-* label.

  • docs-pr-required: This change requires a change to the documentation that has not been completed yet.
  • docs-completed: This change has all necessary documentation completed.
  • docs-not-required: This change has no user-facing impact and requires no docs.

Every PR needs one release-note-* label.

  • release-note-required: This PR has user-facing changes. Most PRs should have this label.
  • release-note-not-required: This PR has no user-facing changes.

Other optional labels:

  • cherry-pick-candidate: This PR should be cherry-picked to an earlier release. For bug fixes only.
  • needs-operator-pr: This PR is related to install and requires a corresponding change to the operator.

@song-jiang song-jiang requested a review from a team as a code owner February 20, 2025 11:35
@marvin-tigera marvin-tigera added this to the Calico v3.30.0 milestone Feb 20, 2025
@marvin-tigera marvin-tigera added release-note-required Change has user-facing impact (no matter how small) docs-pr-required Change is not yet documented labels Feb 20, 2025
Copy link
Contributor

@aaaaaaaalex aaaaaaaalex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't spot any glaring issues (though I understand you know of one!)

{{- end}}
# For peer {{.Key}}
{{- if eq $data.ip ($node_ip) }}
# Skipping ourselves ({{$node_ip}})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would the node itself show up in the local WEP peers?

logCxt.Debug("Workload endpoint status file created")
epStatus, err := epstatus.GetWorkloadEndpointStatusFromFile(fileName)
if err != nil {
logCxt.WithError(err).Error("Failed to read endpoint status from file, it may just be created.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like this might be spammy since we'll always race with felix writing. can you defer the error (if the file is still bad after >5s then log an error).

if len(epStatus.Ipv4Nets) != 0 {
ip, _, err := net.ParseCIDR(epStatus.Ipv4Nets[0])
if err != nil {
log.WithError(err).Error("Workload endpoint status does not have a valid Ipv4Nets, ignore it for now")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably use Warn for this since you're handling the problem (by ignoring it)

"github.com/projectcalico/calico/libcalico-go/lib/backend/model"
)

var _ = Describe("ActiveBGPPeerCalculator", func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This component should be tested through the calculation graph FV suite so that we get the benefits of its "fuzzing" approach.

@@ -234,6 +243,8 @@ func newEndpointManager(
floatingIPsEnabled bool,
nft bool,
) *endpointManager {
nlHandle, _ := netlink.NewHandle()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look right (ignoring the error, not shimmable). Use a netlinkshim.HandleManager, which has a mock alternative.

}

// Peer information that we track for each active local endpoint.
type EpPeerData struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type EpPeerData struct {
type EndpointBGPPeer struct {

Think spelling it out would help in the other files where this name is seen.

var err error
// If LocalBGPPeerIP has been updated, we need to remove old peer IP from all workload interfaces.
for ifaceName := range m.activeWlIfaceNameToID {
err = m.removeBGPPeerIPOnInterface(ifaceName, m.localBGPPeerIP)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suspicious that we need to remove the old IP specifically; what if the desired IP changes while Felix is restarting? Seems we'd get stuck


addrs, err := m.nlHandle.AddrList(link, family)
if err != nil {
// Not sure why this would happen, but pass it up.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link might be deleted under you by CNI plugin

return nil
}

func lookupLink(nlHandle netlinkHandle, name string) (link netlink.Link, err error, notFound bool) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should use errors.Is(err, netlink.LinkNotFoundError) in the caller; that's more common to see

if !errors.Is(err, fs.ErrExist) {
lastError = err
logrus.Error("IterActionNoOp")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dev error left in?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-pr-required Change is not yet documented release-note-required Change has user-facing impact (no matter how small)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants