Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

6964 mimir istio fixes #6965

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Conversation

Paul424
Copy link

@Paul424 Paul424 commented Dec 19, 2023

What this PR does

Solves networking issues when running mimir injected (istio). Following issue's are addressed:

  1. When query_scheduler is enabled the helm chart does not render the headless service for the query-frontend, but istio needs it to register the (headless) svc route that the querier is using to report back query results.
  2. When memcached is enabled the svc cannot connect to the caches because the port naming prevent istio to do proper protocol selection.
  3. grpc ports do need appProtocol setting to make sure istio selects the correct protocol.

Which issue(s) this PR fixes or relates to

Fixes #6964

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

…ing in the sidecar when querier tries to send query results back to the frontend (using the podip)
…oper protocol selection in istio; allow to optionally set appProtocol to tcp
@Paul424 Paul424 requested a review from a team as a code owner December 19, 2023 16:06
@CLAassistant
Copy link

CLAassistant commented Dec 19, 2023

CLA assistant check
All committers have signed the CLA.

…s enabled for istio to register the podip's which is required for querier to report back the results
Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution! We don't have much experience with istio, so this contribution is very valuable.

I left a couple of comments and questions, but the core of the change makes sense to me

CHANGELOG.md Outdated
@@ -68,6 +68,7 @@
* `store_gateway_zone_c_node_affinity_matchers`
* [FEATURE] Ingester: Allow automated zone-by-zone downscaling, that can be enabled via the `ingester_automated_downscale_enabled` flag. It is disabled by default. #6850
* [BUGFIX] Update memcached-exporter to 0.14.1 due to CVE-2023-39325. #6861
* [BUGFIX] Always generate query-frontend headless service (otherwise istio doesn't register the pod ip's for querier to report back the results) #6964
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* [BUGFIX] Always generate query-frontend headless service (otherwise istio doesn't register the pod ip's for querier to report back the results) #6964
* [BUGFIX] Always generate query-frontend headless service (otherwise, istio doesn't register the pod IPs for querier to report back the results). #6965

we usually put the PR number in the changelog entry, so it's easier to trace back the change

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me make a fix there...

@@ -40,6 +40,7 @@ Entries should include a reference to the Pull Request that introduced the chang
* [BUGFIX] Let the unified gatway/nginx config listen on IPv6 as well. Followup to #5948. #6204
* [BUGFIX] Quote `checksum/config` when using external config. This allows setting `externalConfigVersion` to numeric values. #6407
* [BUGFIX] Update memcached-exporter to 0.14.1 due to CVE-2023-39325. #6861
* [BUGFIX] Allow Mimir to run injected (istio); optionally add appProtocol to memcached svc's and grpc ports and always generate query-frontend headless service (otherwise istio doesn't register the pod ip's for querier to report back the results) #6964
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* [BUGFIX] Allow Mimir to run injected (istio); optionally add appProtocol to memcached svc's and grpc ports and always generate query-frontend headless service (otherwise istio doesn't register the pod ip's for querier to report back the results) #6964
* [BUGFIX] Allow Mimir to run injected (istio); optionally add appProtocol to memcached services and gRPC ports and always generate query-frontend headless service (otherwise, istio doesn't register the pod IPs for querier to report back the results). #6965

Comment on lines 2141 to 2143
appProtocol:
# -- Set the optional service protocol. Ex: "tcp"
client: null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it took me a bit to understand the format of this parameter. Is client meant to represent the name of the port on the service? If so let's call it memcached-client, like it actually is on the Service manifest.

Perhaps adding a more explicit doc on the field would also help; something like

Suggested change
appProtocol:
# -- Set the optional service protocol. Ex: "tcp"
client: null
appProtocol:
# -- Set the optional service appProtocol for the memcached-client port. Ex: "tcp"
memcached-client: null

Also, can you apply this to all the ports in this PR?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i had to somehow refer to the port, and indeed client isn't the best choice there, i will rework.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aha, that makes more sense now. Sorry for making you change this twice, but can you add a comment explaining what the null value means too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, can you make it explicit that the keys in appProtocol refer to individual ports of the service? A common on each key or on appProtocol itself should be sufficient

Comment on lines 2141 to 2143
appProtocol:
# -- Set the optional service protocol. Ex: "tcp"
client: null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this as a separate comment because it's a different topic: can you help me understand why we wouldn't want to set the appProtocol by default? I tried setting it to grpc in a cluster without istio and nothing broke 😄 but that's not a very rigorous test

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well actually I took this approach from the tempo charts, both the values structure and the choice of not adding it be default. I guess this is the most flexible way of doing it, if a deployment has multiple ports you can just extend the structure and when you run a different mesh requiring a different app-protocol you can easily configure it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough - consistency with tempo and flexibility with other meshes. Would have been time to be compatible by default with istio, but maybe we're not there.

@dimitarvdimitrov
Copy link
Contributor

The CI seems to be failing. The lint-jsonnet step is failing because the rendered kubernetes manifests for jsonnet haven't been updated. You can do that with make build-jsonnet-tests

However, lint-helm seems to be failing with some helm templating issue. I can reproduce the failure locally with running make build-helm-tests. After fixing them you should rerun that command and commit the diff.

@duricanikolic
Copy link
Contributor

The CHANGELOG has just been cut to prepare for the next Mimir release. Please rebase main and eventually move the CHANGELOG entry added / updated in this PR to the top of the CHANGELOG document. Thanks!

@nbjohnson
Copy link

@Paul424 Are you able to clean up this MR according to the requests so this is able to get merged? Having better istio support we be a great feature to have and I and I'm sure many others look forward to getting this merged in and having it available

@Paul424
Copy link
Author

Paul424 commented Mar 26, 2024

@nbjohnson I would love to, but I no longer have access to the dev/env I used to work on this PR. Give me some time to setup a new environment... or if someone else wants to fork and complete the PR? fine for me.
Paul

Copy link
Contributor

github-actions bot commented Jan 8, 2025

Thank you for your contribution. This pull request has been marked as stale because it has had no activity in the last 150 days. It will be closed in 30 days if there is no further activity. If you need more time, you can add a comment to the PR.

@github-actions github-actions bot added the stale label Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable Mimir to run injected (istio)
7 participants