Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcd DNS query to host DNS #1402

Open
mzaferyahsi opened this issue Dec 7, 2023 · 7 comments
Open

etcd DNS query to host DNS #1402

mzaferyahsi opened this issue Dec 7, 2023 · 7 comments
Labels

Comments

@mzaferyahsi
Copy link

What happened?

When I've installed a fresh vcluster using the helm chart with vcluster-k8s, I noticed that my main pi-hole DNS server (outside of the cluster) is being queried for etcd nodes. I believe this is because coredns cannot resolve and therefore forwards the request to upstream server.

Logs from pi-hole

Dec  7 23:04:53 dnsmasq[206427]: reply arelon-etcd-17 is NXDOMAIN
Dec  7 23:04:53 dnsmasq[206427]: query[A] arelon-etcd-17.arelon-etcd-headless.alroze.cloud from 10.21.1.12
Dec  7 23:04:53 dnsmasq[206427]: cached arelon-etcd-17.arelon-etcd-headless.alroze.cloud is NXDOMAIN
Dec  7 23:04:53 dnsmasq[206427]: query[A] arelon-etcd-17.arelon-etcd-headless from 10.21.1.12
Dec  7 23:04:53 dnsmasq[206427]: cached arelon-etcd-17.arelon-etcd-headless is NXDOMAIN
Dec  7 23:04:53 dnsmasq[206427]: query[AAAA] arelon-etcd-17.arelon-etcd-headless.vcluster-arelon from 10.21.1.12
Dec  7 23:04:53 dnsmasq[206427]: cached arelon-etcd-17.arelon-etcd-headless.vcluster-arelon is NXDOMAIN
Dec  7 23:04:53 dnsmasq[206427]: query[A] arelon-etcd-17.arelon-etcd-headless.vcluster-arelon from 10.21.1.12
Dec  7 23:04:53 dnsmasq[206427]: cached arelon-etcd-17.arelon-etcd-headless.vcluster-arelon is NXDOMAIN
Dec  7 23:04:53 dnsmasq[206427]: query[AAAA] arelon-etcd-18.alroze.cloud from 10.21.1.12
Dec  7 23:04:53 dnsmasq[206427]: cached arelon-etcd-18.alroze.cloud is NXDOMAIN
Dec  7 23:04:53 dnsmasq[206427]: query[A] arelon-etcd-18.alroze.cloud from 10.21.1.12
Dec  7 23:04:53 dnsmasq[206427]: cached arelon-etcd-18.alroze.cloud is NXDOMAIN
Dec  7 23:04:53 dnsmasq[206427]: query[AAAA] arelon-etcd-18 from 10.21.1.12
Dec  7 23:04:53 dnsmasq[206427]: forwarded arelon-etcd-18 to 10.32.0.1
Dec  7 23:04:53 dnsmasq[206427]: reply arelon-etcd-18 is NXDOMAIN
Dec  7 23:04:53 dnsmasq[206427]: query[AAAA] arelon-etcd-18.arelon-etcd-headless.alroze.cloud from 10.21.1.12
Dec  7 23:04:53 dnsmasq[206427]: cached arelon-etcd-18.arelon-etcd-headless.alroze.cloud is NXDOMAIN
Dec  7 23:04:53 dnsmasq[206427]: query[A] arelon-etcd-18.arelon-etcd-headless.alroze.cloud from 10.21.1.12
Dec  7 23:04:53 dnsmasq[206427]: cached arelon-etcd-18.arelon-etcd-headless.alroze.cloud is NXDOMAIN
Dec  7 23:04:53 dnsmasq[206427]: query[A] arelon-etcd-18.arelon-etcd-headless from 10.21.1.12
Dec  7 23:04:53 dnsmasq[206427]: cached arelon-etcd-18.arelon-etcd-headless is NXDOMAIN
Dec  7 23:04:53 dnsmasq[206427]: query[AAAA] arelon-etcd-18.arelon-etcd-headless.vcluster-arelon.alroze.cloud from 10.21.1.12
Dec  7 23:04:53 dnsmasq[206427]: cached arelon-etcd-18.arelon-etcd-headless.vcluster-arelon.alroze.cloud is NXDOMAIN
Dec  7 23:04:53 dnsmasq[206427]: query[A] arelon-etcd-18.arelon-etcd-headless.vcluster-arelon.alroze.cloud from 10.21.1.12
Dec  7 23:04:53 dnsmasq[206427]: cached arelon-etcd-18.arelon-etcd-headless.vcluster-arelon.alroze.cloud is NXDOMAIN
Dec  7 23:04:53 dnsmasq[206427]: query[AAAA] arelon-etcd-18.arelon-etcd-headless.vcluster-arelon from 10.21.1.12
Dec  7 23:04:53 dnsmasq[206427]: cached arelon-etcd-18.arelon-etcd-headless.vcluster-arelon is NXDOMAIN
Dec  7 23:04:53 dnsmasq[206427]: query[AAAA] arelon-etcd-19.alroze.cloud from 10.21.1.12
Dec  7 23:04:53 dnsmasq[206427]: cached arelon-etcd-19.alroze.cloud is NXDOMAIN
Dec  7 23:04:53 dnsmasq[206427]: query[A] arelon-etcd-19.alroze.cloud from 10.21.1.12
Dec  7 23:04:53 dnsmasq[206427]: cached arelon-etcd-19.alroze.cloud is NXDOMAIN
Dec  7 23:04:53 dnsmasq[206427]: query[AAAA] arelon-etcd-19 from 10.21.1.12
Dec  7 23:04:53 dnsmasq[206427]: forwarded arelon-etcd-19 to 10.32.0.1
Dec  7 23:04:53 dnsmasq[206427]: reply arelon-etcd-19 is NXDOMAIN
Dec  7 23:04:53 dnsmasq[206427]: query[A] arelon-etcd-19.arelon-etcd-headless.alroze.cloud from 10.21.1.12
Dec  7 23:04:53 dnsmasq[206427]: cached arelon-etcd-19.arelon-etcd-headless.alroze.cloud is NXDOMAIN
Dec  7 23:04:53 dnsmasq[206427]: query[AAAA] arelon-etcd-19.arelon-etcd-headless from 10.21.1.12
Dec  7 23:04:53 dnsmasq[206427]: cached arelon-etcd-19.arelon-etcd-headless is NXDOMAIN
Dec  7 23:04:53 dnsmasq[206427]: query[AAAA] arelon-etcd-19.arelon-etcd-headless.vcluster-arelon.alroze.cloud from 10.21.1.12
Dec  7 23:04:53 dnsmasq[206427]: cached arelon-etcd-19.arelon-etcd-headless.vcluster-arelon.alroze.cloud is NXDOMAIN
Dec  7 23:04:53 dnsmasq[206427]: query[AAAA] arelon-etcd-19.arelon-etcd-headless.vcluster-arelon from 10.21.1.12
Dec  7 23:04:53 dnsmasq[206427]: cached arelon-etcd-19.arelon-etcd-headless.vcluster-arelon is NXDOMAIN
Dec  7 23:04:53 dnsmasq[206427]: query[A] arelon-etcd-2.alroze.cloud from 10.21.1.12
Dec  7 23:04:53 dnsmasq[206427]: cached arelon-etcd-2.alroze.cloud is NXDOMAIN
Dec  7 23:04:53 dnsmasq[206427]: query[AAAA] arelon-etcd-2.alroze.cloud from 10.21.1.12
Dec  7 23:04:53 dnsmasq[206427]: cached arelon-etcd-2.alroze.cloud is NXDOMAIN

What did you expect to happen?

DNS queries to not to leak to my pi-hole.

How can we reproduce it (as minimally and precisely as possible)?

  1. Setup pi-hole

  2. Deploy k8s cluster with kubespray
    a. Use DNS server as the pi-hole IPs
    b. Use ndots:2
    c. Use cluster_name: cluster.local

  3. Deploy vcluster-k8s with following values.yaml

# Enable HA mode
enableHA: true

# Scale up syncer replicas
syncer:
  replicas: 3
  extraArgs:
    - "--tls-san=cluster.arelon.xxx.xxx"
    - "--tls-san=10.21.2.10"

ingress:
  enabled: true
  annotations:
    kubernetes.io/ingress.class: nginx

# Scale up etcd
etcd:
  replicas: 3

# Scale up controller manager
controller:
  replicas: 3

# Scale up api server
api:
  replicas: 3

# Scale up DNS server
coredns:
  replicas: 3

storage:
  className: longhorn

sync:
  secrets:
    enabled: true
  persistentvolumes:
    enabled: true
  volumesnapshots:
    enabled: false
  serviceaccounts:
    enabled: true
  networkpolicies:
    enabled: true
  pods:
    enabled: true
    # Sync ephemeralContainers to host cluster
    ephemeralContainers: true
    # Sync readiness gates to host cluster
    status: true
  1. Monitor /var/logs/pihole.log

Anything else we need to know?

This issue also happens on vcluster v0.16.4

Host cluster Kubernetes version

$ kubectl version
Server Version: v1.28.3

Host cluster Kubernetes distribution

Self hosted K8s

vlcuster version

$ vcluster --version
0.18.0

Vcluster Kubernetes distribution(k3s(default)), k8s, k0s)

k8s

OS and Arch

OS: Ubuntu 22.04.3 TLS
Arch: x64
@mzaferyahsi
Copy link
Author

It seems that the etcd nodes cannot be resolved with xxx-etcd-headless services. When I create a new service for each etcd instance, the DNS queries seem to stop.

@FabianKramm
Copy link
Member

@mzaferyahsi thanks for creating this issue! I'm not sure why this is not working for you, but I don't think we should create a separate service for each replica, maybe we only need to adjust the certs itself and its enough already.

@mzaferyahsi
Copy link
Author

@FabianKramm indeed that is also another solution. But the requirement on limiting the number of etcd SANs still stays in place. Therefore, I still recommend passing the number of etcd replicas as parameter for the syncer deployment. I've just tested by adjusting the etcd sans as below and it seems to be okay.

	for i := 0; i < etcdReplicaCount; i++ {
		if etcdEmbedded {
			// this is for embedded etcd
			hostname := vClusterName + "-" + strconv.Itoa(i)
			etcdSans = append(etcdSans, hostname, hostname+"."+vClusterName+"-headless", hostname+"."+vClusterName+"-headless"+"."+currentNamespace)
		} else {
			// this is for external etcd
			etcdHostname := etcdService + "-" + strconv.Itoa(i)
			// etcdSans = append(etcdSans, etcdHostname, etcdHostname+"."+etcdService+"-headless", etcdHostname+"."+etcdService+"-headless"+"."+currentNamespace)
			etcdSans = append(etcdSans, etcdHostname+"."+etcdService+"-headless"+"."+currentNamespace)
		}
	}

Shall I apply the same logic to the embedded?

@FabianKramm
Copy link
Member

@mzaferyahsi we are doing some refactoring for this now, but can add that later when the refactoring is done yeah

@PavelGloba
Copy link

I have the same issue with DNS requests for nonexisting etcd replicas to the host cluster's DNS server (in my case coredns) on vcluster 19.3
Also the services and configuration with the helm chart for existing replicas is also incorrect. Domains should end with svc.cluster.local, otherwise the domain would not resolve from the first try.
As far as I can see, right now in the master branch there is still a code which creates configuration for 20 replicas

@mzaferyahsi
Copy link
Author

Indeed, this hasn't been fixed. One solution that I've used is to run the initialization with old version of vcluster, and then update your cluster to the latest one. That way the certificates are initiated correctly and then used by the new cluster.

@PavelGloba
Copy link

PavelGloba commented Oct 14, 2024

Just tried updating fresh installation of 0.17.1 to 0.19.3 It didn't alter any etcd certificates and the problem with DNS is still there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants