This is a mono repository for my home infrastructure and Kubernetes cluster. I try to adhere to Infrastructure as Code (IaC) and GitOps practices using the tools like Ansible, Terraform, Kubernetes, Flux, Renovate and GitHub Actions.
There is a template over at onedr0p/flux-cluster-template if you wanted to try and follow along with some of the practices I use here.
My cluster is k3s provisioned overtop bare-metal Fedora Server using the Ansible galaxy role ansible-role-k3s. This is a semi hyper-converged cluster, workloads and block storage are sharing the same available resources on my nodes while I have a separate server for (NFS) file storage.
πΈ Click here to see my Ansible playbooks and roles.
- actions-runner-controller: Self-hosted Github runners.
- calico: Internal Kubernetes networking plugin.
- cert-manager: Creates SSL certificates for services in my Kubernetes cluster.
- external-dns: Automatically manages DNS records from my cluster in a cloud DNS provider.
- external-secrets: Managed Kubernetes secrets using 1Password Connect.
- ingress-nginx: Ingress controller to expose HTTP traffic to pods over DNS.
- rook: Distributed block storage for peristent storage.
- sops: Managed secrets for Kubernetes, Ansible and Terraform which are commited to Git.
- tf-controller: Additional Flux component used to run Terraform from within a Kubernetes cluster.
- volsync and snapscheduler: Backup and recovery of persistent volume claims.
Flux watches my kubernetes folder (see Directories below) and makes the changes to my cluster based on the YAML manifests.
Renovate watches my entire repository looking for dependency updates, when they are found a PR is automatically created. When some PRs are merged Flux applies the changes to my cluster.
This Git repository contains the following directories under kubernetes.
π kubernetes # Kubernetes cluster defined as code
ββπ bootstrap # Flux installation
ββπ flux # Main Flux configuration of repository
ββπ apps # Apps deployed into my cluster grouped by namespace (see below)
Below is a a high level look at the layout of how my directory structure with Flux works. In this brief example you are able to see that authelia
will not be able to run until glauth
and cloudnative-pg
are running. It also shows that the Cluster
custom resource depends on the cloudnative-pg
Helm chart. This is needed because cloudnative-pg
installs the Cluster
custom resource definition in the Helm chart.
# Key: <kind> :: <metadata.name>
GitRepository :: home-ops-kubernetes
Kustomization :: cluster
Kustomization :: cluster-apps
Kustomization :: cluster-apps-authelia
DependsOn:
Kustomization :: cluster-apps-glauth
Kustomization :: cluster-apps-cloudnative-pg-cluster
HelmRelease :: authelia
DependsOn:
HelmRelease :: cloudnative-pg
HelmRelease :: glauth
Kustomization :: cluster-apps-glauth
HelmRelease :: glauth
Kustomization :: cluster-apps-cloudnative-pg
HelmRelease :: cloudnative-pg
Kustomization :: cluster-apps-cloudnative-pg-cluster
DependsOn:
Kustomization :: cluster-apps-cloudnative-pg
Cluster :: postgres
Name | CIDR |
---|---|
Management VLAN | 192.168.1.0/24 |
Kubernetes Nodes VLAN | 192.168.42.0/24 |
Kubernetes external services (Calico w/ BGP) | 192.168.69.0/24 |
Kubernetes pods | 10.42.0.0/16 |
Kubernetes services | 10.43.0.0/16 |
- HAProxy configured on my
Opnsense
router for the Kubernetes Control Plane Load Balancer. - Calico configured with
externalIPs
to expose Kubernetes services with their own IP over BGP (w/ECMP) which is configured on my router.
While most of my infrastructure and workloads are selfhosted I do rely upon the cloud for certain key parts of my setup. This saves me from having to worry about two things. (1) Dealing with chicken/egg scenarios and (2) services I critically need whether my cluster is online or not.
The alternative solution to these two problems would be to host a Kubernetes cluster in the cloud and deploy applications like HCVault, Vaultwarden, ntfy, and Gatus. However, maintaining another cluster and monitoring another group of workloads is a lot more time and effort than I am willing to put in and only saves me roughly $18/month.
Service | Use | Cost |
---|---|---|
GitHub | Hosting this repository and continuous integration/deployments | Free |
Cloudflare | Domain, DNS and proxy management | ~$30/y |
1Password | Secrets with External Secrets | ~$65/y |
Terraform Cloud | Storing Terraform state | Free |
B2 Storage | Offsite application backups | ~$5/m |
UptimeRobot | Monitoring internet connectivity and external facing applications | ~$60/y |
Pushover | Kubernetes Alerts and application notifications | Free |
GCP | Voice interactions with Home Assistant over Google Assistant | Free |
Total: ~$18/m |
Over WAN, I have port forwarded ports 80
and 443
to the load balancer IP of my ingress controller that's running in my Kubernetes cluster.
Cloudflare works as a proxy to hide my homes WAN IP and also as a firewall. When not on my home network, all the traffic coming into my ingress controller on port 80
and 443
comes from Cloudflare. In Opnsense
I block all IPs not originating from the Cloudflares list of IP ranges.
πΈ Cloudflare is also configured to GeoIP block all countries except a few I have whitelisted
coredns is deployed on my Opnsense
router and all DNS queries for my domains are forwarded to k8s_gateway that is running in my cluster. With this setup k8s_gateway
has direct access to my clusters ingresses and services and serves DNS for them in my internal network.
AdGuard Home is deployed on my Opnsense
router which has a upstream server pointing the coredns
instance I mentioned above. Adguard Home
listens on my MANAGEMENT
, SERVER
, IOT
and GUEST
networks on port 53
meanwhile coredns
only listens on 127.0.0.1:53
. In my firewall rules I have NAT port redirection forcing all the networks to use the Adguard Home
DNS server.
external-dns is deployed in my cluster and configure to sync DNS records to Cloudflare. The only ingresses external-dns
looks at to gather DNS records to put in Cloudflare
are ones that I explicitly set an annotation of external-dns.home.arpa/enabled: "true"
πΈ Click here to see how else I manage Cloudflare with Terraform.
My home IP can change at any given time and in order to keep my WAN IP address up to date on Cloudflare. I have deployed a CronJob in my cluster, this periodically checks and updates the A
record ipv4.domain.tld
.
Device | Count | OS Disk Size | Data Disk Size | Ram | Operating System | Purpose |
---|---|---|---|---|---|---|
Protectli FW6D | 1 | 500GB mSATA | - | 16GB | Opnsense | Router |
Intel NUC8i3BEK | 3 | 256GB NVMe | - | 32GB | Fedora | Kubernetes Masters |
Intel NUC8i5BEH | 3 | 240GB SSD | 1TB NVMe (rook-ceph) | 64GB | Fedora | Kubernetes Workers |
PowerEdge T340 | 1 | 2TB SSD | 8x12TB ZFS (mirrored vdevs) | 64GB | Ubuntu | NFS + Backup Server |
Lenovo SA120 | 1 | - | 6x12TB (+2 hot spares) | - | - | DAS |
Raspberry Pi | 1 | 32GB (SD) | - | 4GB | PiKVM | Network KVM |
TESmart 8 Port KVM Switch | 1 | - | - | - | - | Network KVM (PiKVM) |
APC SMT1500RM2U w/ NIC | 1 | - | - | - | - | UPS |
Unifi USP PDU Pro | 1 | - | - | - | - | PDU |
Thanks to all the people who donate their time to the Kubernetes @Home Discord community. A lot of inspiration for my cluster comes from the people that have shared their clusters using the k8s-at-home GitHub topic. Be sure to check out the Kubernetes @Home search for ideas on how to deploy applications or get ideas on what you can deploy.
See awful commit history
See LICENSE