This is the set of terraform, helm, and docker configurations required to manage, operate, and deploy to a no-nonsense version of Kubernetes we call Coopernetes. This project is still in very early alpha developement, and is currently only being used by Colab Coop (https://colab.coop) and itme (https://itme.company). If you are interested in hosting containers and applicaitons on a managed Kubernetes cluster using Coopernetes, or you are interested in deploying the infrastructure yourself, please reach out to [email protected].
master
is our primary working branch. It is intended to be generic, and can be cloned and used by anyone to launch a cluster from scracth.itme
andcolab
correspond to the configurations of the two organizations currently using coopernetes. We each have slightly different needs and architectures, so we're using branches to track the individual changes until we can merge them back to master.
All commands can be installed with brew install
, except for helm plugins which use helm plugin install
To manage the AWS infrastructure:
terraform
(We use 12.24 in this repo. Installingtfswitch
will allow you to easily switch between terraform versions for different projects. You can install it by following the directions at https://tfswitch.warrensbox.com/Install/)awscli
wget
To manage and deploy applications on kubernetes:
kubectl
helm
helmfile
- the helm-diff plugin:
helm plugin install https://github.com/databus23/helm-diff
- the helm-secrets plugin:
helm plugin install https://github.com/zendesk/helm-secrets
gnu-getopt
: used by helm-secretsvelero
: used for backup and restore
Based on the example at https://github.com/terraform-aws-modules/terraform-aws-eks/blob/7de18cd9cd882f6ad105ca375b13729537df9e68/examples/managed_node_groups/main.tf
- Run
terraform init && terraform apply
in the sops folder to setup sops configuration, used for secrets management.
- Secrets are encrypted with helm-secrets, which is configured using a
.sops.yaml
file in the root folder. In this repo, that file is symlinked toterraform/sops/generated/sops.yaml
, since the kms key is generated by terraform.
- Create a new folder copying an existing cluster config, changing the
terraform.tfvars
file with the desired details to configure the new cluster. - From inside the
terraform/ENV/eks
folderterraform apply
- Terraform generates a couple of config files needed by helmfile / helm. They are put in
terraform/eks/generated
, and the other programs that rely on them link to the files there. - The first of these files is
helmfile.yaml
which is a set of non-secret values generated or configured in terraform that are needed by helmfile. This file is imported as the default environment in helmfile, granting charts and configuration access to terraform values. Anything non-secret you want to pass from terraform to helmfile should live here.
- Configure kubectl with the generated kubeconfig:
aws eks --region us-east-1 --profile=<AWS_PROFILE> update-kubeconfig --name <CLUSTER_NAME>
helmfile apply
in the root folder.- Once you deploy your first application with an Ingress, run
kubectl get ingress --all-namespaces
to list the address associated with the ingress. That is the load balancer for all inbound requests on the clster. You should create a DNS entry pointing to this load balancer for all services you want to create. - Port forward into kibana by running the command from below, then go to Discover menu item, configure the index to
kubernetes_cluster*
, choose a@timestamp
and Kibana is ready. - Once the velero client is installed, you need to run a couple commands to configure and setup backups:
- Run
velero client config set namespace=system-backups
. This tells velero what namespace we installed it it. - Run
velero backup create test-backup
to test the backup functionality - Run
velero schedule create daily-cluster-backup --schedule="0 0 * * *"
to setup a backup schedule for the cluster.
- Once prometheus-operator is installed, you should add the following dashboard to grafana: https://grafana.com/grafana/dashboards/8670.
- You can run the grafan dashboard by finding the grafana pod in the system-monitoring namespace, and then running:
kubectl port-forward <GRAFANA_POD> -n system-monitoring 3000:3000
- You can log in with the user
admin
and the passworprom-operator
. Since you need access to the cluster to port forward, these account credentials can be shared freely.
All deployment related files, including the chart, helmfile, and Dockfile, should all live in a folder called .deploy
in the root of the repository.
To deploy, simply launch the coopernetes-deploy
container in CircleCI and use coopctl
to deploy
- Builds a docker image using the Dockerfile at
.deploy/Dockerfile
and the project root as the context. - Calls
helfile apply .deploy/helmfile.yaml
. - Run
velero schedule create daily-<NAMESPACE>-backup --schedule="0 0 * * *" --include-namespaces <NAMESPACE>
to setup a backup schedule for the namespace. - Keep in mind, nodes have a maximum number of pods they can support, as indicated on the following list: https://github.com/awslabs/amazon-eks-ami/blob/master/files/eni-max-pods.txt
If you are using a custom chart for the project, we recommend putting it at .deploy/chart/
.
- https://cert-manager.io/docs/tutorials/acme/ingress/
- https://cert-manager.io/docs/installation/kubernetes/
- log-aggregator (if installed):
kubectl port-forward deployment/efk-kibana 5601 -n system-logging
- grafana:
kubectl port-forward -n system-monitoring prometheus-operator-grafana-RANDOM-ID 3000:3000
Autoscaling is not currently set up in the cluster, but it can be enabled by installing the cluster-autoscaler as outlined in this doc: https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/docs/autoscaling.md. Autoscaling will mean the cluster shrinks and grows based on our capacity needs. If we annotate our deployed services correctly with expected CPU and memory usage, this will allow the cluster to scale up and down to meet demand.
Spot instances are likely not going to be worth our time to investigate, as they are instances that often have cheaper on demand prices, but no guaranteed availibility. We are probably better off with reserved instances, since our capacity is relatively consistent, but figured it might be a worthwhile exploration if someone is interested. https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/docs/spot-instances.md
helmfile
is great for managing infrastructure installed on a case by case basis, but in order to package up coopernetes so that it's easier to use we will eventually want to create a master helm chart with all the basic installation and configuration options, similar to how https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack installs a bunhc of different srevices using a mix of custom manifests and subcharts.
The benefits of this approach are that we replace the entire helmfile with a single configurable master chart, that installs the appropriate backup services, metrics, ingress, etc. The current repo is somewhat brittle, and not easy to share widely with many organizations. However, part of the reason for that brittleness is that terraform and helmfile are tighly integrated, allowing us to configure both AWS and kubernetes with the same repo. To maintain the same level of interoperability, we would likely want to create a coopernetes terraform module that installs all the AWS specific resources we need for the master chart. Then, any new team that wanted to deploy coopernetes could do so with a terraform module and this master chart.