Basic HTTPS on Kubernetes with Traefik

Back in February of 2018 Google’s Security blog announced that Chrome would be start displaying “not secure” for websites starting in July. In doing so they cemented HTTPS as part of the constantly-rising baseline expectations for modern web developers.

These constantly-rising baseline expectations are written into a new generation of tools like Traefik and Caddy. Both are written in Go and both leverage Let’s Encrypt to automate away the requesting and renewal, and by extension the unexpected expiration, of TLS certificates. Kubernetes is another modern tool aimed at meeting some of the other modern baseline expectations around monitoring, scaling and uptime.

Using Traefik and Kubernetes together is a little fiddly, and getting a working deployment on a cloud provider even more so. The aim here is to show how to use Traefik to get Let’s Encrypt based HTTPS working on the Google Kubernetes Engine.
An obvious prerequisite is to have a domain name, and to point it at a static IP you’ve created.

Let’s start with creating our project:

mike@sleepycat:~$ gcloud projects create --name k8s-https
No project id provided.

Use [k8s-https-212614] as project id (Y/n)?  y

Create in progress for [https://cloudresourcemanager.googleapis.com/v1/projects/k8s-https-212614].
Waiting for [operations/cp.6673958274622567208] to finish...done.

Next lets create a static IP.

mike@sleepycat:~$ gcloud beta compute --project=k8s-https-212614 addresses create k8s-https --region=northamerica-northeast1 --network-tier=PREMIUM
Created [https://www.googleapis.com/compute/beta/projects/k8s-https-212614/regions/northamerica-northeast1/addresses/k8s-https].
mike@sleepycat:~$ gcloud beta compute --project=k8s-https-212614 addresses list
NAME       REGION                   ADDRESS        STATUS
k8s-https  northamerica-northeast1  35.203.65.136  RESERVED

Because I am easily amused, I own the domain actually.works. In my settings for that domain I created an A record pointing at that IP address. When you have things set up correctly, you can verify the DNS part is working with dig

mike@sleepycat:~$ dig it.actually.works

; <<>> DiG 9.13.0 <<>> it.actually.works
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62565
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;it.actually.works.		IN	A

;; ANSWER SECTION:
it.actually.works.	3600	IN	A	35.203.65.136

;; Query time: 188 msec
;; SERVER: 192.168.0.1#53(192.168.0.1)
;; WHEN: Tue Aug 07 23:02:11 EDT 2018
;; MSG SIZE  rcvd: 62

With that squared away, we need to create our Kubernetes cluster. Before we can do that we need to get a little administrative stuff out of the way. First we need to get our billing details and link them to our project.

mike@sleepycat:~$ gcloud beta billing accounts list
ACCOUNT_ID            NAME                OPEN  MASTER_ACCOUNT_ID
0X0X0X-0X0X0X-0X0X0X  My Billing Account  True
mike@sleepycat:~$ gcloud beta billing projects link k8s-https-212614 --billing-account 0X0X0X-0X0X0X-0X0X0X
billingAccountName: billingAccounts/0X0X0X-0X0X0X-0X0X0X
billingEnabled: true
name: projects/k8s-https-212614/billingInfo
projectId: k8s-https-212614

Then we’ll need to enable the Kubernetes engine for this project.

mike@sleepycat:~$ gcloud services enable container.googleapis.com --project k8s-https-212614
Waiting for async operation operations/tmo-acf.74966272-39c8-4b7b-b973-8f7fa4dac4fd to complete...
Operation finished successfully. The following command can describe the Operation details:
 gcloud services operations describe operations/tmo-acf.74966272-39c8-4b7b-b973-8f7fa4dac4fd

Let’s create our cluster. Because both Kubernetes and Google move pretty quickly, it’s good to check the current Kubernetes version for your region with something like gcloud container get-server-config --region "northamerica-northeast1". In my case that shows “1.10.5-gke.3” as the newest so I’ll use that for my cluster. If you are interested in beefier machines explore your options with gcloud compute machine-types list --filter="northamerica-northeast1" but for this I’ll slum it with a f1-micro.

mike@sleepycat:~$ gcloud container --project=k8s-https-212614 clusters create "k8s-https" --zone "northamerica-northeast1-a" --username "admin" --cluster-version "1.10.5-gke.3" --machine-type "f1-micro" --image-type "COS" --disk-type "pd-standard" --disk-size "100" --scopes "https://www.googleapis.com/auth/compute","https://www.googleapis.com/auth/devstorage.read_only","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" --num-nodes "3" --enable-cloud-logging --enable-cloud-monitoring --addons HorizontalPodAutoscaling,HttpLoadBalancing,KubernetesDashboard --enable-autoupgrade --enable-autorepair

Creating cluster k8s-https...done.
Created [https://container.googleapis.com/v1/projects/k8s-https-212614/zones/northamerica-northeast1-a/clusters/k8s-https].
To inspect the contents of your cluster, go to: https://console.cloud.google.com/kubernetes/workload_/gcloud/northamerica-northeast1-a/k8s-https?project=k8s-https-212614
kubeconfig entry generated for k8s-https.
NAME       LOCATION                   MASTER_VERSION  MASTER_IP    MACHINE_TYPE  NODE_VERSION  NUM_NODES  STATUS
k8s-https  northamerica-northeast1-a  1.10.5-gke.3    35.203.64.6  f1-micro      1.10.5-gke.3  3          RUNNING

You will notice that kubectl (which you obviously have installed already) is now configured to access this cluster.
As part of the Traefik setup we are about to do we will need to change some RBAC rules. To do that we will need to create a cluster admin role and load that into our cluster.

mike@sleepycat:~$ cat cluster-admin-rolebinding.yaml 
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: owner-cluster-admin-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: User
  name: <your_username@your_email_you_use_with_google_cloud.whatever>
---

mike@sleepycat:~$ kubectl apply -f cluster-admin-rolebinding.yaml

With that done we can apply the rest of the config I’ve posted in a snippet here with kubectl apply -f https.yaml.
It’s a fair bit of yaml, but a few things are worth pointing out.

First, we are running a single pod with my helloworld image. It’s just the output of create-react-app that I use for testing stuff.
If you look at the traefik-ingress-service, you will notice we are telling Google we want the service mapped to the static IP we created earlier using loadBalancerIP.

---
apiVersion: v1
kind: Service
metadata:
  name: traefik-ingress-service
  namespace: kube-system
spec:
  loadBalancerIP: 35.203.65.136
  ports:
  - name: http
    port: 80
    protocol: TCP
  - name: https
    port: 443
    protocol: TCP
  - name: admin
    port: 8080
    protocol: TCP
  selector:
    k8s-app: traefik-ingress-lb
  type: LoadBalancer
---

When looking at the traefik-ingress-controller itself, it’s worth noting the choice of kind: Deployment instead of kind: DaemonSet. This choice was made for simplicity’s sake (only a single pod will read/write to my certs-claim volume so no Multi-Attach errors), and means that I will have a single pod acting as my ingress controller. Read more about the tradeoffs here.

Here is the traefik-ingress-controller in it’s entirety. It’s a long chunk of code, but I find this helps see everything in context.

Special note about the args being passed to the container; make sure they are strings. You can end up with some pretty baffling errors if you don’t. Other than that, it’s the full set of options to get you TLS certs and automatic redirects to HTTPS.

---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    k8s-app: traefik-ingress-lb
  name: traefik-ingress-controller
  namespace: kube-system
spec:
  template:
    metadata:
      labels:
        k8s-app: traefik-ingress-lb
    spec:
      containers:
      - args:
        - "--api"
        - "--kubernetes"
        - "--logLevel=DEBUG"
        - "--debug"
        - "--defaultentrypoints=http,https"
        - "--entrypoints=Name:http Address::80 Redirect.EntryPoint:https"
        - "--entrypoints=Name:https Address::443 TLS"
        - "--acme"
        - "--acme.onhostrule"
        - "--acme.entrypoint=https"
        - "--acme.domains=it.actually.works"
        - "--acme.email=mike@korora.ca"
        - "--acme.storage=/certs/acme.json"
        - "--acme.httpchallenge"
        - "--acme.httpchallenge.entrypoint=http"
        image: traefik:1.7
        name: traefik-ingress-lb
        ports:
        - containerPort: 80
          hostPort: 80
          name: http
        - containerPort: 443
          hostPort: 443
          name: https
        - containerPort: 8080
          hostPort: 8080
          name: admin
        securityContext:
          capabilities:
            add:
            - NET_BIND_SERVICE
            drop:
            - ALL
        volumeMounts:
        - mountPath: /certs
          name: certs-claim
      serviceAccountName: traefik-ingress-controller
      terminationGracePeriodSeconds: 60
      volumes:
      - name: certs-claim
        persistentVolumeClaim:
          claimName: certs-claim

The contents of the snippet should be all you need to get up and running. You should be able to visit your domain and see the reassuring green of the TLS lock in the URL bar.

A TLS certificate from Let's Encrypt

If things aren’t working you can get a sense of what’s up with the following commands:

kubectl get all --all-namespaces
kubectl logs --namespace=kube-system traefik-ingress-controller-...

Where to go from here

As you can see, there is a fair bit going on here. We have DNS, Kubernetes, Traefik and the underlying Google Cloud Platform all interacting and it’s not easy to get a minimal “hello world” style demo going when that is the case. Hopefully this shows enough to give people a jumping off point so they can start refining this into a more robust configuration. The next steps for me will be exploring DaemonSets and storing the acme.json in a way that multiple copies of Traefik can access, maybe a key/value like consul. We’ll see what the next layer of learning brings.

Leave a comment