Say you suddenly want your own little Kubernetes cluster...

∞ 2022-05-22

… say no more, I have one now, here’s what I did to make that happen!

Whyy?

Well. Because.

At least mostly. Here are some reasons:

i want to try it
i want some more experience with Kubernetes
i can

Basically I want to see how difficult it is, if it is viable for me. And maybe it might even be nicer than the manual server management that I have so far.

I may also have been at KubeCon 2022 virtually this last week and may have wanted some more experience before we restructure things at more. But the “automated the setup” aspect is pretty tempting as well.

Some constraints

As usual, I have some odd choices and constraints:

no publishing ports open to the internet
- i use caddy and a firewall, i don’t want to open up new ports and neither do i want to replace caddy (yet)
as little as possible admin
- and as little as possible setup hassle

How?

In short, the cluster runs k3s, the ports are published on a dedicated internal ip using kube-vip; and all of this took one evening of frustration and a morning with a clearer head to figure out.

Note: ⚡ This setup has been running for a few hours only, I don’t know if it is secure enough, fast enough, whatever enough. But it runs and now I can try some more things. ⚡

k3s

k3s is a small-ish Kubernetes distribution deliverd as one binary and I think easier to set up than a full blown cluster. E.g. the database is a simple sqlite database and there’s no fun distributed etcd stuff to setup. Running k3s server gives you a Kubernetes cluster.

I installed k3s from the AUR. Do note that the k3s-bin package is lacking the k3s-killall cleanup script, which you will need to clean up the networks, iptables stuff and containers that k3s starts up. I pretty much always ran the k3s-killall script when I was changing IPs, testing network settings and similar stuff like that. If in doubt, get a clean (network) state by running k3s-killall.

The devil is in the (network) details. Because I don’t want to publish ports on my public IP, which is not something that k3s seems to support out of the box.

I tried a lot of things, perused the docs over and over, and in the end just used kube-vip which was linked in a GitHub issue. But here are some of the steps that I tried:

--disable traefik (traefik listens on :80 and :443, which was overriding my caddy server for reasons)
various --flannel-backend and --flannel-iface settings
--https-listen=other-port, which did not work at all and is another saga in itself
block ports using ufw

For some reason the default load balancer that comes with k3s goes around ufw, likely by being earlier in the iptables chain. (#2059)
and more

In short, mostly I don’t understand how networking works properly, iptables even less; but there was also something odd going on.

kube-vip

I had enough.

Luckily, kube-vip seemed to be doing what I want: Allowing me to specify an interface an IP address that ports will be published on. In particular, instead of the builtin LoadBalancer implementation that comes with k3s we use kube-vip, setup according to their docs.

Add a new internal ip for Kubernetes: sudo ip addr add 192.168.0.101 dev lo

Note the IP and the interface (lo) are custom and you can choose what you need there, e.g. for my live setup it has a different IP and listens on the actual network interface. (But because it’s a different IP ports on that IP are not reachable from the outside.)
Setup some permissions that kube-vip needs when running as a daemonset:

curl https://kube-vip.io/manifests/rbac.yaml > /var/lib/rancher/k3s/server/manifests/kube-vip-rbac.yaml

Configure kube-vip to listen on your IP and interface:

fetch the image: k3s ctr content fetch ghcr.io/kube-vip/kube-vip:$KVVERSION
generate the daemonset: k3s ctr run --rm --net-host ghcr.io/kube-vip/kube-vip:$KVVERSION vip /kube-vip manifest daemonset --interface lo --address 192.168.0.101 --inCluster --taint --controlplane --services
place the generated daemonset in /var/lib/rancher/k3s/server/manifests/kube-vip-daemonset.yaml

For me this generated daemonset looked something like this:

  apiVersion: apps/v1
  kind: DaemonSet
  metadata:
    creationTimestamp: null
    labels:
      app.kubernetes.io/name: kube-vip-ds
      app.kubernetes.io/version: v0.4.4
    name: kube-vip-ds
    namespace: kube-system
  spec:
    selector:
      matchLabels:
        app.kubernetes.io/name: kube-vip-ds
    template:
      metadata:
        creationTimestamp: null
        labels:
          app.kubernetes.io/name: kube-vip-ds
          app.kubernetes.io/version: v0.4.4
      spec:
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
              - matchExpressions:
                - key: node-role.kubernetes.io/master
                  operator: Exists
              - matchExpressions:
                - key: node-role.kubernetes.io/control-plane
                  operator: Exists
        containers:
        - args:
          - manager
          env:
          - name: vip_arp
            value: "false"
          - name: port
            value: "6443"
          - name: vip_interface
            value: lo
          - name: vip_cidr
            value: "32"
          - name: cp_enable
            value: "true"
          - name: cp_namespace
            value: kube-system
          - name: vip_ddns
            value: "false"
          - name: svc_enable
            value: "true"
          - name: address
            value: 192.168.0.101
          image: ghcr.io/kube-vip/kube-vip:v0.4.4
          imagePullPolicy: Always
          name: kube-vip
          resources: {}
          securityContext:
            capabilities:
              add:
              - NET_ADMIN
              - NET_RAW
        hostNetwork: true
        serviceAccountName: kube-vip
        tolerations:
        - effect: NoSchedule
          operator: Exists
        - effect: NoExecute
          operator: Exists
    updateStrategy: {}
  status:
    currentNumberScheduled: 0
    desiredNumberScheduled: 0
    numberMisscheduled: 0
    numberReady: 0

Finally, run k3s server with the correct parameters to use that IP:

$ k3s server --node-ip 192.168.0.101 --advertise-address 192.168.0.101 --disable traefik --flannel-iface lb --disable servicelb
INFO[0000] Starting k3s v1.23.6+k3s1 (418c3fa8)
INFO[0000] Configuring sqlite3 database connection pooling: maxIdleConns=2, maxOpenConns=0, connMaxLifetime=0s
INFO[0000] Configuring database table schema and indexes, this may take a moment...
INFO[0000] Database tables and indexes are up to date
[...]

With all of this we disable traefik and the default load balancer (servicelb), and replace the latter with kube-vip which listens on the IP and interface we have specified.

ufw

We do need to expose one port on our firewall if we want to manage the cluster from the outside using kubectl:

$ sudo ufw allow 6443 comment k3s
Rule added
Rule added (v6)

Now we can deploy things using kubectl or other tools that talk to Kubernetes. (As long as we copy the config to the machine from where we want to run kubectl: https://rancher.com/docs/k3s/latest/en/cluster-access/)

caddy (and exposing ports locally)

To route things to the outside, we can expose a port to the host on the IP we have set up:

# Expose on port 15555 to the host.
#
# With this setup the port is then routed outside the host in some other way,
# e.g. using Caddy outside of Kubernetes.
apiVersion: v1
kind: Service
metadata:
  name: numblr
spec:
  selector:
    app: numblr
  ports:
    - protocol: TCP
      port: 15555
      targetPort: http
  type: LoadBalancer

See https://github.com/heyLu/numblr/blob/main/kubernetes/deployment.yaml for the full deployment including that Service, which you can deploy using kubectl apply -f https://github.com/heyLu/numblr/blob/main/kubernetes/deployment.yaml.

And finally, we can expose this port using Caddy:

example.org {
    reverse_proxy 192.168.0.101:15555
}

Why like this?

I have regular services running already and Caddy set up, so I just want to add services to that for now. In the future I might play with setting up traefik + cert-manager so that subdomains and certificates are exposed automatically, replacing Caddy completely.

What now?

Now I have it running live, serving a staging instance of numblr that has a copy of the live database. It runs okay so far, but there seem to be some wrinkles I have to investigate still.

What’s neat is that I can now say kubectl apply -f kubernetes/deployment.yaml when I want to deploy a new version, and I don’t have to do a little scp + ssh manual service restart dance. And I can add new services in the same way, only having to tell caddy that there’s a new port to proxy on some new domain.

I think that’s pretty nice, let’s see how it turns out.

Edit from the future: A week later I am pretty happy so far. Better tooling, easy deployments, easy (and fast) access to logs, … And a really nice way to debug things, using ephemeral containers. Quite the nice workflow so far, even when used in the home.

Fin

That’s it! Have a nice day, I have some flowers to plant now.