k3s

From Leo's Notes
Last edited on 14 June 2020, at 23:45.

k3s is a lightweight version of Kubernetes designed for unattended workloads.

Installation

A minimum of 2 nodes required per cluster. k3s does not use etcd, so 3 node isn't required. Uses PostgreSQL instead.

K3OS

Use the K3OS image to quickly build a k3s cluster. Download the iso and boot it on at least 2 nodes. Log in as rancher and start the installer with sudo os-config. The first node is built as a server (controller), with subsequent nodes as agents (workers). When building a server, specify a token. Otherwise, you will need to find the randomly generated token after installation at /var/lib/rancher/k3s/server/node-token:

node1 [/home/rancher]$ cat /var/lib/rancher/k3s/server/node-token
K1061d5fbfbc0952e856faaf32da12cc8718b8991dc2e03a96e0a07348455f305b5::node:4e5800b39e59863b7126eb1c88bb8957

The last value (4e5800b39e59863b7126eb1c88bb8957) is the token which you need to use when setting up agent nodes.

After the server node is running, you may obtain a kubeconfg file at /etc/rancher/k3s/k3s.yaml. Rename it to ~/.kube/config and ensure the server resolves properly.

Stuff

Label nodes:

$ kubectl label node node1 kubernetes.io/role=master
$ kubectl label node node1 node-role.kubernetes.io/master=""

$ kubectl label node nodeN kubernetes.io/role=node
$ kubectl label node nodeN node-role.kubernetes.io/node=""

Troubleshooting

I kept on getting Get hxxps://acme-v02.api.letsencrypt.org/directory: x509: certificate is valid for 1c00f28ec31e740b40f7d89acf5ff01c.931313b93244347524059db81dadaf52.traefik.default, not acme-v02.api.letsencrypt.org when trying to get ACME certs:

{"level":"error","msg":"Unable to obtain ACME certificate for domains \"load.home.steamr.com\" detected thanks to rule \"Host:load.home.steamr.com\" : cannot get ACME client get directory at 'https://acme-v02.api.letsencrypt.org/directory': Get https://acme-v02.api.letsencrypt.org/directory: x509: certificate is valid for 1c00f28ec31e740b40f7d89acf5ff01c.931313b93244347524059db81dadaf52.traefik.default, not acme-v02.api.letsencrypt.org","time":"2019-11-10T03:59:16Z"}
{"level":"info","msg":"Updated status on ingress default/load-ingress","time":"2019-11-10T03:59:17Z"}
{"level":"info","msg":"Updated status on ingress default/load-ingress","time":"2019-11-10T03:59:17Z"}
{"level":"warning","msg":"Error checking new version: Get https://update.traefik.io/repos/containous/traefik/releases: x509: certificate is valid for 1c00f28ec31e740b40f7d89acf5ff01c.931313b93244347524059db81dadaf52.traefik.default, not update.traefik.io","time":"2019-11-10T04:08:58Z"}

This occured for both the ACME in traefik and from cert-manager. It turns out, the DNS setup on my network is just broken which lead to this problem. I had a search domain set which override any lookups as a subdomain of home.steamr.com.

Verify that a lookup to acme-v02.api.letsencrypt.org resolves. If it doesn't, or if it requires a . at the end, it means your DNS is broken.