Kubernetes

Kubernetes is an open source container orchestration system originally developed by Google. By late 2017, there has been a widespread adoption of both containers and Kubernetes.

The name originates from the Greek word κυβερνήτης, meaning helmsman or pilot (Hence, their ship's wheel as their logo). The 8 letters in the middle is commonly replaced with '8' with the entire name stylized as 'k8s'.

Kubernetes is well loved by system administrators for making it easy to upgrade, install security patches, configuring networks, and making backups without needing to worry about the application level. Other features such as autoscaling and load balancing are built in to the core Kubernetes ecosystem.

Developers love it for making it extremely easy to deploy their applications as fully redundant deployments that can automatically scale up or down. Seamless upgrades can also be done and controlled by developers without needing to touch the infrastructure level.

There are many commercial products that build on top of the Kubernetes environment such as Red Hat OpenShift, or VMware PKS. If Kubernetes were the Linux kernel, these commercial products would be like the different Linux distros built on top of the kernel.

Eventually, the excitement around Kubernetes will probably fade. Similar to the Linux kernel, it will be part of the IT infrastructure plumbing and used like any other technology that currently exists.

There are certain use cases where Kubernetes isn't a good fit. For instance, database servers where each database node has some unique state and are not inter-changeable is not something that should be running under Kubernetes. Other lighter things that don't require the full Kubernetes stack might be better suited on Functions as a Service (FaaS) platforms. These so-called funtainers provide an even lighter way to run code that may be more convenient than containers. Alternatively, clusterless container services such as Amazon Fargate, or Azure Container Instances allow hosting of containers without necessarily needing to run a full Kubernetes stack.

Cloud Native

Kubernetes is based around the concept of a cloud native system. Things in such a system should be:

Automatable - applications are deployed and managed by machines
Ubiquitous and flexible - Decoupling physical resources from the compute node. Containerized microservices can be moved from one node to another
Resilient and Scalable - No single point of failure, distributed and highly available through redundancy and graceful degradation.
Dynamic - an orchestrator should be able to schedule containers to take maximum advantage of available resources
Observability - monitoring, logging, and tracing are all available

Cluster Architecture

Kubernetes connects multiple servers into a cluster. A cluster is a combination of master nodes and worker nodes, each with a distinct set of components.

While there is a distinction between the two types of nodes in terms of where components are placed, tehre is no intrinsic difference between the two. In fact, you can run all the components on a single node such as in Minikube.

Master Nodes

Master nodes form the control plane. It forms the brains of the cluster and runs all the necessary components to maintain the Kubernetes cluster including: Scheduling containers, managing services, serving the Kubernetes API, etc. These components can be run on a single master or split across multiple master nodes to ensure high availability.

The control plane should have the following components:

kube-apiserver - the Control Plane's API, used by the user and by other components on the cluster
etcd - the distributed key/value database to store cluster configuration
kube-scheduler - Schedules on which worker node new pods are to be placed
kube-controller-manager - Performs cluster-level functions such as replicating components, keeping track of worker nodes, interacting with cloud provider services such as load balancers, persistent disk volumes, etc.

To make the control plane resilient, multiple master nodes should be deployed so that the necessary services are still available if a node should go down. A failed control plane would result in the inability for the cluster to respond to any commands or be able to react to cluster changes or reschedule any workloads.

Worker Nodes

Each worker node in the cluster runs the actual workload that is deployed on the cluster and should have the following components:

kubelet - Responsible for driving container runtimes and starting workloads that are scheduled on the node. It also monitors pod statuses.
kube-proxy - Handles networking between pods by load-balancing network traffic between application components
a container runtime - The application that handles the containers that are running. Typically Docker, or rkt, or CRI-O.

Pods running on the failed node will automatically be rescheduled elsewhere by the control plane. A well designed cloud application that has multiple replicas should not be impacted by the temporary outage of a single pod.

Failure testing should be done to ensure that applications are not affected by node outages. Automatic resilience testing tools such as Netflix's Chaos Monkey can help by randomly killing nodes, Pods, or network connectivity between nodes.

Installation

Installing Kubernetes is easy and there are many options available to help you get it set up.

Keep in mind however the amount of time and resources it will take to maintain a Kubernetes cluster. There are lots of things that can go wrong with a Kubernetes set up and maintaining such a system requires a significant amount of time and energy. Things to keep in mind on a self-hosted solution are:

HA control plane and worker nodes
Cluster set up securely? Patched? Container defaults set appropriately?
Services in the cluster secure?
Conformant to CNCF standards?
Node configuration managed, or has it drifted?
Data backed up? Persistent storage restore/backups?
Monitoring?

It might be a better solution to go with a managed Kubernetes service such as from AWS or GCE.

If you do wish to go with a self-hosted solution, there are a few Kubernetes installers:

kops
kubespray
TK8
Kubernetes the hard way
kubeadm
tarmak
Rancher Kubernetes Engine
Puppet kubernetes module
kubeformation

kubespray

Kubespray uses ansible to deploy a Kubernetes cluster.

Get Kubespray from https://github.com/kubernetes-sigs/kubespray

Quick Start

To get an application running as a service:

Create a deployment which defines how many replicas should exist, the container to use, and exposed ports.
- This will create a Pod. Each replica creates an additional Pod (that ideally resides on different nodes).
- kubectl get deployments to see deployments
- kubectl get pods to see pods
Create a service which defines the name of the service and the port the service is exposed on
- kubectl get service

Concepts

See the Kubernetes documentation at https://kubernetes.io/docs/concepts/

Deployment: A Deployment defines the desired state for pods and ReplicaSets.
Pod: A Pod is a collection of container(s) all residing on one node.
ReplicaSet: A ReplicaSet creates or destroys pods depending on scaling and the number of pods that are running. Actions are managed by a ReplicationController
ReplicationController: A Replication controller ensure that a specific number of pod replicas are running at any one time. It is a type of Controller.
Controller: A reconciliation loop that ensures the desired state matches the actual cluster state
Service: A Service is an abstraction that defines the logical set of Pods. Think of it as a load balancer with a DNS name. As pods are added or removed, the service will update accordingly.
Ingress: An Ingress defines routes for HTTP requests comining in to the Kubernetes Ingress Controller. Specify a host and URI and its destination service.
Labels: Labels are key/value pairs that Kubernetes attaches to any objects, such as pods, Replication Controllers, Endpoints, and so on.
Annotations: Annotations are key/value pairs used to store arbitrary non-queryable metadata.
Secrets: Secrets hold sensitive information such as passwords, TLS certificates, OAuth tokens, and ssh keys.
ConfigMap: ConfigMaps are mechanisms used to inject containers with configuration data while keeping containers agnostic of Kubernetes.
Kubernetes Master: A master node is responsible for maintaining the state of the Kubernetes cluster. It typically runs a kube-apiserver, kube-scheduler, kube-controller-manager, and etcd.
Kubernetes Worker: A worker node is responsible for running the actual applications managed by Kubernetes. It typically runs kubelet, kube-proxy, a container engine (docker, rkt), and other plugins for networking (flannel) or cloud integration (CNI)

Each concept will be covered in more detail below.

Pods

A pod is:

A collection of application containers
Guaranteed to be deployed on the same Kubernetes cluster node
Shares the same cgroup, IP address, hostname (hence, all containers in a pod runs in the same execution environment)

A pod should provide one individual component of an application that can:

Be scaled independently of all other components in the application (eg. Database with respect to the frontend web server)
Work even if placed (ie. orchestrated) on a different machine

In general, the right question to ask yourself when designing Pods is, “Will these containers work correctly if they land on different machines?” If the answer is “no,” a Pod is the correct grouping for the containers. If the answer is “yes,” multiple Pods is probably the correct solution. In the example at the beginning of this chapter, the two containers interact via a local filesystem. It would be impossible for them to operate correctly if the containers were scheduled on different machines.
—Thinking with Pods, Kubernetes: Up and Running

Pods are assigned a unique Pod IP address within the cluster. All containers inside a pod can reference each other via localhost. Containers outside a pod can only reference other containers in other containers using the Pod IP address or via a Service.

Like all other Kubernetes resources, Pods can be created using a YAML or JSON descriptor. When a Pod descriptor is applied via the kubectl command, the object is stored in the cluster's etcd server and is taken into effect immediately by the Kubernetes scheduler. Alternatively, a Pod can be created by invoking kubectl run (eg. kubectl run kuard --image=registry/something/something:tag) but this method of creating pods is not recommended. (TODO: explain why)

A pod definition can be in either YAML or JSON, typically YAML, and should contain the following sections:

Kind: Specifies the kind of Kubernetes resource this manifest defines. In this case, a 'Pod'
Metadata: Helpful data to uniquely identify the object. Eg. the Pod's Name, UID, Namespace
Spec: Object data; for Pods, it should be an array of containers (including container image, name, ports, environment variables, resources, etc).

For more information on the spec, see https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md.

apiVersion: apps/v1
kind: Pod
metadata:
  name: kuard
spec:
  containers:
    - image: gcr.io/kuar-demo/kuard-amd64:1
      name: kuard
      ports:
        - containerPort: 8080
          name: http
          protocol: TCP
      env:
        - value: something
      resources:
        requests:
          cpu: 100m
          cpu: 100Mi

Use kubectl explain pods to see possible API object fields. Drill deeper with kubectl explain pod.spec.

Create using kubectl apply -f pod-descriptor.yml
See it using kubectl get pods
Detailed information about it using kubectl describe pods pod-name
Delete it using kubectl delete pods/pod-name or kubectl delete -f pod-descriptor.yml

Pods that are set for deletion will cease to have new requests sent to it. After a 30 second termination grace period, the pods are then terminated. This extra time allows for the pod to reliably finish active requests.

Logs generated by all containers in the pod can be viewed with kubectl logs <pod>. A specific container can be singled out using kubectl logs <pod> -c <container name>. Container logs are rotated automatically daily or after the log file reaches 10MB.

Controllers

A Controller ensures the cluster state matches the desired state using a reconciliation loop.

The following objects utilizes a Controller to perform the necessary updates to maintain the current state.

ReplicaSet

See: https://kubernetes.io/docs/concepts/workloads/controllers/replicaset/

ReplicaSets are used if you want to scale your pod (that are stateless and interchangeable) a certain amount. You define the number of pod replicas that are running at a given time which is then enforced by the Replication Controller. Typically, ReplicaSets are used by Deployments as a mechanism to orchestrate pod creation, deletion, and updates.

DaemonSet

See: https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/

DaemonSet are used to run a Pod on every Kubernetes Nodes in your cluster. As nodes are added or removed, Pods will be created or destroyed with it. Some use cases requiring DaemonSets include storage services (glusterd, ceph), monitoring, log collection, etc., on each node.

By default, DaemonSet deploys pods to all nodes in the cluster unless a NodeSelector property is set in the pod template. DaemonSet deploys pods on unschedulable nodes because a DaemonSet bypasses the Scheduler completely.

apiVersion: apps/v1beta2
kind: DaemonSet
metadata:
 name: ssd-monitor
spec:
 selector:
 matchLabels:
 app: ssd-monitor
 template:
 metadata:
 labels:
 app: ssd-monitor
 spec:
 nodeSelector:  # Define your node selector here to exclude / include certain nodes
 disk: ssd
 containers:
 - name: main
 image: luksa/ssd-monitor

kubectl get daemonset or kubectl get ds

StatefulSets

StatefulSets manages Pods requiring a sticky identity. Typically, they are used for Pods that are not interchangeable and requires a persistent identifiers (ie. pod name and DNS name such as database-server-0) and stable storage (ie. a pod always has the same PersistentVolume attached).

Pod creation by default (podManagementPolicy=OrderedReady) is ordered sequentially starting from 0 to N-1 and conversely, scaling down or termination in reverse from N-1 to 0. Alternatively, it can be set to start up in parallel (podManagementPolicy=Parallel). Deletion of a StatefulSet may occur in any order.

Note that, the PersistentVolumes associated with the Pods’ PersistentVolume Claims are not deleted when the Pods, or StatefulSet are deleted. This must be done manually.

Others

There may be other Controllers including:

Job Controller, which runs a Pod as a job.

Deployments

See: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/

Deployments defines the desired state of Pods or ReplicaSets which are enforced by the Deployment Controller. A deployment can be versioned so that Kubernetes can let you rollout a new version with the ability to pause or even rollback the changes at a later time.

The manifest's spec should contain a template that contains the information about a new Pod.

Example Deployment manifest:

No code provided.

Create using kubectl apply -f frontend-deployment.yml
See it using kubectl get deployments
Pods that this deployment creates can be seen using the label selector.
- This example manifest has 2 labels applied: app, and tier
- See it using kubectl get pods -l app=guestbook -l tier=frontend

You can change the scale of a deployment:

[root@kube guestbook]# kubectl get deployment frontend
NAME       DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
frontend   3         3         3            3           44m
[root@kube guestbook]# kubectl scale deployment frontend --replicas=5
deployment.extensions/frontend scaled
[root@kube guestbook]# kubectl get deployment frontend
NAME       DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
frontend   5         5         5            5           45m
[root@kube guestbook]# kubectl get pods
NAME                            READY   STATUS    RESTARTS   AGE
frontend-654c699bc8-5ngzj       1/1     Running   0          45m
frontend-654c699bc8-77b58       1/1     Running   0          45m
frontend-654c699bc8-ll7cr       1/1     Running   0          22s
frontend-654c699bc8-pqzvp       1/1     Running   0          22s
frontend-654c699bc8-sq682       1/1     Running   0          45m

Service

A Service provides an abstraction between the user and the underlying Pod or Pods providing the service. Since Pods are ephemeral, their IP addresses may change as they get created and destroyed and individual pods may go down; a Service provides applications with a name and load balancing and routing to maintain the service's availability.

A Service by default provides a single IP address for the set of pods (known as a ClusterIP) and are only accessible within the cluster. This can be changed so that a Service provides a load balanced port (LoadBalancer) or a port which is exposed on the node (NodePort).

An example manifest:

apiVersion: v1
kind: Service
metadata:
  name: frontend
  labels:
    app: guestbook
    tier: frontend
spec:
  # comment or delete the following line if you want to use a LoadBalancer
  type: NodePort 
  # if your cluster supports it, uncomment the following to automatically create
  # an external load-balanced IP for the frontend service.
  # type: LoadBalancer
  ports:
  - port: 80
  selector:
    app: guestbook
    tier: frontend

Apply the Service using kubectl apply -f frontend-service.yaml
See Services using kubectl get services

If you use a NodePort and the service looks like this:

[root@kube guestbook]# kubectl get service frontend
NAME       TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
frontend   NodePort   10.97.195.242   <none>        80:32764/TCP   10m

You can get to your service by accessing the Node on port 32764.

Namespaces

A namespace splits complex systems with many components into smaller distinct groups. It is also used to separate resources in a multi-tenant environment, or by splitting up resources between production, development, and QA environments.

Resource names must be unique to each namespace. Certain cluster-level resources are not namespaced, such as the Node resource.

Namespaces are "hidden" from each other, but they are not fully isolated by default (depending on what network solution you deployed in the cluster). A service in one Namespace can talk to a service in another Namespace using the DNS name <Service Name>.<Namespace Name>.svc.cluster.local. Since the default search domain is svc.cluster.local, a lookup to <Service Name>.<Namespace Name> would resolve as well.

kubectl get namespaces

Namespaces must contain only letters, digits, and dashes. To create a namespace: kubectl create namespace xyz

apiVersion: v1
kind: Namespace
metadata:
  name: xyz

Objects cannot see other objects in different namespaces (this applies even to the default namespace). Secrets in a different namespace will not be visible or accessible.

Kubernetes users can be assigned permissions to specific namespaces which can be used to limit a user's access on a shared multi-tenant cluster.

Contexts

Contexts are like a profile. A context can have different default namespace, or user credentials to manage different clusters.

Change the current context using kubectl config use-context my-context

Config File

Located in ~/.kube/config. This file contains credentials to authenticate to the cluster.

It contains the default namespace and context values.

Kubernetes API

The Kubernetes API is a RESTful API, providing access to the Kubernetes backend.

Objects in the Kubernetes API are represented as JSON or Yaml files. Files can be used to create, update, or delete objects from the server.

Description	Command
Create/Update	# kubectl apply -f obj.yaml
Edit	# kubectl edit <resource-name> <object-name>
Delete	# kubectl delete -f obj.yaml ## or # kubectl delete <resource-name> <object-name>

All objects can be annotated or given a label.

Description	Command
Label pod 'bar' `color=red`	# kubectl label pods bar color=red ## pass --overwrite if it already exists.
Remove label `color` from pod 'bar'	# kubectl label pods bar -color

Kubernetes Master Node

A Kubernetes master node contains containers that provide the API server, scheduler, etc. that manages the cluster.

The master node should have the following components:

controller-manager: Responsible for running controllers that regulate behavior int he cluster. Eg. ensure replicas for a service are available and healthy.
scheduler: Places pods into different nodes in the cluster
etcd: storage for cluster; stores API objects.

All components deployed by Kubernetes run under the kube-system namespace.

# kubectl describe nodes kube
...
Non-terminated Pods:         (8 in total)
  Namespace                  Name                            CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ---------                  ----                            ------------  ----------  ---------------  -------------
  kube-system                coredns-576cbf47c7-6mphw        100m (2%)     0 (0%)      70Mi (0%)        170Mi (2%)
  kube-system                coredns-576cbf47c7-75n6g        100m (2%)     0 (0%)      70Mi (0%)        170Mi (2%)
  kube-system                etcd-kube                       0 (0%)        0 (0%)      0 (0%)           0 (0%)
  kube-system                kube-apiserver-kube             250m (6%)     0 (0%)      0 (0%)           0 (0%)
  kube-system                kube-controller-manager-kube    200m (5%)     0 (0%)      0 (0%)           0 (0%)
  kube-system                kube-proxy-cmdsn                0 (0%)        0 (0%)      0 (0%)           0 (0%)
  kube-system                kube-scheduler-kube             100m (2%)     0 (0%)      0 (0%)           0 (0%)
  kube-system                weave-net-swwgs                 20m (0%)      0 (0%)      0 (0%)           0 (0%)
...

The Kubernetes proxy is responsible for routing network traffic to services in the kubernetes cluster. (Question: Does it do the load balancing?). A proxy exists on every node.

# kubectl get daemonsets --namespace=kube-system
NAME         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
kube-proxy   1         1         1       1            1           <none>          26h
weave-net    1         1         1       1            1           <none>          28m

Question: What is a DaemonSet?

Kubernetes also runs a DNS server that provides naming and discovery for services in the cluster.

# kubectl get deployments --namespace=kube-system
NAME      DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
coredns   2         2         2            2           26h

# kubectl get services --namespace=kube-system
NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)         AGE
kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP   26h

The DNS service for the cluster runs on 10.96.0.10. If you log into a container in the cluster, this server will be used as the primary DNS server.

Kubernetes Dashboard UI can be installed. Like the DNS service, it is both a deployment and a service:

# kubectl get deployments --namespace=kube-system kubernetes-dashboard
NAME                   DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
kubernetes-dashboard   1         1         1            1           2m26s

[root@kube ~]# kubectl get services --namespace=kube-system kubernetes-dashboard
NAME                   TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)         AGE
kubernetes-dashboard   ClusterIP   10.108.158.128   <none>        443/TCP         2m36s

Run kubectl proxy to proxy the server on localhost:8001 and then access it in a web browser at http://localhost:8001/ui. If this is on a remote server, create a SSH tunnel.

Kubernetes Node

Also known as a Worker or Minion node which has a container runtime such as Docker.

The Kubelet is a daemon on each node that will start/stop/maintain application containers as directed by the Kubernetes Master (the orchestrator/scheduler/control plane).

The scheduler does this by checking a node's taint and will not schedule pods on nodes that contain things like node-role.kubernetes.io/master:NoSchedule. Attempting to do so will result in a FailedScheduling status and a message of 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate. when looking at pod events using kubectl describe pods pod-name.

Commands

The kubectl command line tool is the official kubernetes client for interacting with the Kubernetes API.

Description	Command
Get all nodes	# kubectl get nodes
Get all pods	# kubectl get pods --all-namespaces
Get information about a node	# kubectl describe nodes
See components in the cluster	# kubectl get componentstatuses

Tips: When using kubectl get, pass

--no-headers to remove headers for easier parsing
-o json|yaml to format output in json/yaml.

Description	Command
Create a pod	# kubectl run kuard --image=gcr.io/kuar-demo/kuard-amd64:1 # kubectl apply -f kuard-pod.yaml
Listing pods	# kubectl get pods ## Filter by label with -l, can supply multiple of these. # kubectl get pods -l label=something
Delete pod	# kubectl delete deployments/kuard # kubectl delete -f kuard-pod.yaml
Pod Details	# kubectl describe pods kuard
Pod Logs	# kubectl logs kuard
Enter a container	# kubectl exec kuard cmd # kubectl exec -it kuard sh
Copy to/from container	# kubectl cp podname:/src ./dst # kubectl cp ./src podname:/dst

A pod manifest looks something like this:

apiVersion: v1
kind: Pod
metadata:
  name: kuard
spec:
  containers:
    - image: gcr.io/kuar-demo/kuard-amd64:1
      name: kuard
      ports:
        - containerPort: 8080
          name: http
          protocol: TCP