Kubernetes
Kubernetes is an open source container orchestration system originally developed by Google. By late 2017, there has been a widespread adoption of both containers and Kubernetes.
The name originates from the Greek word κυβερνήτης, meaning helmsman or pilot (Hence, their ship's wheel as their logo). The 8 letters in the middle is commonly replaced with '8' with the entire name stylized as 'k8s'.
Kubernetes is well loved by system administrators for making it easy to upgrade, install security patches, configuring networks, and making backups without needing to worry about the application level. Other features such as autoscaling and load balancing are built in to the core Kubernetes ecosystem.
Developers love it for making it extremely easy to deploy their applications as fully redundant deployments that can automatically scale up or down. Seamless upgrades can also be done and controlled by developers without needing to touch the infrastructure level.
There are many commercial products that build on top of the Kubernetes environment such as Red Hat OpenShift, or VMware PKS. If Kubernetes were the Linux kernel, these commercial products would be like the different Linux distros built on top of the kernel.
Eventually, the excitement around Kubernetes will probably fade. Similar to the Linux kernel, it will be part of the IT infrastructure plumbing and used like any other technology that currently exists.
There are certain use cases where Kubernetes isn't a good fit. For instance, database servers where each database node has some unique state and are not inter-changeable is not something that should be running under Kubernetes. Other lighter things that don't require the full Kubernetes stack might be better suited on Functions as a Service (FaaS) platforms. These so-called funtainers provide an even lighter way to run code that may be more convenient than containers. Alternatively, clusterless container services such as Amazon Fargate, or Azure Container Instances allow hosting of containers without necessarily needing to run a full Kubernetes stack.
Cloud Native
Kubernetes is based around the concept of a cloud native system. Things in such a system should be:
- Automatable - applications are deployed and managed by machines
- Ubiquitous and flexible - Decoupling physical resources from the compute node. Containerized microservices can be moved from one node to another
- Resilient and Scalable - No single point of failure, distributed and highly available through redundancy and graceful degradation.
- Dynamic - an orchestrator should be able to schedule containers to take maximum advantage of available resources
- Observability - monitoring, logging, and tracing are all available
Cluster Architecture
Kubernetes connects multiple servers into a cluster. A cluster is a combination of master nodes and worker nodes, each with a distinct set of components.
While there is a distinction between the two types of nodes in terms of where components are placed, tehre is no intrinsic difference between the two. In fact, you can run all the components on a single node such as in Minikube.
Master Nodes
Master nodes form the control plane. It forms the brains of the cluster and runs all the necessary components to maintain the Kubernetes cluster including: Scheduling containers, managing services, serving the Kubernetes API, etc. These components can be run on a single master or split across multiple master nodes to ensure high availability.
The control plane should have the following components:
- kube-apiserver - the Control Plane's API, used by the user and by other components on the cluster
- etcd - the distributed key/value database to store cluster configuration
- kube-scheduler - Schedules on which worker node new pods are to be placed
- kube-controller-manager - Performs cluster-level functions such as replicating components, keeping track of worker nodes, interacting with cloud provider services such as load balancers, persistent disk volumes, etc.
To make the control plane resilient, multiple master nodes should be deployed so that the necessary services are still available if a node should go down. A failed control plane would result in the inability for the cluster to respond to any commands or be able to react to cluster changes or reschedule any workloads.
Worker Nodes
Each worker node in the cluster runs the actual workload that is deployed on the cluster and should have the following components:
- kubelet - Responsible for driving container runtimes and starting workloads that are scheduled on the node. It also monitors pod statuses.
- kube-proxy - Handles networking between pods by load-balancing network traffic between application components
- a container runtime - The application that handles the containers that are running. Typically Docker, or rkt, or CRI-O.
Pods running on the failed node will automatically be rescheduled elsewhere by the control plane. A well designed cloud application that has multiple replicas should not be impacted by the temporary outage of a single pod.
Failure testing should be done to ensure that applications are not affected by node outages. Automatic resilience testing tools such as Netflix's Chaos Monkey can help by randomly killing nodes, Pods, or network connectivity between nodes.
Installation
Installing Kubernetes is easy and there are many options available to help you get it set up.
Keep in mind however the amount of time and resources it will take to maintain a Kubernetes cluster. There are lots of things that can go wrong with a Kubernetes set up and maintaining such a system requires a significant amount of time and energy. Things to keep in mind on a self-hosted solution are:
- HA control plane and worker nodes
- Cluster set up securely? Patched? Container defaults set appropriately?
- Services in the cluster secure?
- Conformant to CNCF standards?
- Node configuration managed, or has it drifted?
- Data backed up? Persistent storage restore/backups?
- Monitoring?
It might be a better solution to go with a managed Kubernetes service such as from AWS or GCE.
If you do wish to go with a self-hosted solution, there are a few Kubernetes installers:
- kops
- kubespray
- TK8
- Kubernetes the hard way
- kubeadm
- tarmak
- Rancher Kubernetes Engine
- Puppet kubernetes module
- kubeformation
kubespray
Kubespray uses ansible to deploy a Kubernetes cluster.
Get Kubespray from https://github.com/kubernetes-sigs/kubespray
Quick Start
To get an application running as a service:
- Create a deployment which defines how many replicas should exist, the container to use, and exposed ports.
- This will create a Pod. Each replica creates an additional Pod (that ideally resides on different nodes).
kubectl get deployments
to see deploymentskubectl get pods
to see pods
- Create a service which defines the name of the service and the port the service is exposed on
kubectl get service
Concepts
See the Kubernetes documentation at https://kubernetes.io/docs/concepts/
- Deployment
- A Deployment defines the desired state for pods and ReplicaSets.
- Pod
- A Pod is a collection of container(s) all residing on one node.
- ReplicaSet
- A ReplicaSet creates or destroys pods depending on scaling and the number of pods that are running. Actions are managed by a ReplicationController
- ReplicationController
- A Replication controller ensure that a specific number of pod replicas are running at any one time. It is a type of Controller.
- Controller
- A reconciliation loop that ensures the desired state matches the actual cluster state
- Service
- A Service is an abstraction that defines the logical set of Pods. Think of it as a load balancer with a DNS name. As pods are added or removed, the service will update accordingly.
- Ingress
- An Ingress defines routes for HTTP requests comining in to the Kubernetes Ingress Controller. Specify a host and URI and its destination service.
- Labels
- Labels are key/value pairs that Kubernetes attaches to any objects, such as pods, Replication Controllers, Endpoints, and so on.
- Annotations
- Annotations are key/value pairs used to store arbitrary non-queryable metadata.
- Secrets
- Secrets hold sensitive information such as passwords, TLS certificates, OAuth tokens, and ssh keys.
- ConfigMap
- ConfigMaps are mechanisms used to inject containers with configuration data while keeping containers agnostic of Kubernetes.
- Kubernetes Master
- A master node is responsible for maintaining the state of the Kubernetes cluster. It typically runs a kube-apiserver, kube-scheduler, kube-controller-manager, and etcd.
- Kubernetes Worker
- A worker node is responsible for running the actual applications managed by Kubernetes. It typically runs kubelet, kube-proxy, a container engine (docker, rkt), and other plugins for networking (flannel) or cloud integration (CNI)
Each concept will be covered in more detail below.
Pods
A pod is:
- A collection of application containers
- Guaranteed to be deployed on the same Kubernetes cluster node
- Shares the same cgroup, IP address, hostname (hence, all containers in a pod runs in the same execution environment)
A pod should provide one individual component of an application that can:
- Be scaled independently of all other components in the application (eg. Database with respect to the frontend web server)
- Work even if placed (ie. orchestrated) on a different machine
In general, the right question to ask yourself when designing Pods is, “Will these containers work correctly if they land on different machines?” If the answer is “no,” a Pod is the correct grouping for the containers. If the answer is “yes,” multiple Pods is probably the correct solution. In the example at the beginning of this chapter, the two containers interact via a local filesystem. It would be impossible for them to operate correctly if the containers were scheduled on different machines.—Thinking with Pods, Kubernetes: Up and Running
Pods are assigned a unique Pod IP address within the cluster. All containers inside a pod can reference each other via localhost. Containers outside a pod can only reference other containers in other containers using the Pod IP address or via a Service.
Like all other Kubernetes resources, Pods can be created using a YAML or JSON descriptor. When a Pod descriptor is applied via the kubectl
command, the object is stored in the cluster's etcd
server and is taken into effect immediately by the Kubernetes scheduler. Alternatively, a Pod can be created by invoking kubectl run
(eg. kubectl run kuard --image=registry/something/something:tag
) but this method of creating pods is not recommended. (TODO: explain why)
A pod definition can be in either YAML or JSON, typically YAML, and should contain the following sections:
- Kind: Specifies the kind of Kubernetes resource this manifest defines. In this case, a 'Pod'
- Metadata: Helpful data to uniquely identify the object. Eg. the Pod's Name, UID, Namespace
- Spec: Object data; for Pods, it should be an array of containers (including container image, name, ports, environment variables, resources, etc).
For more information on the spec, see https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md.
apiVersion: apps/v1
kind: Pod
metadata:
name: kuard
spec:
containers:
- image: gcr.io/kuar-demo/kuard-amd64:1
name: kuard
ports:
- containerPort: 8080
name: http
protocol: TCP
env:
- value: something
resources:
requests:
cpu: 100m
cpu: 100Mi
Use kubectl explain pods
to see possible API object fields. Drill deeper with kubectl explain pod.spec
.
- Create using
kubectl apply -f pod-descriptor.yml
- See it using
kubectl get pods
- Detailed information about it using
kubectl describe pods pod-name
- Delete it using
kubectl delete pods/pod-name
orkubectl delete -f pod-descriptor.yml
Pods that are set for deletion will cease to have new requests sent to it. After a 30 second termination grace period, the pods are then terminated. This extra time allows for the pod to reliably finish active requests.
Logs generated by all containers in the pod can be viewed with kubectl logs <pod>
. A specific container can be singled out using kubectl logs <pod> -c <container name>
. Container logs are rotated automatically daily or after the log file reaches 10MB.
Controllers
A Controller ensures the cluster state matches the desired state using a reconciliation loop.
The following objects utilizes a Controller to perform the necessary updates to maintain the current state.
ReplicaSet
See: https://kubernetes.io/docs/concepts/workloads/controllers/replicaset/
ReplicaSets are used if you want to scale your pod (that are stateless and interchangeable) a certain amount. You define the number of pod replicas that are running at a given time which is then enforced by the Replication Controller. Typically, ReplicaSets are used by Deployments as a mechanism to orchestrate pod creation, deletion, and updates.
DaemonSet
See: https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/
DaemonSet are used to run a Pod on every Kubernetes Nodes in your cluster. As nodes are added or removed, Pods will be created or destroyed with it. Some use cases requiring DaemonSets include storage services (glusterd, ceph), monitoring, log collection, etc., on each node.
By default, DaemonSet deploys pods to all nodes in the cluster unless a NodeSelector property is set in the pod template. DaemonSet deploys pods on unschedulable nodes because a DaemonSet bypasses the Scheduler completely.
apiVersion: apps/v1beta2
kind: DaemonSet
metadata:
name: ssd-monitor
spec:
selector:
matchLabels:
app: ssd-monitor
template:
metadata:
labels:
app: ssd-monitor
spec:
nodeSelector: # Define your node selector here to exclude / include certain nodes
disk: ssd
containers:
- name: main
image: luksa/ssd-monitor
kubectl get daemonset
or kubectl get ds
StatefulSets
See Also: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/
StatefulSets manages Pods requiring a sticky identity. Typically, they are used for Pods that are not interchangeable and requires a persistent identifiers (ie. pod name and DNS name such as database-server-0) and stable storage (ie. a pod always has the same PersistentVolume attached).
Pod creation by default (podManagementPolicy=OrderedReady
) is ordered sequentially starting from 0
to N-1
and conversely, scaling down or termination in reverse from N-1
to 0
. Alternatively, it can be set to start up in parallel (podManagementPolicy=Parallel
). Deletion of a StatefulSet may occur in any order.
Note that, the PersistentVolumes associated with the Pods’ PersistentVolume Claims are not deleted when the Pods, or StatefulSet are deleted. This must be done manually.
Others
There may be other Controllers including:
- Job Controller, which runs a Pod as a job.
Deployments
See: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/
Deployments defines the desired state of Pods or ReplicaSets which are enforced by the Deployment Controller. A deployment can be versioned so that Kubernetes can let you rollout a new version with the ability to pause or even rollback the changes at a later time.
The manifest's spec should contain a template that contains the information about a new Pod.
Example Deployment manifest:
No code provided.
- Create using
kubectl apply -f frontend-deployment.yml
- See it using
kubectl get deployments
- Pods that this deployment creates can be seen using the label selector.
- This example manifest has 2 labels applied: app, and tier
- See it using
kubectl get pods -l app=guestbook -l tier=frontend
You can change the scale of a deployment:
[root@kube guestbook]# kubectl get deployment frontend
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
frontend 3 3 3 3 44m
[root@kube guestbook]# kubectl scale deployment frontend --replicas=5
deployment.extensions/frontend scaled
[root@kube guestbook]# kubectl get deployment frontend
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
frontend 5 5 5 5 45m
[root@kube guestbook]# kubectl get pods
NAME READY STATUS RESTARTS AGE
frontend-654c699bc8-5ngzj 1/1 Running 0 45m
frontend-654c699bc8-77b58 1/1 Running 0 45m
frontend-654c699bc8-ll7cr 1/1 Running 0 22s
frontend-654c699bc8-pqzvp 1/1 Running 0 22s
frontend-654c699bc8-sq682 1/1 Running 0 45m
Service
A Service provides an abstraction between the user and the underlying Pod or Pods providing the service. Since Pods are ephemeral, their IP addresses may change as they get created and destroyed and individual pods may go down; a Service provides applications with a name and load balancing and routing to maintain the service's availability.
A Service by default provides a single IP address for the set of pods (known as a ClusterIP) and are only accessible within the cluster. This can be changed so that a Service provides a load balanced port (LoadBalancer) or a port which is exposed on the node (NodePort).
An example manifest:
apiVersion: v1
kind: Service
metadata:
name: frontend
labels:
app: guestbook
tier: frontend
spec:
# comment or delete the following line if you want to use a LoadBalancer
type: NodePort
# if your cluster supports it, uncomment the following to automatically create
# an external load-balanced IP for the frontend service.
# type: LoadBalancer
ports:
- port: 80
selector:
app: guestbook
tier: frontend
- Apply the Service using
kubectl apply -f frontend-service.yaml
- See Services using
kubectl get services
If you use a NodePort and the service looks like this:
[root@kube guestbook]# kubectl get service frontend
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
frontend NodePort 10.97.195.242 <none> 80:32764/TCP 10m
You can get to your service by accessing the Node on port 32764.
Namespaces
A namespace splits complex systems with many components into smaller distinct groups. It is also used to separate resources in a multi-tenant environment, or by splitting up resources between production, development, and QA environments.
Resource names must be unique to each namespace. Certain cluster-level resources are not namespaced, such as the Node resource.
Namespaces are "hidden" from each other, but they are not fully isolated by default (depending on what network solution you deployed in the cluster). A service in one Namespace can talk to a service in another Namespace using the DNS name <Service Name>.<Namespace Name>.svc.cluster.local
. Since the default search domain is svc.cluster.local
, a lookup to <Service Name>.<Namespace Name>
would resolve as well.
kubectl get namespaces
Namespaces must contain only letters, digits, and dashes. To create a namespace:
kubectl create namespace xyz
apiVersion: v1
kind: Namespace
metadata:
name: xyz
Objects cannot see other objects in different namespaces (this applies even to the default namespace). Secrets in a different namespace will not be visible or accessible.
Kubernetes users can be assigned permissions to specific namespaces which can be used to limit a user's access on a shared multi-tenant cluster.
Contexts
Contexts are like a profile. A context can have different default namespace, or user credentials to manage different clusters.
Change the current context using kubectl config use-context my-context
Config File
Located in ~/.kube/config
. This file contains credentials to authenticate to the cluster.
It contains the default namespace and context values.
Kubernetes API
The Kubernetes API is a RESTful API, providing access to the Kubernetes backend.
Objects in the Kubernetes API are represented as JSON or Yaml files. Files can be used to create, update, or delete objects from the server.
Description | Command |
---|---|
Create/Update | # kubectl apply -f obj.yaml
|
Edit | # kubectl edit <resource-name> <object-name>
|
Delete | # kubectl delete -f obj.yaml
## or
# kubectl delete <resource-name> <object-name>
|
All objects can be annotated or given a label.
Description | Command |
---|---|
Label pod 'bar' color=red |
# kubectl label pods bar color=red
## pass --overwrite if it already exists.
|
Remove label color from pod 'bar' |
# kubectl label pods bar -color
|
Kubernetes Master Node
A Kubernetes master node contains containers that provide the API server, scheduler, etc. that manages the cluster.
The master node should have the following components:
controller-manager
: Responsible for running controllers that regulate behavior int he cluster. Eg. ensure replicas for a service are available and healthy.scheduler
: Places pods into different nodes in the clusteretcd
: storage for cluster; stores API objects.
All components deployed by Kubernetes run under the kube-system namespace.
# kubectl describe nodes kube
...
Non-terminated Pods: (8 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
kube-system coredns-576cbf47c7-6mphw 100m (2%) 0 (0%) 70Mi (0%) 170Mi (2%)
kube-system coredns-576cbf47c7-75n6g 100m (2%) 0 (0%) 70Mi (0%) 170Mi (2%)
kube-system etcd-kube 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-apiserver-kube 250m (6%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-controller-manager-kube 200m (5%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-proxy-cmdsn 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-scheduler-kube 100m (2%) 0 (0%) 0 (0%) 0 (0%)
kube-system weave-net-swwgs 20m (0%) 0 (0%) 0 (0%) 0 (0%)
...
The Kubernetes proxy is responsible for routing network traffic to services in the kubernetes cluster. (Question: Does it do the load balancing?). A proxy exists on every node.
# kubectl get daemonsets --namespace=kube-system
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-proxy 1 1 1 1 1 <none> 26h
weave-net 1 1 1 1 1 <none> 28m
Question: What is a DaemonSet?
Kubernetes also runs a DNS server that provides naming and discovery for services in the cluster.
# kubectl get deployments --namespace=kube-system
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
coredns 2 2 2 2 26h
# kubectl get services --namespace=kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 26h
The DNS service for the cluster runs on 10.96.0.10. If you log into a container in the cluster, this server will be used as the primary DNS server.
Kubernetes Dashboard UI can be installed. Like the DNS service, it is both a deployment and a service:
# kubectl get deployments --namespace=kube-system kubernetes-dashboard
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
kubernetes-dashboard 1 1 1 1 2m26s
[root@kube ~]# kubectl get services --namespace=kube-system kubernetes-dashboard
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes-dashboard ClusterIP 10.108.158.128 <none> 443/TCP 2m36s
Run kubectl proxy
to proxy the server on localhost:8001
and then access it in a web browser at http://localhost:8001/ui
. If this is on a remote server, create a SSH tunnel.
Kubernetes Node
Also known as a Worker or Minion node which has a container runtime such as Docker.
The Kubelet is a daemon on each node that will start/stop/maintain application containers as directed by the Kubernetes Master (the orchestrator/scheduler/control plane).
The scheduler does this by checking a node's taint and will not schedule pods on nodes that contain things like node-role.kubernetes.io/master:NoSchedule
. Attempting to do so will result in a FailedScheduling
status and a message of 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
when looking at pod events using kubectl describe pods pod-name
.
Commands
The kubectl
command line tool is the official kubernetes client for interacting with the Kubernetes API.
Description | Command |
---|---|
Get all nodes | # kubectl get nodes
|
Get all pods | # kubectl get pods --all-namespaces
|
Get information about a node | # kubectl describe nodes
|
See components in the cluster | # kubectl get componentstatuses
|
Tips:
When using kubectl get
, pass
--no-headers
to remove headers for easier parsing-o json|yaml
to format output in json/yaml.
Description | Command |
---|---|
Create a pod | # kubectl run kuard --image=gcr.io/kuar-demo/kuard-amd64:1
# kubectl apply -f kuard-pod.yaml
|
Listing pods | # kubectl get pods
## Filter by label with -l, can supply multiple of these.
# kubectl get pods -l label=something
|
Delete pod | # kubectl delete deployments/kuard
# kubectl delete -f kuard-pod.yaml
|
Pod Details | # kubectl describe pods kuard
|
Pod Logs | # kubectl logs kuard
|
Enter a container | # kubectl exec kuard cmd
# kubectl exec -it kuard sh
|
Copy to/from container | # kubectl cp podname:/src ./dst
# kubectl cp ./src podname:/dst
|
A pod manifest looks something like this:
apiVersion: v1
kind: Pod
metadata:
name: kuard
spec:
containers:
- image: gcr.io/kuar-demo/kuard-amd64:1
name: kuard
ports:
- containerPort: 8080
name: http
protocol: TCP
See Also
- Command cheat sheet
- Kubernetes: Up & Running
- CouchDB failover demo: https://blog.couchbase.com/databases-on-kubernetes/
- A Kubernetes Guide
- https://kubernetes.io/docs/tutorials/stateless-application/guestbook/ Simple overview on deploying a guestbook application
MySQL on Kubernetes https://www.youtube.com/watch?v=J7h0F34iBx0Not exactly beginner friendly. Goes over the MySQLOperator and Vitess
Additional Reading
mirantis.com's blog has some good information.