Kubernetes is an open source container orchestration system originally developed by Google. By late 2017, there has been a widespread adoption of both containers and Kubernetes.
The name originates from the Greek word κυβερνήτης, meaning helmsman or pilot (Hence, their ship's wheel as their logo). The 8 letters in the middle is commonly replaced with '8' with the entire name stylized as 'k8s'.
Kubernetes is well loved by system administrators for making it easy to upgrade, install security patches, configuring networks, and making backups without needing to worry about the application level. Other features such as autoscaling and load balancing are built in to the core Kubernetes ecosystem.
Developers love it for making it extremely easy to deploy their applications as fully redundant deployments that can automatically scale up or down. Seamless upgrades can also be done and controlled by developers without needing to touch the infrastructure level.
There are many commercial products that build on top of the Kubernetes environment such as Red Hat OpenShift, or VMware PKS. If Kubernetes were the Linux kernel, these commercial products would be like the different Linux distros built on top of the kernel.
Eventually, the excitement around Kubernetes will probably fade. Similar to the Linux kernel, it will be part of the IT infrastructure plumbing and used like any other technology that currently exists.
There are certain use cases where Kubernetes isn't a good fit. For instance, database servers where each database node has some unique state and are not inter-changeable is not something that should be running under Kubernetes. Other lighter things that don't require the full Kubernetes stack might be better suited on Functions as a Service (FaaS) platforms. These so-called funtainers provide an even lighter way to run code that may be more convenient than containers. Alternatively, clusterless container services such as Amazon Fargate, or Azure Container Instances allow hosting of containers without necessarily needing to run a full Kubernetes stack.
Kubernetes is based around the concept of a cloud native system. Things in such a system should be:
- Automatable - applications are deployed and managed by machines
- Ubiquitous and flexible - Decoupling physical resources from the compute node. Containerized microservices can be moved from one node to another
- Resilient and Scalable - No single point of failure, distributed and highly available through redundancy and graceful degradation.
- Dynamic - an orchestrator should be able to schedule containers to take maximum advantage of available resources
- Observability - monitoring, logging, and tracing are all available
Kubernetes connects multiple servers into a cluster. A cluster is a combination of master nodes and worker nodes, each with a distinct set of components.
While there is a distinction between the two types of nodes in terms of where components are placed, tehre is no intrinsic difference between the two. In fact, you can run all the components on a single node such as in Minikube.
Master nodes form the control plane. It forms the brains of the cluster and runs all the necessary components to maintain the Kubernetes cluster including: Scheduling containers, managing services, serving the Kubernetes API, etc. These components can be run on a single master or split across multiple master nodes to ensure high availability.
The control plane should have the following components:
- kube-apiserver - the Control Plane's API, used by the user and by other components on the cluster
- etcd - the distributed key/value database to store cluster configuration
- kube-scheduler - Schedules on which worker node new pods are to be placed
- kube-controller-manager - Performs cluster-level functions such as replicating components, keeping track of worker nodes, interacting with cloud provider services such as load balancers, persistent disk volumes, etc.
To make the control plane resilient, multiple master nodes should be deployed so that the necessary services are still available if a node should go down. A failed control plane would result in the inability for the cluster to respond to any commands or be able to react to cluster changes or reschedule any workloads.
Each worker node in the cluster runs the actual workload that is deployed on the cluster and should have the following components:
- kubelet - Responsible for driving container runtimes and starting workloads that are scheduled on the node. It also monitors pod statuses.
- kube-proxy - Handles networking between pods by load-balancing network traffic between application components
- a container runtime - The application that handles the containers that are running. Typically Docker, or rkt, or CRI-O.
Pods running on the failed node will automatically be rescheduled elsewhere by the control plane. A well designed cloud application that has multiple replicas should not be impacted by the temporary outage of a single pod.
Failure testing should be done to ensure that applications are not affected by node outages. Automatic resilience testing tools such as Netflix's Chaos Monkey can help by randomly killing nodes, Pods, or network connectivity between nodes.
Installing Kubernetes is easy and there are many options available to help you get it set up.
Keep in mind however the amount of time and resources it will take to maintain a Kubernetes cluster. There are lots of things that can go wrong with a Kubernetes set up and maintaining such a system requires a significant amount of time and energy. Things to keep in mind on a self-hosted solution are:
- HA control plane and worker nodes
- Cluster set up securely? Patched? Container defaults set appropriately?
- Services in the cluster secure?
- Conformant to CNCF standards?
- Node configuration managed, or has it drifted?
- Data backed up? Persistent storage restore/backups?
It might be a better solution to go with a managed Kubernetes service such as from AWS or GCE.
If you do wish to go with a self-hosted solution, there are a few Kubernetes installers:
- Kubernetes the hard way
- Rancher Kubernetes Engine
- Puppet kubernetes module
Kubespray uses ansible to deploy a Kubernetes cluster.
Get Kubespray from https://github.com/kubernetes-sigs/kubespray
To get an application running as a service:
- Create a deployment which defines how many replicas should exist, the container to use, and exposed ports.
- This will create a Pod. Each replica creates an additional Pod (that ideally resides on different nodes).
kubectl get deploymentsto see deployments
kubectl get podsto see pods
- Create a service which defines the name of the service and the port the service is exposed on
kubectl get service
See the Kubernetes documentation at https://kubernetes.io/docs/concepts/
- A Deployment defines the desired state for pods and ReplicaSets.
- A Pod is a collection of container(s) all residing on one node.
- A ReplicaSet creates or destroys pods depending on scaling and the number of pods that are running. Actions are managed by a ReplicationController
- A Replication controller ensure that a specific number of pod replicas are running at any one time. It is a type of Controller.
- A reconciliation loop that ensures the desired state matches the actual cluster state
- A Service is an abstraction that defines the logical set of Pods. Think of it as a load balancer with a DNS name. As pods are added or removed, the service will update accordingly.
- An Ingress defines routes for HTTP requests comining in to the Kubernetes Ingress Controller. Specify a host and URI and its destination service.
- Labels are key/value pairs that Kubernetes attaches to any objects, such as pods, Replication Controllers, Endpoints, and so on.
- Annotations are key/value pairs used to store arbitrary non-queryable metadata.
- Secrets hold sensitive information such as passwords, TLS certificates, OAuth tokens, and ssh keys.
- ConfigMaps are mechanisms used to inject containers with configuration data while keeping containers agnostic of Kubernetes.
- Kubernetes Master
- A master node is responsible for maintaining the state of the Kubernetes cluster. It typically runs a kube-apiserver, kube-scheduler, kube-controller-manager, and etcd.
- Kubernetes Worker
- A worker node is responsible for running the actual applications managed by Kubernetes. It typically runs kubelet, kube-proxy, a container engine (docker, rkt), and other plugins for networking (flannel) or cloud integration (CNI)
Each concept will be covered in more detail below.
A pod is:
- A collection of application containers
- Guaranteed to be deployed on the same Kubernetes cluster node
- Shares the same cgroup, IP address, hostname (hence, all containers in a pod runs in the same execution environment)
A pod should provide one individual component of an application that can:
- Be scaled independently of all other components in the application (eg. Database with respect to the frontend web server)
- Work even if placed (ie. orchestrated) on a different machine
In general, the right question to ask yourself when designing Pods is, “Will these containers work correctly if they land on different machines?” If the answer is “no,” a Pod is the correct grouping for the containers. If the answer is “yes,” multiple Pods is probably the correct solution. In the example at the beginning of this chapter, the two containers interact via a local filesystem. It would be impossible for them to operate correctly if the containers were scheduled on different machines.—Thinking with Pods, Kubernetes: Up and Running
Pods are assigned a unique Pod IP address within the cluster. All containers inside a pod can reference each other via localhost. Containers outside a pod can only reference other containers in other containers using the Pod IP address or via a Service.
Like all other Kubernetes resources, Pods can be created using a YAML or JSON descriptor. When a Pod descriptor is applied via the
kubectl command, the object is stored in the cluster's
etcd server and is taken into effect immediately by the Kubernetes scheduler. Alternatively, a Pod can be created by invoking
kubectl run (eg.
kubectl run kuard --image=registry/something/something:tag) but this method of creating pods is not recommended. (TODO: explain why)
A pod definition can be in either YAML or JSON, typically YAML, and should contain the following sections:
- Kind: Specifies the kind of Kubernetes resource this manifest defines. In this case, a 'Pod'
- Metadata: Helpful data to uniquely identify the object. Eg. the Pod's Name, UID, Namespace
- Spec: Object data; for Pods, it should be an array of containers (including container image, name, ports, environment variables, resources, etc).
For more information on the spec, see https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md.
apiVersion: apps/v1 kind: Pod metadata: name: kuard spec: containers: - image: gcr.io/kuar-demo/kuard-amd64:1 name: kuard ports: - containerPort: 8080 name: http protocol: TCP env: - value: something resources: requests: cpu: 100m cpu: 100Mi
kubectl explain pods to see possible API object fields. Drill deeper with
kubectl explain pod.spec.
- Create using
kubectl apply -f pod-descriptor.yml
- See it using
kubectl get pods
- Detailed information about it using
kubectl describe pods pod-name
- Delete it using
kubectl delete pods/pod-nameor
kubectl delete -f pod-descriptor.yml
Pods that are set for deletion will cease to have new requests sent to it. After a 30 second termination grace period, the pods are then terminated. This extra time allows for the pod to reliably finish active requests.
Logs generated by all containers in the pod can be viewed with
kubectl logs <pod>. A specific container can be singled out using
kubectl logs <pod> -c <container name>. Container logs are rotated automatically daily or after the log file reaches 10MB.
A Controller ensures the cluster state matches the desired state using a reconciliation loop.
The following objects utilizes a Controller to perform the necessary updates to maintain the current state.
ReplicaSets are used if you want to scale your pod (that are stateless and interchangeable) a certain amount. You define the number of pod replicas that are running at a given time which is then enforced by the Replication Controller. Typically, ReplicaSets are used by Deployments as a mechanism to orchestrate pod creation, deletion, and updates.
DaemonSet are used to run a Pod on every Kubernetes Nodes in your cluster. As nodes are added or removed, Pods will be created or destroyed with it. Some use cases requiring DaemonSets include storage services (glusterd, ceph), monitoring, log collection, etc., on each node.
By default, DaemonSet deploys pods to all nodes in the cluster unless a NodeSelector property is set in the pod template. DaemonSet deploys pods on unschedulable nodes because a DaemonSet bypasses the Scheduler completely.
apiVersion: apps/v1beta2 kind: DaemonSet metadata: name: ssd-monitor spec: selector: matchLabels: app: ssd-monitor template: metadata: labels: app: ssd-monitor spec: nodeSelector: # Define your node selector here to exclude / include certain nodes disk: ssd containers: - name: main image: luksa/ssd-monitor
kubectl get daemonset or
kubectl get ds
StatefulSets manages Pods requiring a sticky identity. Typically, they are used for Pods that are not interchangeable and requires a persistent identifiers (ie. pod name and DNS name such as database-server-0) and stable storage (ie. a pod always has the same PersistentVolume attached).
Pod creation by default (
podManagementPolicy=OrderedReady) is ordered sequentially starting from
N-1 and conversely, scaling down or termination in reverse from
0. Alternatively, it can be set to start up in parallel (
podManagementPolicy=Parallel). Deletion of a StatefulSet may occur in any order.
Note that, the PersistentVolumes associated with the Pods’ PersistentVolume Claims are not deleted when the Pods, or StatefulSet are deleted. This must be done manually.
There may be other Controllers including:
- Job Controller, which runs a Pod as a job.
Deployments defines the desired state of Pods or ReplicaSets which are enforced by the Deployment Controller. A deployment can be versioned so that Kubernetes can let you rollout a new version with the ability to pause or even rollback the changes at a later time.
The manifest's spec should contain a template that contains the information about a new Pod.
Example Deployment manifest:
No code provided.
- Create using
kubectl apply -f frontend-deployment.yml
- See it using
kubectl get deployments
- Pods that this deployment creates can be seen using the label selector.
- This example manifest has 2 labels applied: app, and tier
- See it using
kubectl get pods -l app=guestbook -l tier=frontend
You can change the scale of a deployment:
[root@kube guestbook]# kubectl get deployment frontend NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE frontend 3 3 3 3 44m [root@kube guestbook]# kubectl scale deployment frontend --replicas=5 deployment.extensions/frontend scaled [root@kube guestbook]# kubectl get deployment frontend NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE frontend 5 5 5 5 45m [root@kube guestbook]# kubectl get pods NAME READY STATUS RESTARTS AGE frontend-654c699bc8-5ngzj 1/1 Running 0 45m frontend-654c699bc8-77b58 1/1 Running 0 45m frontend-654c699bc8-ll7cr 1/1 Running 0 22s frontend-654c699bc8-pqzvp 1/1 Running 0 22s frontend-654c699bc8-sq682 1/1 Running 0 45m
A Service provides an abstraction between the user and the underlying Pod or Pods providing the service. Since Pods are ephemeral, their IP addresses may change as they get created and destroyed and individual pods may go down; a Service provides applications with a name and load balancing and routing to maintain the service's availability.
A Service by default provides a single IP address for the set of pods (known as a ClusterIP) and are only accessible within the cluster. This can be changed so that a Service provides a load balanced port (LoadBalancer) or a port which is exposed on the node (NodePort).
An example manifest:
apiVersion: v1 kind: Service metadata: name: frontend labels: app: guestbook tier: frontend spec: # comment or delete the following line if you want to use a LoadBalancer type: NodePort # if your cluster supports it, uncomment the following to automatically create # an external load-balanced IP for the frontend service. # type: LoadBalancer ports: - port: 80 selector: app: guestbook tier: frontend
- Apply the Service using
kubectl apply -f frontend-service.yaml
- See Services using
kubectl get services
If you use a NodePort and the service looks like this:
[root@kube guestbook]# kubectl get service frontend NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE frontend NodePort 10.97.195.242 <none> 80:32764/TCP 10m
You can get to your service by accessing the Node on port 32764.
A namespace splits complex systems with many components into smaller distinct groups. It is also used to separate resources in a multi-tenant environment, or by splitting up resources between production, development, and QA environments.
Resource names must be unique to each namespace. Certain cluster-level resources are not namespaced, such as the Node resource.
Namespaces are "hidden" from each other, but they are not fully isolated by default (depending on what network solution you deployed in the cluster). A service in one Namespace can talk to a service in another Namespace using the DNS name
<Service Name>.<Namespace Name>.svc.cluster.local. Since the default search domain is
svc.cluster.local, a lookup to
<Service Name>.<Namespace Name> would resolve as well.
kubectl get namespaces
Namespaces must contain only letters, digits, and dashes. To create a namespace:
kubectl create namespace xyz
apiVersion: v1 kind: Namespace metadata: name: xyz
Objects cannot see other objects in different namespaces (this applies even to the default namespace). Secrets in a different namespace will not be visible or accessible.
Kubernetes users can be assigned permissions to specific namespaces which can be used to limit a user's access on a shared multi-tenant cluster.
Contexts are like a profile. A context can have different default namespace, or user credentials to manage different clusters.
Change the current context using
kubectl config use-context my-context
~/.kube/config. This file contains credentials to authenticate to the cluster.
It contains the default namespace and context values.
The Kubernetes API is a RESTful API, providing access to the Kubernetes backend.
Objects in the Kubernetes API are represented as JSON or Yaml files. Files can be used to create, update, or delete objects from the server.
# kubectl apply -f obj.yaml
# kubectl edit <resource-name> <object-name>
# kubectl delete -f obj.yaml ## or # kubectl delete <resource-name> <object-name>
All objects can be annotated or given a label.
|Label pod 'bar'
# kubectl label pods bar color=red ## pass --overwrite if it already exists.
# kubectl label pods bar -color
Kubernetes Master Node
A Kubernetes master node contains containers that provide the API server, scheduler, etc. that manages the cluster.
The master node should have the following components:
controller-manager: Responsible for running controllers that regulate behavior int he cluster. Eg. ensure replicas for a service are available and healthy.
scheduler: Places pods into different nodes in the cluster
etcd: storage for cluster; stores API objects.
All components deployed by Kubernetes run under the kube-system namespace.
# kubectl describe nodes kube ... Non-terminated Pods: (8 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits --------- ---- ------------ ---------- --------------- ------------- kube-system coredns-576cbf47c7-6mphw 100m (2%) 0 (0%) 70Mi (0%) 170Mi (2%) kube-system coredns-576cbf47c7-75n6g 100m (2%) 0 (0%) 70Mi (0%) 170Mi (2%) kube-system etcd-kube 0 (0%) 0 (0%) 0 (0%) 0 (0%) kube-system kube-apiserver-kube 250m (6%) 0 (0%) 0 (0%) 0 (0%) kube-system kube-controller-manager-kube 200m (5%) 0 (0%) 0 (0%) 0 (0%) kube-system kube-proxy-cmdsn 0 (0%) 0 (0%) 0 (0%) 0 (0%) kube-system kube-scheduler-kube 100m (2%) 0 (0%) 0 (0%) 0 (0%) kube-system weave-net-swwgs 20m (0%) 0 (0%) 0 (0%) 0 (0%) ...
The Kubernetes proxy is responsible for routing network traffic to services in the kubernetes cluster. (Question: Does it do the load balancing?). A proxy exists on every node.
# kubectl get daemonsets --namespace=kube-system NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE kube-proxy 1 1 1 1 1 <none> 26h weave-net 1 1 1 1 1 <none> 28m
Question: What is a DaemonSet?
Kubernetes also runs a DNS server that provides naming and discovery for services in the cluster.
# kubectl get deployments --namespace=kube-system NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE coredns 2 2 2 2 26h # kubectl get services --namespace=kube-system NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 26h
The DNS service for the cluster runs on 10.96.0.10. If you log into a container in the cluster, this server will be used as the primary DNS server.
Kubernetes Dashboard UI can be installed. Like the DNS service, it is both a deployment and a service:
# kubectl get deployments --namespace=kube-system kubernetes-dashboard NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE kubernetes-dashboard 1 1 1 1 2m26s [root@kube ~]# kubectl get services --namespace=kube-system kubernetes-dashboard NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes-dashboard ClusterIP 10.108.158.128 <none> 443/TCP 2m36s
kubectl proxy to proxy the server on
localhost:8001 and then access it in a web browser at
http://localhost:8001/ui. If this is on a remote server, create a SSH tunnel.
Also known as a Worker or Minion node which has a container runtime such as Docker.
The Kubelet is a daemon on each node that will start/stop/maintain application containers as directed by the Kubernetes Master (the orchestrator/scheduler/control plane).
The scheduler does this by checking a node's taint and will not schedule pods on nodes that contain things like
node-role.kubernetes.io/master:NoSchedule. Attempting to do so will result in a
FailedScheduling status and a message of
0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate. when looking at pod events using
kubectl describe pods pod-name.
kubectl command line tool is the official kubernetes client for interacting with the Kubernetes API.
|Get all nodes|
# kubectl get nodes
|Get all pods|
# kubectl get pods --all-namespaces
|Get information about a node|
# kubectl describe nodes
|See components in the cluster|
# kubectl get componentstatuses
kubectl get, pass
--no-headersto remove headers for easier parsing
-o json|yamlto format output in json/yaml.
|Create a pod|
# kubectl run kuard --image=gcr.io/kuar-demo/kuard-amd64:1 # kubectl apply -f kuard-pod.yaml
# kubectl get pods ## Filter by label with -l, can supply multiple of these. # kubectl get pods -l label=something
# kubectl delete deployments/kuard # kubectl delete -f kuard-pod.yaml
# kubectl describe pods kuard
# kubectl logs kuard
|Enter a container|
# kubectl exec kuard cmd # kubectl exec -it kuard sh
|Copy to/from container|
# kubectl cp podname:/src ./dst # kubectl cp ./src podname:/dst
A pod manifest looks something like this:
apiVersion: v1 kind: Pod metadata: name: kuard spec: containers: - image: gcr.io/kuar-demo/kuard-amd64:1 name: kuard ports: - containerPort: 8080 name: http protocol: TCP
- Command cheat sheet
- Kubernetes: Up & Running
- CouchDB failover demo: https://blog.couchbase.com/databases-on-kubernetes/
- A Kubernetes Guide
- https://kubernetes.io/docs/tutorials/stateless-application/guestbook/ Simple overview on deploying a guestbook application
MySQL on Kubernetes https://www.youtube.com/watch?v=J7h0F34iBx0Not exactly beginner friendly. Goes over the MySQLOperator and Vitess
mirantis.com's blog has some good information.