Let’s talk about Kubernetes and it’s use-cases

Durvesh Palkar
12 min readMar 12, 2021

In this blog, I will provide an introduction to Kubernetes so that you can understand the motivation behind the tool, what it is, and how you can use it. We’ll also study some usecases of kubernetes, which will help us understand how kubernetes is solving industry challenges.

You’ll be able to understand this blog way better if you’re familiar with containers. Read my previous blog, in which I’ve talked about Containers.

✒ WHAT IS KUBERNETES?

Kubernetes is a powerful open-source system, initially developed by Google, for managing containerized applications in a clustered environment. It aims to provide better ways of managing related, distributed components and services across varied infrastructure.

You will also see Kubernetes referred to by its numeronym: k8s. It means the same thing, just easier to type.

K8S, at its basic level, is a system for running and coordinating containerized applications across a cluster of machines. It is a platform designed to completely manage the life cycle of containerized applications and services using methods that provide predictability, scalability, and high availability.

As a Kubernetes user, you can define how your applications should run and the ways they should be able to interact with other applications or the outside world. You can scale your services up or down, perform graceful rolling updates, and switch traffic between different versions of your applications to test features or rollback problematic deployments. Kubernetes provides interfaces and composable platform primitives that allow you to define and manage your applications with high degrees of flexibility, power, and reliability.

✒ Kubernetes Components and Architecture

Kubernetes follows a client-server architecture. It’s possible to have a multi-master setup (for high availability), but by default there is a single master server which acts as a controlling node and point of contact. The master server consists of various components including a kube-apiserver, an etcd storage, a kube-controller-manager, a cloud-controller-manager, a kube-scheduler, and a DNS server for Kubernetes services. Node components include kubelet and kube-proxy on top of Docker.

Master Components

Below are the main components found on the master node:

  • etcd cluster: a simple, distributed key value storage which is used to store the Kubernetes cluster data (such as number of pods, their state, namespace, etc), API objects and service discovery details. It is only accessible from the API server for security reasons. etcd enables notifications to the cluster about configuration changes with the help of watchers. Notifications are API requests on each etcd cluster node to trigger the update of information in the node’s storage.
  • kube-apiserver: Kubernetes API server is the central management entity that receives all REST requests for modifications (to pods, services, replication sets/controllers and others), serving as frontend to the cluster. Also, this is the only component that communicates with the etcd cluster, making sure data is stored in etcd and is in agreement with the service details of the deployed pods.
  • kube-controller-manager: runs a number of distinct controller processes in the background (for example, replication controller controls number of replicas in a pod, endpoints controller populates endpoint objects like services and pods, and others) to regulate the shared state of the cluster and perform routine tasks. When a change in a service configuration occurs (for example, replacing the image from which the pods are running, or changing parameters in the configuration yaml file), the controller spots the change and starts working towards the new desired state.
  • cloud-controller-manager: is responsible for managing controller processes with dependencies on the underlying cloud provider (if applicable). For example, when a controller needs to check if a node was terminated or set up routes, load balancers or volumes in the cloud infrastructure, all that is handled by the cloud-controller-manager.
  • kube-scheduler: helps schedule the pods (a co-located group of containers inside which our application processes are running) on the various nodes based on resource utilization. It reads the service’s operational requirements and schedules it on the best fit node. For example, if the application needs 1GB of memory and 2 CPU cores, then the pods for that application will be scheduled on a node with at least those resources. The scheduler runs each time there is a need to schedule pods. The scheduler must know the total resources available as well as resources allocated to existing workloads on each node.

Node (worker) components

Below are the main components found on a (worker) node:

  • kubelet: the main service on a node, regularly taking in new or modified pod specifications (primarily through the kube-apiserver) and ensuring that pods and their containers are healthy and running in the desired state. This component also reports to the master on the health of the host where it is running.
  • kube-proxy: a proxy service that runs on each worker node to deal with individual host subnetting and expose services to the external world. It performs request forwarding to the correct pods/containers across the various isolated networks in a cluster.

Kubectl

kubectl command is a line tool that interacts with kube-apiserver and send commands to the master node. Each command is converted into an API call.

Kubernetes Concepts

Making use of Kubernetes requires understanding the different abstractions it uses to represent the state of the system, such as services, pods, volumes, namespaces, and deployments.

  • Pod: generally refers to one or more containers that should be controlled as a single application. A pod encapsulates application containers, storage resources, a unique network ID and other configuration on how to run the containers.
  • Service: pods are volatile, that is Kubernetes does not guarantee a given physical pod will be kept alive (for instance, the replication controller might kill and start a new set of pods). Instead, a service represents a logical set of pods and acts as a gateway, allowing (client) pods to send requests to the service without needing to keep track of which physical pods actually make up the service.
  • Volume: similar to a container volume in Docker, but a Kubernetes volume applies to a whole pod and is mounted on all containers in the pod. Kubernetes guarantees data is preserved across container restarts. The volume will be removed only when the pod gets destroyed. Also, a pod can have multiple volumes (possibly of different types) associated.
  • Namespace: a virtual cluster (a single physical cluster can run multiple virtual ones) intended for environments with many users spread across multiple teams or projects, for isolation of concerns. Resources inside a namespace must be unique and cannot access resources in a different namespace. Also, a namespace can be allocated a resource quota to avoid consuming more than its share of the physical cluster’s overall resources.
  • Deployment: describes the desired state of a pod or a replica set, in a yaml file. The deployment controller then gradually updates the environment (for example, creating or deleting replicas) until the current state matches the desired state specified in the deployment file. For example, if the yaml file defines 2 replicas for a pod but only one is currently running, an extra one will get created. Note that replicas managed via a deployment should not be manipulated directly, only via new deployments.

✒ KUBERNETES CASE STUDIES

A. CASE STUDY : Spotify

Spotify: An Early Adopter of Containers, Spotify Is Migrating from Homegrown Orchestration to Kubernetes

Challenge

Launched in 2008, the audio-streaming platform has grown to over 200 million monthly active users across the world. “Our goal is to empower creators and enable a really immersive listening experience for all of the consumers that we have today — and hopefully the consumers we’ll have in the future,” says Jai Chakrabarti, Director of Engineering, Infrastructure and Operations. An early adopter of microservices and Docker, Spotify had containerized microservices running across its fleet of VMs with a homegrown container orchestration system called Helios. By late 2017, it became clear that “having a small team working on the features was just not as efficient as adopting something that was supported by a much bigger community,” he says.

Solution

“We saw the amazing community that had grown up around Kubernetes, and we wanted to be part of that,” says Chakrabarti. Kubernetes was more feature-rich than Helios. Plus, “we wanted to benefit from added velocity and reduced cost, and also align with the rest of the industry on best practices and tools.” At the same time, the team wanted to contribute its expertise and influence in the flourishing Kubernetes community. The migration, which would happen in parallel with Helios running, could go smoothly because “Kubernetes fit very nicely as a complement and now as a replacement to Helios,” says Chakrabarti.

Impact

The team spent much of 2018 addressing the core technology issues required for a migration, which started late that year and is a big focus for 2019. “A small percentage of our fleet has been migrated to Kubernetes, and some of the things that we’ve heard from our internal teams are that they have less of a need to focus on manual capacity provisioning and more time to focus on delivering features for Spotify,” says Chakrabarti. The biggest service currently running on Kubernetes takes about 10 million requests per second as an aggregate service and benefits greatly from autoscaling, says Site Reliability Engineer James Wen. Plus, he adds, “Before, teams would have to wait for an hour to create a new service and get an operational host to run it in production, but with Kubernetes, they can do that on the order of seconds and minutes.” In addition, with Kubernetes’s bin-packing and multi-tenancy capabilities, CPU utilization has improved on average two- to threefold.

An early adopter of microservices and Docker, Spotify had containerized microservices running across its fleet of VMs since 2014. The company used an open source, homegrown container orchestration system called Helios, and in 2016–17 completed a migration from on premise data centers to Google Cloud. Underpinning these decisions, “We have a culture around autonomous teams, over 200 autonomous engineering squads who are working on different pieces of the pie, and they need to be able to iterate quickly,” Chakrabarti says. “So for us to have developer velocity tools that allow squads to move quickly is really important.”

But by late 2017, it became clear that “having a small team working on the Helios features was just not as efficient as adopting something that was supported by a much bigger community,” says Chakrabarti. “We saw the amazing community that had grown up around Kubernetes, and we wanted to be part of that. We wanted to benefit from added velocity and reduced cost, and also align with the rest of the industry on best practices and tools.” At the same time, the team wanted to contribute its expertise and influence in the flourishing Kubernetes community.

Chakrabarti points out that for all four of the top-level metrics that Spotify looks at — lead time, deployment frequency, time to resolution, and operational load — “there is impact that Kubernetes is having.”

One success story that’s come out of the early days of Kubernetes is a tool called Slingshot that a Spotify team built on Kubernetes. “With a pull request, it creates a temporary staging environment that self destructs after 24 hours,” says Chakrabarti. “It’s all facilitated by Kubernetes, so that’s kind of an exciting example of how, once the technology is out there and ready to use, people start to build on top of it and craft their own solutions, even beyond what we might have envisioned as the initial purpose of it.”

B. CASE STUDY : OpenAI

Challenge

An artificial intelligence research lab, OpenAI needed infrastructure for deep learning that would allow experiments to be run either in the cloud or in its own data center, and to easily scale. Portability, speed, and cost were the main drivers.

Solution

OpenAI began running Kubernetes on top of AWS in 2016, and in early 2017 migrated to Azure. OpenAI runs key experiments in fields including robotics and gaming both in Azure and in its own data centers, depending on which cluster has free capacity. “We use Kubernetes mainly as a batch scheduling system and rely on our autoscaler to dynamically scale up and down our cluster,” says Christopher Berner, Head of Infrastructure. “This lets us significantly reduce costs for idle nodes, while still providing low latency and rapid iteration.”

Impact

The company has benefited from greater portability: “Because Kubernetes provides a consistent API, we can move our research experiments very easily between clusters,” says Berner. Being able to use its own data centers when appropriate is “lowering costs and providing us access to hardware that we wouldn’t necessarily have access to in the cloud,” he adds. “As long as the utilization is high, the costs are much lower there.” Launching experiments also takes far less time: “One of our researchers who is working on a new distributed training system has been able to get his experiment running in two or three days. In a week or two he scaled it out to hundreds of GPUs. Previously, that would have easily been a couple of months of work.”

With a mission to ensure powerful AI systems are safe, OpenAI cares deeply about open source — both benefiting from it and contributing safety technology into it. “The research that we do, we want to spread it as widely as possible so everyone can benefit,” says OpenAI’s Head of Infrastructure Christopher Berner. The lab’s philosophy — as well as its particular needs — lent itself to embracing an open source, cloud native strategy for its deep learning infrastructure.

OpenAI started running Kubernetes on top of AWS in 2016, and a year later, migrated the Kubernetes clusters to Azure. “We probably use Kubernetes differently from a lot of people,” says Berner. “We use it for batch scheduling and as a workload manager for the cluster. It’s a way of coordinating a large number of containers that are all connected together. We rely on our autoscaler to dynamically scale up and down our cluster. This lets us significantly reduce costs for idle nodes, while still providing low latency and rapid iteration.”

In the past year, Berner has overseen the launch of several Kubernetes clusters in OpenAI’s own data centers. “We run them in a hybrid model where the control planes — the Kubernetes API servers, etcd and everything — are all in Azure, and then all of the Kubernetes nodes are in our own data center,” says Berner. “The cloud is really convenient for managing etcd and all of the masters, and having backups and spinning up new nodes if anything breaks. This model allows us to take advantage of lower costs and have the availability of more specialized hardware in our own data center.”

“Research teams can now take advantage of the frameworks we’ve built on top of Kubernetes, which make it easy to launch experiments, scale them by 10x or 50x, and take little effort to manage.” : CHRISTOPHER BERNER, HEAD OF INFRASTRUCTURE FOR OPENAI

That path has been simplified by frameworks and tools that two of OpenAI’s teams have developed to handle interaction with Kubernetes. “You can just write some Python code, fill out a bit of configuration with exactly how many machines you need and which types, and then it will prepare all of those specifications and send it to the Kube cluster so that it gets launched there,” says Berner. “And it also provides a bit of extra monitoring and better tooling that’s designed specifically for these machine learning projects.”

The impact that Kubernetes has had at OpenAI is impressive. With Kubernetes, the frameworks and tooling, including the autoscaler, in place, launching experiments takes far less time. “One of our researchers who is working on a new distributed training system has been able to get his experiment running in two or three days,” says Berner. “In a week or two he scaled it out to hundreds of GPUs. Previously, that would have easily been a couple of months of work.”

Plus, the flexibility they now have to use their on-prem Kubernetes cluster when appropriate is “lowering costs and providing us access to hardware that we wouldn’t necessarily have access to in the cloud,” he says. “As long as the utilization is high, the costs are much lower in our data center. To an extent, you can also customize your hardware to exactly what you need.”

Summary

Kubernetes is an orchestration tool for managing distributed services or containerized applications across a distributed cluster of nodes. It was designed for natively supporting (auto-)scaling, high availability, security and portability. Kubernetes itself follows a client-server architecture, with a master node composed of etcd cluster, kube-apiserver, kube-controller-manager, cloud-controller-manager, scheduler. Client (worker) nodes are composed of kube-proxy and kubelet components. Core concepts in Kubernetes include pods (a group of containers deployed together), services (a group of logical pods with a stable IP address) and deployments (a definition of the desired state for a pod or replica set, acted upon by a controller if the current state differs from the desired state), among others.

--

--