LFS258 Kubernetes Fundamentals

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 123
At a glance
Powered by AI
The key takeaways are the course objectives, formatting, and timing. The course will cover the history, architecture, API, deployment, and upcoming features of Kubernetes.

The main components of Kubernetes architecture are the master node, worker nodes, etcd database, containers, pods, services and labels.

The main Kubernetes resources that make up the API are pods, services, deployments, replicasets, namespaces and network policies.

1.1.

Before You Begin

Chapter 1. Course Introduction > 1.1. Before You Begin

Before you begin, please take a moment to familiarize yourself with the course
navigation:
 The navigation buttons at the bottom of the page will help you move forward or
backward within the course, one page at a time. You can also use the Right Arrow
keystroke to go to the next page, and the Left Arrow keystroke to go to the
previous page. For touchscreen devices such as phone and tablet devices,
navigation through swiping is enabled as well.
 To exit the course, you can use the Exit button at the top-right of the page or the X
keystroke.
 The Home button at the top right of the page, or the H keystroke will take you to
the first page of the course.
 The drop-down menu (Table of Contents) at the bottom left helps you navigate to
any page in the course. It will always display the title of the current page. 
 The breadcrumbs at the top of the page indicate your location within the course
(chapter/page).
 Numerous resources are available throughout the course, and can be accessed by
clicking hyperlinks that will open Internet pages in a new window. 
 Where available, you can use the video player functionalities to start, pause, stop,
restart the video, control the volume, turn closed captions on or off, and control the
screen size of the video. Closed captions are enabled for video narrations only. 

1.3. Course Objectives

Chapter 1. Course Introduction > 1.3. Course Objectives

By the end of this course, you will learn the following:


 The history and evolution of Kubernetes.
 Its high-level architecture and components.
 The API, the most important resources that make the API, and how to use them.
 How to deploy and manage an application.
 Some upcoming features that will boost your productivity.

1.4. Course Formatting

Chapter 1. Course Introduction > 1.4. Course Formatting

In order to make it easier to distinguish the various types of content in the course, we use
the color coding and formats below:
 Bold: names of programs or services (or used for emphasis)
 Light blue: designates hyperlinks
 Dark blue: text typed at the command line, system output at
the command line.
 

1.6. Course Timing

Chapter 1. Course Introduction > 1.6. Course Timing

This course is entirely self-paced; there is no fixed schedule for going through the material.
You can go through the course at your own pace, and you'll always be returned to exactly
where you left off when you come back to start a new session. However, we still suggest
you avoid long breaks in between periods of work, as learning will be faster and content
retention improved.
You have unlimited access to this course for 12 months from the date you registered, even
after you have completed the course.
The chapters in the course have been designed to build on one another. It is probably best
to work through them in sequence; if you skip or only skim some chapters quickly, you
may find there are topics being discussed you have not been exposed to yet. But this is all
self-paced, and you can always go back, so you can thread your own path through the
material.

1.7.a. Exercises-Lab Environment

Chapter 1. Course Introduction > 1.7.a. Exercises-Lab Environment

The lab exercises were written using Google Compute Engine (GCE) nodes. They have
been written to be vendor-agnostic, so they could run on AWS, local hardware, or inside of
virtual machines, to give you the most flexibility and options.
Each node has 3 vCPUs and 7.5G of memory, running Ubuntu 18.04. Smaller nodes
should work, but you should expect a slow response. Other operating system images
are also possible, but there may be a slight difference in some command outputs.
Using GCE requires setting up an account, and will incur expenses if using nodes of the
size suggested. The Getting Started pages can be viewed online.

Amazon Web Service (AWS) is another provider of cloud-based nodes, and requires an
account; you will incur expenses for nodes of the suggested size. You can find
videos and information about how to launch a Linux virtual machine on the AWS website. 

Virtual machines such as KVM, VirtualBox, or VMWare can also be used for the lab
systems. Putting the VMs on a private network can make troubleshooting easier. As of
Kubernetes v1.16.1, the minimum (as in barely works) size for VirtualBox is 3vCPU/4G
memory/5G minimal OS for master and 1vCPU/2G memory/5G minimal OS for worker
node.

Finally, using bare-metal nodes, with access to the Internet, will also work for the lab
exercises.

1.7.e. Exercises - Knowledge Check

Chapter 1. Course Introduction > 1.7.e. Exercises - Knowledge Check

At the end of each chapter, you will also find a series of knowledge check questions.
These questions, just like the labs, were designed with one main goal in mind: to help you
better comprehend the course content and reinforce what you have learned. It is important
to point out that the labs and knowledge check questions are not graded. We would
like to emphasize as well that you will not be required to take a final exam to complete
this course.

1.8. Course Resources

Chapter 1. Course Introduction > 1.8. Course Resources

Resources for this course can be found online. Making updates to this course takes time.
Therefore, if there are any changes in between updates, you can always access course
updates, as well as the course resources online:
 Go to the LFS258 Course Resource webpage.
 The user ID is LFtraining and the password is Penguin2014. 

1.9. Class Forum Guidelines

Chapter 1. Course Introduction > 1.9. Class Forum Guidelines

One great way to interact with peers taking this course is via the Class Forum on
linux.com. This board can be used in the following ways:
 To introduce yourself to other peers taking this course.
 To discuss concepts, tools, and technologies presented in this course, or related to
the topics discussed in the course material.
 To ask questions about course content.
 To share resources and ideas related to Kubernetes and related technologies.
The Class Forum will be reviewed periodically by The Linux Foundation staff, but it is
primarily a community resource, not an 'ask the instructor' service.

1.10. Course Support


Chapter 1. Course Introduction > 1.10. Course Support

Note: If you are using the QuickStart course player engine, and are inactive for more than
30 minutes while signed-in, you will automatically be logged out.

 If you enrolled in the course using the Linux Foundation cart, and you are logged
out for inactivity, you should close the course player and the
QuickStart tab/window, you will have to log back into your Linux Foundation portal
and re-launch the course from your portal. Do not use the login window presented
by QuickStart, as you will not be able to log back in from that page.
 If you are using a corporate branded QuickStart portal, you can log back in using
the same URL and credentials that you normally use to access the course.
 We use a single sign-on service to launch the course once users are on their
'myportal' page. Do not attempt to change your password on the QuickStart course
player engine, as this will break your single sign-on. 

For any issues with your username or password, visit The Linux Foundation ID website.
For any course content-related questions, please use the course forum on Linux.com (see
the details in the Class Forum Guidelines section). 
If you need assistance beyond what is available above, please email us at:
[email protected] and provide your Linux Foundation ID username and a
description of your concern.
You can download a list of most frequently asked support questions by clicking on the
Document button below or by using the D keystroke.

1.11. Course Audience and Requirements

Chapter 1. Course Introduction > 1.11. Course Audience and Requirements

If you are a Linux administrator or software developer starting to work with containers and
wondering how to manage them in production, LFS258: Kubernetes Fundamentals is the
course for you. 
In this course, you will learn the key principles that will put you on the journey to managing
containerized applications in production. 
To make the most of this course, you will need the following:
 A good understanding of Linux.
 Familiarity with the command line.
 Familiarity with package managers.
 Familiarity with Git and GitHub.
 Access to a Linux server or Linux desktop/laptop.
 VirtualBox on your machine, or access to a public cloud.

1.12. Software Environment


Chapter 1. Course Introduction > 1.12. Software Environment

The material produced by The Linux Foundation is distribution-flexible. This means that
technical explanations, labs and procedures should work on most modern distributions,
and we do not promote products sold by any specific vendor (although we may mention
them for specific scenarios).
In practice, most of our material is written with the three main Linux distribution families in
mind:
 Red Hat/Fedora
 OpenSUSE/SUSE
 Debian.
Distributions used by our students tend to be one of these three alternatives, or a product
that is derived from them.

1.13. Which Distribution to Choose?

Chapter 1. Course Introduction > 1.13. Which Distribution to Choose?

You should ask yourself several questions when choosing a new distribution:
 Has your employer already standardized?
 Do you want to learn more?
 Do you want to certify?
While there are many reasons that may force you to focus on one Linux distribution versus
another, we encourage you to gain experience on all of them. You will quickly notice that
technical differences are mainly about package management systems, software versions
and file locations. Once you get a grasp of those differences, it becomes relatively painless
to switch from one Linux distribution to another.
Some tools and utilities have vendor-supplied front-ends, especially for more particular or
complex reporting. The steps included in the text may need to be modified to run on a
different platform.

1.14. Red Hat Family

Chapter 1. Course Introduction > 1.14. Red Hat Family

Fedora is the community distribution that forms the basis of Red Hat Enterprise Linux,
CentOS, Scientific Linux and Oracle Linux. Fedora contains significantly more software
than Red Hat's enterprise version. One reason for this is that a diverse community is
involved in building Fedora; it is not just one company.
The Fedora community produces new versions every six months or so. For this reason, we
decided to standardize the Red Hat/Fedora part of the course material on the latest
version of CentOS 7, which provides much longer release cycles. Once installed, CentOS
is also virtually identical to Red Hat Enterprise Linux (RHEL), which is the most popular
Linux distribution in enterprise environments:

 Current material is based upon the latest release of Red Hat Enterprise Linux
(RHEL) - 7.x at the time of publication, and should work well with later versions
 Supports x86, x86-64, Itanium, PowerPC and IBM System Z
 RPM-based, uses yum (or dnf) to install and update
 Long release cycle; targets enterprise server environments
 Upstream for CentOS, Scientific Linux and Oracle Linux.

Note: CentOS is used for demos and labs because it is available at no cost.

1.15. SUSE Family

Chapter 1. Course Introduction > 1.15. SUSE Family

The relationship between OpenSUSE and SUSE Linux Enterprise Server is similar to the
one we just described between Fedora and Red Hat Enterprise Linux. In this case,
however, we decided to use OpenSUSE as the reference distribution for the OpenSUSE
family, due to the difficulty of obtaining a free version of SUSE Linux Enterprise Server.
The two products are extremely similar and material that covers OpenSUSE can typically
be applied to SUSE Linux Enterprise Server with no problem:

 Current material is based upon the latest release of OpenSUSE, and should work
well with later versions
 RPM-based, uses zypper to install and update
 YaST available for administration purposes
 x86 and x86-64
 Upstream for SUSE Linux Enterprise Server (SLES)
 Note: OpenSUSE is used for demos and labs because it is available at no cost.

1.16. Debian Family

Chapter 1. Course Introduction > 1.16. Debian Family

The Debian distribution is the upstream for several other distributions, including Ubuntu,
Linux Mint, and others. Debian is a pure open source project, and focuses on a key
aspect: stability. It also provides the largest and most complete software repository to its
users.
Ubuntu aims at providing a good compromise between long term stability and ease of use.
Since Ubuntu gets most of its packages from Debian's unstable branch, Ubuntu also has
access to a very large software repository. For those reasons, we decided to use Ubuntu
as the reference Debian-based distribution for our lab exercises:

 Commonly used on both servers and desktops


 DPKG-based, uses apt-get and frontends for installing and updating
 Upstream for Ubuntu, Linux Mint and others
 Current material based upon the latest release of Ubuntu, and should work well
with later versions
 x86 and x86-64
- Long Term Release (LTS)
 Note: Ubuntu is used for demos and labs because it is available at no cost, as is
Debian, but has a more relevant user base.

1.17. New Distribution Similarities

Chapter 1. Course Introduction > 1.17. New Distribution Similarities

Current trends and changes to the distributions have reduced some of the differences
between the distributions.
 systemd (system startup and service management)
systemd is used by the most common distributions, replacing the SysVinit and
Upstart packages. Replaces service and chkconfig commands.
 journald (manages system logs)
journald is a systemd service that collects and stores logging data. It creates and
maintains structured, indexed journals based on logging information that is
received from a variety of sources. Depending on the distribution, text-based
system logs may be replaced.
 firewalld (firewall management daemon)
firewalld provides a dynamically managed firewall with support for network/firewall
zones to define the trust level of network connections or interfaces. It has support
for IPv4, IPv6 firewall settings and for Ethernet bridges. This replaces the iptables
configurations.
 ip (network display and configuration tool)
The ip program is part of the net-tools package, and is designed to be a
replacement for the ifconfig command. The ip command will show or manipulate
routing, network devices, routing information and tunnels.
Since these utilities are common across distributions, the course content and lab
information will use these utilities.
If your choice of distribution or release does not support these commands, please translate
accordingly.
The following documents may be of some assistance translating older commands to their
systemd counterparts:
 SysVinit Cheat Sheet
 Debian Cheat Sheet
 openSUSE Cheat Sheet

1.20. The Linux Foundation


Chapter 1. Course Introduction > 1.20. The Linux Foundation

The Linux Foundation partners with the world's leading developers and companies to solve
the hardest technology problems and accelerate open technology development and
commercial adoption. The Linux Foundation makes it its mission to provide experience
and expertise to any initiative working to solve complex problems through open source
collaboration, providing the tools to scale open source projects: security best practices,
governance, operations and ecosystem development, training and certification, licensing,
and promotion.
Linux is the world's largest and most pervasive open source software project in history.
The Linux Foundation is home to the Linux creator Linus Torvalds and lead maintainer
Greg Kroah-Hartman, and provides a neutral home where Linux kernel development can
be protected and accelerated for years to come. The success of Linux has catalyzed
growth in the open source community, demonstrating the commercial efficacy of open
source and inspiring countless new projects across all industries and levels of the
technology stack.
The Linux Foundation is creating the largest shared technology investment in history. The
Linux Foundation is the umbrella for many critical open source projects that power
corporations today, spanning all industry sectors:
 Big data and analytics: ODPi, R Consortium
 Networking: OpenDaylight, OPNFV
 Embedded: Dronecode, Zephyr
 Web tools: JS Foundation, Node.js
 Cloud computing: Cloud Foundry, Cloud Native Computing Foundation, Open
Container Initiative
 Automotive: Automotive Grade Linux
 Security: The Core Infrastructure Initiative
 Blockchain: Hyperledger
 And many more.

1.21. The Linux Foundation Events

Chapter 1. Course Introduction > 1.21. The Linux Foundation Events

The Linux Foundation produces technical events around the world. Whether it is to provide
an open forum for development of the next kernel release, to bring together developers to
solve problems in a real-time environment, to host work groups and community groups for
active discussions, to connect end users and kernel developers in order to grow Linux and
open source software use in the enterprise or to encourage collaboration among the entire
community, we know that our conferences provide an atmosphere that is unmatched in
their ability to further the platform. The Linux Foundation hosts an increasing number of
events each year, including:
 Open Source Summit North America, Europe, Japan, and China
 MesosCon North America, Europe, and China
 Embedded Linux Conference/OpenIoT Summit North America and Europe
 Open Source Leadership Summit
 Automotive Linux Summit
 Apache: Big Data North America & ApacheCon
 KVM Forum
 Linux Storage Filesystem and Memory Management Summit
 Vault
 Open Networking Summit.

1.22. Training Venues

Chapter 1. Course Introduction > 1.22. Training Venues

The Linux Foundation's training is for the community, by the community, and
features instructors and content straight from the leaders of the Linux developer
community.
The Linux Foundation offers several types of training:
 Classroom
 Online
 On-site
 Events-based.
Attendees receive Linux and open source software training that is distribution-flexible,
technically advanced and created with the actual leaders of the Linux and open source
software development community themselves. The Linux Foundation courses give
attendees the broad, foundational knowledge and networking needed to thrive in their
careers today. With either online or in-person training, The Linux Foundation classes can
keep you or your developers ahead of the curve on open source essentials.

1.23. The Linux Foundation Training Offerings

Chapter 1. Course Introduction > 1.23. The Linux Foundation Training Offerings

Our current course offerings include:


 Linux Programming & Development Training
 Enterprise IT & Linux System Administration Courses
 Open Source Compliance Courses.
To get more information about specific courses offered by The Linux Foundation, including
technical requirements and other logistics, visit The Linux Foundation training website.
1.24. The Linux Foundation Certifications

Chapter 1. Course Introduction > 1.24. The Linux Foundation Certifications

The Linux Foundation certifications give you a way to differentiate yourself in a job market
that's hungry for your skills. We've taken a new, innovative approach to open source
certification that allows you to showcase your skills in a way that other peers will respect
and employers will trust:
 You can take your certification exam from any computer, anywhere, at any time
 The certification exams are performance-based
 The exams are distribution-flexible
 The exams are up-to-date, testing knowledge and skills that actually matter in today's IT
environment.
The Linux Foundation and its collaborative projects currently offer several different
certifications:
 Linux Foundation Certified Sysadmin (LFCS)
 Linux Foundation Certified Engineer (LFCE)
 Certified Kubernetes Administrator (CKA)
 Certified Kubernetes Application Developer (CKAD)
 Cloud Foundry Certified Developer (CFCD)
 Certified Hyperledger Sawtooth Administrator (CHSA)
 Certified Hyperledger Fabric Administrator (CHFA).

1.25. Training/Certification Firewall

Chapter 1. Course Introduction > 1.25. Training/Certification Firewall

The Linux Foundation has two separate training divisions: Course Delivery and
Certification. These two divisions are separated by a firewall.
The curriculum development and maintenance division of The Linux Foundation Training
department has no direct role in developing, administering, or grading certification exams.
Enforcing this self-imposed firewall ensures that independent organizations and
companies can develop third party training material, geared towards helping test takers
pass their certification exams.
Furthermore, it ensures that there are no secret "tips" (or secrets in general) that one
needs to be familiar with in order to succeed.
It also permits The Linux Foundation to develop a very robust set of courses that do far
more than teach the test, but rather equip attendees with a broad knowledge of the many
areas they may be required to master to have a successful career in open source system
administration.
1.26. Open Source Guides for the Enterprise

Chapter 1. Course Introduction > 1.26. Open Source Guides for the Enterprise

The Linux Foundation in partnership with the TODO Group developed a set of guides
leveraging best practices for:
 Running an open source program office, or
 Starting an open source project in an existing organization.
The Open Source Guides For the Enterprise are available for free online.

1.28. Copyright

Chapter 1. Course Introduction > 1.28. Copyright

Copyright 2017-2020, The Linux Foundation. All rights reserved.


The training materials provided or developed by The Linux Foundation in connection with
the training services are protected by copyright and other intellectual property rights.
Open source code incorporated herein may have other copyright holders and is used
pursuant to the applicable open source license.
Although third-party application software packages may be referenced herein, this is for
demonstration purposes only and shall not constitute an endorsement of any of these
software applications.
All The Linux Foundation training, including all the material provided herein, is supplied
without any guarantees from The Linux Foundation. The Linux Foundation assumes no
liability for damages or legal action arising from the use or misuse of contents or details
contained herein.
Linux is a registered trademark of Linus Torvalds. Other trademarks within this course
material are the property of their respective owners.
If you believe The Linux Foundation materials are being used, copied, or otherwise
improperly distributed, please email [email protected] or call +1-415-723-9709
(USA).
2.3. Learning Objectives

Chapter 2. Basics of Kubernetes > 2.3. Learning Objectives

By the end of this chapter, you should be able to:


 Discuss Kubernetes.
 Learn the basic Kubernetes terminology.
 Discuss the configuration tools.
 Learn what community resources are available. 

2.4. What Is Kubernetes?

Chapter 2. Basics of Kubernetes > 2.4. What Is Kubernetes?

Running a container on a laptop is relatively simple. But, connecting containers across


multiple hosts, scaling them, deploying applications without downtime, and service
discovery among several aspects, can be difficult.
Kubernetes addresses those challenges from the start with a set of primitives and a
powerful open and extensible API. The ability to add new objects and controllers allows
easy customization for various production needs.
According to the kubernetes.io website, Kubernetes is: 
"an open-source system for automating deployment, scaling, and management of
containerized applications".
A key aspect of Kubernetes is that it builds on 15 years of experience at Google in a
project called borg.
Google's infrastructure started reaching high scale before virtual machines became
pervasive in the datacenter, and containers provided a fine-grained solution for packing
clusters efficiently. Efficiency in using clusters and managing distributed applications has
been at the core of Google challenges.
In Greek, κνβερνητης means the Helmsman, or pilot of the ship. Keeping with the
maritime theme of Docker containers, Kubernetes is the pilot of a ship of containers.
Due to the difficulty in pronouncing the name, many will use a nickname, K8s, as
Kubernetes has eight letters. The nickname is said like Kate's.

2.5. Components of Kubernetes

Chapter 2. Basics of Kubernetes > 2.5. Components of Kubernetes

Deploying containers and using Kubernetes may require a change in the development and
the system administration approach to deploying applications. In a traditional environment,
an application (such as a web server) would be a monolithic application placed on a
dedicated server. As the web traffic increases, the application would be tuned, and
perhaps moved to bigger and bigger hardware. After a couple of years, a lot of
customization may have been done in order to meet the current web traffic needs.
Instead of using a large server, Kubernetes approaches the same issue by deploying a
large number of small web servers, or microservices. The server and client sides of the
application expect that there are many possible agents available to respond to a request. It
is also important that clients expect the server processes to die and be replaced, leading
to a transient server deployment. Instead of a large Apache web server with many httpd
daemons responding to page requests, there would be many nginx servers, each
responding.
The transient nature of smaller services also allows for decoupling. Each aspect of the
traditional application is replaced with a dedicated, but transient, microservice or agent. To
join these agents, or their replacements together, we use services and API calls. A service
ties traffic from one agent to another (for example, a frontend web server to a backend
database) and handles new IP or other information, should either one die and be replaced.
Communication to, as well as internally, between components is API call-driven, which
allows for flexibility. Configuration information is stored in a JSON format, but is most often
written in YAML. Kubernetes agents convert the YAML to JSON prior to persistence to the
database.
Kubernetes is written in Go Language, a portable language which is like a hybridization
between C++, Python, and Java. Some claim it incorporates the best (while some claim
the worst) parts of each.

2.6. Challenges

Chapter 2. Basics of Kubernetes > 2.6. Challenges

Containers have seen a huge rejuvenation in the past few years. They provide a great way
to package, ship, and run applications - that is the Docker motto.
The developer experience has been boosted tremendously thanks to containers.
Containers, and Docker specifically, have empowered developers with ease of building
container images, simplicity of sharing images via Docker registries, and providing a
powerful user experience to manage containers.
However, managing containers at scale and architecting a distributed application based on
microservices' principles is still challenging.
You first need a continuous integration pipeline to build your container images, test them,
and verify them. Then, you need a cluster of machines acting as your base infrastructure
on which to run your containers. You also need a system to launch your containers, and
watch over them when things fail and self-heal. You must be able to perform rolling
updates and rollbacks, and eventually tear down the resource when no longer needed.
All of these actions require flexible, scalable, and easy-to-use network and storage. As
containers are launched on any worker node, the network must join the resource to other
containers, while still keeping the traffic secure from others. We also need a storage
structure which provides and keeps or recycles storage in a seamless manner.
One of the biggest challenges to adoption is the applications themselves, inside the
container. They need to be written, or re-written, to be truly transient. If you were to deploy
Chaos Monkey, which would terminate any containers, would your customers notice?

2.7. Other Solutions

Chapter 2. Basics of Kubernetes > 2.7. Other Solutions

Built on open source and easily extensible, Kubernetes is definitely a solution to manage
containerized applications.
There are other solutions as well, including:

 Docker Swarm is the Docker Inc. solution. It has been re-architected recently and
is based on SwarmKit. It is embedded with the Docker Engine.
 Apache Mesos is a data center scheduler, which can run containers through the
use of frameworks. Marathon is the framework that lets you orchestrate containers.
 Nomad from HashiCorp, the makers of Vagrant and Consul, is another solution for
managing containerized applications. Nomad schedules tasks defined in Jobs. It
has a Docker driver which lets you define a running container as a task.
 Rancher is a container orchestrator-agnostic system, which provides a single pane
of glass interface to managing applications. It supports Mesos, Swarm,
Kubernetes.

2.8.a. The Borg Heritage

Chapter 2. Basics of Kubernetes > 2.8.a. The Borg Heritage

What primarily distinguishes Kubernetes from other systems is its heritage. Kubernetes is
inspired by Borg - the internal system used by Google to manage its applications (e.g.
Gmail, Apps, GCE).
With Google pouring the valuable lessons they learned from writing and operating Borg for
over 15 years into Kubernetes, this makes Kubernetes a safe choice when having to decide
on what system to use to manage containers. While a powerful tool, part of the current
growth in Kubernetes is making it easier to work with and handle workloads not found in a
Google data center.
To learn more about the ideas behind Kubernetes, you can read the Large-scale cluster
management at Google with Borg paper.
Borg has inspired current data center systems, as well as the underlying technologies
used in container runtime today. Google contributed cgroups to the Linux kernel in 2007;
it limits the resources used by collection of processes. Both cgroups and Linux
namespaces are at the heart of containers today, including Docker.
Mesos was inspired by discussions with Google when Borg was still a secret. Indeed,
Mesos builds a multi-level scheduler, which aims to better use a data center cluster.
The Cloud Foundry Foundation embraces the 12 factor application principles. These
principles provide great guidance to build web applications that can scale easily, can be
deployed in the cloud, and whose build is automated. Borg and Kubernetes address these
principles as well.

2.8.b. The Borg Heritage (Cont.)

Chapter 2. Basics of Kubernetes > 2.8.b. The Borg Heritage (Cont.)

The Kubernetes Lineage (by Chip Childers, Cloud Foundry Foundation, retrieved


from LinkedIn Slideshare)

2.9.a. Kubernetes Architecture

Chapter 2. Basics of Kubernetes > 2.9.a. Kubernetes Architecture

To quickly demistify Kubernetes, let's have a look at the Kubernetes Architecture graphic,
which shows a high-level architecture diagram of the system components. Not all
components are shown. Every node running a container would have kubelet and kube-
proxy, for example.

Kubernetes Architecture 

2.9.b. Kubernetes Architecture (Cont.)

Chapter 2. Basics of Kubernetes > 2.9.b. Kubernetes Architecture (Cont.)

In its simplest form, Kubernetes is made of a central manager (aka master) and some
worker nodes, once called minions (we will see in a follow-on chapter how you can actually
run everything on a single node for testing purposes). The manager runs an API server, a
scheduler, various controllers and a storage system to keep the state of the cluster,
container settings, and the networking configuration.
Kubernetes exposes an API via the API server. You can communicate with the API using a
local client called kubectl or you can write your own client and use curl commands. The
kube-scheduler is forwarded the requests for running containers coming to the API and
finds a suitable node to run that container. Each node in the cluster runs two
processes: a kubelet and kube-proxy. The kubelet receives requests to run the
containers, manages any necessary resources and watches over them on the local node.
kubelet interacts with the local container engine, which is Docker by default, but could be
rkt or cri-o, which is growing in popularity. 
The kube-proxy creates and manages networking rules to expose the container on the
network.
Using an API-based communication scheme allows for non-Linux worker nodes and
containers. Support for Windows Server 2019 was graduated to Stable with the 1.14
release. Only Linux nodes can be master on a cluster.
2.10. Terminology

Chapter 2. Basics of Kubernetes > 2.10. Terminology

We have learned that Kubernetes is an orchestration system to deploy and manage


containers. Containers are not managed individually; instead, they are part of a larger
object called a Pod. A Pod consists of one or more containers which share an IP address,
access to storage and namespace. Typically, one container in a Pod runs an application,
while other containers support the primary application.
Orchestration is managed through a series of watch-loops, or controllers. Each controller
interrogates the kube-apiserver for a particular object state, modifying the object until the
declared state matches the current state. These controllers are compiled into the kube-
controller-manager. The default, newest, and feature-filled controller for containers is a
Deployment. A Deployment ensures that resources declared in the PodSpec are
available, such as IP address and storage, and then deploys a ReplicaSet. The
ReplicaSet is a controller which deploys and restarts pods, which declares to the
container engine, Docker by default, to spawn or terminate a container until the requested
number is running. Previously, the function was handled by the ReplicationController,
but has been obviated by Deployments. There are also Jobs and CronJobs to handle
single or recurring tasks, among others.
To easily manage thousands of Pods across hundreds of nodes can be a difficult task to
manage. To make management easier, we can use labels, arbitrary strings which become
part of the object metadata. These can then be used when checking or changing the state
of objects without having to know individual names or UIDs. Nodes can have taints to
discourage Pod assignments, unless the Pod has a toleration in its metadata.
There is also space in metadata for annotations which remain with the object but cannot
be used by Kubernetes commands. This information could be used by third-party agents
or other tools.

2.11. Innovation

Chapter 2. Basics of Kubernetes > 2.11. Innovation

Since its inception, Kubernetes has seen a terrific pace of innovation and adoption. The
community of developers, users, testers, and advocates is continuously growing every
day. The software is also moving at an extremely fast pace, which is even putting GitHub
to the test:
 Given to open source in June 2014
 Thousands of contributors
 More than 83k commits
 More than 28k on Slack
 Currently, on a three month major release cycle
 Constant changes.
2.12. User Community

Chapter 2. Basics of Kubernetes > 2.12. User Community

Kubernetes is being adopted at a very rapid pace. To learn more, you should check out
the case studies presented on the Kubernetes website. Ebay, Box, Pearson and
Wikimedia have all shared their stories.
Pokemon Go, the fastest growing mobile game, also runs on Google Container Engine
(GKE), the Kubernetes service from Google Cloud Platform (GCP).

Kubernetes Users
(by Kubernetes, retrieved from the Kubernetes website)

2.13. Tools

Chapter 2. Basics of Kubernetes > 2.13. Tools

There are several tools you can use to work with Kubernetes. As the project has grown,
new tools are made available, while old ones are being deprecated. Minikube is a very
simple tool meant to run inside of VirtualBox. If you have limited resources and do not
want much hassle, it is the easiest way to get up and running. We mention it for those who
are not interested in a typical production environment, but want to use the tool.
Our labs will focus on the use of kubeadm and kubectl, which are very powerful and
complex tools.
There are third-party tools as well, such as Helm, an easy tool for using
Kubernetes charts, and Kompose to translate Docker Compose files into Kubernetes
objects. Expect these tools to change often.
2.14. The Cloud Native Computing Foundation

Chapter 2. Basics of Kubernetes > 2.14. The Cloud Native Computing Foundation

Kubernetes is an open source software with an Apache license. Google donated


Kubernetes to a newly formed collaborative project within The Linux Foundation in July
2015, when Kubernetes reached the v1.0 release. This project is known as the Cloud
Native Computing Foundation (CNCF).
CNCF is not just about Kubernetes, it serves as the governing body for open source
software that solves specific issues faced by cloud native applications (i.e. applications
that are written specifically for a cloud environment).
CNCF has many corporate members that collaborate, such as Cisco, the Cloud Foundry
Foundation, AT&T, Box, Goldman Sachs, and many others.
Note: Since CNCF now owns the Kubernetes copyright, contributors to the source need to
sign a contributor license agreement (CLA) with CNCF, just like any contributor to an
Apache-licensed project signs a CLA with the Apache Software Foundation.

2.15. Resource Recommendations

Chapter 2. Basics of Kubernetes > 2.15. Resource Recommendations

If you want to go beyond this general introduction to Kubernetes, here are a few things we
recommend:
 Read the Borg paper
 Listen to John Wilkes talking about Borg and Kubernetes
 Add the Kubernetes community hangout to your calendar, and attend at least once.
 Join the community on Slack and go in the #kubernetes-users channel.
 Check out the very active Stack Overflow community.

3.3. Learning Objectives

Chapter 3. Installation and Configuration > 3.3. Learning Objectives

By the end of this chapter, you should be able to:


 Download installation and configuration tools.
 Install a Kubernetes master and grow a cluster.
 Configure a network solution for secure communications.
 Discuss highly-available deployment considerations.

3.4. Installation Tools

Chapter 3. Installation and Configuration > 3.4. Installation Tools

This chapter is about Kubernetes installation and configuration. We are going to review a
few installation mechanisms that you can use to create your own Kubernetes cluster.
To get started without having to dive right away into installing and configuring a cluster,
there are two main choices.
One way is to use Google Container Engine (GKE), a cloud service from the Google Cloud
Platform, that lets you request a Kubernetes cluster with the latest stable version.
Another easy way to get started is to use Minikube. It is a single binary which deploys into
Oracle VirtualBox software, which can run in several operating systems. While Minikube is
local and single node, it will give you a learning, testing, and development
platform. MicroK8s is a newer tool tool developed by Canonical, and aimed at easy
installation. Aimed at appliance-like installations, it currently runs on Ubuntu 16.04 and
18.04.
To be able to use the Kubernetes cluster, you will need to have installed the Kubernetes
command line, called kubectl. This runs locally on your machine and targets the API
server endpoint. It allows you to create, manage, and delete all Kubernetes resources (e.g.
Pods, Deployments, Services). It is a powerful CLI that we will use throughout the rest of
this course. So, you should become familiar with it.
We will use kubeadm, the community-suggested tool from the Kubernetes project, that
makes installing Kubernetes easy and avoids vendor-specific installers. Getting a cluster
running involves two commands: kubeadm init, that you run on one Master node, and
then, kubeadm join, that you run on your Worker or redundant master nodes, and your
cluster bootstraps itself. The flexibility of these tools allows Kubernetes to be deployed in a
number of places. Lab exercises use this method.
We will also talk about other installation mechanisms, such as kubespray or kops,
another way to create a Kubernetes cluster on AWS. We will note you can create your
systemd unit file in a very traditional way. Additionally, you can use a container image
called hyperkube, which contains all the key Kubernetes binaries, so that you can run a
Kubernetes cluster by just starting a few containers on your nodes.

3.5. Installing kubectl

Chapter 3. Installation and Configuration > 3.5. Installing kubectl

To configure and manage your cluster, you will probably use the kubectl command. You
can use RESTful calls or the Go language, as well.
Enterprise Linux distributions have the various Kubernetes utilities and other files available
in their repositories. For example, on RHEL 7/CentOS 7, you would find kubectl in the
kubernetes-client package.
You can (if needed) download the code from Github, and go through the usual steps to
compile and install kubectl.
This command line will use ~/.kube/config as a configuration file. This contains all the
Kubernetes endpoints that you might use. If you examine it, you will see cluster definitions
(i.e. IP endpoints), credentials, and contexts.
A context is a combination of a cluster and user credentials. You can pass these
parameters on the command line, or switch the shell between contexts with a command,
as in:
$ kubectl config use-context foobar
This is handy when going from a local environment to a cluster in the cloud, or from one
cluster to another, such as from development to production.

3.6. Using Google Kubernetes Engine (GKE)

Chapter 3. Installation and Configuration > 3.6. Using Google Kubernetes Engine
(GKE)

Google takes every Kubernetes release through rigorous testing and makes it available via
its GKE service. To be able to use GKE, you will need the following:
 An account on Google Cloud.
 A method of payment for the services you will use.
 The gcloud command line client.
There is an extensive documentation to get it installed. Pick your favorite method of
installation and set it up. For more details, you can visit the Installing Cloud SDK web
page. 
You will then be able to follow the GKE quickstart guide and you will be ready to create
your first Kubernetes cluster:
$ gcloud container clusters create linuxfoundation
$ gcloud container clusters list
$ kubectl get nodes

By installing gcloud, you will have automatically installed kubectl. In the commands
above, we created the cluster, listed it, and then, listed the nodes of the cluster with
kubectl. 
Once you are done, do not forget to delete your cluster, otherwise you will keep on
getting charged for it:
$ gcloud container clusters delete linuxfoundation

3.7. Using Minikube

Chapter 3. Installation and Configuration > 3.7. Using Minikube


You can also use Minikube, an open source project within the GitHub Kubernetes
organization. While you can download a release from GitHub, following listed directions, it
may be easier to download a pre-compiled binary. Make sure to verify and get the latest
version.
For example, to get the v.0.22.2 version, do:
$ curl -Lo
minikube https://storage.googleapis.com/minikube/releases/latest/m
inikube-darwin-amd64
$ chmod +x minikube
$ sudo mv minikube /usr/local/bin
With Minikube now installed, starting Kubernetes on your local machine is very easy:
$ minikube start
$ kubectl get nodes
This will start a VirtualBox virtual machine that will contain a single node Kubernetes
deployment and the Docker engine. Internally, minikube runs a single Go binary called
localkube. This binary runs all the components of Kubernetes together. This makes
Minikube simpler than a full Kubernetes deployment. In addition, the Minikube VM also
runs Docker, in order to be able to run containers.

3.8. Installing with kubeadm

Chapter 3. Installation and Configuration > 3.8. Installing with kubeadm

Once you become familiar with Kubernetes using Minikube, you may want to start building
a real cluster. Currently, the most straightforward method is to use kubeadm, which
appeared in Kubernetes v1.4.0, and can be used to bootstrap a cluster quickly. As the
community has focused on kubeadm, it has moved from beta to stable and added high
availability with v1.15.0.
The Kubernetes website provides documentation on how to use kubeadm to create a
cluster.
Package repositories are available for Ubuntu 16.04 and CentOS 7.1. Packages have not
yet been made available for Ubuntu 18, but will work as you will see in the lab exercises.
To join other nodes to the cluster, you will need at least one token and an SHA256 hash.
This information is returned by the command kubeadm init. Once the master has
initialized, you would apply a network plugin. Main steps:
 Run kubeadm init on the head node
 Create a network for IP-per-Pod criteria
 Run kubeadm join --token token head-node-IP on worker nodes.
You can also create the network with kubectl by using a resource manifest of the network.
For example, to use the Weave network, you would do the following:
$ kubectl create -f https://git.io/weave-kube
Once all the steps are completed, workers and other master nodes joined, you will have a
functional multi-node Kubernetes cluster, and you will be able to use kubectl to interact
with it.

3.9. Installing a Pod Network

Chapter 3. Installation and Configuration > 3.9. Installing a Pod Network

Prior to initializing the Kubernetes cluster, the network must be considered and IP conflicts
avoided. There are several Pod networking choices, in varying levels of development and
feature set:
 Calico 
A flat Layer 3 network which communicates without IP encapsulation, used in
production with software such as Kubernetes, OpenShift, Docker, Mesos and
OpenStack. Viewed as a simple and flexible networking model, it scales well for
large environments. Another network option, Canal, also part of this project, allows
for integration with Flannel. Allows for implementation of network policies.
 Flannel
A Layer 3 IPv4 network between the nodes of a cluster. Developed by CoreOS, it
has a long history with Kubernetes. Focused on traffic between hosts, not how
containers configure local networking, it can use one of several backend
mechanisms, such as VXLAN. A flanneld agent on each node allocates subnet
leases for the host. While it can be configured after deployment, it is much easier
prior to any Pods being added.
 Kube-router
Feature-filled single binary which claims to "do it all". The project is in the alpha
stage, but promises to offer a distributed load balancer, firewall, and router
purposely built for Kubernetes.
 Romana 
Another project aimed at network and security automation for cloud native
applications. Aimed at large clusters, IPAM-aware topology and integration with
kops clusters.
 Weave Net 
Typically used as an add-on for a CNI-enabled Kubernetes cluster.
Many of the projects will mention the Container Network Interface (CNI), which is a CNCF
project. Several container runtimes currently use CNI. As a standard to handle deployment
management and cleanup of network resources, it will become more popular.

3.10. More Installation Tools

Chapter 3. Installation and Configuration > 3.10. More Installation Tools

Since Kubernetes is, after all, like any other applications that you install on a server
(whether physical or virtual), all the configuration management systems (e.g. Chef,
Puppet, Ansible, Terraform) can be used. Various recipes are available on the Internet.
Here are just a few examples of installation tools that you can use:

 kubespray 
kubespray is now in the Kubernetes incubator. It is an advanced Ansible playbook
which allows you to set up a Kubernetes cluster on various operating systems and
use different network providers. It was once known as kargo.
 kops
kops lets you create a Kubernetes cluster on AWS via a single command line. Also
in beta for GKE and alpha for VMware.
 kube-aws
kube-aws is a command line tool that makes use of the AWS Cloud Formation to
provision a Kubernetes cluster on AWS.
 kubicorn
kubicorn is a tool which leverages the use of kubeadm to build a cluster. It claims
to have no dependency on DNS, runs on several operating systems, and uses
snapshots to capture a cluster and move it.
The best way to learn how to install Kubernetes using step-by-step manual commands is
to examine the Kelsey Hightower walkthrough.

3.11. Installation Considerations

Chapter 3. Installation and Configuration > 3.11. Installation Considerations

To begin the installation process, you should start experimenting with a single-node
deployment. This single-node will run all the Kubernetes components (e.g. API server,
controller, scheduler, kubelet, and kube-proxy). You can do this with Minikube for example.
Once you want to deploy on a cluster of servers (physical or virtual), you will have many
choices to make, just like with any other distributed system:
 Which provider should I use? A public or private cloud? Physical or virtual?
 Which operating system should I use? Kubernetes runs on most operating systems
(e.g. Debian, Ubuntu, CentOS, etc.), plus on container-optimized OSes (e.g.
CoreOS, Atomic).
 Which networking solution should I use? Do I need an overlay?
 Where should I run my etcd cluster?
 Can I configure Highly Available (HA) head nodes?
To learn more about how to choose the best options, you can read the Picking the Right
Solution article.
With systemd becoming the dominant init system on Linux, your Kubernetes components
will end up being run as systemd unit files in most cases. Or, they will be run via a kubelet
running on the head node (i.e. kubadm).

3.12. Main Deployment Configurations

Chapter 3. Installation and Configuration > 3.12. Main Deployment Configurations


At a high level, you have four main deployment configurations:
 Single-node
 Single head node, multiple workers
 Multiple head nodes with HA, multiple workers
 HA etcd, HA head nodes, multiple workers.
Which of the four you will use will depend on how advanced you are in your Kubernetes
journey, but also on what your goals are.
With a single-node deployment, all the components run on the same server. This is great
for testing, learning, and developing around Kubernetes.
Adding more workers, a single head node and multiple workers typically will consist of a
single node etcd instance running on the head node with the API, the scheduler, and the
controller-manager.
Multiple head nodes in an HA configuration and multiple workers add more durability to the
cluster. The API server will be fronted by a load balancer, the scheduler and the controller-
manager will elect a leader (which is configured via flags). The etcd setup can still be
single node.
The most advanced and resilient setup would be an HA etcd cluster, with HA head nodes
and multiple workers. Also, etcd would run as a true cluster, which would provide HA and
would run on nodes separate from the Kubernetes head nodes.
The use of Kubernetes Federations also offers high availability. Multiple clusters are joined
together with a common control plane allowing movement of resources from one cluster to
another administratively or after failure. While Federation has has some issues, there is
hope v2 will be a stronger product.

3.13.a. Systemd Unit File for Kubernetes

Chapter 3. Installation and Configuration > 3.13.a. Systemd Unit File for Kubernetes

In any of these configurations, you will run some of the components as a standard system
daemon. As an example, below is a sample systemd unit file to run the controller-
manager. Using kubeadm will create a system daemon for kubelet, while the rest will be
deployed as containers.
 - name: kube-controller-manager.service
    command: start
    content: |
      [Unit]
      Description=Kubernetes Controller Manager
Documentation=https://github.com/kubernetes/kubernetes
      Requires=kube-apiserver.service
      After=kube-apiserver.service
      [Service]
      ExecStartPre=/usr/bin/curl -L -o /opt/bin/kube-controller-
manager -z /opt/bin/kube-controller-manager
https://storage.googleapis.com/kubernetes-release/release/v1.7.6/b
in/linux/amd64/kube-controller-manager
      ExecStartPre=/usr/bin/chmod +x /opt/bin/kube-controller-
manager
      ExecStart=/opt/bin/kube-controller-manager \
      --service-account-private-key-file=/opt/bin/kube-
serviceaccount.key \
      --root-ca-file=/var/run/kubernetes/apiserver.crt \
      --master=127.0.0.1:8080 \
      ...

3.13.b. Systemd Unit Files for Kubernetes (Cont.)

Chapter 3. Installation and Configuration > 3.13.b. Systemd Unit Files for
Kubernetes (Cont.)

This is by no means a perfect unit file. It downloads the controller binary from the
published release of Kubernetes, and sets a few flags to run.
As you dive deeper in the configuration of each component, you will become more familiar
not only with its configuration, but also with the various existing options, including those for
authentication, authorization, HA, container runtime, etc. Expect them to change.
For example, the API server is highly configurable. The Kubernetes documentation
provides more details about the kube-apiserver.

3.14. Using Hyperkube

Chapter 3. Installation and Configuration > 3.14. Using Hyperkube

While you can run all the components as regular system daemons in unit files, you can
also run the API server, the scheduler, and the controller-manager as containers. This is
what kubeadm does.
Similar to minikube, there is a handy all-in-one binary named hyperkube, which is
available as a container image (e.g.
gcr.io/google_containers/hyperkube:v1.10.12). This is hosted by Google, so
you may need to add a new repository so Docker would know where to pull the file. You
can find the current release of software here:
https://console.cloud.google.com/gcr/images/google-containers/GLOBAL/hyperkube.
This method of installation consists in running a kubelet as a system daemon and
configuring it to read in manifests that specify how to run the other components (i.e. the
API server, the scheduler, etcd, the controller). In these manifests, the hyperkube image
is used. The kubelet will watch over them and make sure they get restarted if they die.
To get a feel for this, you can simply download the hyperkube image and run a container
to get help usage:
$ docker run --rm
gcr.io/google_containers/hyperkube:v1.15.5 /hyperkube apiserver --
help
$ docker run --rm
gcr.io/google_containers/hyperkube:v1.15.5 /hyperkube scheduler --
help
$ docker run --rm
gcr.io/google_containers/hyperkube:v1.15.5 /hyperkube controller-
manager --help
This is also a very good way to start learning the various configuration flags.

3.15. Compiling from Source

Chapter 3. Installation and Configuration > 3.15. Compiling from Source

The list of binary releases is available on GitHub. Together with gcloud, minikube, and
kubeadmin, these cover several scenarios to get started with Kubernetes.
Kubernetes can also be compiled from source relatively quickly. You can clone the
repository from GitHub, and then use the Makefile to build the binaries. You can build
them natively on your platform if you have a Golang environment properly setup, or via
Docker containers if you are on a Docker host.
To build natively with Golang, first install Golang. Download files and directions can be
found online. https://golang.org/doc/install.
Once Golang is working, you can clone the kubernetes repository, around 500MB in
size. Change into the directory and use make:
$ cd $GOPATH
$ git clone https://github.com/kubernetes/kubernetes
$ cd kubernetes
$ make
On a Docker host, clone the repository anywhere you want and use the make quick-
release command. The build will be done in Docker containers. 
The _output/bin directory will contain the newly built binaries.
 
4.3. Learning Objectives

Chapter 4. Kubernetes Architecture > 4.3. Learning Objectives

By the end of this chapter, you should be able to:


 Discuss the main components of a Kubernetes cluster.
 Learn details of the master agent kube-apiserver.
 Explain how the etcd database keeps the cluster state and configuration.
 Study the kubelet local agent.
 Examine how controllers are used to manage the cluster state.
 Discover what a Pod is to the cluster.
 Examine network configurations of a cluster.
 Discuss Kubernetes services. 

4.4.a. Main Components

Chapter 4. Kubernetes Architecture > 4.4.a. Main Components

Kubernetes has the following main components:


 Master and worker nodes
 Controllers
 Services
 Pods of containers
 Namespaces and quotas
 Network and policies
 Storage.
A Kubernetes cluster is made of a master node and a set of worker nodes. The cluster is
all driven via API calls to controllers, both interior as well as exterior traffic. We will take a
closer look at these components next.
Most of the processes are executed inside a container. There are some differences,
depending on the vendor and the tool used to build the cluster.

4.4.b. Main Components (Cont.)

Chapter 4. Kubernetes Architecture > 4.4.b. Main Components (Cont.)

Kubernetes Architectural Overview (Retrieved from the Kubernetes website)


4.5. Master Node

Chapter 4. Kubernetes Architecture > 4.5. Master Node

The Kubernetes master runs various server and manager processes for the cluster.
Among the components of the master node are the kube-apiserver, the kube-scheduler,
and the etcd database. As the software has matured, new components have been created
to handle dedicated needs, such as the cloud-controller-manager; it handles tasks once
handled by the kube-controller-manager to interact with other tools, such as Rancher or
DigitalOcean for third-party cluster management and reporting.
There are several add-ons which have become essential to a typical production cluster,
such as DNS services. Others are third-party solutions where Kubernetes has not yet
developed a local component, such as cluster-level logging and resource monitoring.

4.6. kube-apiserver

Chapter 4. Kubernetes Architecture > 4.6. kube-apiserver

The kube-apiserver is central to the operation of the Kubernetes cluster.


All calls, both internal and external traffic, are handled via this agent. All actions are
accepted and validated by this agent, and it is the only connection to the etcd database.
As a result, it acts as a master process for the entire cluster, and acts as a frontend of the
cluster's shared state.
Starting as an alpha feature in v1.16 is the ability to separate user-initiated traffic from
server-initiated traffic. Until these features are developed, most network plugins commingle
the traffic, which has performance, capacity, and security ramifications.

4.7. kube-scheduler

Chapter 4. Kubernetes Architecture > 4.7. kube-scheduler

The kube-scheduler uses an algorithm to determine which node will host a Pod of
containers. The scheduler will try to view available resources (such as volumes) to bind,
and then try and retry to deploy the Pod based on availability and success.
There are several ways you can affect the algorithm, or a custom scheduler could be used
instead. You can also bind a Pod to a particular node, though the Pod may remain in a
pending state due to other settings.
One of the first settings referenced is if the Pod can be deployed within the current quota
restrictions. If so, then the taints and tolerations, and labels of the Pods are used along
with those of the nodes to determine the proper placement.
The details of the scheduler can be found on GitHub.

4.8. etcd Database


Chapter 4. Kubernetes Architecture > 4.8. etcd Database

The state of the cluster, networking, and other persistent information is kept in an etcd
database, or, more accurately, a b+tree key-value store. Rather than finding and
changing an entry, values are always appended to the end. Previous copies of the data
are then marked for future removal by a compaction process. It works with curl and other
HTTP libraries, and provides reliable watch queries.
Simultaneous requests to update a value all travel via the kube-apiserver, which then
passes along the request to etcd in a series. The first request would update the database.
The second request would no longer have the same version number, in which case the
kube-apiserver would reply with an error 409 to the requester. There is no logic past that
response on the server side, meaning the client needs to expect this and act upon the
denial to update.
There is a master database along with possible followers. They communicate
with each other on an ongoing basis to determine which will be master, and determine
another in the event of failure. While very fast and potentially durable, there have been
some hiccups with new tools, such as kubeadm, and features like whole cluster upgrades.

4.9. Other Agents

Chapter 4. Kubernetes Architecture > 4.9. Other Agents

The kube-controller-manager is a core control loop daemon which interacts with the
kube-apiserver to determine the state of the cluster. If the state does not match, the
manager will contact the necessary controller to match the desired state. There are
several controllers in use, such as endpoints, namespace, and replication. The full list has
expanded as Kubernetes has matured.
Remaining in beta in v1.16, the cloud-controller-manager (ccm) interacts with agents
outside of the cloud. It handles tasks once handled by kube-controller-manager. This
allows faster changes without altering the core Kubernetes control process. Each kubelet
must use the --cloud-provider-external settings passed to the binary. You can
also develop your own CCM, which can be deployed as a daemonset as an in-tree
deployment or as a free-standing out-of-tree installation. The cloud-controller-manager is
an optional agent which takes a few steps to enable. You can find more details about the
cloud controller manager online.

4.10. Worker Nodes

Chapter 4. Kubernetes Architecture > 4.10. Worker Nodes

All worker nodes run the kubelet and kube-proxy, as well as the container engine, such
as Docker or rkt. Other management daemons are deployed to watch these agents or
provide services not yet included with Kubernetes.
The kubelet interacts with the underlying Docker Engine also installed on all the nodes,
and makes sure that the containers that need to run are actually running. The kube-proxy
is in charge of managing the network connectivity to the containers. It does so through the
use of iptables entries. It also has the userspace mode, in which it monitors Services
and Endpoints using a random port to proxy traffic and an alpha feature of ipvs.
You can also run an alternative to the Docker engine: cri-o or rkt. To learn how you can do
that, you should check the documentation. In future releases, it is highly likely that
Kubernetes will support additional container runtime engines.
Supervisord is a lightweight process monitor used in traditional Linux environments to
monitor and notify about other processes. In the cluster, this daemon monitors both the
kubelet and docker processes. It will try to restart them if they fail, and log events. While
not part of a standard installation, some may add this monitor for added reporting.
Kubernetes does not have cluster-wide logging yet. Instead, another CNCF project is
used, called Fluentd. When implemented, it provides a unified logging layer for the cluster,
which filters, buffers, and routes messages.

4.11. kubelet

Chapter 4. Kubernetes Architecture > 4.11. kubelet

The kubelet agent is the heavy lifter for changes and configuration on worker nodes. It
accepts the API calls for Pod specifications (a PodSpec is a JSON or YAML file that
describes a pod). It will work to configure the local node until the specification has been
met.
Should a Pod require access to storage, Secrets or ConfigMaps, the kubelet will ensure
access or creation. It also sends back status to the kube-apiserver for eventual
persistence.
 Uses PodSpec
 Mounts volumes to Pod
 Downloads secrets
 Passes request to local container engine
 Reports status of Pods and node to cluster.

Kubelet calls other components such as the Topology Manager, which uses hints from
other components to configure topology-aware resource assignments such as for CPU
and hardware accelerators. As an alpha feature, it is not enabled by default.

4.12.a. Services

Chapter 4. Kubernetes Architecture > 4.12.a. Services

With every object and agent decoupled we need a flexible and scalable agent which
connects resources together and will reconnect, should something die and a replacement
is spawned. Each Service is a microservice handling a particular bit of traffic, such as a
single NodePort or a LoadBalancer to distribute inbound requests among many Pods.
A Service also handles access policies for inbound requests, useful for resource control,
as well as for security.
 Connect Pods together
 Expose Pods to Internet
 Decouple settings
 Define Pod access policy.

4.12.b. Services

Chapter 4. Kubernetes Architecture > 4.12.b. Services

We can use a service to connect one pod to another, or to outside of the cluster. This
graphic shows a pod with a primary container, App, with an optional sidecar Logger. Also
seen is the pause container, which is used by the cluster to reserve the IP address in the
namespace prior to starting the other pods. This container is not seen from within
Kubernetes, but can be seen using docker and crictl.

This graphic also shows a ClusterIP which is used to connect inside the cluster, not the IP
of the cluster. As the graphic shows, this can be used to connect to a NodePort for outside
the cluster, an IngressController or proxy, or another ”backend” pod or pods.
4.13. Controllers

Chapter 4. Kubernetes Architecture > 4.13. Controllers

An important concept for orchestration is the use of controllers. Various controllers ship
with Kubernetes, and you can create your own, as well. A simplified view of a controller is
an agent, or Informer, and a downstream store. Using a DeltaFIFO queue, the source
and downstream are compared. A loop process receives an obj or object, which is an
array of deltas from the FIFO queue. As long as the delta is not of the type Deleted, the
logic of the controller is used to create or modify some object until it matches the
specification.
The Informer which uses the API server as a source requests the state of an object via
an API call. The data is cached to minimize API server transactions. A similar agent is the
SharedInformer; objects are often used by multiple other objects. It creates a shared
cache of the state for multiple requests.
A Workqueue uses a key to hand out tasks to various workers. The standard Go work
queues of rate limiting, delayed, and time queue are typically used.
The endpoints, namespace, and serviceaccounts controllers each manage the
eponymous resources for Pods.

4.14. Pods

Chapter 4. Kubernetes Architecture > 4.14. Pods

The whole point of Kubernetes is to orchestrate the lifecycle of a container. We do not


interact with particular containers. Instead, the smallest unit we can work with is a Pod.
Some would say a pod of whales or peas-in-a-pod. Due to shared resources, the design of
a Pod typically follows a one-process-per-container architecture.
Containers in a Pod are started in parallel. As a result, there is no way to determine which
container becomes available first inside a pod. The use of InitContainers can order startup,
to some extent. To support a single process running in a container, you may need logging,
a proxy, or special adapter. These tasks are often handled by other containers in the same
pod.
There is only one IP address per Pod, for almost every network plugin. If there is more
than one container in a pod, they must share the IP. To communicate with each other, they
can either use IPC, the loopback interface, or a shared filesystem.
While Pods are often deployed with one application container in each, a common reason
to have multiple containers in a Pod is for logging. You may find the term sidecar for a
container dedicated to performing a helper task, like handling logs and responding to
requests, as the primary application container may have this ability. The term sidecar,
like ambassador and adapter, does not have a special setting, but refers to the concept
of what secondary pods are included to do.

4.15.a. Rewrite Legacy Applications

Chapter 4. Kubernetes Architecture > 4.15.a. Rewrite Legacy Applications

Moving legacy applications to Kubernetes often brings up the question if the application
should be containerized as is, or rewritten as a transient, decoupled microservice. The
cost and time of rewriting legacy applications can be high, but there is also value to
leveraging the flexibility of Kubernetes. This video discusses the issue comparing a city
bus (monolithic legacy application) to a scooter (transient, decoupled microservices).

4.16. Containers
Chapter 4. Kubernetes Architecture > 4.16. Containers

While Kubernetes orchestration does not allow direct manipulation on a container level, we
can manage the resources containers are allowed to consume.
In the resources section of the PodSpec you can pass parameters which will be passed
to the container runtime on the scheduled node:
resources:
  limits:
    cpu: "1"
    memory: "4Gi"
  requests:
    cpu: "0.5"
    memory: "500Mi"
Another way to manage resource usage of the containers is by creating a
ResourceQuota object, which allows hard and soft limits to be set in a namespace. The
quotas allow management of more resources than just CPU and memory and allows
limiting several objects.
A beta feature in v1.12 uses the scopeSelector field in the quota spec to run a pod at a
specific priority if it has the appropriate priorityClassName in its pod spec.

4.17. Init Containers

Chapter 4. Kubernetes Architecture > 4.17. Init Containers

Not all containers are the same. Standard containers are sent to the container engine at the same
time, and may start in any order. LivenessProbes, ReadinessProbes, and StatefulSets can be used to
determine the order, but can add complexity. Another option can be an Init Container, which must
complete before app containers can be started. Should the init container fail, it will be restarted until
completion, without the app container running.
The init container can have a different view of the storage and security settings, which allows us to
use utilities and commands that the application would not be allowed to use. Init containers can
contain code or utilities that are not in an app. It also has an independent security from app
containers.
The code below will run the init container until the ls command succeeds; then the database
container will start.
spec:
  containers:
  - name: main-app
    image: databaseD
  initContainers:
  - name: wait-fatabase
    image: busybox
    command: ['sh', '-c', 'until ls /db/dir ; do sleep 5; done; ']

4.18.a. Component Review


Chapter 4. Kubernetes Architecture > 4.18.a. Component Review

Now that we have seen some of the components, lets take another look with some of the
connections shown. Not all connections are shown in this diagram. Note that all of the
components are communicating with kube-apiserver. Only kube-apiserver
communicates with the etcd database.
We also see some commands, which we may need to install separately to work with
various components. There is an etcdctl command to interrogate the database and
calicoctl to view more of how the network is configured. We can see Felix, which is
the primary Calico agent on each machine. This agent, or daemon, is responsible for
interface  monitoring and management, route programming, ACL configuration and state
reporting.
BIRD is a dynamic IP routing daemon used by Felix to read routing state and distribute
that information to other nodes in the cluster. This allows a client to connect to any node,
and eventually be connected to the workload on a container, even if not the node originally
contacted.

4.18.b. Component Review

Chapter 4. Kubernetes Architecture > 4.18.b. Component Review

4.19.a. API Call Flow

Chapter 4. Kubernetes Architecture > 4.19.a. API Call Flow

On the next page, you will find a video that should help you get a better understanding of the API
call flow from a request for a new pod through to pod and container deployment and ongoing
cluster status.
4.20. Node

Chapter 4. Kubernetes Architecture > 4.20. Node

A node is an API object created outside the cluster representing an instance. While a
master must be Linux, worker nodes can also be Microsoft Windows Server 2019. Once
the node has the necessary software installed, it is ingested into the API server.
At the moment, you can create a master node with kubeadm init and worker nodes by
passing join. In the near future, secondary master nodes and/or etcd nodes may be
joined.
If the kube-apiserver cannot communicate with the kubelet on a node for 5 minutes, the
default NodeLease will schedule the node for deletion and the NodeStatus will change
from ready. The pods will be evicted once a connection is re-established. They are no
longer forcibly removed and rescheduled by the cluster.
Each node object exists in the kube-node-lease namespace. To remove a node from
the cluster, first use kubectl delete node <node-name> to remove it from the API
server. This will cause pods to be evacuated. Then, use kubeadm reset to remove
cluster-specific information. You may also need to remove iptables information, depending
on if you plan on re-using the node.

4.21. Single IP per Pod

Chapter 4. Kubernetes Architecture > 4.21. Single IP per Pod

A pod represents a group of co-located containers with some associated data volumes. All
containers in a pod share the same network namespace.
The graphic shows a pod with two containers, A and B, and two data volumes, 1 and 2.
Containers A and B share the network namespace of a third container, known as the pause
container. The pause container is used to get an IP address, then all the containers in the
pod will use its network namespace. Volumes 1 and 2 are shown for completeness.
To communicate with each other, containers within pods can use the loopback interface,
write to files on a common filesystem, or via inter-process communication (IPC). 

There is now a network plugin from HPE Labs which allows multiple IP addresses per pod,
but this feature has not grown past this new plugin.

Starting as an alpha feature in 1.16 is the ability to use IPv4 and IPv6 for pods and
services. When creating a service, you would create the endpoint for each address family
separately.
Pod

4.22. Container to Outside Path

Chapter 4. Kubernetes Architecture > 4.22. Container to Outside Path

This graphic shows a node with a single, dual-container pod. A NodePort service connects
the Pod to the outside network. Even though there are two containers, they share the
same namespace and the same IP address, which would be configured by kubectl working
with kube-proxy. The IP address is assigned before the containers are started, and will be
inserted into the containers. The container will have an interface like eth0@tun10. This IP
is set for the life of the pod.
The end point is created at the same time as the service. Note that it uses the pod IP
address, but also includes a port. The service connects network traffic from a node high-
number port to the endpoint using iptables with ipvs on the way. The kube-controller-
manager handles the watch loops to monitor the need for endpoints and services, as well
as any updates or deletions.
4.23. Networking Setup

Chapter 4. Kubernetes Architecture > 4.23. Networking Setup

Getting all the previous components running is a common task for system administrators
who are accustomed to configuration management. But, to get a fully functional
Kubernetes cluster, the network will need to be set up properly, as well.
A detailed explanation about the Kubernetes networking model can be seen on the
Cluster Networking page in the Kubernetes documentation.
If you have experience deploying virtual machines (VMs) based on IaaS solutions, this will
sound familiar. The only caveat is that, in Kubernetes, the lowest compute unit is not a
container, but what we call a pod.
A pod is a group of co-located containers that share the same IP address. From a
networking perspective, a pod can be seen as a virtual machine of physical hosts. The
network needs to assign IP addresses to pods, and needs to provide traffic routes between all
pods on any nodes. 
The three main networking challenges to solve in a container orchestration system are:
 Coupled container-to-container communications (solved by the pod concept).
 Pod-to-pod communications.
 External-to-pod communications (solved by the services concept, which we will
discuss later).
Kubernetes expects the network configuration to enable pod-to-pod communications to
be available; it will not do it for you.
Tim Hockin, one of the lead Kubernetes developers, has created a very useful slide deck
to understand the Kubernetes networking An Illustrated Guide to Kubernetes Networking.

4.24.a. CNI Network Configuration File

Chapter 4. Kubernetes Architecture > 4.24.a. CNI Network Configuration File

To provide container networking, Kubernetes is standardizing on the Container Network


Interface (CNI) specification. Since v1.6.0, kubeadm's (the Kubernetes cluster
bootstrapping tool) goal has been to use CNI, but you may need to recompile to do so.
CNI is an emerging specification with associated libraries to write plugins that configure
container networking and remove allocated resources when the container is deleted. Its
aim is to provide a common interface between the various networking solutions and
container runtimes. As the CNI specification is language-agnostic, there are many plugins
from Amazon ECS, to SR-IOV, to Cloud Foundry, and more.

4.24.b. CNI Network Configuration File (Cont.)

Chapter 4. Kubernetes Architecture > 4.24.b. CNI Network Configuration File (Cont.)

With CNI, you can write a network configuration file:


{
    "cniVersion": "0.2.0",
    "name": "mynet",
    "type": "bridge",
    "bridge": "cni0",
    "isGateway": true,
    "ipMasq": true,
    "ipam": {
        "type": "host-local",
        "subnet": "10.22.0.0/16",
        "routes": [
            { "dst": "0.0.0.0/0" }
         ]
    }
}
This configuration defines a standard Linux bridge named cni0, which will give out IP
addresses in the subnet 10.22.0.0./16. The bridge plugin will configure the network
interfaces in the correct namespaces to define the container network properly.
The main README of the CNI GitHub repository has more information.

4.25. Pod-to-Pod Communication

Chapter 4. Kubernetes Architecture > 4.25. Pod-to-Pod Communication

While a CNI plugin can be used to configure the network of a pod and provide a single IP
per pod, CNI does not help you with pod-to-pod communication across nodes.
The requirement from Kubernetes is the following:
 All pods can communicate with each other across nodes.
 All nodes can communicate with all pods.
 No Network Address Translation (NAT).
Basically, all IPs involved (nodes and pods) are routable without NAT. This can be
achieved at the physical network infrastructure if you have access to it (e.g. GKE). Or, this
can be achieved with a software defined overlay with solutions like:
 Weave
 Flannel
 Calico
 Romana.
See this documentation page or the list of networking add-ons for a more complete list.

4.26. Mesos

Chapter 4. Kubernetes Architecture > 4.26. Mesos

At a high level, there is nothing different between Kubernetes and other clustering
systems.
A central manager exposes an API, a scheduler places the workloads on a set of nodes,
and the state of the cluster is stored in a persistent layer.
For example, you could compare Kubernetes with Mesos, and you would see the
similarities. In Kubernetes, however, the persistence layer is implemented with etcd,
instead of Zookeeper for Mesos.
You should also consider systems like OpenStack and CloudStack. Think about what runs
on their head node, and what runs on their worker nodes. How do they keep state? How
do they handle networking? If you are familiar with those systems, Kubernetes will not
seem that different.
What really sets Kubernetes apart is its features oriented towards fault-tolerance, self-
discovery, and scaling, coupled with a mindset that is purely API-driven.

Mesos Architecture (by The Apache Software Foundation, retrieved from the  Mesos
website)

5.3. Learning Objectives

Chapter 5. APIs and Access > 5.3. Learning Objectives

By the end of this chapter, you should be able to:


 Understand the API REST-based architecture.
 Work with annotations.
 Understand a simple Pod template.
 Use kubectl with greater verbosity for troubleshooting.
 Separate cluster resources using namespaces.
 5.4. API Access
 Chapter 5. APIs and Access > 5.4. API Access
 Kubernetes has a powerful REST-based API. The entire architecture is API-driven.
Knowing where to find resource endpoints and understanding how the API
changes between versions can be important to ongoing administrative tasks, as
there is much ongoing change and growth. Starting with v1.16 deprecated objects
are no longer honored by the API server.
 As we learned in the Architecture chapter, the main agent for communication
between cluster agents and from outside the cluster is the kube-apiserver. A curl
query to the agent will expose the current API groups. Groups may have multiple
versions, which evolve independently of other groups, and follow a domain-name
format with several names reserved, such as single-word domains, the empty
group, and any name ending in .k8s.io.

5.5. RESTful

Chapter 5. APIs and Access > 5.5. RESTful

kubectl makes API calls on your behalf, responding to typical HTTP verbs (GET, POST,
DELETE). You can also make calls externally, using curl or other program. With the
appropriate certificates and keys, you can make requests, or pass JSON files to make
configuration changes.
$ curl --cert userbob.pem --key userBob-key.pem \
  --cacert /path/to/ca.pem \
  https://k8sServer:6443/api/v1/pods
The ability to impersonate other users or groups, subject to RBAC configuration, allows a
manual override authentication. This can be helpful for debugging authorization policies of
other users.

5.6. Checking Access

Chapter 5. APIs and Access > 5.6. Checking Access

While there is more detail on security in a later chapter, it is helpful to check the current
authorizations, both as an administrator, as well as another user. The following shows
what user bob could do in the default namespace and the developer namespace,
using the auth can-i subcommand to query:
$ kubectl auth can-i create deployments
yes
$ kubectl auth can-i create deployments --as bob
no
$ kubectl auth can-i create deployments --as bob --namespace
developer
yes
There are currently three APIs which can be applied to set who and what can be queried:
 SelfSubjectAccessReview
Access review for any user, helpful for delegating to others.
 LocalSubjectAccessReview
Review is restricted to a specific namespace.
 SelfSubjectRulesReview
A review which shows allowed actions for a user within a particular namespace.
The use of reconcile allows a check of authorization necessary to create an object from
a file. No output indicates the creation would be allowed.

5.7. Optimistic Concurrency

Chapter 5. APIs and Access > 5.7. Optimistic Concurrency

The default serialization for API calls must be JSON. There is an effort to use Google's
protobuf serialization, but this remains experimental. While we may work with files in a
YAML format, they are converted to and from JSON.
Kubernetes uses the resourceVersion value to determine API updates and implement
optimistic concurrency. In other words, an object is not locked from the time it has been
read until the object is written.
Instead, upon an updated call to an object, the resourceVersion is checked, and a 409
CONFLICT is returned, should the number have changed. The resourceVersion is
currently backed via the modifiedIndex parameter in the etcd database, and is unique
to the namespace, kind, and server. Operations which do not change an object, such as
WATCH or GET, do not update this value.

5.8. Using Annotations

Chapter 5. APIs and Access > 5.8. Using Annotations

Labels are used to work with objects or collections of objects; annotations are not.
Instead, annotations allow for metadata to be included with an object that may be helpful
outside of the Kubernetes object interaction. Similar to labels, they are key to value maps.
They are also able to hold more information, and more human-readable information than
labels.
Having this kind of metadata can be used to track information such as a timestamp,
pointers to related objects from other ecosystems, or even an email from the developer
responsible for that object's creation.
The annotation data could otherwise be held in an exterior database, but that would limit
the flexibility of the data. The more this metadata is included, the easier it is to integrate
management and deployment tools or shared client libraries.
For example, to annotate only Pods within a namespace, you can overwrite the
annotation, and finally delete it:
$ kubectl annotate pods --all description='Production Pods' -n
prod
$ kubectl annotate --overwrite pods description="Old Production
Pods" -n prod
$ kubectl annotate pods foo description- -n prod

5.9. Simple Pod

Chapter 5. APIs and Access > 5.9. Simple Pod

As discussed earlier, a Pod is the lowest compute unit and individual object we can work
with in Kubernetes. It can be a single container, but often, it will consist of a primary
application container and one or more supporting containers.
Below is an example of a simple pod manifest in YAML format. You can see the
apiVersion (it must match the existing API group), the kind (the type of object to
create), the metadata (at least a name), and its spec (what to create and parameters),
which define the container that actually runs in this pod:
apiVersion: v1
kind: Pod
metadata:
    name: firstpod
spec:
    containers:
    - image: nginx
      name: stan
You can use the kubectl create command to create this pod in Kubernetes. Once it is
created, you can check its status with kubectl get pods. The output is omitted to save
space:
$ kubectl create -f simple.yaml
$ kubectl get pods
$ kubectl get pod firstpod -o yaml
$ kubectl get pod firstpod -o json

5.10. Manage API Resources with kubectl

Chapter 5. APIs and Access > 5.10. Manage API Resources with kubectl

Kubernetes exposes resources via RESTful API calls, which allows all resources to be
managed via HTTP, JSON or even XML, the typical protocol being HTTP. The state of the
resources can be changed using standard HTTP verbs (e.g. GET, POST, PATCH, DELETE,
etc.).
kubectl has a verbose mode argument which shows details from where the command
gets and updates information. Other output includes curl commands you could use to
obtain the same result. While the verbosity accepts levels from zero to any number, there
is currently no verbosity value greater than nine. You can check this out for kubectl
get. The output below has been formatted for clarity:
$ kubectl --v=10 get pods firstpod
....
I1215 17:46:47.860958 29909 round_trippers.go:417]
   curl -k -v -XGET -H "Accept: application/json"
   -H "User-Agent: kubectl/v1.8.5 (linux/amd64)
kubernetes/cce11c6" 
   https://10.128.0.3:6443/api/v1/namespaces/default/pods/firstpod

....
If you delete this pod, you will see that the HTTP method changes from XGET to XDELETE:
$ kubectl --v=9 delete pods firstpod
....
I1215 17:49:32.166115 30452 round_trippers.go:417]
   curl -k -v -XDELETE -H "Accept: application/json, */*"
   -H "User-Agent: kubectl/v1.8.5 (linux/amd64)
kubernetes/cce11c6"
   https://10.128.0.3:6443/api/v1/namespaces/default/pods/firstpod

5.11. Access from Outside the Cluster

Chapter 5. APIs and Access > 5.11. Access from Outside the Cluster

The primary tool used from the command line will be kubectl, which calls curl on your
behalf. You can also use the curl command from outside the cluster to view or make
changes.

The basic server information, with redacted TLS certificate information, can be found in the
output of

$ kubectl config view

If you view the verbose output from a previous page, you will note that the first line
references a configuration file where this information is pulled from, ~/.kube/config:

I1215 17:35:46.725407 27695 loader.go:357]


     Config loaded from file /home/student/.kube/config

Without the certificate authority, key and certificate from this file, only insecure curl
commands can be used, which will not expose much due to security settings. We will use
curl to access our cluster using TLS in an upcoming lab.

5.12.a. ~/.kube/config
Chapter 5. APIs and Access > 5.12.a. ~/.kube/config

Take a look at the output below:

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: LS0tLS1CRUdF.....
    server: https://10.128.0.3:6443
  name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: kubernetes-admin
  name: kubernetes-admin@kubernetes
current-context: kubernetes-admin@kubernetes
kind: Config
preferences: {}
users:
- name: kubernetes-admin
  user:
    client-certificate-data: LS0tLS1CRUdJTib.....
    client-key-data: LS0tLS1CRUdJTi....

5.12.b. ~/.kube/config (Cont.)

Chapter 5. APIs and Access > 5.12.b. ~/.kube/config (Cont.)

The output on the previous page shows 19 lines of output, with each of the keys being
heavily truncated. While the keys may look similar, close examination shows them to be
distinct:

 apiVersion
As with other objects, this instructs the kube-apiserver where to assign the
data.
 clusters 
This contains the name of the cluster, as well as where to send the API calls. The
certificate-authority-data is passed to authenticate the curl request.
 contexts
A setting which allows easy access to multiple clusters, possibly as various users,
from one configuration file. It can be used to set namespace, user, and cluster.
 current-context
Shows which cluster and user the kubectl command would use. These settings
can also be passed on a per-command basis.
 kind 
Every object within Kubernetes must have this setting; in this case, a declaration of
object type Config.
 preferences 
Currently not used, optional settings for the kubectl command, such as colorizing
output.
 users
A nickname associated with client credentials, which can be client key and
certificate, username and password, and a token. Token and username/password
are mutually exclusive. These can be configured via the kubectl config set-
credentials command.

5.13. Namespaces

Chapter 5. APIs and Access > 5.13. Namespaces

The term namespace is used to reference both the kernel feature and the segregation of
API objects by Kubernetes. Both are means to keep resources distinct.
Every API call includes a namespace, using default if not otherwise declared:
https://10.128.0.3:6443/api/v1/namespaces/default/pods.
Namespaces, a Linux kernel feature that segregates system resources, are intended to
isolate multiple groups and the resources they have access to work with via quotas.
Eventually, access control policies will work on namespace boundaries, as well. One could
use Labels to group resources for administrative reasons.
There are four namespaces when a cluster is first created.
 default
This is where all resources are assumed, unless set otherwise.
 kube-node-lease
The namespace where worker node lease information is kept.
 kube-public
A namespace readable by all, even those not authenticated. General information is
often included in this namespace.
 kube-system
Contains infrastructure pods.
Should you want to see all the resources on a system, you must pass the --all-
namespaces option to the kubectl command.

5.14. Working with Namespaces

Chapter 5. APIs and Access > 5.14. Working with Namespaces

Take a look at the following commands:


$ kubectl get ns
$ kubectl create ns linuxcon
$ kubectl describe ns linuxcon
$ kubectl get ns/linuxcon -o yaml
$ kubectl delete ns/linuxcon
The above commands show how to view, create and delete namespaces. Note that the
describe subcommand shows several settings, such as Labels, Annotations, resource
quotas, and resource limits, which we will discus later in the course.
Once a namespace has been created, you can reference it via YAML when creating a
resource:
$ cat redis.yaml
apiVersion: V1
kind: Pod
metadata:
    name: redis
    namespace: linuxcon
...

5.15. API Resources with kubectl

Chapter 5. APIs and Access > 5.15. API Resources with kubectl

All API resources exposed are available via kubectl. To get more information, do
kubectl help. 
kubectl [command] [type] [Name] [flag]
Expect the list below to change:     
 
 podsecuritypolicies
 all  events (ev)
(psp)
 certificatesigningrequ  horizontalpodautosca
 podtemplates
ests (csr) lers (hpa)
 clusterrolebindings   ingresses (ing)  replicasets (rs)
 replicationcontroll
 clusterroles  jobs
ers (rc)
 clusters (valid only for  resourcequotas
 limitranges (limits)
federation apiservers) (quota)
 componentstatuses (cs)  namespaces (ns)  rolebindings
 networkpolicies
 configmaps (cm)  roles
(netpol)
 controllerrevisions  nodes (no)  secrets
 persistentvolumeclai
 cronjobs  serviceaccounts (sa)
ms (pvc)
 customresourcedefiniti  persistentvolumes (pv)  services (svc)
on (crd)
 poddisruptionbudgets
 daemonsets (ds)  statefulsets
(pdb)
 deployments (deploy)  podpreset  storageclasses

 endpoints (ep)  pods (po)

5.16. Additional Resource Methods

Chapter 5. APIs and Access > 5.16. Additional Resource Methods

In addition to basic resource management via REST, the API also provides some
extremely useful endpoints for certain resources.
For example, you can access the logs of a container, exec into it, and watch changes to it
with the following endpoints:
$ curl --cert /tmp/client.pem --key /tmp/client-key.pem \
  --cacert /tmp/ca.pem -v -XGET \
  https://10.128.0.3:6443/api/v1/namespaces/default/pods/firstpod/
log
This would be the same as the following. If the container does not have any standard out,
there would be no logs.
$ kubectl logs firstpod
Other calls you could make, following the various API groups on your cluster:
GET /api/v1/namespaces/{namespace}/pods/{name}/exec
GET /api/v1/namespaces/{namespace}/pods/{name}/log
GET /api/v1/watch/namespaces/{namespace}/pods/{name}

5.17. Swagger

Chapter 5. APIs and Access > 5.17. Swagger

The entire Kubernetes API uses a Swagger specification. This is evolving towards the
OpenAPI initiative. It is extremely useful, as it allows, for example, to auto-generate client
code. All the stable resources definitions are available on the documentation site.
You can browse some of the API groups via a Swagger UI on the OpenAPI Specification
web page.
5.18. API Maturity

Chapter 5. APIs and Access > 5.18. API Maturity

The use of API groups and different versions allows for development to advance without
changes to an existing group of APIs. This allows for easier growth and separation of work
among separate teams. While there is an attempt to maintain some consistency between
API and software versions, they are only indirectly linked.
The use of JSON and Google's Protobuf serialization scheme will follow the same release
guidelines.
An Alpha level release, noted with alpha in the name, may be buggy and is disabled by
default. Features could change or disappear at any time. Only use these features on a test
cluster which is often rebuilt.
The Beta level, found with beta in the name, has more well-tested code and is enabled by
default. It also ensures that, as changes move forward, they will be tested for backwards
compatibility between versions. It has not been adopted and tested enough to be called
stable. You can expect some bugs and issues.
Use of the Stable version, denoted by only an integer which may be preceded by the letter
v, is for stable APIs.

6.3. Learning Objectives

Chapter 6. API Objects > 6.3. Learning Objectives

By the end of this chapter, you should be able to:


 Explore API versions.
 Discuss rapid change and development.
 Deploy and configure and application using a Deployment.
 Examine primitives for a self-healing application.
 Scale an application.

6.4. Overview

Chapter 6. API Objects > 6.4. Overview

This chapter is about API resources or objects. We will learn about resources in the v1
API group, among others. Stability increases and code becomes more stable as objects
move from alpha versions, to beta, and then v1, indicating stability.
DaemonSets, which ensure a Pod on every node, and StatefulSets, which stick a
container to a node and otherwise act like a deployment, have progressed
to apps/v1 stability. Jobs and CronJobs are now in batch/v1.
Role-Based Access Control (RBAC), essential to security, has made the leap from
v1alpha1 to the stable v1 status.
As a fast moving project keeping track of changes, any possible changes can be an
important part of the ongoing system administration. Release notes, as well as discussions
to release notes, can be found in version-dependent subdirectories in the Features
tracking repository for Kubernetes releases on GitHub. For example, the v1.17 release
feature status can be found on the Kubernetes v1.17.0 Release Notes page.

Starting with v1.16, deprecated API object versions will respond with an error instead of
being accepted. This is an important change from the historic behavior.

6.5. v1 API Group

Chapter 6. API Objects > 6.5. v1 API Group

The v1 API group is no longer a single group, but rather a collection of groups for each
main object category. For example, there is a v1 group, a storage.k8s.io/v1 group, and an
rbac.authorization.k8s.io/v1, etc. Currently, there are eight v1 groups.
We have touched on several objects in lab exercises. Here are some details for some of
them:
 Node
Represents a machine - physical or virtual - that is part of your
Kubernetes cluster. You can get more information about nodes with
the kubectl get nodes command. You can turn on and off the scheduling to a
node with the kubectl cordon/uncordon commands.
 Service Account
Provides an identifier for processes running in a pod to access the API server and
performs actions that it is authorized to do.
 Resource Quota
It is an extremely useful tool, allowing you to define quotas per namespace. For
example, if you want to limit a specific namespace to only run a given number of
pods, you can write a resourcequota manifest, create it with kubectl and the quota
will be enforced.
 Endpoint
Generally, you do not manage endpoints. They represent the set of IPs for Pods
that match a particular service. They are handy when you want to check that a
service actually matches some running pods. If an endpoint is empty, then it means
that there are no matching pods and something is most likely wrong with your
service definition.

6.6. Discovering API Groups

Chapter 6. API Objects > 6.6. Discovering API Groups

We can take a closer look at the output of the request for current APIs. Each of the name
values can be appended to the URL to see details of that group. For example, you could
drill down to find included objects at this URL: https://localhost:
6443/apis/apiregistrationk8s.io/v1beta1.
If you follow this URL, you will find only one resource, with a name of apiservices. If it
seems to be listed twice, the lower output is for status. You'll notice that there are different
verbs or actions for each. Another entry is if this object is namespaced, or restricted to only
one namespace. In this case, it is not.
$ curl https://localhost:6443/apis --header "Authorization: Bearer
$token" -k
{
  "kind": "APIGroupList",
  "apiVersion": "v1",
  "groups": [
    {
      "name": "apiregistration.k8s.io",
      "versions": [
        {
          "groupVersion": "apiregistration.k8s.io/v1beta1",
          "version": "v1beta1"
        }
      ],
      "preferredVersion": {
        "groupVersion": "apiregistration.k8s.io/v1beta1",
        "version": "v1beta1"
      }
You can then curl each of these URIs and discover additional API objects, their
characteristics and associated verbs.

6.7. Deploying an Application

Chapter 6. API Objects > 6.7. Deploying an Application

Using the kubectl create command, we can quickly deploy an application. We have
looked at the Pods created running the application, like nginx. Looking closer, you will
find that a Deployment was created, which manages a ReplicaSet, which then deploys the
Pod. Lets take a closer look at each object:
 Deployment
A controller which manages the state of ReplicaSets and the pods within. The
higher level control allows for more flexibility with upgrades and administration.
Unless you have a good reason, use a deployment.
 ReplicaSet
Orchestrates individual Pod lifecycle and updates. These are newer versions of
Replication Controllers, which differ only in selector support.
 Pod
As we've mentioned, it is the lowest unit we can manage, runs the application
container, and possibly support containers.

6.8. DaemonSets

Chapter 6. API Objects > 6.8. DaemonSets

Should you want to have a logging application on every node, a DaemonSet may be a
good choice. The controller ensures that a single pod, of the same type, runs on every
node in the cluster. When a new node is added to the cluster, a Pod, same as deployed on
the other nodes, is started. When the node is removed, the DaemonSet makes sure the
local Pod is deleted. DaemonSets are often used for logging, metrics and security pods,
and can be configured to avoid nodes.
As usual, you get all the CRUD operations via kubectl:
$ kubectl get daemonsets
$ kubectl get ds

6.9. StatefulSet

Chapter 6. API Objects > 6.9. StatefulSet

According to Kubernetes documentation, StatefulSet is the workload API object used to


manage stateful applications. Pods deployed using a StatefulSet use the same Pod
specification. How this is different than a Deployment is that a StatefulSet considers
each Pod as unique and provides ordering to Pod deployment.
In order to track each Pod as a unique object, the controller uses an identity composed of
stable storage, stable network identity, and an ordinal. This identity remains with the node,
regardless of which node the Pod is running on at any one time.
The default deployment scheme is sequential, starting with 0, such as app-0, app-1,
app-2, etc. A following Pod will not launch until the current Pod reaches a running and
ready state. They are not deployed in parallel.
StatefulSets are stable as of Kubernetes v1.9.
6.10. Autoscaling

Chapter 6. API Objects > 6.10. Autoscaling

In the autoscaling group we find the Horizontal Pod Autoscalers (HPA). This is a stable
resource. HPAs automatically scale Replication Controllers, ReplicaSets, or Deployments
based on a target of 50% CPU usage by default. The usage is checked by the kubelet
every 30 seconds, and retrieved by the Metrics Server API call every minute. HPA checks
with the Metrics Server every 30 seconds. Should a Pod be added or removed, HPA waits
180 seconds before further action.
Other metrics can be used and queried via REST. The autoscaler does not collect the
metrics, it only makes a request for the aggregated information and increases or
decreases the number of replicas to match the configuration.
The Cluster Autoscaler (CA) adds or removes nodes to the cluster, based on the inability
to deploy a Pod or having nodes with low utilization for at least 10 minutes. This allows
dynamic requests of resources from the cloud provider and minimizes expenses for
unused nodes. If you are using CA, nodes should be added and removed through
cluster-autoscaler- commands. Scale-up and down of nodes is checked every 10
seconds, but decisions are made on a node every 10 minutes. Should a scale-down fail,
the group will be rechecked in 3 minutes, with the failing node being eligible in five
minutes. The total time to allocate a new node is largely dependent on the cloud provider.
Another project still under development is the Vertical Pod Autoscaler. This component
will adjust the amount of CPU and memory requested by Pods.

6.11. Jobs

Chapter 6. API Objects > 6.11. Jobs

Jobs are part of the batch API group. They are used to run a set number of pods to
completion. If a pod fails, it will be restarted until the number of completion is reached.
While they can be seen as a way to do batch processing in Kubernetes, they can also be
used to run one-off pods. A Job specification will have a parallelism and a completion key.
If omitted, they will be set to one. If they are present, the parallelism number will set the
number of pods that can run concurrently, and the completion number will set how many
pods need to run successfully for the Job itself to be considered done. Several Job
patterns can be implemented, like a traditional work queue.
Cronjobs work in a similar manner to Linux jobs, with the same time syntax. There are
some cases where a job would not be run during a time period or could run twice; as a
result, the requested Pod should be idempotent.
An option spec field is .spec.concurrencyPolicy which determines how to handle
existing jobs, should the time segment expire. If set to Allow, the default, another
concurrent job will be run. If set to Forbid, the current job continues and the new job is
skipped. A value of Replace cancels the current job and starts a new job in its place.

6.12. RBAC
Chapter 6. API Objects > 6.12. RBAC

The last API resources that we will look at are in the rbac.authorization.k8s.io group.
We actually have four resources: ClusterRole, Role, ClusterRoleBinding, and RoleBinding.
They are used for Role Based Access Control (RBAC) to Kubernetes.
$ curl localhost:8080/apis/rbac.authorization.k8s.io/v1
...
    "groupVersion": "rbac.authorization.k8s.io/v1",
    "resources": [
...
        "kind": "ClusterRoleBinding"
...
        "kind": "ClusterRole"
...
        "kind": "RoleBinding"
...
        "kind": "Role"
...
These resources allow us to define Roles within a cluster and associate users to these
Roles. For example, we can define a Role for someone who can only read pods in a
specific namespace, or a Role that can create deployments, but no services. We will talk
more about RBAC later in the course.

7.3. Learning Objectives

Chapter 7. Managing State with Deployments > 7.3. Learning Objectives

By the end of this chapter, you should be able to:


 Discuss Deployment configuration details.
 Scale a Deployment up and down.
 Implement rolling updates and roll back.
 Use Labels to select various objects.

7.4. Overview

Chapter 7. Managing State with Deployments > 7.4. Overview

The default controller for a container deployed via kubectl run is a Deployment. While
we have been working with them already, we will take a closer look at configuration
options.
As with other objects, a deployment can be made from a YAML or JSON spec file. When
added to the cluster, the controller will create a ReplicaSet and a Pod automatically. The
containers, their settings and applications can be modified via an update, which generates
a new ReplicaSet, which, in turn, generates new Pods.
The updated objects can be staged to replace previous objects as a block or as a rolling
update, which is determined as part of the deployment specification. Most updates can be
configured by editing a YAML file and running kubectl apply. You can also use
kubectl edit to modify the in-use configuration. Previous versions of the ReplicaSets
are kept, allowing a rollback to return to a previous configuration.
We will also talk more about labels. Labels are essential to administration in Kubernetes,
but are not an API resource. They are user-defined key-value pairs which can be attached
to any resource, and are stored in the metadata. Labels are used to query or select
resources in your cluster, allowing for flexible and complex management of the cluster.
As a label is arbitrary, you could select all resources used by developers, or belonging to a
user, or any attached string, without having to figure out what kind or how many of such
resources exist.

7.5. Deployments

Chapter 7. Managing State with Deployments > 7.5. Deployments

ReplicationControllers (RC) ensure that a specified number of pod replicas is running at


any one time. ReplicationControllers also give you the ability to perform rolling updates.
However, those updates are managed on the client side. This is problematic if the client
loses connectivity, and can leave the cluster in an unplanned state. To avoid problems
when scaling the RCs on the client side, a new resource has been introduced in the
extensions/v1beta1 API group: Deployments.
Deployments allow server-side updates to pods at a specified rate. They are used for
canary and other deployment patterns. Deployments generate replicaSets, which offer
more selection features than ReplicationControllers, such as matchExpressions.
$ kubectl create deployment dev-web --image=nginx:1.13.7-alpine
deployment "dev-web" created

7.6.a. Object Relationship

Chapter 7. Managing State with Deployments > 7.6.a. Object Relationship

Here we can see the relationship of objects from the container which Kubernetes does not
directly manage, up to the deployment. 
7.6.b. Object Relationship

Chapter 7. Managing State with Deployments > 7.6.b. Object Relationship

The boxes and shapes are logical, in that they represent the controllers, or watch loops,
running as a thread of kube-controller-manager. Each controller queries the kube-
apiserver for the current state of the object they track. The state of each object on a
worker node is sent back from the local kubelet.
The graphic in the upper left represents a container running nginx 1.11. Kubernetes does
not directly manage the container. Instead, the kubelet daemon checks the pod
specifications by asking the container engine, which could be Docker or cri-o, for the
current status. The graphic to the right of the container shows a pod which represents a
watch loop checking the container status. kubelet compares the current pod spec against
what the container engine replies and will terminate and restart the pod if necessary.
A multi-container pod is shown next. While there are several names used, such as
sidecar or ambassador, these are all multi-container pods. The names are used to
indicate the particular reason to have a second container in the pod, instead of denoting a
new kind of pod.
On the lower left we see a replicaSet. This controller will ensure you have a certain
number of pods running. The pods are all deployed with the same podspec, which is why
they are called replicas. Should a pod terminate or a new pod be found, the replicaSet will
create or terminate pods until the current number of running pods matches the
specifications. Any of the current pods could be terminated should the spec demand fewer
pods running.
The graphic in the lower right shows a deployment. This controller allows us to manage
the versions of images deployed in the pods. Should an edit be made to the deployment, a
new replicaSet is created, which will deploy pods using the new podSpec. The deployment
will then direct the old replicaSet to shut down pods as the new replicaSet pods become
available. Once the old pods are all terminated, the deployment terminates the old
replicaSet and the deployment returns to having only one replicaSet running.

7.7. Deployment Details

Chapter 7. Managing State with Deployments > 7.7. Deployment Details

In the previous page, we created a new deployment running a particular version of the
nginx web server.
To generate the YAML file of the newly created objects, do:
$ kubectl get deployments,rs,pods -o yaml
Sometimes, a JSON output can make it more clear:
$ kubectl get deployments,rs,pods -o json
Now, we will look at the YAML output, which also shows default values, not passed to the
object when created.
apiVersion: v1
items:
- apiVersion: apps/v1
  kind: Deployment

 apiVersion
A value of v1 shows that this object is considered to be a stable resource. In this
case, it is not the deployment. It is a reference to the List type.
 items
As the previous line is a List, this declares the list of items the command is
showing.
 - apiVersion
The dash is a YAML indication of the first item, which declares the apiVersion of
the object as apps/v1. This indicates the object is considered
stable. Deployments are controller used in most cases.
 kind
This is where the type of object to create is declared, in this case, a deployment.

7.8.a. Deployment Configuration Metadata

Chapter 7. Managing State with Deployments > 7.8.a. Deployment Configuration


Metadata

Continuing with the YAML output, we see the next general block of output concerns the
metadata of the deployment. This is where we would find labels, annotations, and other
non-configuration information. Note that this output will not show all possible configuration.
Many settings which are set to false by default are not shown, like podAffinity or
nodeAffinity.
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
  creationTimestamp: 2017-12-21T13:57:07Z
  generation: 1
  labels:
    app: dev-web
  name: dev-web
  namespace: default
  resourceVersion: "774003"
  selfLink: /apis/apps/v1/namespaces/default/deployments/dev-web
  uid: d52d3a63-e656-11e7-9319-42010a800003

7.8.b. Deployment Configuration Metadata (Cont.)

Chapter 7. Managing State with Deployments > 7.8.b. Deployment Configuration


Metadata (Cont.)

Next, you can see an explanation of the information present in the deployment metadata
(the file provided on the previous page):

 annotations:
These values do not configure the object, but provide further information that could
be helpful to third-party applications or administrative tracking. Unlike labels, they
cannot be used to select an object with kubectl.
 creationTimestamp :
Shows when the object was originally created. Does not update if the object is
edited.
 generation :
How many times this object has been edited, such as changing the number of
replicas, for example.
 labels :
Arbitrary strings used to select or exclude objects for use with kubectl, or other API
calls. Helpful for administrators to select objects outside of typical object
boundaries.
 name :
This is a required string, which we passed from the command line. The name must
be unique to the namespace.
 resourceVersion :
A value tied to the etcd database to help with concurrency of objects. Any changes
to the database will cause this number to change.
 selfLink :
References how the kube-apiserver will ingest this information into the API.
 uid :
Remains a unique ID for the life of the object.
7.9.a. Deployment Configuration Spec

Chapter 7. Managing State with Deployments > 7.9.a. Deployment Configuration


Spec

There are two spec declarations for the deployment. The first will modify the ReplicaSet
created, while the second will pass along the Pod configuration.
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: dev-web
  strategy:
    rollingUpdate:
      maxSurge: 25% 
      maxUnavailable: 25%
    type: RollingUpdate

 spec :
A declaration that the following items will configure the object being created.
 progressDeadlineSeconds:
Time in seconds until a progress error is reported during a change. Reasons could
be quotas, image issues, or limit ranges.
 replicas :
As the object being created is a ReplicaSet, this parameter determines how many
Pods should be created. If you were to use kubectl edit and change this value
to two, a second Pod would be generated.
 revisionHistoryLimit:
How many old ReplicaSet specifications to retain for rollback.

7.9.b. Deployment Configuration Spec (Cont.)

Chapter 7. Managing State with Deployments > 7.9.b. Deployment Configuration


Spec (Cont.)

The elements present in the example we provided on the previous page are explained
below (continued):

 selector :
A collection of values ANDed together. All must be satisfied for the replica to
match. Do not create Pods which match these selectors, as the deployment
controller may try to control the resource, leading to issues.
 matchLabels :
Set-based requirements of the Pod selector. Often found with
the matchExpressions statement, to further designate where the resource
should be scheduled.
 strategy :
A header for values having to do with updating Pods. Works with the later listed
type. Could also be set to Recreate, which would delete all existing pods before
new pods are created. With RollingUpdate, you can control how many Pods are
deleted at a time with the following parameters.
 maxsurge :
Maximum number of Pods over desired number of Pods to create. Can be a
percentage, default of 25%, or an absolute number. This creates a certain number
of new Pods before deleting old ones, for continued access.
 maxUnavailable:
A number or percentage of Pods which can be in a state other than Ready during
the update process.
 type :
Even though listed last in the section, due to the level of white space indentation, it
is read as the type of object being configured. (e.g. RollingUpdate). 

7.10.a. Deployment Configuration Pod Template

Chapter 7. Managing State with Deployments > 7.10.a. Deployment Configuration


Pod Template

Next, we will take a look at a configuration template for the pods to be deployed. We will


see some similar values.
template:
  metadata:
  creationTimestamp: null
    labels:
      app: dev-web
  spec:
    containers:
    - image: nginx:1.13.7-alpine
      imagePullPolicy: IfNotPresent
      name: dev-web
      resources: {}
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
    dnsPolicy: ClusterFirst
    restartPolicy: Always
    schedulerName: default-scheduler
    securityContext: {}
    terminationGracePeriodSeconds: 30
Note: If the meaning is basically the same as what was defined before, we will not repeat
the definition:
 template :
Data being passed to the ReplicaSet to determine how to deploy an object  (in this
case, containers).
 containers :
Key word indicating that the following items of this indentation are for a container.

7.10.b. Deployment Configuration Pod Template (Cont.)

Chapter 7. Managing State with Deployments > 7.10.b. Deployment Configuration


Pod Template (Cont.)

Next, we will continue to take a look at a configuration template for the pods to be


deployed. If the meaning is basically the same as what was defined before, we will not
repeat the definition:

 image :
This is the image name passed to the container engine, typically Docker. The
engine will pull the image and create the Pod.
 imagePullPolicy :
Policy settings passed along to the container engine, about when and if an image
should be downloaded or used from a local cache.
 name :
The leading stub of the Pod names. A unique string will be appended.
 resources :
By default, empty. This is where you would set resource restrictions and settings,
such as a limit on CPU or memory for the containers.
 terminationMessagePath :
A customizable location of where to output success or failure information of a
container.
 terminationMessagePolicy :
The default value is File, which holds the termination method. It could also be set
to FallbackToLogsOnError which will use the last chunk of container log if the
message file is empty and the container shows an error.
 dnsPolicy :
Determines if DNS queries should go to coredns or, if set to Default, use the
node's DNS resolution configuration.
 restartPolicy :
Should the container be restarted if killed? Automatic restarts are part of the typical
strength of Kubernetes.

7.10.c. Deployment Configuration Pod Template (Cont.)

Chapter 7. Managing State with Deployments > 7.10.c. Deployment Configuration


Pod Template (Cont.)
Next, we will continue to take a look at a configuration template for the pods to be
deployed. If the meaning is basically the same as what was defined before, we will not
repeat the definition:

 scheduleName :
Allows for the use of a custom scheduler, instead of the Kubernetes default.
 securityContext :
Flexible setting to pass one or more security settings, such as SELinux context,
AppArmor values, users and UIDs for the containers to use.
 terminationGracePeriodSeconds :
The amount of time to wait for a SIGTERM to run until a SIGKILL is used to
terminate the container.

7.11. Deployment Configuration Status

Chapter 7. Managing State with Deployments > 7.11. Deployment Configuration


Status

The Status output is generated when the information is requested:


status:
  availableReplicas: 2
  conditions:
  - lastTransitionTime: 2017-12-21T13:57:07Z
    lastUpdateTime: 2017-12-21T13:57:07Z
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  observedGeneration: 2
  readyReplicas: 2
  replicas: 2
  updatedReplicas: 2
The output above shows what the same deployment were to look like if the number of
replicas were increased to two. The times are different than when the deployment was first
generated.
 availableReplicas :
Indicates how many were configured by the ReplicaSet. This would be compared
to the later value of readyReplicas, which would be used to determine if all
replicas have been fully generated and without error.
 observedGeneration :
Shows how often the deployment has been updated. This information can be used
to understand the rollout and rollback situation of the deployment.

7.12. Scaling and Rolling Updates


Chapter 7. Managing State with Deployments > 7.12. Scaling and Rolling Updates

The API server allows for the configurations settings to be updated for most values. There
are some immutable values, which may be different depending on the version of
Kubernetes you have deployed.
A common update is to change the number of replicas running. If this number is set to
zero, there would be no containers, but there would still be a ReplicaSet and Deployment.
This is the backend process when a Deployment is deleted.
$ kubectl scale deploy/dev-web --replicas=4
deployment "dev-web" scaled
$ kubectl get deployments
NAME     READY   UP-TO-DATE  AVAILABLE  AGE
dev-web  4/4     4           4          20s
Non-immutable values can be edited via a text editor, as well. Use edit to trigger an
update. For example, to change the deployed version of the nginx web server to an older
version:
$ kubectl edit deployment nginx
....
      containers:
      - image: nginx:1.8 #<<---Set to an older version
        imagePullPolicy: IfNotPresent
        name: dev-web
....
This would trigger a rolling update of the deployment. While the deployment would show
an older age, a review of the Pods would show a recent update and older version of the
web server application deployed.
 
7.13.a. Deployment Rollbacks

Chapter 7. Managing State with Deployments > 7.13.a. Deployment Rollbacks

With some of the previous ReplicaSets of a Deployment being kept, you can also roll back
to a previous revision by scaling up and down. The number of previous configurations kept
is configurable, and has changed from version to version. Deployment edits replica counts
decrementing old and incrementing new ReplicaSets. Next, we will have a closer look at
rollbacks, using the --record option of the kubectl create command, which
allows annotation in the resource definition.
$ kubectl create deploy ghost --image=ghost --record
$ kubectl get deployments ghost -o yaml
    deployment.kubernetes.io/revision: "1"
    kubernetes.io/change-cause: kubectl create deploy ghost --
image=ghost --record
Should an update fail, due to an improper image version, for example, you can roll back
the change to a working version with kubectl rollout undo:
$ kubectl set image deployment/ghost ghost=ghost:09 --all
$ kubectl rollout history deployment/ghost deployments "ghost":
REVISION     CHANGE-CAUSE
1            kubectl create deploy ghost --image=ghost --record
2            kubectl set image deployment/ghost ghost=ghost:09 --
all
$ kubectl get pods
NAME                       READY    STATUS               RESTARTS 
AGE
ghost-2141819201-tcths     0/1      ImagePullBackOff     0       
1m
$ kubectl rollout undo deployment/ghost ; kubectl get pods
NAME                     READY   STATUS   RESTARTS   AGE  
ghost-3378155678-eq5i6   1/1     Running   0          7s

7.13.b. Deployment Rollbacks (Cont.)

Chapter 7. Managing State with Deployments > 7.13.b. Deployment Rollbacks (Cont.)

You can roll back to a specific revision with the --to-revision=2 option.
You can also edit a Deployment using the kubectl edit command.
You can also pause a Deployment, and then resume.
$ kubectl rollout pause deployment/ghost
$ kubectl rollout resume deployment/ghost
Please note that you can still do a rolling update on ReplicationControllers with the
kubectl rolling-update command, but this is done on the client side. Hence, if you
close your client, the rolling update will stop.

7.14. Using DaemonSets

Chapter 7. Managing State with Deployments > 7.14. Using DaemonSets

A newer object to work with is the DaemonSet. This controller ensures that a single pod
exists on each node in the cluster. Every Pod uses the same image. Should a new node
be added, the DaemonSet controller will deploy a new Pod on your behalf. Should a node
be removed, the controller will delete the Pod also.
The use of a DaemonSet allows for ensuring a particular container is always running. In a
large and dynamic environment, it can be helpful to have a logging or metric generation
application on every node without an administrator remembering to deploy that application.
Use kind: DaemonSet.
There are ways of effecting the kube-apischeduler such that some nodes will not run a
DaemonSet.
7.15.a. Labels

Chapter 7. Managing State with Deployments > 7.15.a. Labels

Part of the metadata of an object is a label. Though labels are not API objects, they are an
important tool for cluster administration. They can be used to select an object based on an
arbitrary string, regardless of the object type. Labels are immutable as of API version
apps/v1.
Every resource can contain labels in its metadata. By default, creating a Deployment with
kubectl create adds a label, as we saw in:
....
    labels:
        pod-template-hash: "3378155678"
        run: ghost
....
You could then view labels in new columns:
$ kubectl get pods -l run=ghost
NAME                   READY STATUS  RESTARTS AGE
ghost-3378155678-eq5i6 1/1   Running 0        10m
$ kubectl get pods -L run
NAME                   READY STATUS  RESTARTS AGE RUN
ghost-3378155678-eq5i6 1/1   Running 0        10m ghost
nginx-3771699605-4v27e 1/1   Running 1         1h nginx

7.15.b. Labels (Cont.)

Chapter 7. Managing State with Deployments > 7.15.b. Labels (Cont.)

While you typically define labels in pod templates and in the specifications of
Deployments, you can also add labels on the fly:
$ kubectl label pods ghost-3378155678-eq5i6 foo=bar
$ kubectl get pods --show-labels
NAME                   READY STATUS  RESTARTS AGE LABELS
ghost-3378155678-eq5i6 1/1   Running 0        11m foo=bar, pod-
template-hash=3378155678,run=ghost
For example, if you want to force the scheduling of a pod on a specific node, you can use
a nodeSelector in a pod definition, add specific labels to certain nodes in your cluster and
use those labels in the pod.
....
spec:
    containers:
    - image: nginx
    nodeSelector:
        disktype: ssd

8.3. Learning Objectives

Chapter 8. Services > 8.3. Learning Objectives

By the end of this chapter, you should be able to:


 Explain Kubernetes services.
 Expose an application.
 Discuss the service types available.
 Start a local proxy.
 Use the cluster DNS.

8.4. Services Overview

Chapter 8. Services > 8.4. Services Overview

As touched on previously, the Kubernetes architecture is built on the concept of transient,


decoupled objects connected together. Services are the agents which connect Pods
together, or provide access outside of the cluster, with the idea that any particular Pod
could be terminated and rebuilt. Typically using Labels, the refreshed Pod is connected
and the microservice continues to provide the expected resource via an Endpoint object.
Google has been working on Extensible Service Proxy (ESP), based off the nginx HTTP
reverse proxy server, to provide a more flexible and powerful object than Endpoints, but
ESP has not been adopted much outside of the Google App Engine or GKE environments.
There are several different service types, with the flexibility to add more, as necessary.
Each service can be exposed internally or externally to the cluster. A service can also
connect internal resources to an external resource, such as a third-party database.
The kube-proxy agent watches the Kubernetes API for new services and endpoints
being created on each node. It opens random ports and listens for traffic to the
ClusterIP:Port, and redirects the traffic to the randomly generated service endpoints.
Services provide automatic load-balancing, matching a label query. While there is no
configuration of this option, there is the possibility of session affinity via IP. Also, a
headless service, one without a fixed IP nor load-balancing, can be configured.
Unique IP addresses are assigned, and configured via the etcd database, so that Services
implement iptables to route traffic, but could leverage other technologies to provide
access to resources in the future.

8.5. Service Update Pattern

Chapter 8. Services > 8.5. Service Update Pattern


Labels are used to determine which Pods should receive traffic from a service. As we have
learned, labels can be dynamically updated for an object, which may affect which Pods
continue to connect to a service.
The default update pattern is for a rolling deployment, where new Pods are added, with
different versions of an application, and due to automatic load balancing, receive traffic
along with previous versions of the application.
Should there be a difference in applications deployed, such that clients would have issues
communicating with different versions, you may consider a more specific label for the
deployment, which includes a version number. When the deployment creates a new
replication controller for the update, the label would not match. Once the new Pods have
been created, and perhaps allowed to fully initialize, we would edit the labels for which the
Service connects. Traffic would shift to the new and ready version, minimizing client
version confusion.

8.6.a. Accessing an Application with a Service

Chapter 8. Services > 8.6.a. Accessing an Application with a Service

The basic step to access a new service is to use kubectl.


$ kubectl expose deployment/nginx --port=80 --type=NodePort
$ kubectl get svc
NAME         TYPE         CLUSTER-IP   EXTERNAL-IP   PORT(S)        AGE 
kubernetes   ClusterIP    10.0.0.1     <none>        443/TCP        18h
nginx        NodePort     10.0.0.112   <none>        80:31230/TCP   5s
$ kubectl get svc nginx -o yaml
apiVersion: v1
kind: Service
...
spec:
    clusterIP: 10.0.0.112
    ports:     
    - nodePort: 31230
...
Open browser http://Public-IP:31230
X|

8.6.b. Accessing an Application with a Service (Cont.)

Chapter 8. Services > 8.6.b. Accessing an Application with a Service (Cont.)

The kubectl expose command created a service for the nginx deployment. This
service used port 80 and generated a random port on all the nodes. A particular port and
targetPort can also be passed during object creation to avoid random values. The
targetPort defaults to the port, but could be set to any value, including a string
referring to a port on a backend Pod. Each Pod could have a different port, but traffic is still
passed via the name. Switching traffic to a different port would maintain a client
connection, while changing versions of software, for example.
The kubectl get svc command gave you a list of all the existing services, and we saw
the nginx service, which was created with an internal cluster IP.
The range of cluster IPs and the range of ports used for the random NodePort are
configurable in the API server startup options.
Services can also be used to point to a service in a different namespace, or even a
resource outside the cluster, such as a legacy application not yet in Kubernetes.

8.7. Service Types

Chapter 8. Services > 8.7. Service Types

Services can be of the following types:


 ClusterIP
 NodePort
 LoadBalancer
 ExternalName.
The ClusterIP service type is the default, and only provides access internally (except if
manually creating an external endpoint). The range of ClusterIP used is defined via an API
server startup option.
The NodePort type is great for debugging, or when a static IP address is necessary, such
as opening a particular address through a firewall. The NodePort range is defined in the
cluster configuration.
The LoadBalancer service was created to pass requests to a cloud provider like GKE or
AWS. Private cloud solutions also may implement this service type if there is a cloud
provider plugin, such as with CloudStack and OpenStack. Even without a cloud provider,
the address is made available to public traffic, and packets are spread among the Pods in
the deployment automatically.
A newer service is ExternalName, which is a bit different. It has no selectors, nor does it
define ports or endpoints. It allows the return of an alias to an external service. The
redirection happens at the DNS level, not via a proxy or forward. This object can be useful
for services not yet brought into the Kubernetes cluster. A simple change of the type in the
future would redirect traffic to the internal objects.
The kubectl proxy command creates a local service to access a ClusterIP. This can be
useful for troubleshooting or development work.

8.8. Services Diagram

Chapter 8. Services > 8.8. Services Diagram


The kube-proxy running on cluster nodes watches the API server service resources. It
presents a type of virtual IP address for services other than ExternalName. The mode for
this process has changed over versions of Kubernetes.
In v1.0, services ran in userspace mode as TCP/UDP over IP or Layer 4. In the v1.1
release, the iptables proxy was added and became the default mode starting with v1.2.
In the iptables proxy mode, kube-proxy continues to monitor the API server for changes
in Service and Endpoint objects, and updates rules for each object when created or
removed. One limitation to the new mode is an inability to connect to a Pod should the
original request fail, so it uses a Readiness Probe to ensure all containers are
functional prior to connection. This mode allows for up to approximately 5000 nodes.
Assuming multiple Services and Pods per node, this leads to a bottleneck in the kernel.
Another mode beginning in v1.9 is ipvs. While in beta, and expected to change, it works in
the kernel space for greater speed, and allows for a configurable load-balancing algorithm,
such as round-robin, shortest expected delay, least connection and several others. This
can be helpful for large clusters, much past the previous 5000 node limitation. This mode
assumes IPVS kernel modules are installed and running prior to kube-proxy.
The kube-proxy mode is configured via a flag sent during initialization, such as
mode=iptables and could also be IPVS or userspace.
Traffic from ClusterIP to Pod
(by Kubernetes, retrieved from the Kubernetes website)

8.9. Local Proxy for Development

Chapter 8. Services > 8.9. Local Proxy for Development

When developing an application or service, one quick way to check your service is to run a
local proxy with kubectl. It will capture the shell, unless you place it in the background.
When running, you can make calls to the Kubernetes API on localhost and also reach
the ClusterIP  services on their API URL. The IP and port where the proxy listens can be
configured with command arguments.
Run a proxy:
$ kubectl proxy
Starting to serve on 127.0.0.1:8001
Next, to access a ghost service using the local proxy, we could use the following URL, for
example, at http://localhost:8001/api/v1/namespaces/default/services/ghost.
If the service port has a name, the path will
be http://localhost:8001/api/v1/namespaces/default/services/ghost:<port_na
me>.

8.10. DNS

Chapter 8. Services > 8.10. DNS


DNS has been provided as CoreDNS by default as of v1.13. The use of CoreDNS allows
for a great amount of flexibility. Once the container starts, it will run a Server for the zones
it has been configured to serve. Then, each server can load one or more plugin chains to
provide other functionality.
The 30 or so in-tree plugins provide most common functionality, with an easy process to
write and enable other plugins as necessary.
Common plugins can provide metrics for consumption by Prometheus, error logging,
health reporting, and TLS to configure certificates for TLS and gRPC servers.
More can be found here: https://coredns.io/plugins/.

8.11.a. Verifying DNS Registration

Chapter 8. Services > 8.11.a. Verifying DNS Registration

To make sure that your DNS setup works well and that services get registered, the easiest
way to do it is to run a pod in the cluster and exec in it to do a DNS lookup.
Create this sleeping busybox pod with the kubectl create command :
apiVersion: v1
kind: Pod
metadata:
    name: busybox
    namespace: default
spec:
    containers:
    - image: busybox
      name: busy
      command:
        - sleep
        - "3600"
Then, use kubectl exec to do your nslookup like so:
$ kubectl exec -ti busybox -- nslookup nginx
Server: 10.0.0.10
Address 1: 10.0.0.10
Name: nginx
Address 1: 10.0.0.112
You can see that the DNS name nginx (corresponding to the nginx service) is registered
with the ClusterIP of the service.

8.11.b. Verifying DNS Registration (Cont.)

Chapter 8. Services > 8.11.b. Verifying DNS Registration (Cont.)

Other steps, similar to any DNS troubleshooting, would be to check the


/etc/resolv.conf file of the container:
$ kubectl exec busybox cat /etc/resolv.conf
Then, check the logs of each container in the kube-dns Pod. Look for log lines with W for
warning, E for error and F for failure. Also check to make sure the DNS service is up and
the DNS endpoints are exposed.
9.3. Learning Objectives

Chapter 9. Volumes and Data > 9.3. Learning Objectives

By the end of this chapter, you should be able to:


 Understand and create persistent volumes.
 Configure persistent volume claims.
 Manage volume access modes.
 Deploy an application with access to persistent storage.
 Discuss the dynamic provisioning of storage.
 Configure secrets and ConfigMaps.

9.4. Volumes Overview

Chapter 9. Volumes and Data > 9.4. Volumes Overview

Container engines have traditionally not offered storage that outlives the container. As
containers are considered transient, this could lead to a loss of data, or complex exterior
storage options. A Kubernetes volume shares the Pod lifetime, not the containers within.
Should a container terminate, the data would continue to be available to the new
container.
A volume is a directory, possibly pre-populated, made available to containers in a Pod.
The creation of the directory, the backend storage of the data and the contents depend on
the volume type. As of v1.13, there were 27 different volume types ranging from rbd to
gain access to Ceph, to NFS, to dynamic volumes from a cloud provider like Google's
gcePersistentDisk. Each has particular configuration options and dependencies.
The Container Storage Interface (CSI) adoption enables the goal of an industry standard
interface for container orchestration to allow access to arbitrary storage systems.
Currently, volume plugins are "in-tree", meaning they are compiled and built with the core
Kubernetes binaries. This "out-of-tree" object will allow storage vendors to develop a
single driver and allow the plugin to be containerized. This will replace the existing Flex
plugin which requires elevated access to the host node, a large security concern.
Should you want your storage lifetime to be distinct from a Pod, you can use Persistent
Volumes. These allow for empty or pre-populated volumes to be claimed by a Pod using a
Persistent Volume Claim, then outlive the Pod. Data inside the volume could then be used
by another Pod, or as a means of retrieving data.
There are two API Objects which exist to provide data to a Pod already. Encoded data can
be passed using a Secret and non-encoded data can be passed with a ConfigMap. These
can be used to pass important data like SSH keys, passwords, or even a configuration file
like /etc/hosts.
9.5.a. Introducing Volumes

Chapter 9. Volumes and Data > 9.5.a. Introducing Volumes

A Pod specification can declare one or more volumes and where they are made available.
Each requires a name, a type, and a mount point. The same volume can be made
available to multiple containers within a Pod, which can be a method of container-to-
container communication. A volume can be made available to multiple Pods, with
each given an access mode to write. There is no concurrency checking, which means
data corruption is probable, unless outside locking takes place.

K8s Pod Volumes

9.5.b. Introducing Volumes (Cont.)

Chapter 9. Volumes and Data > 9.5.b. Introducing Volumes (Cont.)

A particular access mode is part of a Pod request. As a request, the user may be granted
more, but not less access, though a direct match is attempted first. The cluster groups
volumes with the same mode together, then sorts volumes by size, from smallest to
largest. The claim is checked against each in that access mode group, until a volume of
sufficient size matches. The three access modes are ReadWriteOnce, which allows read-
write by a single node, ReadOnlyMany, which allows read-only by multiple nodes, and
ReadWriteMany, which allows read-write by many nodes. Thus two pods on the same
node can write to a ReadWriteOnce, but a third pod on a different node would not
become ready due to a FailedAttachVolume error.
When a volume is requested, the local kubelet uses the kubelet_pods.go script to map
the raw devices, determine and make the mount point for the container, then create the
symbolic link on the host node filesystem to associate the storage to the container. The
API server makes a request for the storage to the StorageClass plugin, but the specifics
of the requests to the backend storage depend on the plugin in use.
If a request for a particular StorageClass was not made, then the only parameters used
will be access mode and size. The volume could come from any of the storage types
available, and there is no configuration to determine which of the available ones will be
used.

9.6. Volume Spec

Chapter 9. Volumes and Data > 9.6. Volume Spec

One of the many types of storage available is an emptyDir. The kubelet will create the
directory in the container, but not mount any storage. Any data created is written to the
shared container space. As a result, it would not be persistent storage. When the Pod is
destroyed, the directory would be deleted along with the container.
apiVersion: v1
kind: Pod
metadata:
    name: busybox
    namespace: default
spec:
    containers:
    - image: busybox
      name: busy
      command:
        - sleep
        - "3600"
      volumeMounts:
    - mountPath: /scratch
      name: scratch-volume
    volumes:
    - name: scratch-volume
            emptyDir: {}
The YAML file above would create a Pod with a single container with a volume named
scratch-volume created, which would create the /scratch directory inside the
container.

9.7. Volume Types


Chapter 9. Volumes and Data > 9.7. Volume Types

There are several types that you can use to define volumes, each with their pros and cons.
Some are local, and many make use of network-based resources.
In GCE or AWS, you can use volumes of type GCEpersistentDisk or
awsElasticBlockStore, which allows you to mount GCE and EBS disks in your Pods,
assuming you have already set up accounts and privileges.
emptyDir and hostPath volumes are easy to use. As mentioned, emptyDir is an
empty directory that gets erased when the Pod dies, but is recreated when the container
restarts. The hostPath volume mounts a resource from the host node filesystem. The
resource could be a directory, file socket, character, or block device. These resources
must already exist on the host to be used. There are two types, DirectoryOrCreate
and FileOrCreate, which create the resources on the host, and use them if they don't
already exist.
NFS (Network File System) and iSCSI (Internet Small Computer System Interface) are
straightforward choices for multiple readers scenarios.
rbd for block storage or CephFS and GlusterFS, if available in your Kubernetes cluster,
can be a good choice for multiple writer needs.
Besides the volume types we just mentioned, there are many other possible, with more
being added: azureDisk, azureFile, csi, downwardAPI, fc (fibre channel),
flocker, gitRepo, local, projected, portworxVolume, quobyte, scaleIO,
secret, storageos, vsphereVolume, persistentVolumeClaim, etc.

9.8. Shared Volume Example

Chapter 9. Volumes and Data > 9.8. Shared Volume Example

The following YAML file creates a pod with two containers, both with access to a shared
volume:
containers:
- image: busybox
  name: busy
  volumeMounts:
  - mountPath: /busy
    name: test
- image: busybox
  name: box
  volumeMounts:
    - mountPath: /box
      name: test
volumes:
  - name: test
    emptyDir: {}
$ kubectl exec -ti busybox -c box -- touch /box/foobar
$ kubectl exec -ti busybox -c busy -- ls -l /busy total 0
-rw-r--r-- 1 root root 0 Nov 19 16:26 foobar
You could use emptyDir or hostPath easily, since those types do not require any
additional setup, and will work in your Kubernetes cluster.
Note that one container wrote, and the other container had immediate access to the data.
There is nothing to keep the containers from overwriting the other's data. Locking or
versioning considerations must be part of the application to avoid corruption.

9.9. Persistent Volumes and Claims

Chapter 9. Volumes and Data > 9.9. Persistent Volumes and Claims

A persistent volume (pv) is a storage abstraction used to retain data longer then the Pod
using it. Pods define a volume of type persistentVolumeClaim (pvc) with various
parameters for size and possibly the type of backend storage known as its
StorageClass. The cluster then attaches the persistentVolume.
Kubernetes will dynamically use volumes that are available, irrespective of its storage
type, allowing claims to any backend storage.
There are several phases to persistent storage:

 Provisioning can be from PVs created in advance by the cluster administrator, or


requested from a dynamic source, such as the cloud provider.
 Binding occurs when a control loop on the master notices the PVC, containing an
amount of storage, access request, and optionally, a particular StorageClass.
The watcher locates a matching PV or waits for the StorageClass provisioner to
create one. The PV must match at least the storage amount requested, but may
provide more.
 The use phase begins when the bound volume is mounted for the Pod to use,
which continues as long as the Pod requires.
 Releasing happens when the Pod is done with the volume and an API request is
sent, deleting the PVC. The volume remains in the state from when the claim is
deleted until available to a new claim. The resident data remains depending on the
persistentVolumeReclaimPolicy.
 The reclaim phase has three options:
- Retain, which keeps the data intact, allowing for an administrator to handle the
storage and data.
- Delete tells the volume plugin to delete the API object, as well as the storage
behind it.
- The Recycle option runs an rm -rf /mountpoint and then makes it available
to a new claim. With the stability of dynamic provisioning, the Recycle option is
planned to be deprecated.

$ kubectl get pv
$ kubectl get pvc

9.10. Persistent Volume


Chapter 9. Volumes and Data > 9.10. Persistent Volume

The following example shows a basic declaration of a PersistentVolume using the


hostPath type.
kind: PersistentVolume
apiVersion: v1
metadata:
    name: 10Gpv01
    labels:
         type: local
spec:
    capacity:
        storage: 10Gi
    accessModes:
        - ReadWriteOnce
    hostPath:
        path: "/somepath/data01"
Each type will have its own configuration settings. For example, an already created Ceph
or GCE Persistent Disk would not need to be configured, but could be claimed from the
provider.
Persistent volumes are not a namespaces object, but persistent volume claims are. A beta
feature of v1.13 allows for static provisioning of Raw Block Volumes, which currently
support the Fibre Channel plugin, AWS EBS, Azure Disk and RBD plugins among others.
The use of locally attached storage has been graduated to a stable feature. This feature is
often used as part of distributed filesystems and databases.

9.11.a. Persistent Volume Claim

Chapter 9. Volumes and Data > 9.11.a. Persistent Volume Claim

With a persistent volume created in your cluster, you can then write a manifest for a claim
and use that claim in your pod definition. In the Pod, the volume uses the
persistentVolumeClaim.
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
    name: myclaim
spec:
    accessModes:
        - ReadWriteOnce
    resources:
        requests:
                storage: 8GI
In the Pod:
spec:
    containers:
....
    volumes:
        - name: test-volume
          persistentVolumeClaim:
                claimName: myclaim

9.11.b. Persistent Volume Claim (Cont.)

Chapter 9. Volumes and Data > 9.11.b. Persistent Volume Claim (Cont.)

The Pod configuration could also be as complex as this:

volumeMounts:
      - name: Cephpd
        mountPath: /data/rbd
  volumes:
    - name: rbdpd
      rbd:
        monitors:
        - '10.19.14.22:6789'
        - '10.19.14.23:6789'
        - '10.19.14.24:6789'
        pool: k8s
        image: client
        fsType: ext4
        readOnly: true
        user: admin
        keyring: /etc/ceph/keyring
        imageformat: "2"
        imagefeatures: "layering"

9.12. Dynamic Provisioning

Chapter 9. Volumes and Data > 9.12. Dynamic Provisioning

While handling volumes with a persistent volume definition and abstracting the storage
provider using a claim is powerful, a cluster administrator still needs to create those
volumes in the first place. Starting with Kubernetes v1.4, Dynamic Provisioning allowed
for the cluster to request storage from an exterior, pre-configured source. API calls made
by the appropriate plugin allow for a wide range of dynamic storage use.
The StorageClass API resource allows an administrator to define a persistent volume
provisioner of a certain type, passing storage-specific parameters.
With a StorageClass created, a user can request a claim, which the API Server fills via
auto-provisioning. The resource will also be reclaimed as configured by the provider. AWS
and GCE are common choices for dynamic storage, but other options exist, such as a
Ceph cluster or iSCSI. Single, default class is possible via annotation.
Here is an example of a StorageClass using GCE:
apiVersion: storage.k8s.io/v1       
kind: StorageClass
metadata:
  name: fast                         # Could be any name
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-ssd

9.13. Using Rook for Storage Orchestration

Chapter 9. Volumes and Data > 9.13. Using Rook for Storage Orchestration

In keeping with the decoupled and distributed nature of Cloud technology, the Rook project
allows orchestration of storage using multiple storage providers.
As with other agents of the cluster, Rook uses custom resource definitions (CRD) and a
custom operator to provision storage according to the backend storage type, upon API
call.
Several storage providers are supported:
 Ceph
 Cassandra
 CockroachDB
 EdgeFS Geo-Transparant Storage
 Minio Object Store
 Network File System (NFS)
 YugabyteDB.

9.14.a. Secrets

Chapter 9. Volumes and Data > 9.14.a. Secrets

Pods can access local data using volumes, but there is some data you don't want readable
to the naked eye. Passwords may be an example. Using the Secret API resource, the
same password could be encoded or encrypted.
You can create, get, or delete secrets:
$ kubectl get secrets
Secrets can be encoded manually or via kubectl create secret:
$ kubectl create secret generic --help
$ kubectl create secret generic mysql --from-literal=password=root
A secret is not encrypted, only base64-encoded, by default. You must create an
EncryptionConfiguration with a key and proper identity. Then, the kube-apiserver needs
the --encryption-provider-config flag set to a previously configured provider, such as
aescbc or ksm. Once this is enabled, you need to recreate every secret, as they are
encrypted upon write.
Multiple keys are possible. Each key for a provider is tried during decryption. The first key
of the first provider is used for encryption. To rotate keys, first create a new key, restart
(all) kube-apiserver processes, then recreate every secret.
You can see the encoded string inside the secret with kubectl. The secret will be decoded
and be presented as a string saved to a file. The file can be used as an environmental
variable or in a new directory, similar to the presentation of a volume.

9.14.b. Secrets (Cont.)

Chapter 9. Volumes and Data > 9.14.b. Secrets (Cont.)

A secret can be made manually as well, then inserted into a YAML file:
$ echo LFTr@1n | base64
TEZUckAxbgo=
$ vim secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: LF-secret
data:
  password: TEZUckAxbgo=

9.15. Using Secrets via Environment Variables

Chapter 9. Volumes and Data > 9.15. Using Secrets via Environment Variables

A secret can be used as an environmental variable in a Pod. You can see one being
configured in the following example: 
...
spec:
    containers:
    - image: mysql:5.5
      env:
      - name: MYSQL_ROOT_PASSWORD
        valueFrom:
            secretKeyRef:
                name: mysql
                key: password
          name: mysql
There is no limit to the number of Secrets used, but there is a 1MB limit to their size. Each
secret occupies memory, along with other API objects, so very large numbers of secrets
could deplete memory on a host.
They are stored in the tmpfs storage on the host node, and are only sent to the host
running Pod. All volumes requested by a Pod must be mounted before the containers
within the Pod are started. So, a secret must exist prior to being requested.
9.16. Mounting Secrets as Volumes

Chapter 9. Volumes and Data > 9.16. Mounting Secrets as Volumes

You can also mount secrets as files using a volume definition in a pod manifest. The
mount path will contain a file whose name will be the key of the secret created with the
kubectl create secret step earlier.

...
spec:
    containers:
    - image: busybox
      command:
        - sleep
        - "3600"
      volumeMounts:
      - mountPath: /mysqlpassword
        name: mysql
      name: busy
    volumes:
    - name: mysql
        secret:

            secretName: mysql

Once the pod is running, you can verify that the secret is indeed accessible in the
container:

$ kubectl exec -ti busybox -- cat /mysqlpassword/password

LFTr@1n

9.17.a. Portable Data with ConfigMaps

Chapter 9. Volumes and Data > 9.17.a. Portable Data with ConfigMaps

A similar API resource to Secrets is the ConfigMap, except the data is not encoded. In
keeping with the concept of decoupling in Kubernetes, using a ConfigMap decouples a
container image from configuration artifacts. 
They store data as sets of key-value pairs or plain configuration files in any format. The
data can come from a collection of files or all files in a directory. It can also be populated
from a literal value.
A ConfigMap can be used in several different ways. A Pod can use the data as
environmental variables from one or more sources. The values contained inside can be
passed to commands inside the pod. A Volume or a file in a Volume can be created,
including different names and particular access modes. In addition, cluster components
like controllers can use the data.
Let's say you have a file on your local filesystem called config.js. You can create a
ConfigMap that contains this file. The configmap object will have a data section containing
the content of the file:
$ kubectl get configmap foobar -o yaml
kind: ConfigMap
apiVersion: v1
metadata: 
    name: foobar
data:
    config.js: |
        {
...

9.17.b. Portable Data with ConfigMaps (Cont.)

Chapter 9. Volumes and Data > 9.17.b. Portable Data with ConfigMaps (Cont.)

ConfigMaps can be consumed in various ways:


 Pod environmental variables from single or multiple ConfigMaps
 Use ConfigMap values in Pod commands
 Populate Volume from ConfigMap
 Add ConfigMap data to specific path in Volume
 Set file names and access mode in Volume from ConfigMap data
 Can be used by system components and controllers. 

9.18. Using ConfigMaps

Chapter 9. Volumes and Data > 9.18. Using ConfigMaps

Like secrets, you can use ConfigMaps as environment variables or using a volume mount.
They must exist prior to being used by a Pod, unless marked as optional. They also reside
in a specific namespace.
In the case of environment variables, your pod manifest will use the valueFrom key and
the configMapKeyRef value to read the values. For instance:
env:
- name: SPECIAL_LEVEL_KEY
  valueFrom:
    configMapKeyRef:
      name: special-config
      key: special.how
With volumes, you define a volume with the configMap type in your pod and mount it
where it needs to be used.
volumes:
    - name: config-volume
      configMap:
        name: special-config

10.3. Learning Objectives

Chapter 10. Ingress > 10.3. Learning Objectives

By the end of this chapter, you should be able to:


 Discuss the difference between an Ingress Controller and a Service.
 Learn about nginx and GCE Ingress Controllers.
 Deploy an Ingress Controller.
 Configure an Ingress Rule.

10.4. Ingress Overview

Chapter 10. Ingress > 10.4. Ingress Overview

In an earlier chapter, we learned about using a Service to expose a containerized


application outside of the cluster. We use Ingress Controllers and Rules to do the same
function. The difference is efficiency. Instead of using lots of services, such as
LoadBalancer, you can route traffic based on the request host or path. This allows for
centralization of many services to a single point.
An Ingress Controller is different than most controllers, as it does not run as part of the
kube-controller-manager binary. You can deploy multiple controllers, each with unique
configurations. A controller uses Ingress Rules to handle traffic to and from outside the
cluster.
There are two supported controllers, GCE and nginx. HAProxy and Traefix.io are
also in common use. Any tool capable of reverse proxy should work. These agents
consume rules and listen for associated traffic. An Ingress Rule is an API resource that
you can create with kubectl. When you create that resource, it reprograms and
reconfigures your Ingress Controller to allow traffic to flow from the outside to an internal
service. You can leave a service as a ClusterIP type and define how the traffic gets
routed to that internal service using an Ingress Rule.

10.5. Ingress Controller

Chapter 10. Ingress > 10.5. Ingress Controller

An Ingress Controller is a daemon running in a Pod which watches the /ingresses


endpoint on the API server, which is found under the networking.k8s.io/v1beta1
group for new objects. When a new endpoint is created, the daemon uses the configured
set of rules to allow inbound connection to a service, most often HTTP traffic. This allows
easy access to a service through an edge router to Pods, regardless of where the Pod is
deployed.
Multiple Ingress Controllers can be deployed. Traffic should use annotations to select the
proper controller. The lack of a matching annotation will cause every controller to attempt
to satisfy the ingress traffic.

Ingress Controller for inbound connections

10.6. nginx

Chapter 10. Ingress > 10.6. nginx

Deploying an nginx controller has been made easy through the use of provided YAML
files, which can be found in the ingress-nginx GitHub repository.
This page has configuration files to configure nginx on several platforms, such as AWS,
GKE, Azure, and bare-metal, among others.
As with any Ingress Controller, there are some configuration requirements for proper
deployment. Customization can be done via a ConfigMap, Annotations, or, for detailed
configuration, a Custom template:
 Easy integration with RBAC
 Uses the annotation kubernetes.io/ingress.class: "nginx"
 L7 traffic requires the proxy-real-ip-cidr setting
 Bypasses kube-proxy to allow session affinity
 Does not use conntrack entries for iptables DNAT
 TLS requires the host field to be defined.

10.7. Google Load Balancer Controller (GLBC)

Chapter 10. Ingress > 10.7. Google Load Balancer Controller (GLBC)

There are several objects which need to be created to deploy the GCE Ingress Controller.
YAML files are available to make the process easy. Be aware that several objects would
be created for each service, and currently, quotas are not evaluated prior to creation.
The GLBC Controller must be created and started first. Also, you must create a
ReplicationController with a single replica, three services for the application Pod, and an
Ingress with two hostnames and three endpoints for each service. The backend is a group
of virtual machine instances, Instance Group. 
Each path for traffic uses a group of like objects referred to as a pool. Each pool regularly
checks the next hop up to ensure connectivity.
The multi-pool path is:
Global Forwarding Rule -> Target HTTP Proxy -> URL map -> Backend
Service -> Instance Group.
Currently, the TLS Ingress only supports port 443 and assumes TLS termination. It does
not support SNI, only using the first certificate. The TLS secret must contain keys named
tls.crt and tls.key .

10.8. Ingress API Resources

Chapter 10. Ingress > 10.8. Ingress API Resources

Ingress objects are still an extension API, like Deployments and ReplicaSets. A typical
Ingress object that you can POST to the API server is:

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: ghost
spec:
rules:
    - host: ghost.192.168.99.100.nip.io
http:
paths:
    - backend:
            serviceName: ghost
            servicePort: 2368

You can manage ingress resources like you do pods, deployments, services etc:

$ kubectl get ingress


$ kubectl delete ingress <ingress_name>
$ kubectl edit ingress <ingress_name>

10.9. Deploying the Ingress Controller

Chapter 10. Ingress > 10.9. Deploying the Ingress Controller

To deploy an Ingress Controller, it can be as simple as creating it with kubectl. The


source for a sample controller deployment is available on GitHub.
$ kubectl create -f backend.yaml
The result will be a set of pods managed by a replication controller and some internal
services. You will notice a default HTTP backend which serves 404 pages.
$ kubectl get pods,rc,svc
NAME                              READY  STATUS    RESTARTS  AGE
po/default-http-backend-xvep8     1/1    Running   0         4m
po/nginx-ingress-controller-fkshm 1/1    Running   0         4m
NAME                    DESIRED CURRENT READY AGE
rc/default-http-backend 1       1       0     4m
NAME                      CLUSTER-IP  EXTERNAL-IP  PORT(S)  AGE
svc/default-http-backend  10.0.0.212  <none>       80/TCP   4m
svc/kubernetes            10.0.0.1    <none>       443/TCP  77d

10.10. Creating an Ingress Rule

Chapter 10. Ingress > 10.10. Creating an Ingress Rule

To get exposed with ingress quickly, you can go ahead and try to create a similar rule as
mentioned on the previous page. First, start a Ghost deployment and expose it with an
internal ClusterIP service:

$ kubectl run ghost --image=ghost

$ kubectl expose deployments ghost --port=2368

With the deployment exposed and the Ingress rules in place, you should be able to access
the application from outside the cluster.

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:     
    name: ghost
spec:
    rules:
    - host: ghost.192.168.99.100.nip.io
      http:
      paths:
      - backend:
            serviceName: ghost
            servicePort: 2368

10.11. Multiple Rules

Chapter 10. Ingress > 10.11. Multiple Rules

On the previous page, we defined a single rule. If you have multiple services, you can
define multiple rules in the same Ingress, each rule forwarding traffic to a specific service.

rules:
- host: ghost.192.168.99.100.nip.io
  http:
    paths:
    - backend:
        serviceName: ghost  
        servicePort: 2368
- host: nginx.192.168.99.100.nip.io
  http:
    paths:
    - backend:
        serviceName: nginx
        servicePort: 80

10.12.a. Intelligent Connected Proxies

Chapter 10. Ingress > 10.12.a. Intelligent Connected Proxies

For more complex connections or resources such as service discovery, rate limiting, traffic
management and advanced metrics, you may want to implement a service mesh. A
service mesh consists of edge and embedded proxies communicating with each other and
handling traffic based on rules from a control plane. Various options are available including
Envoy, Istio, and linkerd:
 Envoy - a modular and extensible proxy favored due to its modular construction,
open architecture and dedication to remaining unmonetized. Often used as a data
plane under other tools of a service mesh.
 Istio - a powerful tool set which leverages Envoy proxies via a multi-component
control plane. Built to be platform-independent; it can be used to make the service
mesh flexible and feature-filled.
 linkerd - Another service mesh purposely built to be easy to deploy, fast, and
ultralight.

10.12.b. Intelligent Connected Proxies


Chapter 10. Ingress > 10.12.b. Intelligent Connected Proxies

Istio Service Mesh (retrieved from the Istio documentation)

11.3. Learning Objectives

Chapter 11. Scheduling > 11.3. Learning Objectives

By the end of this chapter, you should be able to:


 Learn how kube-scheduler schedules Pod placement.
 Use Labels to manage Pod scheduling.
 Configure taints and tolerations.
 Use podAffinity and podAntiAffinity.
 Understand how to run multiple schedulers.

11.4. kube-scheduler
Chapter 11. Scheduling > 11.4. kube-scheduler

The larger and more diverse a Kubernetes deployment becomes, the more administration
of scheduling can be important. The kube-scheduler determines which nodes will run a
Pod, using a topology-aware algorithm.
Users can set the priority of a pod, which will allow preemption of lower priority pods. The
eviction of lower priority pods would then allow the higher priority pod to be scheduled.
The scheduler tracks the set of nodes in your cluster, filters them based on a set of
predicates, then uses priority functions to determine on which node each Pod should be
scheduled. The Pod specification as part of a request is sent to the kubelet on the node
for creation.
The default scheduling decision can be affected through the use of Labels on nodes or
Pods. Labels of podAffinity, taints, and pod bindings allow for configuration from the
Pod or the node perspective. Some, like tolerations, allow a Pod to work with a node,
even when the node has a taint that would otherwise preclude a Pod being scheduled.
Not all labels are drastic. Affinity settings may encourage a Pod to be deployed on a node,
but would deploy the Pod elsewhere if the node was not available. Sometimes,
documentation may use the term require, but practice shows the setting to be more of a
request. As beta features, expect the specifics to change. Some settings will evict Pods
from a node should the required condition no longer be true, such as
requiredDuringScheduling, RequiredDuringExecution.
Other options, like a custom scheduler, need to be programmed and deployed into your
Kubernetes cluster.

11.5. Predicates

Chapter 11. Scheduling > 11.5. Predicates

The scheduler goes through a set of filters, or predicates, to find available nodes, then
ranks each node using priority functions. The node with the highest rank is selected to run
the Pod.
predicatesOrdering = []string{CheckNodeConditionPred,
GeneralPred, HostNamePred, PodFitsHostPortsPred,
MatchNodeSelectorPred, PodFitsResourcesPred, NoDiskConflictPred,
PodToleratesNodeTaintsPred, PodToleratesNodeNoExecuteTaintsPred,
CheckNodeLabelPresencePred, checkServiceAffinityPred,
MaxEBSVolumeCountPred, MaxGCEPDVolumeCountPred,
MaxAzureDiskVolumeCountPred, CheckVolumeBindingPred,
NoVolumeZoneConflictPred, CheckNodeMemoryPressurePred,
CheckNodeDiskPressurePred, MatchInterPodAffinityPred}
The predicates, such as PodFitsHost or NoDiskConflict, are evaluated in a
particular and configurable order. In this way, a node has the least amount of checks for
new Pod deployment, which can be useful to exclude a node from unnecessary checks if
the node is not in the proper condition.
For example, there is a filter called HostNamePred, which is also known as HostName,
which filters out nodes that do not match the node name specified in the pod specification.
Another predicate is PodFitsResources to make sure that the available CPU and
memory can fit the resources required by the Pod.
The scheduler can be updated by passing a configuration of kind: Policy which can
order predicates, give special weights to priorities and even
hardPodAffinitySymmetricWeight which deploys Pods such that if we set Pod A to
run with Pod B, then Pod B should automatically be run with Pod A.

11.6. Priorities

Chapter 11. Scheduling > 11.6. Priorities

Priorities are functions used to weight resources. Unless Pod and node affinity has been
configured to the SelectorSpreadPriority setting, which ranks nodes based on the
number of existing running pods, they will select the node with the least amount of Pods.
This is a basic way to spread Pods across the cluster.
Other priorities can be used for particular cluster needs. The
ImageLocalityPriorityMap favors nodes which already have downloaded container
images. The total sum of image size is compared with the largest having the highest
priority, but does not check the image about to be used.
Currently, there are more than ten included priorities, which range from checking the
existence of a label to choosing a node with the most requested CPU and memory usage.
You can view a list of priorities at master/pkg/scheduler/algorithm/priorities.
A stable feature as of v1.14 allows the setting of a PriorityClass and assigning pods
via the use of PriorityClassName settings. This allows users to preempt, or evict, lower
priority pods so that their higher priority pods can be scheduled. The kube-scheduler
determines a node where the pending pod could run if one or more existing pods were
evicted. If a node is found, the low priority pod(s) are evicted and the higher priority pod is
scheduled. The use of a Pod Disruption Budget (PDB) is a way to limit the number of
pods preemption evicts to ensure enough pods remain running. The scheduler will remove
pods even if the PDB is violated if no other options are available.

11.7.a. Scheduling Policies

Chapter 11. Scheduling > 11.7.a. Scheduling Policies

The default scheduler contains a number of predicates and priorities; however, these can
be changed via a scheduler policy file. 
A short version is shown below:
"kind" : "Policy",
"apiVersion" : "v1",
"predicates" : [
        {"name" : "MatchNodeSelector", "order": 6},
        {"name" : "PodFitsHostPorts", "order": 2},
        {"name" : "PodFitsResources", "order": 3},       
        {"name" : "NoDiskConflict", "order": 4},
        {"name" : "PodToleratesNodeTaints", "order": 5},
        {"name" : "PodFitsHost", "order": 1}
        ],
"priorities" : [
        {"name" : "LeastRequestedPriority", "weight" : 1},
        {"name" : "BalancedResourceAllocation", "weight" : 1},
        {"name" : "ServiceSpreadingPriority", "weight" : 2},
        {"name" : "EqualPriority", "weight" : 1}
        ],
"hardPodAffinitySymmetricWeight" : 10
}
Typically, you will configure a scheduler with this policy using the --policy-config-
file parameter and define a name for this scheduler using the --scheduler-
name parameter. You will then have two schedulers running and will be able to specify
which scheduler to use in the pod specification.

11.7.b. Scheduling Policies (Cont.)

Chapter 11. Scheduling > 11.7.b. Scheduling Policies (Cont.)

With multiple schedulers, there could be conflict in the Pod allocation. Each Pod should
declare which scheduler should be used. But, if separate schedulers determine that a
node is eligible because of available resources and both attempt to deploy, causing the
resource to no longer be available, a conflict would occur. The current solution is for the
local kubelet to return the Pods to the scheduler for reassignment. Eventually, one Pod
will succeed and the other will be scheduled elsewhere.

11.8. Pod Specification

Chapter 11. Scheduling > 11.8. Pod Specification

Most scheduling decisions can be made as part of the Pod specification. A pod
specification contains several fields that inform scheduling, namely:
 nodeName
 nodeSelector
 affinity
 schedulerName
 tolerations
The nodeName and nodeSelector options allow a Pod to be assigned to a single node
or a group of nodes with particular labels.
Affinity and anti-affinity can be used to require or prefer which node is used
by the scheduler. If using a preference instead, a matching node is chosen first, but other
nodes would be used if no match is present.
The use of taints allows a node to be labeled such that Pods would not be scheduled for
some reason, such as the master node after initialization. A toleration allows a Pod to
ignore the taint and be scheduled assuming other requirements are met.
Should none of these options meet the needs of the cluster, there is also the ability to
deploy a custom scheduler. Each Pod could then include a schedulerName to choose
which schedule to use.

11.9. Specifying the Node Label

Chapter 11. Scheduling > 11.9. Specifying the Node Label

The nodeSelector field in a pod specification provides a straightforward way to target a


node or a set of nodes, using one or more key-value pairs.
spec:
  containers:
  - name: redis
    image: redis
  nodeSelector:
    net: fast
Setting the nodeSelector tells the scheduler to place the pod on a node that matches
the labels. All listed selectors must be met, but the node could have more labels. In
the example above, any node with a key of net set to fast would be a candidate for
scheduling. Remember that labels are administrator-created tags, with no tie to actual
resources. This node could have a slow network.
The pod would remain Pending until a node is found with the matching labels.
The use of affinity/anti-affinity should be able to express every feature as nodeSelector.

11.10. Pod Affinity Rules

Chapter 11. Scheduling > 11.10. Pod Affinity Rules

Pods which may communicate a lot or share data may operate best if co-located, which
would be a form of affinity. For greater fault tolerance, you may want Pods to be as
separate as possible, which would be anti-affinity. These settings are used by the
scheduler based on the labels of Pods that are already running. As a result, the scheduler
must interrogate each node and track the labels of running Pods. Clusters larger than
several hundred nodes may see significant performance loss. Pod affinity rules use In,
NotIn, Exists, and DoesNotExist operators.
The use of requiredDuringSchedulingIgnoredDuringExecution means that the
Pod will not be scheduled on a node unless the following operator is true. If the operator
changes to become false in the future, the Pod will continue to run. This could be seen as
a hard rule.
Similarly, preferredDuringSchedulingIgnoredDuringExecution will choose a
node with the desired setting before those without. If no properly-labeled nodes are
available, the Pod will execute anyway. This is more of a soft setting, which declares a
preference instead of a requirement.
With the use of podAffinity, the scheduler will try to schedule Pods together. The use
of podAntiAffinity would cause the scheduler to keep Pods on different nodes.
The topologyKey allows a general grouping of Pod deployments. Affinity (or the inverse
anti-affinity) will try to run on nodes with the declared topology key and running Pods with
a particular label. The topologyKey could be any legal key, with some important
considerations. If using requiredDuringScheduling and the admission controller
LimitPodHardAntiAffinityTopology setting, the topologyKey must be set to
kubernetes.io/hostname. If using PreferredDuringScheduling, an empty
topologyKey is assumed to be all, or the combination of kubernetes.io/hostname,
failure-domain.beta.kubernetes.io/zone and failure-
domain.beta.kubernetes.io/region.

11.11. podAffinity Example

Chapter 11. Scheduling > 11.11. podAffinity Example

An example of affinity and podAffinity settings can be seen below. This also
requires a particular label to be matched when the Pod starts, but not required if the label
is later removed.
spec:
  affinity:
    podAffinity:  
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: security
            operator: In
            values: - S1
        topologyKey: failure-domain.beta.kubernetes.io/zone
Inside the declared topology zone, the Pod can be scheduled on a node running a Pod
with a key label of security and a value of S1. If this requirement is not met, the Pod will
remain in a Pending state.

11.12. podAntiAffinity Example

Chapter 11. Scheduling > 11.12. podAntiAffinity Example


Similarly to podAffinity, we can try to avoid a node with a particular label. In this case,
the scheduler will prefer to avoid a node with a key set to security and value of S2. 
podAntiAffinity:
  preferredDuringSchedulingIgnoredDuringExecution:
  - weight: 100
    podAffinityTerm:
      labelSelector:
        matchExpressions:
        - key: security
          operator: In
          values:
          - S2
      topologyKey: kubernetes.io/hostname
In a large, varied environment, there may be multiple situations to be avoided. As a
preference, this setting tries to avoid certain labels, but will still schedule the Pod on some
node. As the Pod will still run, we can provide a weight to a particular rule. The weights
can be declared as a value from 1 to 100. The scheduler then tries to choose, or avoid the
node with the greatest combined value.

11.13. Node Affinity Rules

Chapter 11. Scheduling > 11.13. Node Affinity Rules

Where Pod affinity/anti-affinity has to do with other Pods, the use of nodeAffinity allows
Pod scheduling based on node labels. This is similar, and will some day replace the use of
the nodeSelector setting. The scheduler will not look at other Pods on the system, but the
labels of the nodes. This should have much less performance impact on the cluster, even
with a large number of nodes.
 Uses In, NotIn, Exists, DoesNotExist operators
 requiredDuringSchedulingIgnoredDuringExecution
 preferredDuringSchedulingIgnoredDuringExecution
 Planned for future: requiredDuringSchedulingRequiredDuringExecution
Until nodeSelector has been fully deprecated, both the selector and required labels must
be met for a Pod to be scheduled.

11.14. Node Affinity Example

Chapter 11. Scheduling > 11.14. Node Affinity Example

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/colo-tx-name
            operator: In
            values:
            - tx-aus
            - tx-dal
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: disk-speed
            operator: In
            values:
            - fast
            - quick
The first nodeAffinity rule requires a node with a key of kubernetes.io/colo-tx-
name which has one of two possible values: tx-aus or tx-dal.
The second rule gives extra weight to nodes with a key of disk-speed with a value of
fast or quick. The Pod will be scheduled on some node - in any case, this just prefers a
particular label.

11.15. Taints

Chapter 11. Scheduling > 11.15. Taints

A node with a particular taint will repel Pods without tolerations for that taint. A
taint is expressed as key=value:effect. The key and the value value are created by
the administrator.
The key and value used can be any legal string, and this allows flexibility to prevent
Pods from running on nodes based off of any need. If a Pod does not have an existing
toleration, the scheduler will not consider the tainted node.
There are three effects, or ways to handle Pod scheduling:

 NoSchedule
The scheduler will not schedule a Pod on this node, unless the Pod has this
toleration. Existing Pods continue to run, regardless of toleration.
 PreferNoSchedule
The scheduler will avoid using this node, unless there are no untainted nodes for
the Pods toleration. Existing Pods are unaffected.
 NoExecute
This taint will cause existing Pods to be evacuated and no future Pods
scheduled. Should an existing Pod have a toleration, it will continue to run. If the
Pod tolerationSeconds is set, they will remain for that many seconds, then be
evicted. Certain node issues will cause the kubelet to add 300 second
tolerations to avoid unnecessary evictions.
If a node has multiple taints, the scheduler ignores those with matching tolerations.
The remaining unignored taints have their typical effect.
The use of TaintBasedEvictions is still an alpha feature. The kubelet uses taints to
rate-limit evictions when the node has problems.

11.16. Tolerations

Chapter 11. Scheduling > 11.16. Tolerations

Setting tolerations on a node are used to schedule Pods on tainted nodes. This
provides an easy way to avoid Pods using the node. Only those with a particular
toleration would be scheduled.
An operator can be included in a Pod specification, defaulting to Equal if not declared.
The use of the operator Equal requires a value to match. The Exists operator should
not be specified. If an empty key uses the Exists operator, it will tolerate every taint. If
there is no effect, but a key and operator are declared, all effects are matched with
the declared key.
tolerations:
- key: "server"
  operator: "Equal"
  value: "ap-east"
  effect: "NoExecute"
  tolerationSeconds: 3600
In the above example, the Pod will remain on the server with a key of server and a value
of ap-east for 3600 seconds after the node has been tainted with NoExecute. When the
time runs out, the Pod will be evicted.

11.17. Custom Scheduler

Chapter 11. Scheduling > 11.17. Custom Scheduler

If the default scheduling mechanisms (affinity, taints, policies) are not flexible enough for
your needs, you can write your own scheduler. The programming of a custom scheduler is
outside the scope of this course, but you may want to start with the existing scheduler
code, which can be found in the Scheduler repository on GitHub.
If a Pod specification does not declare which scheduler to use, the standard scheduler is
used by default. If the Pod declares a scheduler, and that container is not running, the Pod
would remain in a Pending state forever.
The end result of the scheduling process is that a pod gets a binding that specifies which
node it should run on. A binding is a Kubernetes API primitive in the api/v1 group.
Technically, without any scheduler running, you could still schedule a pod on a node, by
specifying a binding for that pod.
You can also run multiple schedulers simultaneously.
You can view the scheduler and other information with:
kubectl get events

12.3. Learning Objectives

Chapter 12. Logging and Troubleshooting > 12.3. Learning Objectives

By the end of this chapter, you should be able to:


 Understand that Kubernetes does not yet have integrated logging.
 Learn which external products are often used to aggregate logs.
 Examine the basic flow of troubleshooting.
 Discuss the use of a sidecar for in-Pod logs.

12.4. Overview

Chapter 12. Logging and Troubleshooting > 12.4. Overview

Kubernetes relies on API calls and is sensitive to network issues. Standard Linux tools and
processes are the best method for troubleshooting your cluster. If a shell, such as bash, is
not available in an affected Pod, consider deploying another similar pod with a shell, like
busybox. DNS configuration files and tools like dig are a good place to start. For more
difficult challenges, you may need to install other tools, like tcpdump.
Large and diverse workloads can be difficult to track, so monitoring of usage is essential.
Monitoring is about collecting key metrics, such as CPU, memory, and disk usage, and
network bandwidth on your nodes, as well as monitoring key metrics in your applications.
These features are being ingested into Kubernetes with the Metric Server, which is a cut-
down version of the now deprecated Heapster. Once installed, the Metrics Server exposes
a standard API which can be consumed by other agents, such as autoscalers. Once
installed, this endpoint can be found here on the master server:
/apis/metrics/k8s.io/. 
Logging activity across all the nodes is another feature not part of Kubernetes. Using
Fluentd can be a useful data collector for a unified logging layer. Having aggregated logs
can help visualize the issues, and provides the ability to search all logs. It is a good place
to start when local network troubleshooting does not expose the root cause. It can be
downloaded from the Fluentd website.
Another project from CNCF combines logging, monitoring, and alerting and is called
Prometheus - you can learn more from the Prometheus website. It provides a time-series
database, as well as integration with Grafana for visualization and dashboards.
We are going to review some of the basic kubectl commands that you can use to debug
what is happening, and we will walk you through the basic steps to be able to debug your
containers, your pending containers, and also the systems in Kubernetes.
12.5. Basic Troubleshooting Steps

Chapter 12. Logging and Troubleshooting > 12.5. Basic Troubleshooting Steps

The troubleshooting flow should start with the obvious. If there are errors from the
command line, investigate them first. The symptoms of the issue will probably determine
the next step to check. Working from the application running inside a container to the
cluster as a whole may be a good idea. The application may have a shell you can use, for
example:
$ kubectl exec -ti <busybox_pod> -- /bin/sh
If the Pod is running, use kubectl logs pod-name to view the standard out of the
container. Without logs, you may consider deploying a sidecar container in the Pod to
generate and handle logging. The next place to check is networking, including DNS,
firewalls and general connectivity, using standard Linux commands and tools.
Security settings can also be a challenge. RBAC, covered in the security chapter, provides
mandatory or discretionary access control in a granular manner. SELinux and AppArmor
are also common issues, especially with network-centric applications.
A newer feature of Kubernetes is the ability to enable auditing for the kube-apiserver,
which can allow a view into actions after the API call has been accepted.
The issues found with a decoupled system like Kubernetes are similar to those of a
traditional datacenter, plus the added layers of Kubernetes controllers:
 Errors from the command line
 Pod logs and state of Pods
 Use shell to troubleshoot Pod DNS and network
 Check node logs for errors, make sure there are enough resources allocated
 RBAC, SELinux or AppArmor for security settings
 API calls to and from controllers to kube-apiserver
 Enable auditing
 Inter-node network issues, DNS and firewall
 Master server controllers (control Pods in pending or error state, errors in log files,
sufficient resources, etc).

12.6. Ephemeral Containers

Chapter 12. Logging and Troubleshooting > 12.6. Ephemeral Containers

A feature new to the 1.16 version is the ability to add a container to a running pod. This
would allow a feature-filled container to be added to an existing pod without having to
terminate and re-create. Intermittent and difficult to determine problems may take a while
to reproduce, or not exist with the addition of another container.
As an alpha stability feature, it may change or be removed at any time. As well, they will
not be restarted automatically, and several resources such as ports or resources are not
allowed.
These containers are added via the ephemeralcontainers handler via an API call, not
via the podSpec. As a result, the use of kubectl edit is not possible.
You may be able to use the kubectl attach command to join an existing process within
the container. This can be helpful instead of kubectl exec which executes a new
process. The functionality of the attached process depends entirely on what you are
attaching to.

12.7. Cluster Start Sequence

Chapter 12. Logging and Troubleshooting > 12.7. Cluster Start Sequence

The cluster startup sequence begins with systemd if you built the cluster using kubeadm.
Other tools may leverage a different method. Use systemctl status
kubelet.service to see the current state and configuration files used to run the kubelet
binary.
 Uses /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
Inside of the config.yaml file you will find several settings for the binary, including the
staticPodPath which indicates the directory where kubelet will read every yaml file and
start every pod. If you put a yaml file in this directory, it is a way to troubleshoot the
scheduler, as the pod is created with any requests to the scheduler.
 Uses /var/lib/kubelet/config.yaml configuration file
 staticPodPath is set to /etc/kubernetes/manifests/
The four default yaml files will start the base pods necessary to run the cluster:
 kubelet creates all pods from *.yaml in directory: kube-apiserver, etcd, kube-controller-
manager, kube-scheduler.
Once the watch loops and controllers from kube-controller-manager run using etcd data,
the rest of the configured objects will be created.

12.8. Monitoring

Chapter 12. Logging and Troubleshooting > 12.8. Monitoring

Monitoring is about collecting metrics from the infrastructure, as well as applications.


The long used and now deprecated Heapster has been replaced with an integrated
Metrics Server. Once installed and configured, the server exposes a standard API which
other agents can use to determine usage. It can also be configured to expose custom
metrics, which then could also be used by autoscalers to determine if an action should
take place.
Prometheus is part of the Cloud Native Computing Foundation (CNCF). As a Kubernetes
plugin, it allows one to scrape resource usage metrics from Kubernetes objects across the
entire cluster. It also has several client libraries which allow you to instrument your
application code in order to collect application level metrics.

12.9. Plugins

Chapter 12. Logging and Troubleshooting > 12.9. Plugins

We have been using the kubectl command throughout the course. The basic commands
can be used together in a more complex manner extending what can be done. There are
over seventy and growing plugins available to interact with Kubernetes objects and
components.

At the time this course was written, plugins cannot overwrite existing kubectl
commands, nor can it add sub-commands to existing commands. Writing new plugins
should take into account the command line runtime package and a Go library for plugin
authors.

More information can be found in the Kubernetes Documentation, "Extend kubectl with


plugins".

As a plugin the declaration of options such as namespace or container to use must come
after the command.

$ kubectl sniff bigpod-abcd-123 -c mainapp -n accounting

Plugins can be distributed in many ways. The use of krew (the kubectl plugin manager)
allows for cross-platform packaging and a helpful plugin index, which makes finding new
plugins easy.

Install the software using steps available in krew's GitHub repository.

$ kubectl krew help

You can invoke krew through kubectl:

kubectl krew [command]...

Usage:

krew [command]

Available Commands

help Help about any command

info Show information about kubectl plugin


install Install kubectl plugins

list List installed kubectl plugins

search Discover kubectl plugins

uninstall Uninstall plugins

update Update the local copy of the plugin index

upgrade Upgrade installed plugins to newer versions

version Show krew version and diagnostics

12.10. Managing Plugins

Chapter 12. Logging and Troubleshooting > 12.10. Managing Plugins

The help option explains basic operation. After installation ensure the $PATH includes
the plugins. krew should allow easy installation and use after that.

$ export PATH="${KREW_ROOT:-$HOME/.krew}/bin:$PATH"
$ kubectl krew search

NAME              DESCRIPTION                                     
INSTALLED
access-matrix     Show an RBAC access matrix for server resources 
no
advise-psp        Suggests PodSecurityPolicies for cluster.       
no
....

$ kubectl krew install tail

Updated the local copy of plugin index.


Installing plugin: tail
Installed plugin: tail
\
 | Use this plugin:

....

 | | Usage:
 | |
 | | # match all pods
 | | $ kubectl tail
 | |
 | | # match pods in the 'frontend' namespace
 | | $ kubectl tail --ns staging
....

In order to view the current plugins use:

kubectl plugin list

To find new plugins use:

kubectl krew search

To install use:

kubectl krew install new-plugin

Once installed use as kubectl sub-command. You can also upgrade and uninstall.

12.11. Sniffing Traffic With Wireshark

Chapter 12. Logging and Troubleshooting > 12.11. Sniffing Traffic With Wireshark

Cluster network traffic is encrypted making troubleshooting of possible network issues


more complex. Using the sniff plugin you can view the traffic from within. sniff requires
Wireshark and ability to export graphical display.

The sniff command will use the first found container unless you pass the -c option to
declare which container in the pod to use for traffic monitoring.

$ kubectl krew install sniff nginx-123456-abcd -c webcont

12.12. Logging Tools

Chapter 12. Logging and Troubleshooting > 12.12. Logging Tools

Logging, like monitoring, is a vast subject in IT. It has many tools that you can use as part
of your arsenal.
Typically, logs are collected locally and aggregated before being ingested by a search
engine and displayed via a dashboard which can use the search syntax. While there are
many software stacks that you can use for logging, the Elasticsearch, Logstash, and
Kibana Stack (ELK) has become quite common.
In Kubernetes, the kubelet writes container logs to local files (via the Docker logging
driver). The kubectl logs command allows you to retrieve these logs.
Cluster-wide, you can use Fluentd to aggregate logs. Check the cluster administration
logging concepts for a detailed description.
Fluentd is part of the Cloud Native Computing Foundation and, together with Prometheus,
they make a nice combination for monitoring and logging. You can find a detailed walk-
through of running Fluentd on Kubernetes in the Kubernetes documentation.
Setting up Fluentd for Kubernetes logging is a good exercise in understanding
DaemonSets. Fluentd agents run on each node via a DaemonSet, they aggregate the
logs, and feed them to an Elasticsearch instance prior to visualization in a Kibana
dashboard.

12.13. More Resources

Chapter 12. Logging and Troubleshooting > 12.13. More Resources

There are several things that you can do to quickly diagnose potential issues with your
application and/or cluster. The official documentation offers additional materials to help
you get familiar with troubleshooting:
 General guidelines and instructions (Troubleshooting) 
 Troubleshooting applications
 Troubleshooting clusters
 Debugging Pods
 Debugging Services
 GitHub website for issues and bug tracking
 Kubernetes Slack channel.

13.3. Learning Objectives

Chapter 13. Custom Resource Definitions > 13.3. Learning Objectives

By the end of this chapter, you should be able to:


 Grow available Kubernetes objects.
 Deploy a new Custom Resource Definition.
 Deploy a new resource and API endpoint.
 Discuss aggregated APIs.

13.4. Custom Resources

Chapter 13. Custom Resource Definitions > 13.4. Custom Resources

We have been working with built-in resources, or API endpoints. The flexibility of
Kubernetes allows for the dynamic addition of new resources as well. Once these Custom
Resources have been added, the objects can be created and accessed using standard
calls and commands, like kubectl. The creation of a new object stores new structured data
in the etcd database and allows access via kube-apiserver.
To make a new custom resource part of a declarative API, there needs to be a controller to
retrieve the structured data continually and act to meet and maintain the declared state.
This controller, or operator, is an agent that creates and manages one or more
instances of a specific stateful application. We have worked with built-in controllers such
as Deployments, DaemonSets and other resources.
The functions encoded into a custom operator should be all the tasks a human would need
to perform if deploying the application outside of Kubernetes. The details of building a
custom controller are outside the scope of this course, and thus, not included.
There are two ways to add custom resources to your Kubernetes cluster. The easiest, but
less flexible, way is by adding a Custom Resource Definition (CRD) to the cluster. The
second way, which is more flexible, is the use of Aggregated APIs (AA), which requires a
new API server to be written and added to the cluster.
Either way of adding a new object to the cluster, as distinct from a built-in resource, is
called a Custom Resource.
If you are using RBAC for authorization, you probably will need to grant access to the new
CRD resource and controller. If using an Aggregated API, you can use the same or a
different authentication process.

13.5. Custom Resource Definitions

Chapter 13. Custom Resource Definitions > 13.5. Custom Resource Definitions

As we have already learnt, the decoupled nature of Kubernetes depends on a collection of


watcher loops, or controllers, interrogating the kube-apiserver to determine if a particular
configuration is true. If the current state does not match the declared state, the controller
makes API calls to modify the state until they do match. If you add a new API object and
controller, you can use the existing kube-apiserver to monitor and control the object. The
addition of a Custom Resource Definition will be added to the cluster API path, currently
under apiextensions.k8s.io/v1beta1.
While this is the easiest way to add a new object to the cluster, it may not be flexible
enough for your needs. Only the existing API functionality can be used. Objects must
respond to REST requests and have their configuration state validated and stored in the
same manner as built-in objects. They would also need to exist with the protection rules of
built-in objects.
A CRD allows the resource to be deployed in a namespace or be available in the entire
cluster. The YAML file sets this with the scope: parameter, which can be set to
Namespaced or Cluster.
Prior to v1.8, there was a resource type called ThirdPartyResource (TPR). This has been
deprecated and is no longer available. All resources will need to be rebuilt as CRD. After
upgrading, existing TPRs will need to be removed and replaced by CRDs such that the
API URL points to functional objects.

13.6. Configuration Example


Chapter 13. Custom Resource Definitions > 13.6. Configuration Example

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: backups.stable.linux.com
spec:
  group: stable.linux.com
  version: v1
  scope: Namespaced
  names:
    plural: backups
    singular: backup
    shortNames:
    - bks
    kind: BackUp
apiVersion: Should match the current level of stability, currently
apiextensions.k8s.io/v1beta1.
kind: CustomResourceDefinition The object type being inserted by the kube-apiserver.
name: backups.stable.linux.com The name must match the spec field declared later.
The syntax must be <plural name>.<group>.
group: stable.linux.com The group name will become part of the REST API under
/apis/<group>/<version> or /apis/ stable/v1. in this case with the version set
to v1.
scope: Determines if the object exists in a single namespace or is cluster-wide.
plural: Defines the last part of the API URL, such as apis/stable/v1/backups.
singular and shortNames represent the name displayed and make CLI usage easier.
kind: A CamelCased singular type used in resource manifests.

13.7. New Object Configuration

Chapter 13. Custom Resource Definitions > 13.7. New Object Configuration

apiVersion: "stable.linux.com/v1"
kind: BackUp
metadata:
  name: a-backup-object
spec:
  timeSpec: "* * * * */5"
  image: linux-backup-image
  replicas: 5
Note that the apiVersion and kind match the CRD we created in a previous step. The
spec parameters depend on the controller.
The object will be evaluated by the controller. If the syntax, such as timeSpec, does not
match the expected value, you will receive and error, should validation be configured.
Without validation, only the existence of the variable is checked, not its details.

13.8. Optional Hooks

Chapter 13. Custom Resource Definitions > 13.8. Optional Hooks

Just as with built-in objects, you can use an asynchronous pre-delete hook known as a
Finalizer. If an API delete request is received, the object metadata field
metadata.deletionTimestamp is updated. The controller then triggers whichever
finalizer has been configured. When the finalizer completes, it is removed from the list. The
controller continues to complete and remove finalizers until the string is empty. Then, the
object itself is deleted.
Finalizer:
metadata:
  finalizers:
  - finalizer.stable.linux.com
Validation:
validation:
    openAPIV3Schema:
      properties:
        spec:
          properties:
            timeSpec:
              type: string
              pattern: '^(\d+|\*)(/\d+)?(\s+(\d+|\*)(/\d+)?){4}$'
            replicas:
              type: integer
              minimum: 1
              maximum: 10
A feature in beta starting with v1.9 allows for validation of custom objects via the OpenAPI
v3 schema. This will check various properties of the object configuration being passed
by the API server. In the example above, the timeSpec must be a string matching a
particular pattern and the number of allowed replicas is between 1 and 10. If the validation
does not match, the error returned is the failed line of validation.

13.9. Understanding Aggregated APIs (AA)

Chapter 13. Custom Resource Definitions > 13.9. Understanding Aggregated APIs
(AA)

The use of Aggregated APIs allows adding additional Kubernetes-type API servers to the
cluster. The added server acts as a subordinate to kube-apiserver, which, as of v1.7, runs
the aggregation layer in-process. When an extension resource is registered, the
aggregation layer watches a passed URL path and proxies any requests to the newly
registered API service.
The aggregation layer is easy to enable. Edit the flags passed during startup of the kube-
apiserver to include --enable-aggregator-routing=true. Some vendors enable
this feature by default.
The creation of the exterior can be done via YAML configuration files or APIs. Configuring
TLS authorization between components and RBAC rules for various new objects is also
required. A sample API server is available on GitHub. A project currently in the incubation
stage is an API server builder which should handle much of the security and connection
configuration.

14.3. Learning Objectives

Chapter 14. Helm > 14.3. Learning Objectives

By the end of this chapter, you should be able to:


 Examine easy Kubernetes deployments using the Helm package manager.
 Understand the Chart template used to describe what application to deploy.
 Discuss how Tiller creates the Deployment based on the Chart.
 Initialize Helm in a cluster.

14.4. Deploying Complex Applications

Chapter 14. Helm > 14.4. Deploying Complex Applications

We have used Kubernetes tools to deploy simple Docker applications. Starting with the
v1.4 release, the goal was to have a canonical location for software. Helm is similar to a
package manager like yum or apt, with a chart being similar to a package. Helm v3 is
significantly different than v2.
A typical containerized application will have several manifests. Manifests for deployments,
services, and ConfigMaps. You will probably also create some secrets, Ingress, and other
objects. Each of these will need a manifest.
With Helm, you can package all those manifests and make them available as a single
tarball. You can put the tarball in a repository, search that repository, discover an
application, and then, with a single command, deploy and start the entire application.
The server runs in your Kubernetes cluster, and your client is local, even a local laptop.
With your client, you can connect to multiple repositories of applications.
You will also be able to upgrade or roll back an application easily from the command line.

14.5. Helm v2 and Tiller

Chapter 14. Helm > 14.5. Helm v2 and Tiller


Helm version 2 uses a Tiller pod to deploy in the cluster. This has led to a lot of issues with
security and cluster permissions. The new Helm v3 does not deploy a pod.
The helm tool packages a Kubernetes application using a series of YAML files into a chart,
or package. This allows for simple sharing between users, tuning using a templating
scheme, as well as provenance tracking, among other things.
Helm v2 is made of two components:
 A server called Tiller, which runs inside your Kubernetes cluster.
 A client called Helm, which runs on your local machine.
With the Helm client you can browse package repositories (containing published Charts),
and deploy those Charts on your Kubernetes cluster. The Helm will download the chart
and pass a request to Tiller to create a release, otherwise known as an instance of a
chart. The release will be made of various resources running in the Kubernetes cluster.

Basic Helm and Tiller Flow

14.6. Helm v3

Chapter 14. Helm > 14.6. Helm v3


With the near complete overhaul of Helm, the processes and commands have changed
quite a bit. Expect to spend some time updating and integrating these changes if you are
currently using Helm v2.
One of the most noticeable changes is the removal of the Tiller pod. This was an ongoing
security issue, as the pod needed elevated permissions to deploy charts. The functionality
is in the command alone, and no longer requires initialization to use.
In version 2, an update to a chart and deployment used a 2-way strategic merge for
patching. This compared the previous manifest to the intended manifest, but not the
possible edits done outside of helm commands. The third way now checked is the live
state of objects.
Among other changes, software installation no longer generates a name automatically.
One must be provided, or the --generated-name option must be passed.

14.7. Chart Contents

Chapter 14. Helm > 14.7. Chart Contents

A chart is an archived set of Kubernetes resource manifests that make up a distributed


application. You can check out the GitHub repository where the Kubernetes community is
curating charts. Others exist and can be easily created, for example by a vendor providing
software. Charts are similar to the use of independent YUM repositories.
├── Chart.yaml
├── README.md
├── templates
│   ├── NOTES.txt
│   ├── _helpers.tpl
│   ├── configmap.yaml
│   ├── deployment.yaml
│   ├── pvc.yaml
│   ├── secrets.yaml
│   └── svc.yaml
└── values.yaml
The Chart.yaml file contains some metadata about the Chart, like its name, version,
keywords, and so on, in this case, for MariaDB. The values.yaml file contains keys and
values that are used to generate the release in your cluster. These values are replaced in
the resource manifests using the Go templating syntax. And finally,
the templates directory contains the resource manifests that make up this MariaDB
application.
More about creating charts can be found in the Helm v3 documentation.

14.8. Templates

Chapter 14. Helm > 14.8. Templates

The templates are resource manifests that use the Go templating syntax. Variables
defined in the values file, for example, get injected in the template when a release is
created. In the MariaDB example we provided, the database passwords are stored in a
Kubernetes secret, and the database configuration is stored in a Kubernetes
ConfigMap.
We can see that a set of labels are defined in the Secret metadata using the Chart
name, Release name, etc. The actual values of the passwords are read from
the values.yaml file.
apiVersion: v1
kind: Secret
metadata:
    name: {{ template "fullname" . }}
    labels:
        app: {{ template "fullname" . }}
        chart: "{{ .Chart.Name }}-{{ .Chart.Version }}"
        release: "{{ .Release.Name }}"
        heritage: "{{ .Release.Service }}"
type: Opaque
data:     
    mariadb-root-password: {{ default "" .Values.mariadbRootPassword |
b64enc | quote }}
    mariadb-password: {{ default "" .Values.mariadbPassword | b64enc |
quote }}

14.9. Initializing Helm v2

Chapter 14. Helm > 14.9. Initializing Helm v2

Helm v3 does not need to be initialized.


As always, you can build Helm from source, or download a tarball. We expect to see Linux
packages for the stable release soon. The current RBAC security requirements to deploy
helm require the creation of a new serviceaccount and assigning of permissions and
roles. There are several optional settings which can be passed to the helm init
command, typically for particular security concerns, storage options and also a dry-run
option.
$ helm init
...
Tiller (the helm server side component) has been installed into your
Kubernetes Cluster.
Happy Helming!
$ kubectl get deployments --namespace=kube-system
NAMESPACE     NAME           READY  UP-TO-DATE  AVAILABLE  AGE
kube-system   tiller-deploy  1/1    1           1          15s   
The helm v2 initialization should have created a new tiller-deploy pod in your
cluster. Please note that this will create a deployment in the kube-system namespace.
The client will be able to communicate with the tiller pod using port forwarding. Hence,
you will not see any service exposing tiller.
14.10. Chart Repositories

Chapter 14. Helm > 14.10. Chart Repositories

A default repository is included when initializing helm, but it's common to add other
repositories. Repositories are currently simple HTTP servers that contain an index file and
a tarball of all the Charts present.
You can interact with a repository using the helm repo commands.
$ helm repo add testing http://storage.googleapis.com/kubernetes-charts-
testing
$ helm repo list
NAME     URL
stable   http://storage.googleapis.com/kubernetes-charts
local    http://localhost:8879/charts
testing  http://storage.googleapis.com/kubernetes-charts...
Once you have a repository available, you can search for Charts based on keywords.
Below, we search for a redis Chart:
$ helm search redis
WARNING: Deprecated index file format. Try 'helm repo update'
NAME                      VERSION   DESCRIPTION
testing/redis-cluster     0.0.5     Highly available Redis cluster with
multiple se...
testing/redis-standalone  0.0.1     Standalone Redis Master testing/...
Once you find the chart within a repository, you can deploy it on your cluster.

14.11.a. Deploying a Chart

Chapter 14. Helm > 14.11.a. Deploying a Chart

To deploy a Chart, you can just use the helm install command. There may be several


required resources for the installation to be successful, such as available PVs to match
chart PVC. Currently, the only way to discover which resources need to exist is by reading
the READMEs for each chart :

$ helm install testing/redis-standalone


Fetched testing/redis-standalone to redis-standalone-0.0.1.tgz
amber-eel
Last Deployed: Fri Oct 21 12:24:01 2016
Namespace: default

Status: DEPLOYED

Resources:
==> v1/ReplicationController
NAME              DESIRED  CURRENT  READY  AGE

redis-standalone  1        1        0      1s
==> v1/Service
NAME    CLUSTER-IP  EXTERNAL-IP  PORT(S)   AGE

redis   10.0.81.67  <none>       6379/TCP  0s

You will be able to list the release, delete it, even upgrade it and roll back.

$ helm list
NAME       REVISION  UPDATED                   STATUS    CHART
amber-eel  1         Fri Oct 21 12:24:01 2016  DEPLOYED  redis-
standalone-0.0.1

14.11.b. Deploying a Chart (Cont.)

Chapter 14. Helm > 14.11.b. Deploying a Chart (Cont.)

A unique, colorful name will be created for each helm instance deployed. You can also
use kubectl to view new resources Helm created in your cluster.
The output of the deployment should be carefully reviewed. It often includes information on
access to the applications within. If your cluster did not have a required cluster resource,
the output is often the first place to begin troubleshooting.

15.3. Learning Objectives

Chapter 15. Security > 15.3. Learning Objectives

By the end of this chapter, you should be able to:


 Explain the flow of API requests.
 Configure authorization rules.
 Examine authentication policies.
 Restrict network traffic with network policies.

15.4. Overview

Chapter 15. Security > 15.4. Overview

Security is a big and complex topic, especially in a distributed system like Kubernetes.
Thus, we are just going to cover some of the concepts that deal with security in the context
of Kubernetes.
Then, we are going to focus on the authentication aspect of the API server and we will dive
into authorization, looking at things like ABAC and RBAC, which is now the default
configuration when you bootstrap a Kubernetes cluster with kubeadm.
We are going to look at the admission control system, which lets you look at and
possibly modify the requests that are coming in, and do a final deny or accept on those
requests.
Following that, we're going to look at a few other concepts, including how you can secure
your Pods more tightly using security contexts and pod security policies, which are full-
fledged API objects in Kubernetes.
Finally, we will look at network policies. By default, we tend not to turn on network policies,
which let any traffic flow through all of our pods, in all the different namespaces. Using
network policies, we can actually define Ingress rules so that we can restrict the Ingress
traffic between the different namespaces. The network tool in use, such as Flannel or
Calico will determine if a network policy can be implemented. As Kubernetes becomes
more mature, this will become a strongly suggested configuration.

15.5.a. Accessing the API

Chapter 15. Security > 15.5.a. Accessing the API

To perform any action in a Kubernetes cluster, you need to access the API and go through
three main steps:
 Authentication
 Authorization (ABAC or RBAC)
 Admission Control.
These steps are described in more detail in the official documentation about controlling
access to the API and illustrated by the picture below:

Accessing the API (retrieved from the Kubernetes website)

15.5.b. Accessing the API (Cont.)


Chapter 15. Security > 15.5.b. Accessing the API (Cont.)

Once a request reaches the API server securely, it will first go through any authentication
module that has been configured. The request can be rejected if authentication fails or it
gets authenticated and passed to the authorization step.
At the authorization step, the request will be checked against existing policies. It will be
authorized if the user has the permissions to perform the requested actions. Then, the
requests will go through the last step of admission. In general, admission controllers will
check the actual content of the objects being created and validate them before admitting
the request.
In addition to these steps, the requests reaching the API server over the network are
encrypted using TLS. This needs to be properly configured using SSL certificates. If you
use kubeadm, this configuration is done for you; otherwise, follow Kelsey Hightower's
guide Kubernetes The Hard Way, or the API server configuration options.

15.6. Authentication

Chapter 15. Security > 15.6. Authentication

There are three main points to remember with authentication in Kubernetes:


 In its straightforward form, authentication is done with certificates, tokens or basic
authentication (i.e. username and password).
 Users are not created by the API, but should be managed by an external system.
 System accounts are used by processes to access the API.
There are two more advanced authentication mechanisms: Webhooks which can be used
to verify bearer tokens, and connection with an external OpenID provider.
The type of authentication used is defined in the kube-apiserver startup options. Below
are four examples of a subset of configuration options that would need to be set
depending on what choice of authentication mechanism you choose:
--basic-auth-file
--oidc-issuer-url
--token-auth-file
--authorization-webhook-config-file
One or more Authenticator Modules are used: x509 Client Certs; static token, bearer or
bootstrap token; static password file; service account and OpenID connect tokens. Each is
tried until successful, and the order is not guaranteed. Anonymous access can also be
enabled, otherwise you will get a 401 response. Users are not created by the API, and
should be managed by an external system. 
To learn more about authentication, see the official Kubernetes documentation.

15.7. Authorization
Chapter 15. Security > 15.7. Authorization

Once a request is authenticated, it needs to be authorized to be able to proceed through


the Kubernetes system and perform its intended action.
There are three main authorization modes and two global Deny/Allow settings. The three
main modes are:
 ABAC
 RBAC
 WebHook.
They can be configured as kube-apiserver startup options:
--authorization-mode=ABAC
--authorization-mode=RBAC
--authorization-mode=Webhook
--authorization-mode=AlwaysDeny
--authorization-mode=AlwaysAllow
The authorization modes implement policies to allow requests. Attributes of the requests
are checked against the policies (e.g. user, group, namespace, verb).

15.8. ABAC

Chapter 15. Security > 15.8. ABAC

ABAC stands for Attribute Based Access Control. It was the first authorization model in
Kubernetes that allowed administrators to implement the right policies. Today, RBAC is
becoming the default authorization mode.
Policies are defined in a JSON file and referenced to by a kube-apiserver startup option:
--authorization-policy-file=my_policy.json
 For example, the policy file shown below, authorizes user Bob to read pods in the
namespace foobar:
{
    "apiVersion": "abac.authorization.kubernetes.io/v1beta1",
    "kind": "Policy",
    "spec": {
        "user": "bob",
        "namespace": "foobar",
        "resource": "pods",
        "readonly": true     
    }
}
 
You can check other policy examples in the Kubernetes documentation. 
15.9. RBAC

Chapter 15. Security > 15.9. RBAC

RBAC stands for Role Based Access Control. 


All resources are modeled API objects in Kubernetes, from Pods to Namespaces. They
also belong to API Groups, such as core and apps. These resources allow operations
such as Create, Read, Update, and Delete (CRUD), which we have been working with so
far. Operations are called verbs inside YAML files. Adding to these basic components, we
will add more elements of the API, which can then be managed via RBAC.
Rules are operations which can act upon an API group. Roles are a group of rules which
affect, or scope, a single namespace, whereas ClusterRoles have a scope of the entire
cluster.
Each operation can act upon one of three subjects, which are User Accounts which
don't exist as API objects, Service Accounts, and Groups which are known as
clusterrolebinding when using kubectl.
RBAC is then writing rules to allow or deny operations by users, roles or groups upon
resources.

15.10. RBAC Process Overview

Chapter 15. Security > 15.10. RBAC Process Overview

While RBAC can be complex, the basic flow is to create a certificate for a user. As a user
is not an API object of Kubernetes, we are requiring outside authentication, such as
OpenSSL certificates. After generating the certificate against the cluster certificate
authority, we can set that credential for the user using a context.
Roles can then be used to configure an association of apiGroups, resources, and the
verbs allowed to them. The user can then be bound to a role limiting what and where they
can work in the cluster.
Here is a summary of the RBAC process:
 Determine or create namespace
 Create certificate credentials for user
 Set the credentials for the user to the namespace using a context
 Create a role for the expected task set
 Bind the user to the role
 Verify the user has limited access.

15.11. Admission Controller

Chapter 15. Security > 15.11. Admission Controller


The last step in letting an API request into Kubernetes is admission control.
Admission controllers are pieces of software that can access the content of the objects
being created by the requests. They can modify the content or validate it, and potentially
deny the request.
Admission controllers are needed for certain features to work properly. Controllers have
been added as Kubernetes matured. As of the v1.12 release, the kube-apiserver uses a
compiled set of controllers. Instead of passing a list, we can enable or disable particular
controllers. If you want to use a controller that is not available by default, you would need
to download from source and compile.
The first controller is Initializers which will allow the dynamic modification of the API
request, providing great flexibility. Each admission controller functionality is explained in
the documentation. For example, the ResourceQuota controller will ensure that the
object created does not violate any of the existing quotas.

15.12. Security Contexts

Chapter 15. Security > 15.12. Security Contexts

Pods and containers within pods can be given specific security constraints to limit what
processes running in containers can do. For example, the UID of the process, the Linux
capabilities, and the filesystem group can be limited.
This security limitation is called a security context. It can be defined for the entire pod or
per container, and is represented as additional sections in the resources manifests. The
notable difference is that Linux capabilities are set at the container level.
For example, if you want to enforce a policy that containers cannot run their process as the
root user, you can add a pod security context like the one below:
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  securityContext:
    runAsNonRoot: true
  containers:
  - image: nginx
    name: nginx
Then, when you create this pod, you will see a warning that the container is trying to run
as root and that it is not allowed. Hence, the Pod will never run:
$ kubectl get pods
NAME   READY  STATUS                                             
RESTARTS  AGE
nginx  0/1    container has runAsNonRoot and image will run as
root  0         10s
You can read more in the Kubernetes documentation about configuring security contexts
to give proper constraints to your pods or containers.
15.13.a. Pod Security Policies

Chapter 15. Security > 15.13.a. Pod Security Policies

To automate the enforcement of security contexts, you can define


PodSecurityPolicies (PSP). A PSP is defined via a standard Kubernetes manifest following
the PSP API schema. An example is presented on the next page.
These policies are cluster-level rules that govern what a pod can do, what they can
access, what user they run as, etc.
For instance, if you do not want any of the containers in your cluster to run as the
root user, you can define a PSP to that effect. You can also prevent containers from being
privileged or use the host network namespace, or the host PID namespace.

15.13.b. Pod Security Policies (Cont.)

Chapter 15. Security > 15.13.b. Pod Security Policies (Cont.)

You can see an example of a PSP below:


apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: restricted
spec:
  seLinux:
    rule: RunAsAny
  supplementalGroups:
    rule: RunAsAny
  runAsUser:
    rule: MustRunAsNonRoot
  fsGroup:
    rule: RunAsAny

For Pod Security Policies to be enabled, you need to configure the admission controller of
the controller-manager to contain PodSecurityPolicy. These policies make even more
sense when coupled with the RBAC configuration in your cluster. This will allow you to
finely tune what your users are allowed to run and what capabilities and low level
privileges their containers will have. 

See the PSP RBAC example on GitHub for more details. 

15.14. Network Security Policies

Chapter 15. Security > 15.14. Network Security Policies

By default, all pods can reach each other; all ingress and egress traffic is allowed. This has
been a high-level networking requirement in Kubernetes. However, network isolation can
be configured and traffic to pods can be blocked. In newer versions of Kubernetes, egress
traffic can also be blocked. This is done by configuring a NetworkPolicy. As all traffic is
allowed, you may want to implement a policy that drops all traffic, then, other policies
which allow desired ingress and egress traffic.

The spec of the policy can narrow down the effect to a particular namespace, which can
be handy. Further settings include a podSelector, or label, to narrow down which Pods
are affected. Further ingress and egress settings declare traffic to and from IP addresses
and ports.

Not all network providers support the NetworkPolicies kind. A non-exhaustive list of
providers with support includes Calico, Romana, Cilium, Kube-router, and WeaveNet.

In previous versions of Kubernetes, there was a requirement to annotate a namespace as


part of network isolation, specifically the net.beta.kubernetes.io/network-
policy= value. Some network plugins may still require this setting.

On the next page, you can find an example of a NetworkPolicy recipe. More network
policy recipes can be found on GitHub.

15.15.a. Network Security Policy Example

Chapter 15. Security > 15.15.a. Network Security Policy Example

The use of policies has become stable, noted with the v1 apiVersion. The example
below narrows down the policy to affect the default namespace.
Only Pods with the label of role: db will be affected by this policy, and the policy has
both Ingress and Egress settings.
The ingress setting includes a 172.17 network, with a smaller range of 172.17.1.0 IPs
being excluded from this traffic.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: ingress-egress-policy
  namespace: default
spec:
  podSelector:
    matchLabels:
      role: db
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - ipBlock:
        cidr: 172.17.0.0/16
        except:
        - 172.17.1.0/24
<continued_on_next_page>

15.15.b. Network Security Policy Example (Cont.)

Chapter 15. Security > 15.15.b. Network Security Policy Example (Cont.)

  - namespaceSelector:
      matchLabels:
        project: myproject
  - podSelector:
      matchLabels:
        role: frontend
  ports:
  - protocol: TCP
    port: 6379
egress:
- to:
  - ipBlock:
      cidr: 10.0.0.0/24
  ports:
  - protocol: TCP
    port: 5978
These rules change the namespace for the following settings to be labeled project:
myproject. The affected Pods also would need to match the label role: frontend.
Finally, TCP traffic on port 6379 would be allowed from these Pods.
The egress rules have the to settings, in this case the 10.0.0.0/24 range TCP traffic to
port 5978.
The use of empty ingress or egress rules denies all type of traffic for the included Pods,
though this is not suggested. Use another dedicated NetworkPolicy instead.
Note that there can also be complex matchExpressions statements in the spec, but this
may change as NetworkPolicy matures.
podSelector:
  matchExpressions:
    - {key: inns, operator: In, values: ["yes"]}

15.16.a. Default Policy Example

Chapter 15. Security > 15.16.a. Default Policy Example

The empty braces will match all Pods not selected by other NetworkPolicy and will not
allow ingress traffic. Egress traffic would be unaffected by this policy.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny
spec:
  podSelector: {}
  policyTypes:
  - Ingress
With the potential for complex ingress and egress rules, it may be helpful to create multiple
objects which include simple isolation rules and use easy to understand names and labels.

15.16.b. Default Policy Example (Cont.)

Chapter 15. Security > 15.16.b. Default Policy Example (Cont.)

Some network plugins, such as WeaveNet, may require annotation of the Namespace.
The following shows the setting of a DefaultDeny for the myns namespace:

kind: Namespace
apiVersion: v1
metadata:
  name: myns
  annotations:
    net.beta.kubernetes.io/network-policy: |
     {
        "ingress": {
          "isolation": "DefaultDeny"
        }
     }

16.2. Learning Objectives

Chapter 16. High Availability > 16.2. Learning Objectives

By the end of this chapter, you should be able to:


 Discuss about high availability in Kubernetes.
 Discuss about collocated and non-collocated databases.
 Learn steps to achieve high availability in Kubernetes.

16.3. Cluster High Availability

Chapter 16. High Availability > 16.3. Cluster High Availability

A newer feature of kubeadm is the integrated ability to join multiple master nodes with
collocated etcd databases. This allows for higher redundancy and fault tolerance. As long
as the database services the cluster will continue to run and catch up with kubelet
information should the master node go down and be brought back online.
Three instances are required for etcd to be able to determine quorum if the data is
accurate, or if the data is corrupt, the database could become unavailable. Once etcd is
able to determine quorum, it will elect a leader and return to functioning as it had before
failure.
One can either collocate the database with control planes or use an external etcd
database cluster. The kubeadm command makes the collocated deployment easier to
use.
To ensure that workers and other control planes continue to have access, it is a good idea
to use a load balancer. The default configuration leverages SSL, so you may need to
configure the load balancer as a TCP passthrough unless you want the extra work of
certificate configuration. As the certificates will be decoded only for particular node names,
it is a good idea to use a FQDN instead of an IP address, although there are many
possible ways to handle access.

16.4. Collocated Databases

Chapter 16. High Availability > 16.4. Collocated Databases

The easiest way to gain higher availability is to use the kubeadm command and join at
least two more master servers to the cluster. The command is almost the same as a
worker join except an additional --control-plane flag and a certificate-key. The key
will probably need to be generated unless the other master nodes are added within two
hours of the cluster initialization.
Should a node fail, you would lose both a control plane and a database. As the database
is the one object that cannot be rebuilt, this may not be an important issue.

16.5. Non-Collocated Databases

Chapter 16. High Availability > 16.5. Non-Collocated Databases

Using an external cluster of etcd allows for less interruption should a node fail. Creating a
cluster in this manner requires a lot more equipment to properly spread out services and
takes more work to configure.
The external etcd cluster needs to be configured first. The kubeadm command has
options to configure this cluster, or other options are available. Once the etcd cluster is
running, the certificates need to be manually copied to the intended first control plane
node.
The kubeadm-config.yaml file needs to be populated with the etcd set to external,
endpoints, and the certificate locations. Once the first control plane is fully initialized, the
redundant control planes need to be added one at a time, each fully initialized before the
next is added.

You might also like