GCP Professional Network Engineer Certification
GCP Professional Network Engineer Certification
GCP Professional Network Engineer Certification
Dario Cabianca
Google Cloud Platform (GCP) Professional Cloud Network Engineer Certification
Companion: Learn and Apply Network Design Concepts to Prepare for the Exam
Dario Cabianca
Georgetown, KY, USA
■
■Chapter 1: Exam Overview������������������������������������������������������������������������������������ 1
Exam Content������������������������������������������������������������������������������������������������������������������� 1
Exam Subject Areas�������������������������������������������������������������������������������������������������������������������������������� 2
Exam Format������������������������������������������������������������������������������������������������������������������������������������������ 2
Summary�������������������������������������������������������������������������������������������������������������������������� 8
■
■Chapter 2: Designing, Planning, and Prototyping a Google Cloud Network���������� 9
Designing an Overall Network Architecture��������������������������������������������������������������������� 9
High Availability, Failover, and Disaster Recovery Strategies��������������������������������������������������������������� 11
DNS (Domain Name System) Strategy�������������������������������������������������������������������������������������������������� 16
Security and Data Exfiltration Requirements���������������������������������������������������������������������������������������� 18
Load Balancing������������������������������������������������������������������������������������������������������������������������������������� 19
Applying Quotas per Project and per VPC��������������������������������������������������������������������������������������������� 24
■ Table of Contents
Container Networking��������������������������������������������������������������������������������������������������������������������������� 24
SaaS, PaaS, and IaaS Services������������������������������������������������������������������������������������������������������������� 29
viii
■ Table of Contents
Summary������������������������������������������������������������������������������������������������������������������������ 84
Exam Questions�������������������������������������������������������������������������������������������������������������� 85
Question 2.1 (VPC Peering)������������������������������������������������������������������������������������������������������������������� 85
Question 2.2 (Private Google Access)��������������������������������������������������������������������������������������������������� 85
■
■Chapter 3: Implementing Virtual Private Cloud Instances���������������������������������� 87
Configuring VPCs������������������������������������������������������������������������������������������������������������ 87
Configuring VPC Resources������������������������������������������������������������������������������������������������������������������ 88
Configuring VPC Peering����������������������������������������������������������������������������������������������������������������������� 92
Creating a Shared VPC and Sharing Subnets with Other Projects�������������������������������������������������������� 98
Using a Shared VPC���������������������������������������������������������������������������������������������������������������������������� 107
Sharing Subnets Using Folders���������������������������������������������������������������������������������������������������������� 111
Configuring API Access to Google Services (e.g., Private Google Access,
Public Interfaces)�������������������������������������������������������������������������������������������������������������������������������� 112
Expanding VPC Subnet Ranges After Creation������������������������������������������������������������������������������������ 118
ix
■ Table of Contents
■
■Chapter 4: Implementing Virtual Private Cloud Service Controls���������������������� 181
VPC Service Controls Introduction�������������������������������������������������������������������������������� 181
Creating and Configuring Access Levels and Service Perimeters�������������������������������� 182
Perimeters������������������������������������������������������������������������������������������������������������������������������������������ 182
Access Levels������������������������������������������������������������������������������������������������������������������������������������� 184
■
■Chapter 5: Configuring Load Balancing������������������������������������������������������������� 215
Google Cloud Load Balancer Family����������������������������������������������������������������������������� 216
Backend Services and Network Endpoint Groups (NEGs)������������������������������������������������������������������� 217
Firewall Rules to Allow Traffic and Health Checks to Backend Services�������������������������������������������� 221
Configuring External HTTP(S) Load Balancers Including Backends and Backend
Services with Balancing Method, Session Affinity, and Capacity Scaling/Scaler������������������������������� 223
External TCP and SSL Proxy Load Balancers�������������������������������������������������������������������������������������� 252
Network Load Balancers�������������������������������������������������������������������������������������������������������������������� 256
Internal HTTP(S) and TCP Proxy Load Balancers�������������������������������������������������������������������������������� 260
Load Balancer Summary�������������������������������������������������������������������������������������������������������������������� 262
Protocol Forwarding��������������������������������������������������������������������������������������������������������������������������� 265
Accommodating Workload Increases Using Autoscaling vs. Manual Scaling������������������������������������� 266
xi
■ Table of Contents
■
■Chapter 6: Configuring Advanced Network Services����������������������������������������� 291
Configuring and Maintaining Cloud DNS���������������������������������������������������������������������� 291
Managing Zones and Records������������������������������������������������������������������������������������������������������������ 291
Migrating to Cloud DNS���������������������������������������������������������������������������������������������������������������������� 297
DNS Security Extensions (DNSSEC)���������������������������������������������������������������������������������������������������� 299
Forwarding and DNS Server Policies�������������������������������������������������������������������������������������������������� 300
Integrating On-Premises DNS with Google Cloud������������������������������������������������������������������������������� 300
Split-Horizon DNS������������������������������������������������������������������������������������������������������������������������������� 303
DNS Peering��������������������������������������������������������������������������������������������������������������������������������������� 303
Private DNS Logging��������������������������������������������������������������������������������������������������������������������������� 307
xii
■ Table of Contents
■
■Chapter 7: Implementing Hybrid Connectivity��������������������������������������������������� 335
Configuring Cloud Interconnect������������������������������������������������������������������������������������ 335
Dedicated Interconnect Connections and VLAN Attachments������������������������������������������������������������� 336
Partner Interconnect Connections and VLAN Attachments����������������������������������������������������������������� 345
xiii
■ Table of Contents
■
■Chapter 8: Managing Network Operations�������������������������������������������������������� 373
Logging and Monitoring with Google Cloud’s Operations Suite����������������������������������� 373
Reviewing Logs for Networking Components (e.g., VPN, Cloud Router,
VPC Service Controls)������������������������������������������������������������������������������������������������������������������������� 374
Monitoring Networking Components (e.g., VPN, Cloud Interconnect Connections
and Interconnect Attachments, Cloud Router, Load Balancers, Google Cloud Armor,
Cloud NAT)������������������������������������������������������������������������������������������������������������������������������������������ 381
Index��������������������������������������������������������������������������������������������������������������������� 421
xiv
About the Author
This book is the result of the study, work, and research I accomplished over the past two years. I could not
have written this book without the help of family, friends, colleagues, and experts in the field of computer
networks and computer science.
When my friend, former colleague, and author Tom Nelson first introduced me to Apress in August 2021,
I had no idea I was about to embark on this wonderful journey.
First and foremost, I am grateful to my wife Margie, who carefully created a conducive space at home
so I could stay focused on this work and prepare quality content for this book (not an easy task with my two
young sons Joseph and Samuele eager to learn networks from their dad).
The team at Apress has been phenomenal for accommodating my schedule a few times and for
providing the necessary guidance in a timely manner. Thanks to Gryffin Winkler, Raymond Blum, Laura
Berendson, Joan Murray, and Jill Balzano. Without your prompt and careful assistance, this work would not
have been possible.
Every concept I explained in the book is the product of scientific curiosity, theory, practice, and
experience I acquired through my professional and academic career.
I was inspired by the idea of a Virtual Private Cloud (VPC) network intended as a logical routing
domain, as clearly described by Emanuele Mazza in the presentation he gave in the “VPC Deep Dive and
Best Practices” session at Google Cloud Next 2018. Not only did this concept consolidate my understanding
of VPCs—whose scope extends the boundaries of zones and regions—but it naturally helped build more
knowledge touching a significant number of exam objectives.
I am also grateful to Luca Prete for his article “GCP Routing Adventures (Vol. 1)” he posted on Medium,
which helped me explain in a simple yet comprehensive way the concept of BGP (Border Gateway Protocol)
routing mode, as it pertains to VPCs.
The section about VPC Service Controls implementation required extra work due to the sophistication
of this unique capability offered by GCP. The article “Google Cloud VPC-Service Controls: Lessons Learned”
posted on Medium by my friend Andrea Gandolfi was instrumental in helping me set the context and
document the key features of this product. Thanks Andrea for your great article!
A number of other friends and former colleagues helped me develop my knowledge on some of the
objectives of the exam. These include Daniel Schewe, Ali Ikram, Rajesh Ramamoorthy, Justin Quattlebaum,
Stewart Reed, Stephen Beasey, Chris Smith, Tim DelBosco, and Kapil Gupta. Thanks to all of you for your
constructive feedback and the methodical approach you shared during our problem solving discussions.
Last, I cannot express enough words of gratitude for the Late Prof. Giovanni Degli Antoni (Gianni), who
guided me through my academic career in the University of Milan, and my beloved parents Eugenia and
Giuseppe, who always supported me in my academic journey and in life.
Introduction
This book is about preparing you to pass the Google Cloud Professional Cloud Network Engineer
certification exam and—most importantly—to get you started for an exciting career as a Google Cloud
Platform (GCP) network engineer.
There are a number of professional cloud certifications covering a broad array of areas. These
certifications are offered by all three leading public cloud providers in the world, that is, Amazon Web
Services (AWS), Microsoft Azure, and Google Cloud Platform. These areas include cloud architecture,
cloud engineering and operations (also known as DevOps), data engineering, cloud security, and cloud
networking. Among all these areas, the network is the key element of the infrastructure your workloads
use to deliver business value to their users. Think about it. Without the network—whether it be physical
or virtual, covering a local or wide area (LAN and WAN, respectively)—there is no way two (or more)
computers can communicate with each other and exchange data. Back in the 1990s, the former Sun
Microsystems (later acquired by Oracle) introduced a slogan, “the Network is the Computer,” to emphasize
that computers should be networked or—to an extreme—they are not computers. This slogan was ahead
of its time and put an emphasis on the nature of distributed systems, where the parts of a system are not
concentrated into one single unit (computer), but they are spread across multiple units (computers). This
slogan originated when cloud computing didn’t exist. Yet, in my opinion, it is still real and is agnostic
to where your workloads operate, that is, in your company’s data centers, in GCP (or other clouds), or
both. The fundamental difference between computer networking in the data centers (also referred to as
traditional networking or on-premises networking) and in the cloud is that the cloud makes all things “more
distributed.” In fact, if you leverage the capabilities offered by the cloud, it’s easier to design and implement
recovery-oriented architectures for your workloads, which help you mitigate the risks of single point of
failures (SPFs) by enabling self-healing functionality and other fault tolerance techniques. The cloud—when
properly used—can address many other concerns that apply to software and hardware distributed systems.
Don’t worry! Throughout this book, I will teach you what “more distributed” means and how the users of
your workloads can benefit from it. This brings us to who this book is for.
This book is intended for a broad audience of cloud solution architects (in any of the three public cloud
providers), as well as site reliability, security, network, and software engineers with foundational knowledge
of Google Cloud and networking concepts. Basic knowledge of the OSI model, the RFC 1918 (private address
space) paper, the TCP/IP, the TLS (or SSL), and the HTTP protocols is a plus, although it is not required.
I used the official exam guide to organize the content and to present it in a meaningful way. As a result,
the majority of the chapters are structured to map one to one with each exam objective and to provide
detailed coverage of each topic, as defined by Google. The exposition of the content for most of the key
topics includes a theoretical part, which is focused on conceptual knowledge, and a practical part, which
is focused on the application of the acquired knowledge to solve common use cases, usually by leveraging
reference architectures and best practices. This approach will help you gradually set context, get you
familiarized with the topic, and lay the foundations for more advanced concepts.
Given the nature of the exam, whose main objective is to teach you how to design, engineer, and
architect efficient, secure, and cost-effective network solutions with GCP, I have developed a bias for
diagrams, infographic content, and other illustrative material to help you “connect the dots” and visually
build knowledge.
■ Introduction
Another important aspect of the exposition includes the use of the Google Cloud Command Line
Interface (gcloud CLI) as the main tool to solve the presented use cases. This choice is deliberate, and the
rationale about it is twofold. On the one side, the exam has a number of questions that require you to know
the gcloud CLI commands. On the other side, the alternatives to the gcloud CLI are the console and other
tools that enable Infrastructure as Code (IaC), for example, HashiCorp Terraform. The former leverages the
Google Cloud user interface and is subject to frequent changes without notice. The latter is a product that is
not in the scope of the exam.
A Google Cloud free account is recommended to make the best use of this book. This approach will
teach you how to use the gcloud CLI and will let you practice the concepts you learned. Chapter 1 will
cover this setup and will provide an overview of the exam, along with the registration process. If you want
to become an expert on shared Virtual Private Cloud (VPC) networks, I also recommend that you create a
Google Workspace account with your own domain. Although this is not free, the price is reasonable, and you
will have your own organization that you can use to create multiple GCP users and manage IAM (Identity
and Access Management) policies accordingly.
In Chapter 2, you will learn the important factors you need to consider to design the network
architecture for your workloads. The concept of a Virtual Private Cloud (VPC) network as a logical routing
domain will be first introduced, along with a few reference topologies. Other important GCP constructs
will be discussed, for example, projects, folders, organizations, billing accounts, Identity and Access
Management (IAM) allow policies, and others, to help you understand how to enable separation of duties—
also known as microsegmentation—effectively. Finally, an overview of hybrid and multi-cloud deployments
will be provided to get you familiarized with the GCP network connectivity products.
Chapter 3 is your VPC “playground.” In this chapter, you’ll use the gcloud CLI to perform a number
of operations on VPCs and their components. You will learn the construct of a subnetwork, intended as a
partition of a VPC, and you will create, update, delete, and peer VPCs. We will deep dive in the setup of a
shared VPC, which we’ll use as a reference for the upcoming sections and chapters. The concepts of Private
Google Access and Private Service Connect will be introduced and implemented. A detailed setup of a
Google Kubernetes Engine (GKE) cluster in our shared VPC will be implemented with examples of internode
connectivity. The fundamental concepts of routing and firewall rules will be discussed, with emphasis on
their applicability scope, which is the entire VPC.
Chapter 4 will be entirely focused on the implementation of VPC Service Controls. This is a topic I
have been particularly interested in covering as a separate chapter, because of its level of sophistication
and because the literature available is dispersed in multiple sources. The chapter provides two deep dive
examples of VPC Service Controls using a shared VPC, including their important dry-run mode feature.
Chapter 5 will cover all the load balancing services you need to know to pass the exam, beginning from
the nine different “flavors” of GCP load balancers. A number of deep dive examples on how to implement
global, external HTTP(S) load balancers with different backend types will be provided. You will become an
expert at choosing the right load balancer based on a set of business and technical requirements, which is
exactly what you are expected to know during the exam and at work.
Chapter 6 will cover advanced network services that provide additional security capabilities to your
workloads. These are Cloud DNS, Cloud NAT, and Packet Mirroring.
In Chapter 7, you will learn how to implement the GCP products that enable hybrid and multi-cloud
connectivity. These include the two “flavors” of Cloud Interconnect (Dedicated and Partner) and the two
flavors of Cloud VPN (HA and Classic).
The last chapter (Chapter 8) concludes our study by teaching you how to perform network operations
as a means to proactively support and optimize the network infrastructure you have designed, architected,
and implemented.
Each chapter (other than Chapter 1) includes at the end a few questions (and the correct answers) to
help you consolidate your knowledge of the covered exam objective.
As in any discipline, you will need to supplement what you learned with experience. The combination
of the two will make you a better GCP network engineer. I hope this book will help you achieve your Google
Cloud Professional Cloud Network Engineer certification and, most importantly, will equip you with the
tools and the knowledge you need to succeed at work.
xxii
CHAPTER 1
Exam Overview
You are starting your preparation for the Google Professional Cloud Network Engineer certification. This
certification validates your knowledge to implement and manage network architectures in Google Cloud.
In this chapter, we will set the direction on getting ready for the exam. We will outline resources that
will aid you in your learning strategy. We will explain how you can obtain access to a free tier Google Cloud
account, which will allow you to practice what you have learned. We will provide links to useful additional
study materials, and we will describe how to sign up for the exam.
Exam Content
The Google Cloud Professional Cloud Network Engineer certification is designed for individuals who would
like to validate their knowledge of network infrastructure and services in Google Cloud. You are expected to
have a thorough understanding of the following subject areas:
• Virtual Private Cloud (VPC) networks
• Routing
• Network services
• Hybrid and multi-cloud interconnectivity
• Network operations
The exam does not cover cloud service fundamentals, but some questions on the exam assume
knowledge of these concepts. Some of the broad knowledge areas that you are expected to be familiar
with are
• Compute infrastructure concepts, for example, virtual machines, containers,
container orchestration, serverless compute services
• Site Reliability Engineering (SRE) concepts
• Familiarity with Google Cloud Storage classes
• Familiarity with the TCP and HTTP(S) protocols
• Familiarity with security topics such as Identity and Access Management (IAM),
endpoint protection, and encryption in transit
• Basic knowledge of DevOps best practices
Google doesn’t provide their weighting ranges, nor does it tell you how you scored in each domain. The
outcome of the exam is pass/fail and is provided immediately upon submitting your exam.
You are expected to learn all topics according to the exam study guide that are included in this study
companion.
Exam Format
The exam consists of 50–60 questions with a time length of two hours, all of which are in one of the following
formats:
• Multiple choice: Select the most appropriate answer.
• Multiple select: Select all answers that apply. The question will tell you how many
answers are to be selected.
As long as you come well prepared to take the exam, you should be able to complete all questions
within the allotted time. Some questions on the exam may be unscored items to gather statistical
information. These items are not identified to you and do not affect your score.
The registration fee for the Google Professional Cloud Network Engineer certification exam is $200 (plus
tax where applicable).
For the latest information about the exam, navigate to the Google Cloud Certifications page at the
following URL: https://cloud.google.com/certification/cloud-network-engineer.
2
Chapter 1 ■ Exam Overview
Register for the Exam
To register for the exam, you need to create a Google Cloud Webassessor account (unless you already have
one) for exams in English by visiting https://webassessor.com/googlecloud.
Scroll down and click as indicated in Figure 1-1.
3
Chapter 1 ■ Exam Overview
Upon creating your Webassessor account, you are all set and you can schedule your exam.
Schedule the Exam
Visit https://webassessor.com/googlecloud and log in. Then click the “REGISTER FOR AN EXAM” button
as indicated in Figure 1-3.
4
Chapter 1 ■ Exam Overview
Scroll down until you find the Google Cloud Certified Professional Cloud Network Engineer (English)
exam in the list as indicated in Figure 1-4. You will see a “Buy Exam” blue button. In my case, since I am
already certified the button is unavailable. Click the “Buy Exam” button.
You will be asked whether you want to take the exam at a test center (Onsite Proctored) or online at
home (Remote Proctored). Select your preferred choice.
Regardless of where you will take the exam, you will need to present a government-issued identification (ID)
before you start your exam.
5
Chapter 1 ■ Exam Overview
If you will take your exam online at your home, you will also need a personal computer or a mac that
has a reliable webcam and Internet connection and a suitable, distraction-free room or space where you will
be taking your exam.
■■Tip If you take your exam online, make sure you use your own personal computer or mac to take the exam.
Do not attempt to take the exam using your company’s laptop or a computer in the office. This is because a
company-owned computer typically uses a VPN (virtual private network) client and software to provide an extra
layer of protection to prevent corporate data exfiltration. This software generates issues with the software you
need to download and install in order to take your exam.
Depending on your selection, the next screen asks you to select a test center location as indicated in
Figure 1-5.
6
Chapter 1 ■ Exam Overview
Upon selecting a test center, you will be prompted to choose the date and time of your exam, agree to
the Google Cloud’s certification terms and conditions, and acknowledge your selections as indicated in
Figure 1-6.
Finally, you will be directed to check out where you will pay your exam fee ($200 plus taxes).
Exam Results
You are expected to take the exam at the scheduled place and time. After the completion of the exam, you
will immediately receive a Pass/Fail result.
If you achieve a Pass result, your transcript will record the exam as Pass, and a few days later (it may
take a week or even longer), you will receive an email confirming the result, which includes a link to a
Google Cloud Perks website where you can select a gift.
If you fail, your transcript will record the exam as Fail, and you will also receive an email to confirm the
result. Don't give up if you don’t pass the exam on your first try. Review all the study materials again, taking
into consideration any weak areas that you have identified after reviewing your scoring feedback, and retake
the exam.
Retake Policy
If you don't pass an exam, you can take it again after 14 days. If you don't pass the second time, you must
wait 60 days before you can take it a third time. If you don't pass the third time, you must wait 365 days
before taking it again.
7
Chapter 1 ■ Exam Overview
All attempts, regardless of exam language and delivery method (onsite or online testing), count toward
the total permissible attempts, and the waiting period between attempts still applies. Circumventing this
retake policy by registering under a different name or any other means is a violation of the Google Cloud
Exam Terms and Conditions and will result in a denied or revoked certification.
Summary
In this chapter, we covered all the areas that will help prepare you for the Google Professional Cloud Network
Engineer certification exam. We provided an overview of the exam content and the type of questions you will
find on the exam. We explained how to access free training resources from Google Cloud and how to sign
up for a free tier Google Cloud account. The free tier account will allow you to gain hands-on experience
working with Google Cloud.
The next chapter will provide an introduction to the Google Cloud networking capabilities and services
to get you started on your Google Cloud Network Engineer learning journey.
8
CHAPTER 2
Like in a data center, computers are physically interconnected with cables, and data are transmitted across
the network(s) using bridges, repeaters, switches, and routers; in the cloud, the compute elements of
your workload—for example, Compute Engine instances (commonly known as virtual machines—VMs),
Google Kubernetes Engine (GKE) nodes, Cloud Run container instances, App Engine instances—are also
interconnected, but using a different technology, that is, software-defined networking.
Software-defined networking (SDN) is an approach to networking that uses software-based controllers
or application programming interfaces (APIs) to communicate with underlying hardware infrastructure and
direct traffic on a network.
The key difference between SDN and traditional networking is infrastructure: SDN is software-
based as suggested by its name, while traditional networking is hardware-based. Because the control
plane is software-based, SDN is much more flexible than traditional networking. As a result, SDN allows
administrators to control the network, change configuration settings, provision resources, and increase
network capacity—all from a centralized user interface, without the need for more hardware.
In this chapter, you will be introduced to the building blocks of a network in Google Cloud.
We will start by addressing the overall network architecture, which is intended to help you answer your
“why” questions. Every aspect of the Google Cloud network(s) that you will architect, design, build, and
maintain must tie to one (or more) of the areas covered in the overall network architecture. These areas
include high availability, fault tolerance, security, performance, cost, and others.
Then, you will learn how to design a Google Cloud network and how to pick and choose each
component based on your workloads’ requirements and the preceding areas.
Next, you will learn how to apply these concepts to the design of hybrid and multi-cloud networks,
which are prevalent nowadays. The Google Cloud network connectivity products will be introduced, and
their applicability and scope will be explained with a number of reference architectures.
Finally, container networking will be presented, and you will see how Google Cloud provides native
networking capabilities that address most of the container networking concerns.
VPC networks themselves do not define IP address ranges. Instead, each VPC network is comprised of
one or more partitions called subnets. Each subnet in turn defines one or more IP address ranges. Subnets
are regional resources; each subnet is explicitly associated with a single region.
All compute resources of your workload rely on a VPC network’s routing capabilities for
communication. The VPC connects—by default—the resources to each other. Additionally, the VPC can
be connected to other VPCs in GCP, on-premises networks, or the Internet. However, this external form of
communication does not happen by default. You have to explicitly configure it.
The design of your overall network architecture is the result of your workload business and technical
requirements:
• Do you need zonal, regional, or global (multi-regional) redundancy for the resources
(e.g., compute, storage, etc.) in your workload?
• Are high performance and low latency must-have characteristics of your workload?
• Does your workload use sensitive data that must be protected in transit, in use, and
at rest?
10
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
Disaster Recovery
Disaster recovery (DR) is a key element of a business continuity plan, which involves a set of policies and
procedures to enable the recovery of vital technology infrastructure and systems following a natural or a
human-induced disaster.
High Availability
High availability (HA) is a characteristic of a system which aims to ensure an agreed level of operational
performance, usually uptime (i.e., availability), for a higher than normal period.
11
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
12
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
13
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
You may be wondering how do you select which GCP services you need to use in order to meet your
workload SLOs? To answer this question, you need to understand how the availability SLI is related to zones
and regions. Google Cloud generally designs products to deliver the levels of availability for zones and
regions (Table 2-1).
Table 2-1. Availability design goals for zonal and regional GCP services
GCP Service Examples Availability Implied Downtime
Locality Design Goal
Zonal Compute Engine, Persistent Disk 99.9% 8.75 hours/year
Regional Regional Cloud Storage, Replicated Persistent 99.99% 52 minutes/year
Disk, Regional Google Kubernetes Engine
As a result, the selection of the GCP services for your workload is based upon the comparison
of the GCP availability design goals against your acceptable level of downtime, as formalized by your
workload SLOs.
For example, if your workload has an availability SLO greater than 99.99%, you’ll probably want to
exclude zonal GCP services because zonal GCP services are guaranteed an availability of 99.9% only, as
indicated in Table 1-1.
14
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
While best practices on how to architect a workload for resilience, high availability, and performance on
GCP are out of the scope of this book, it is important that you understand how your workload SLOs—RPO,
RTO for DR, and availability for HA—drive the selection and the composition of GCP services in order to
meet your workload business and technical requirements.
Figure 2-4 illustrates how GCP compute, storage, and network services are broken down by locality
(zonal, regional, multi-regional).
15
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
16
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
17
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
18
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
These are
1.
Access from the Internet or from unauthorized VPC networks
2.
Access from authorized VPC networks to GCP resources (e.g., VMs) in authorized
projects
3.
Copy data with GCP service to service calls, for example, from bucket1 to bucket2
Load Balancing
Just like a VPC is a software-defined network, which provides routing among its components, a cloud load
balancer is a software-defined, fully distributed managed service. Since it is not hardware-based, you don't
have to worry about managing a physical load balancing infrastructure.
A load balancer distributes inbound traffic across multiple compute elements of your workload—
that is, backends. By spreading the load, a load balancer reduces the risk that your workload experiences
performance issues.
Figure 2-8 illustrates an example of a logical architecture of a load balancer, where the specific backends
are VMs (Compute Engine instances), Google Kubernetes Engine (GKE) clusters, and Google Cloud Storage
buckets. Notice how the interaction from the load balancer frontend and its backends is mediated by a
backend service component, whose intent is to decouple the load and direct it to a given backend.
19
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
In designing an overall network architecture, another important decision you need to make is whether
cost or performance—you can’t have both; otherwise, you need to revise your requirements with your
stakeholders—are key factors as directed by your workload requirements.
Load balancing is part of a group of GCP network services. The service itself comes in different flavors
depending on how important cost or performance is relevant to your workload.
In this section, we will describe how cost and performance impact the selection of the load balancer
that best suits your workload requirements. Chapter 5 will provide all the necessary details you need to know
to implement a performant, resilient, modern, cost-effective, and secure load balancing solution for your
workload.
Unlike other public cloud providers, GCP offers tiered network services.
As illustrated in Figure 2-9, the premium tier leverages Google's highly performant, highly optimized,
global backbone network to carry traffic between your users and the GCP region where the frontend services
of your workload’s load balancer are located—in our example us-central1. The public Internet is only used to
carry traffic between the user’s Internet Service Provider (ISP) and the closest Google network ingress point,
that is, Google Edge Point of Presence (PoP).
20
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
Conversely, the standard tier leverages the public Internet to carry traffic between your workload GCP
services and your users. While the use of the Internet provides the benefit of a lower cost when compared
to using the Google global backbone (premium tier), choosing the standard tier results in lower network
performance and availability similar to other public cloud providers.
Think of choosing between premium and standard tiers as booking a rail ticket. If cost is your main
concern and travel time is not as important, you should consider traveling with a “standard” train that
meets your budget. Otherwise, if travel time is critical and you are willing to pay extra, you should consider
traveling with a high-speed train, which leverages state-of-the-art high-speed rail infrastructure, that is,
premium tier.
21
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
As explained, the rationale about choosing premium tier instead of standard tier for your load balancer
(and other network services) is mainly driven by performance and cost.
If performance and reliability are your key drivers, then you should opt for a load balancer that utilizes
the premium tier.
If cost is your main driver, then you should opt for a load balancer that utilizes the standard tier.
The decision tree in Figure 2-10 will help you select the network tier that best suits the load balancing
requirements for your workload.
22
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
23
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
Container Networking
Containers have been around for a while. As a developer, containers give you the benefit of portability by
allowing you to package your workload and its dependencies as a small unit (in the order of MB) or container.
Google Cloud allows you to deploy your containers using one of these three container services
(Figure 2-11).
You choose your container service that best fits your workload business and technical requirements
based on how much infrastructure you are willing to manage. The goal is to reduce time-to-market, with fast
and agile delivery of products and services for your workload, then let GCP manage and scale the containers
for you using Cloud Run, which is a managed serverless compute platform for containers.
Conversely, if your business and technical requirements compel you to exercise more control on
your container infrastructure, then Google Kubernetes Engine (GKE, i.e., managed Kubernetes service) or
Google Compute Engine (you manage all the container infrastructure and the workloads running in your
containers) are options to consider.
GKE sits in the middle between Cloud Run and Google Compute Engine. As a result, it provides the
right mix of manageability and network controls you need in order to properly design your workload
network architecture.
In this section, we will focus on GKE, and we will address some of the architectural choices you will
need to make when designing an overall network architecture for your workload.
24
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
■■Note To learn more about HPA, use the official Kubernetes reference: https://kubernetes.io/docs/
tasks/run-application/horizontal-pod-autoscale/.
25
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
As per the preceding Kubernetes documentation, an HPA automatically scales the number of pods in
a replication controller, deployment, replica set, or stateful set based on observed CPU utilization (or, with
custom metrics support, on some other application-provided metrics). This shifts your focus on properly
planning for expansion (or reduction) of your pods, thereby addressing the problem of exhausting the IP
address originally allocated to your pods.
One of the best practices to properly plan IP allocation for your pods, services, and nodes is to use VPC-
native clusters, which as of the writing of this book are enabled by default.
26
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
■■Note To learn more about ingress, use the official Kubernetes reference: https://kubernetes.io/
docs/concepts/services-networking/ingress/.
Iptable rules programmed on each node routed requests to the pods. This approach was effective for
small clusters serving a limited number of incoming requests, but it turned out to be inefficient for larger
clusters due to suboptimal data paths with unnecessary hops between nodes.
As new nodes (VMs) were added in response to increasing incoming requests, the load balancer
attempted to equally distribute the load across the nodes. However, if connections were heavily reused (e.g.,
HTTP/2), then requests were served by older nodes. As a result, a number of pods remained underutilized,
while others became overutilized. This imbalance scenario was caused by a mismatch between connections
and requests, as illustrated in Figure 2-13.
27
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
28
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
29
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
Table 2-4 shows how a Google Cloud VPC is different from other public cloud providers’ VPCs.
VPC Specifications
A VPC is intended to be a logical routing domain, which allows implicit connectivity among any compute
resource hosted in one of its partitions or subnets.
A compute resource can be a VM (also known as a Compute Engine instance), a pod in a GKE cluster,
an App Engine flexible environment instance, or any other GCP product built using VMs. From now on, we
will use the terms “VM” and “Compute Engine instance” or simply “instance” interchangeably.
VPCs are designed in accordance with the following specifications:
1.
Global: VPCs are global resources, that is, they can span across multiple regions.
VPCs are composed of a number of IP range partitions—denoted by CIDR
blocks—which are known as subnets. The acronym “CIDR” stands for Classless
Inter-domain Routing. More information on CIDR block notation is provided in
the following section.
2.
Regional subnets: Subnets are regional resources, that is, a subnet is limited to
one region.
3.
Firewall rules: Traffic to and from compute resources (e.g., VMs, pods) is
controlled by firewall rules, which are defined on a per-VPC basis.
4.
Implicit connectivity: Compute resources hosted in a (subnet of a) VPC are
allowed to communicate with each other by default, unless otherwise specified
by a “deny” firewall rule. Implicit connectivity among compute resources in a
VPC occurs using RFC 1918 IPv4 addresses. More information on RFC 1918 is
provided in the following section.
5.
Implicit encryption in transit: Not only does a VPC network provide implicit
connectivity between compute resources (e.g., VMs) hosted in its subnets, but
this connectivity is automatically encrypted for you. Put differently, VM-to-VM
connections within VPC networks and peered VPC networks are automatically
authenticated and encrypted at layer 3 (network layer of the OSI model). You
don’t have to check a box to enable encryption in transit between VMs.
6.
Private Google API access: Compute resources in a VPC can consume Google
APIs without requiring public IPs, that is, without using the Internet.
7.
IAM: Like any other GCP resources, VPCs can be administered using Identity and
Access Management (IAM) roles.
30
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
8.
Shared VPC: An organization can use a Shared VPC to keep a VPC network in a
common host project. Authorized IAM identities from other projects in the same
organization can create resources that use subnets of the Shared VPC network.
9.
VPC peering: VPCs can be securely connected to other VPCs in different projects
or different organizations using VPC peering. Traffic is always encrypted and
stays in the Google global backbone without traversing the Internet.
10.
Secure hybrid connectivity: VPCs can be securely connected to your on-
premises data center (hybrid connectivity) or other clouds (multi-cloud
connectivity) using Cloud VPN or Cloud Interconnect. More details will be
provided in Chapter 7.
11.
No multicast support: VPCs do not support IP multicast addresses within the
network. In other words, in a VPC packets can only be sent from one sender to
one receiver, each identified by their IP address.
Subnets
When you create a VPC, you don’t need to specify an IP range just for the VPC. Instead, you use subnets to
tell which partition of your VPC is associated to which IP range. As per the VPC specifications, remember
that subnets are regional resources, while VPCs are global.
■■Note From now on, we will be using the terms “IP range” and “CIDR block” interchangeably. For more
information about CIDR block notation, refer to https://tools.ietf.org/pdf/rfc4632.pdf.
Each subnet must have a primary IP range and, optionally, one or more secondary IP ranges for alias
IP. The per-network limits describe the maximum number of secondary ranges that you can define for each
subnet. Primary and secondary IP ranges must be RFC 1918 addresses.
■■Note RFC 1918 is the official Address Allocation for Private Internets standard. In the next sections, when
we discuss connectivity with respect to IP addressing, we will be using the terms “private” and “internal”
interchangeably to signify RFC 1918 connectivity. While you don’t need to read the whole document, section 3 is
recommended (https://datatracker.ietf.org/doc/html/rfc1918#section-3).
For your convenience and to help you acquire more familiarity with the CIDR block IPv4 notation, a
table that maps block size (number of IP addresses) to the number of blocks is shown in Figure 2-16.
31
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
32
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
External IP Addresses
You can assign an external IP address to a VM or a forwarding rule if the VM needs to communicate with
• The Internet
• Resources in another network (VPC)
• Services other than Compute Engine (e.g., APIs hosted in other clouds)
Resources from outside a VPC network can address a specific resource in your VPC by using the
resource’s external IP address. The interaction is allowed as long as firewall rules enable the connection. We
will provide detailed information of firewall rules in the upcoming sections. For the time being, think of a
firewall rule as a means to inspect traffic and determine whether traffic is allowed or denied based on origin,
destination, ports, and protocols.
In other words, only resources with an external IP address can send and receive traffic directly to and
from outside your VPC.
Compute Engine supports two types of external IP addresses:
• Static external IP addresses: For VMs, static external IP addresses remain attached
to stopped instances until they are removed. Static external IP addresses can be
either regional or global:
• Regional static external IP addresses allow resources of that region or resources
of zones within that region to use the IP address.
• Global static external IP addresses are available only to global forwarding rules,
used for global load balancing. You can't assign a global IP address to a regional
or zonal resource.
• Ephemeral external IP addresses: Ephemeral external IP addresses are available
to VMs and forwarding rules. Ephemeral external IP addresses remain attached to a
VM instance only until the VM is stopped and restarted or the instance is terminated.
If an instance is stopped, any ephemeral external IP addresses that are assigned to
the instance are released back into the general Compute Engine pool and become
available for use by other projects. When a stopped instance is started again, a new
ephemeral external IP address is assigned to the instance.
Internal IP Addresses
Just like for external IP addresses, Compute Engine supports static and ephemeral internal IP addresses:
• Static internal IP addresses: Static internal IP addresses are assigned to a project
long term until they are explicitly released from that assignment and remain
attached to a resource until they are explicitly detached from the resource. For VM
instances, static internal IP addresses remain attached to stopped instances until
they are removed.
• Ephemeral internal IP addresses: Ephemeral internal IP addresses are available
to VM instances and forwarding rules. Ephemeral internal IP addresses remain
attached to VM instances and forwarding rules until the instance or forwarding
rule is deleted. You can assign an ephemeral internal IP address when you create a
resource by omitting an IP address specification in your request and letting Compute
Engine randomly assign an address.
33
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
The infographic in Figure 2-17 illustrates the external and internal IP address concepts for two VPCs in
the same project.
Figure 2-17. Two VPCs in the same project connected to the Internet
34
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
Standalone
Start with a single VPC network for resources that have common requirements.
For many simple use cases, a single VPC network provides the features that you need while being
easier to create, maintain, and understand than the more complex alternatives. By grouping resources
with common requirements and characteristics into a single VPC network, you begin to establish the VPC
network border as the perimeter for potential issues.
For an example of this configuration, see the single project, single VPC network reference architecture:
https://cloud.google.com/architecture/best-practices-vpc-design#single-project-single-vpc.
Factors that might lead you to create additional VPC networks include scale, network security, financial
considerations, compliance, operational requirements, and Identity and Access Management (IAM).
Shared
The Shared VPC model allows you to export subnets from a VPC network in a host project to other service
projects in the same organization or tenant.
■■Note The term “organization” dates a while back when the Lightweight Directory Access Protocol (LDAP)
was invented. In this context, we use this term to denote the root node of the Google Cloud resource
hierarchy. All resources that belong to an organization are grouped under its root node. We will use the terms
“organization” and “tenant” interchangeably. For more information, see https://cloud.google.com/
resource-manager/docs/cloud-platform-resource-hierarchy.
If your use case requires cross-organizational private connectivity, then VPC network peering is an
attractive option to consider. In this section, we will focus on Shared VPC connectivity, which by definition
pertains to connectivity in the context of a single organization.
With Shared VPC, VMs in the service projects can connect to the shared subnets of the host project.
Use the Shared VPC model to centralize the network administration presented when multiple teams
work together, for example, the developers for an N-tier application.
For organizations with a large number of teams, Shared VPC provides an effective means to extend the
architectural simplicity of a single VPC network across multiple teams.
In this scenario, network policy and control for all networking resources are centralized and easier to
manage. Service project departments can configure and manage their compute resources, enabling a clear
separation of concerns for different teams in the organization.
Resources in service projects can communicate with each other more securely and efficiently across
project boundaries using internal (RFC 1918) IP addresses. You can manage shared network resources—
such as subnets, routes, and firewalls—from a central host project, so you can enforce consistent network
policies across the projects.
35
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
36
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
37
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
38
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
Another key factor that makes your life easier in designing VPC instances is the fact that Google Cloud
VPCs are global resources, which by definition span multiple regions without using the Internet and—most
importantly—without you having to worry about connecting them somehow (e.g., using IPsec VPN tunnels,
Interconnect, or other means). This is a remarkable advantage, which will simplify your VPC network design
and will reduce implementation costs.
Regarding operational costs, you need to consider that—unlike ingress—egress traffic is charged, and
the amount of these charges depends on a number of factors including
• Whether traffic uses an internal or an external IP address
• Whether traffic crosses zones or regions
• Whether traffic stays within the Google Global Backbone or it uses the Internet
■■Note For more information about VPC pricing, refer to the official Google Cloud page: https://cloud.
google.com/vpc/pricing.
Examples
Figure 2-21 illustrates how SaaS and native tenant isolation work.
39
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
We have two consumers, each hosted in their own organization (tenant), which require the same group
of SaaS services. Since the consumers have no knowledge of each other—yet they need common, cross-
cutting SaaS services offered by the “blue” tenant (Org1)—in this scenario their VPCs have a subnet with an
overlapping private CIDR block, that is, 10.130.1/24.
40
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
Note that this wouldn’t have happened had the two consumers been part of the same tenant (or
organization).
SaaS services are deployed in the producer VPCs, which are hosted in the “blue” organization (Org1).
Since the two consumers have an overlapping CIDR block, the producer project requires two VPCs. The
same VPC cannot be peered with multiple consumers that use overlapping IP addresses.
The consumer in the “green” tenant (Org3) can consume SaaS services deployed in a subnet in VPC2.
Likewise, the consumer in the “red” tenant (Org2) can consume SaaS services deployed in a subnet
in VPC1.
As a result, the consumer in the “green” tenant cannot access resources hosted in the “red” tenant.
Conversely, the consumer in the “red” tenant cannot access resources hosted in the “green” tenant. This
demonstrates the concept of native tenant isolation.
The infographic in Figure 2-22 shows a holistic view of all the constructs we reviewed so far, including IP
addresses, subnets, VPCs, routes, and peered VPCs.
The example in Figure 2-23 illustrates how VPC network peering can be used in conjunction with
shared VPC. Two isolated environments (i.e., production and non-production) require common shared
services, which are deployed in a producer VPC (VPC3 in Project3) and are peered with the production VPC
and the non-production VPC (consumer VPCs). The consumer VPCs are two shared VPCs, which expose
their subnets to a number of service projects (for the sake of simplicity, only one service project has been
displayed, but the number can—and should—be greater than one).
41
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
Figure 2-23. Multiple shared, peered VPCs with multiple service projects
This reference architecture provides the dual benefit of promoting reuse of common capabilities
(e.g., common, cross-cutting, SaaS services) through VPC network peering and separating concerns through
shared VPCs.
Firewalls
Similarly to your data center's DMZ (demilitarized zone), each VPC network has a built-in firewall that
blocks all incoming (ingress) traffic from outside of the VPC to its VMs and compute resources (e.g., GKE
pods, etc.). You can configure firewall rules to override this default behavior, which is designed to protect
your VPC from untrusted clients in the outside world, that is, the Internet, other VPCs, or other sources
located in other clouds or on-premises.
Unlike traditional DMZs—as shown in Figure 2-24—where a cluster of firewalls separates each trusted
zone, GCP firewalls are globally distributed to allow for resilience and scalability. As a result, there are no
choke points and no single point of failures. GCP firewalls are characterized by the following:
• VPC scope: By default, firewall rules are applied to the whole VPC network.
• Network tag scope: Filtering can be accomplished by applying firewall rules to a set
of VMs by tagging the VMs with a network tag.
42
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
• Service account scope: Firewall rules can also be applied to a set of VMs, which are
associated to a specific service account. You will need to indicate whether or not the
service account and the VMs are billed in the same project, and choose the service
account name in the source/target service account field.
• Internal traffic: You can also use firewall rules to control internal traffic between
VMs by defining a set of permitted source machines in the rule.
Firewall rules are a flexible means to enforce network perimeter control by using ingress/egress
directions, allow/deny actions with priorities (0 highest priority, 65535 lowest priority), and an expressive
way to denote your sources and targets in the form of CIDR blocks, network tags, or service accounts.
■■Exam tip You cannot delete the implied rules, but you can override them with your own rules.
Google Cloud always blocks some traffic, regardless of firewall rules; for more information, see blocked traffic:
https://cloud.google.com/vpc/docs/firewalls#blockedtraffic.
43
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
To monitor which firewall rule allowed or denied a particular connection, see Firewall Rules Logging:
https://cloud.google.com/vpc/docs/firewall-rules-logging.
Example
The example in Figure 2-25 illustrates how two firewall rules can be used to
1.
Allow a Global HTTPS load balancer Front End (GFE) to access its
backend, represented by a management instance group with VMs denoted by the
web-server network tag
2.
Deny the Global HTTPS load balancer Front End (GFE) direct access to the VMs
in a database server farm, denoted by the db-server network tag
Notice the separation of concerns achieved by using a shared VPC model, where the backend instances
are billed in their own service project ServiceProject1, which is connected to the shared VPC SharedVPC1
hosted in HostProject.
Likewise, the database instances are billed in their own service project ServiceProject2, which is also
connected to the shared VPC.
44
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
The host project handles the network administration of the subnets exposed to the two service projects.
This includes the network security aspects, which are described by the two aforementioned firewall rules.
The firewall table in the figure is deliberately extended to the width of the shared VPC to emphasize its global
distribution scope, which applies to the entire VPC network.
More details about firewall configurations and how to effectively use them to secure the perimeter of
your VPC will be provided in the next chapters.
Custom Routes
One common problem you need to address when designing your VPC network instances is how to link them
together efficiently and securely.
In your data center, network interconnectivity requires switches, hubs, bridges, and routers, to
name a few. All these hardware components need to be multiplied as the number of networks grows in
response to increasing traffic. As the networks grow, the associated capital and operational costs may grow
disproportionately.
Google Cloud Platform leverages software-defined routing to address these problems. By design, every
VPC network uses a scalable, distributed, virtual routing mechanism, which is defined as a routing table and
operates at the VPC network level, just like firewall rules.
Moreover, each VM has a controller, which is a built-in component aware of all the routes from the VPC
routing table. Each packet leaving a VM is delivered to the appropriate next hop of an applicable route based
on a routing order. When you add or delete a route, the set of changes is propagated to the VM controllers by
using an eventually consistent design.
VPCs come with system-generated routes, which are automatically created for you by GCP and provide
internal routing connectivity among its subnets using the highly performant Google Global Backbone
network and egress routing connectivity to the Internet via a default Internet gateway.
■■Note You still need to create an external IP address if your VMs need access to the Internet. Also, consider
extra cost for traffic egressing your VMs.
Additionally, when you establish a peering connection between two VPCs, peer routes are automatically
created to allow the two VPCs to communicate privately (RFC 1918).
Finally, for any other scenarios where system-generated or peer routes are not suitable to meet your
workload requirements—for example, hybrid or multi-cloud workloads—GCP allows you to create your own
custom routes. These can be of type static or dynamic. The former type (static route) supports a predefined
number of destinations and is best suited for simple network topologies that don’t change very often. The latter
type (dynamic route) leverages a new resource, that is, Cloud Router, which is intended to add and remove
routes automatically in response to changes. Cloud Router leverages the Border Gateway Protocol (BGP) to
exchange routes with a peer of a BGP session. More information will be provided in the next chapter.
Figure 2-26 summarizes the different types of routes.
45
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
46
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
Notice how connectivity is allowed between public cloud providers in a multi-cloud topology. An
example is Cloud VPN, which can be configured to establish IPsec tunnels between GCP and Amazon Web
Services. You will learn more about Cloud VPN in the upcoming sections.
Business Requirements
Common requirements and drivers from the business side include
• Reducing capex or general IT spending
• Increasing flexibility and agility to respond better to changing market demands
• Building out capabilities, such as advanced analytics services, that might be difficult
to implement in existing environments
• Improving quality and availability of service
• Improving transparency regarding costs and resource consumption
• Complying with laws and regulations about data sovereignty
• Avoiding or reducing vendor lock-in
47
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
Development Requirements
From a development standpoint, common requirements and drivers include
• Automating and accelerating workload rollouts to achieve faster time-to-market and
shorter cycle times
• Leveraging APIs and services to increase development velocity
• Accelerating the provisioning of compute and storage resources
Operational Requirements
Common requirements and drivers to consider from the operations side include
• Ensuring consistent authentication, authorization, auditing, and policies across
computing environments
• Using consistent tooling and processes to limit complexity
• Providing visibility across environments
Architectural Requirements
On the architecture side, the biggest constraints often stem from existing systems and can include
• Dependencies between workloads
• Performance and latency requirements for communication between systems
• Dependencies on hardware or operating systems that might not be available in the
public cloud
• Licensing restrictions
Overall Goals
The goal of a hybrid and multi-cloud strategy is to meet these requirements with a plan that describes
• Which workload should be run in or migrated to each computing environment
• Which pattern to apply across multiple workloads
• Which technology and network topology to use
Any hybrid and multi-cloud strategy is derived from the business requirements. However, how you
derive a usable strategy from the business requirements is rarely clear. The workloads, architecture patterns,
and technologies you choose not only depend on the business requirements but also influence each other in
a cyclic fashion. The diagram represented in Figure 2-28 illustrates this cycle.
48
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
49
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
So, you’ve done your research using the rationalization process we just described and determined
that some of your workloads—for the time being—need to operate in a hybrid (or multi-cloud) network.
Now what?
Assuming you already have created your organization in GCP, the next step is to implement your hybrid
(or multi-cloud) network topology, and to do that, you need to decide how your company’s on-premises data
center(s) will connect to GCP.
GCP offers a number of options, and the choice you need to make depends on
• Latency: For example, do you need high availability and low latency (e.g., < 2
milliseconds) for your workload?
• Cost: For example, is cost a priority, or are you willing to pay more for lower latency,
stronger security, and better resilience?
50
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
51
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
A key factor in this use case is the fact that this application is internal, that is, all users will access the
application from the company’s network (e.g., RFC 1918 IP address space) or its extension to GCP. No access
from the Internet or other external networks is allowed.
Another key factor is the fact that you need fast connectivity between your company’s data center
and GCP to effectively leverage the HSM in order to authenticate users and authorize their access to the
requested resource.
With Cloud Interconnect, traffic between your on-premises computing environment and GCP does
not traverse the Internet. Instead, a physical link connects your company’s data centers to a Google Point of
Presence (PoP), which is your entry point to the closest region and the Google Global Backbone network.
This link can be directly installed or facilitated by a service provider.
In the former scenario—Dedicated Interconnect, solution 1—customer A’s network must physically
meet the Google Global Backbone network entry point in a supported colocation facility, also known as a
Cloud Interconnect PoP. This means customer A’s peering router will be installed in the Cloud Interconnect
PoP in order to establish the physical link, which comes in two flavors, that is, 10 Gbps or 100 Gbps pipes.
In the latter scenario—Partner Interconnect, solution 2—customer B uses a service provider to connect
to the Google Global Backbone network. This may be for a number of reasons, for example, cost, that is,
customer B is not willing to lease space in the colocation facility and maintain networking equipment.
Instead, customer B would rather use a managed service provided by a partner.
■■Note Cloud Interconnect does not encrypt traffic. To help secure communication between workloads,
consider using Transport Layer Security (TLS).
■■Note On the Internet, every network is assigned an autonomous system number (ASN) that encompasses
the network's internal network infrastructure and routes. Google's primary ASN is 15169.
52
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
There are two ways a network can connect to Google without using RFC 1918 IP addresses:
1.
IP transit: Buy services made available by an Internet Service Provider (ISP),
which is connected to Google (ASN 15169).
2.
Peering: Connect directly to Google (ASN 15169) in one of the Google Edge PoPs
around the world.
As you can imagine, peering offers lower latency because you avoid the ISP mediation. Therefore, if
your workload requires high-throughput, low-latency, non-RFC 1918 connectivity, then peering is your
best choice.
Just like Interconnect, peering comes in two flavors, direct and carrier.
Direct peering is best suited when your company already has a footprint in one of Google’s PoPs, for
example, it already uses a Dedicated Interconnect circuit.
Carrier peering is best suited when your company chooses to let a carrier manage a peering connection
between its network and GCP.
In our reference model, solutions 3 and 4 show how customers C and D connect to GCP using direct
and carrier peering, respectively.
IPsec VPN
Both Interconnect and peering leverage connectivity between your company’s data centers and a GCP PoP
without using the Internet. This is a great benefit in terms of performance and security. By using a dedicated
communication channel, whether directly or using a carrier, you avoid extra hops while minimizing the
chances of data exfiltration.
However, this comes at the expense of high costs. As of the writing of this book, a 10 Gbps circuit price is
$2.328 per hour using Dedicated Interconnect. A 100 Gbps circuit price is $18.05 per hour. Additionally, you
need to account for costs related to VLAN (Virtual Local Area Network) attachments (which is where your
traffic exchange link is established), as well as egress costs from your VPCs to your company’s data centers
on-premises.
If you are looking for a more cost-effective solution for your private-to-private workloads and are willing
to sacrifice performance and security for cost, then a virtual private network (VPN) IPsec tunnel is the
way to go.
GCP offers a service called Cloud VPN, which provides just that. With Cloud VPN, your on-premises
resources can connect to the resources hosted in your VPC using a tunnel that traverses the Internet.
RFC 1918 traffic is encrypted and routed to its RFC 1918 destination via a router on-premises and a
router on GCP.
If you need resilience in addition to cost-effectiveness, GCP offers a highly available VPN solution,
called HA VPN, which comes with 99.99% uptime.
A complete setup of an HA VPN will be covered in Chapter 7.
53
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
■■Note What does “cloud-native” really mean? There is not a definitive answer, but a general consensus
instead. If you want to learn my understanding of “cloud-native,” see www.linkedin.com/pulse/cloud-
native-api-driven-approach-coding-dna-dario-cabianca/.
Nevertheless, GCP provides a number of hybrid and multi-cloud network connectivity products, and
your job as a GCP network engineer is to help your company’s network engineering team pick and choose
which product best suits your company’s short- and long-term visions.
The decision tree in Figure 2-31 can help you determine the first selection of GCP network connectivity
products that fit your workload network requirements.
Depending on the specifics of your workload’s network requirements, you might end up using a
combination of products.
54
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
Cloud Router
Cloud Router was briefly presented in the “Custom Routes” section, when the definition of dynamic route
was introduced. In this and the upcoming sections, we will learn more about this product and why it
represents an integral component of your multi-cloud and hybrid strategy.
First and foremost, let’s summarize a few important concepts we already know about Virtual Private
Cloud (VPC) networks and how a Google Cloud VPC is different from other public cloud providers’ VPC
networks.
VPC Routing
A VPC is a logical routing domain, because
1.
By design, it provides internal routing connectivity between all its subnets in
any region.
2.
Internal connectivity means RFC 1918 IP addressing, that is, traffic stays within
the Google Global Backbone and does not traverse the Internet.
3.
A VPC lives within one (and one only) project, which represents a container
you use for billing, securing, and grouping the Google Cloud resources for your
workload.
55
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
4.
A project may have more than one VPC. In this case, the VPCs within your project
are completely disjoint from each other, and their subnets might even have
overlapping CIDR ranges, as shown in Figure 2-33. Still, they wouldn’t be able
to connect to each other—unless you choose to do so (e.g., using an IPsec VPN
tunnel and a NAT to disambiguate the overlapping CIDR ranges).
Figure 2-33. Two VPCs in the same project with overlapping CIDR 10.140.0/24
■■Note An autonomous system (AS) is defined as a connected group of one or more IP prefixes run by one
or more network operators, which has a single and clearly defined routing policy (https://datatracker.
ietf.org/doc/html/rfc1930#section-3).
The goal of a Cloud Router is twofold. A Cloud Router exports internal routes outside its
VPC. Conversely, a Cloud Router imports external routes inside its VPC, based on information exchanged
with its BGP peer. The BGP peer can be located in Google Cloud, on-premises, or in another cloud.
An important Cloud Router’s feature is that its behavior is controlled by a property of the VPC where it
lives, that is, the dynamic routing mode property. As we will learn later, if the VPC operates in global dynamic
56
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
routing mode, then all its Cloud Routers advertise to their BGP peers subnet routes from subnets in all
regions spanned by the Cloud Routers’ VPC.
In contrast, Cloud Routers in a VPC configured with regional dynamic route mode are limited to
advertising only subnet routes from subnets in the same region of the Cloud Routers.
Figure 2-34 demonstrates this concept by showing how CloudRouter1 in GCP’s Region1 advertises
routes from Region1 and Region2 to its on-premises PeerRouter1 counterpart and how it imports routes
from PeerRouter1 related to the two on-premises data centers.
Figure 2-34. Cloud Router in a VPC configured with global dynamic routing
57
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
■■Note The term “topology” derives from the Greek words topos and logos, which mean “locus” (i.e., the
place where something is situated or occurs) and “study,” respectively. Therefore, in the context of network
engineering, a topology is a blueprint for networks, whose logical or physical elements are combined in
accordance with a given pattern.
In the next sections, we will review a few hybrid and multi-cloud network topologies, which will help
you execute your workload migration or modernization strategy. The term “private computing environment”
will be used to denote your company on-premises data centers or another cloud.
Mirrored
In a mirrored topology, as the name suggests, you mirror your cloud computing environment into your private
one. Notice in Figure 2-35 how your workload infrastructure is located in its own shared VPC, which is separated
from its automation components (CI/CD (Continuous Integration/Continuous Deployment) pipelines). These
are located in another VPC, which connects to your private computing environment using RFC 1918 IP space.
58
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
Eventually, the production network will “relocate” to a VPC in Google Cloud when the migration of your
workload is completed.
Meshed
A meshed topology is the simplest way to extend your network from your company’s data center(s) into
Google Cloud or from another public cloud provider. The outcome of this topology is a network that
encompasses all your computing environments, whether they be in Google Cloud, on-premises, or in
other clouds.
This topology is best suited when your hybrid or multi-cloud workloads require private addressing (RFC
1918 IP addresses), cross-environment connectivity.
The design in Figure 2-36 shows an example of a meshed topology that connects a Google Cloud
VPC (VPC1) to a VPC in another cloud (VPC2) by leveraging a Dedicated Cloud Interconnect link.
59
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
60
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
As you learned before, a key difference between Interconnect and IPsec VPN is that the latter uses the
Internet or direct peering as a communication channel.
As a result, a secure tunnel between the two clouds must be established in order to ensure
authentication, confidentiality, and integrity of the data exchanged between the components of your multi-
cloud workload. This is where Cloud VPN and its counterpart peer VPN gateway in the other public cloud
provider come into play.
The two VPN gateways need to expose a public IP address in order to communicate over the Internet
(or direct peering), and they also need to be configured to establish a security association between them. A
security association may include attributes such as cryptographic algorithm and mode, traffic encryption
key, and parameters for the network data to be passed over the connection. More details about this setup
will be provided in Chapter 7.
Additionally, if VPC1 has been configured to use global dynamic routing mode, only one Cloud Router is
needed. In the other public cloud provider, you should expect one VPN gateway per region.
61
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
Gated Egress
A gated topology is intended to privately expose APIs to your workloads.
With a gated egress topology, you want to expose APIs to your GCP workloads, which act as consumers.
This is typically achieved by deploying an API gateway in your private computing environment to act as a
façade for your workloads within.
Figure 2-38 displays this setup, with the API gateway acting as a façade and placed in the private
computing environment, whereas the consumers are GCP VMs hosted in service projects of a shared VPC.
All traffic uses RFC 1918 IP addresses, and communication from your private computing environment
to GCP (i.e., ingress with respect to GCP) is not allowed.
Gated Ingress
Conversely, a gated ingress topology is intended to expose APIs to your private computing environment
workloads, which act as consumers.
62
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
As illustrated in Figure 2-39, this time the API gateway is deployed in GCP and consists of a group of
VMs (or virtual network appliances), each equipped with two network interfaces:
1.
eth0: Connected to a Transit VPC, which receives incoming RFC 1918 traffic from
the private computing environment
2.
eth1: Connected to a Shared VPC, where your workloads operate
An internal load balancer (ILB) is placed in the Transit VPC to balance incoming traffic across the VMs.
All traffic uses RFC 1918 IP addresses, and communication from GCP to your private computing
environment (i.e., egress with respect to GCP) is not allowed.
63
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
All inbound and outbound traffic—with respect to GCP—uses RFC 1918 IP addresses, and firewall rules
need to be configured in the Transit VPC and in the DMZ VLAN (also known as Perimeter VLAN) to only
allow API consumption and nothing else.
Figure 2-40 shows this setup, which is obtained by combining gated egress (Figure 2-38) and gated
ingress (Figure 2-39).
Figure 2-40. Reference architecture for gated egress and ingress topology
Handover
This topology is best suited for hybrid or multi-cloud analytics workloads. The approach leverages the
broad spectrum of big data services offered by Google Cloud in order to ingest, store, process, analyze, and
visualize data originated in your private compute environment.
Since there is no private connectivity requirement across environments, it is recommended to use
direct or carrier peering connectivity, which provides high throughput and low latency.
Data pipelines are established to load data (in batches or in real time) from your private compute
environment into a Google Cloud Storage bucket or a Pub/Sub topic. GCP workloads (e.g., Hadoop) process
the data and deliver it in the proper format and channel.
64
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
To prevent data exfiltration, it is also recommended to create a VPC service perimeter—as shown in
Figure 2-41—that protects the projects and the service APIs and allows access to the ETL (Extract Transform
Load (Data)) workloads responsible for data ingestion and processing.
65
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
As you learned in the “Custom Routes” section, each VM has a built-in controller, which is aware of
all the routes in the VPC routing table. The value you assign to the bgp-routing-mode flag will update the
controller for each VM in the VPC as explained in the following text.
MODE value must be one of
• Regional: This is the default option. With this setting, all Cloud Routers in this VPC
advertise to their BGP peers subnets from their local region only and program VMs
with the router's best-learned BGP routes in their local region only.
• Global: With this setting, all Cloud Routers in this VPC advertise to their BGP peers
all subnets from all regions and program VMs with the router's best learned BGP
routes in all regions.
You can set the bgp-routing-mode flag any time, that is, while creating or updating your VPC network.
■■Note Changing the dynamic routing mode has the potential to interrupt traffic in your VPC or enable/
disable routes in unexpected ways. Carefully review the role of each Cloud Router before changing the dynamic
routing mode.
66
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
67
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
Redundant VPC
A redundant VPC leverages a combination of Interconnect links and IPsec tunnels to achieve failover and
high availability for your hybrid workloads.
The example in Figure 2-43 illustrates a reference topology of a hybrid setup, where a multi-regional
VPC is extended to two data centers on-premises.
The setup uses two Dedicated Interconnect links as a primary connection and two IPsec tunnels as a
secondary connection in the event the customer-managed peering router becomes unavailable.
For egress route advertisement, there are a number of important points to highlight:
1.
The four Cloud Router instances R1, R2, R3, R4 are distributed in two regions.
2.
VPC1 is configured to use global dynamic routing mode.
3.
As a result of point #2, R1, R2, R3, R4 advertise routes for all VPC subnets, that is,
Subnet1 in Region1, Subnet2 in Region2.
4.
R3, R4 advertise routes using Dedicated Interconnect. This is the primary network
connectivity product selected for this reference topology.
68
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
5.
R1, R2 advertise routes using Cloud VPN. This is the secondary (backup) network
connectivity product selected for this reference topology.
6.
Cloud Router instances are automatically configured to add an Inter-region cost
when they advertise subnet routes for subnets outside of their region. This value
(e.g., 103 in our reference topology) is automatically added to the advertised
route priority—the MED (Multi-exit Discriminator). The higher the MED, the
lower the advertised route priority. This behavior ensures optimal path selection
when multiple routes are available.
7.
When the on-premises BGP peer routers PeerRouter1 and PeerRouter2 learn
about Subnet2 routes in Region2, they favor routes using Dedicated Interconnect
rather than VPN because—as stated in point #6—routes advertised by R3, R4
using Dedicated Interconnect have MED values (100) lower than MED values for
routes advertised by R1, R2 (203=100+103), resulting in higher priorities.
Let’s review now how this reference topology provides failover and disaster recovery for ingress route
advertisement.
In this scenario (Figure 2-44), the advertised route priority can be left equal (e.g., MED=100) for all
advertised on-premises subnet routes.
69
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
By using the same MED, Google Cloud automatically adds that Inter-region cost (103) when R1, R2 in
Region1 program routes in VPC1 whose destinations are in Subnet2 ranges.
As a result, on-premises route advertisements are load-balanced across the four instances of Cloud
Router, that is, R1, R2 (in Region1) and R3, R4 (in Region2). However, if the destination is an IP range in
Subnet2, that is, 10.140.1/24, then Interconnect links are favored because they have a lower MED, resulting
in higher priority.
■■Note Most, but not all, Google Cloud services expose their API for internal access. The list of supported
services can be found here: https://cloud.google.com/vpc/docs/configure-private-service-
connect-apis#supported-apis.
Put differently, a VM no longer needs an external IP address in order to consume Google APIs and
services, provided the subnet it belongs to has been configured to use Private Google Access and a few DNS and
route configurations are made.
In Figure 2-45, an on-premises network is connected to a shared VPC (VPC1) through a Cloud VPN
tunnel. Traffic from on-premises hosts to Google APIs travels through the IPsec tunnel to VPC1 and then is
sent through a custom route (goto-apis), which uses the default Internet gateway as its next hop (step 1).
This next hop allows traffic to leave the VPC network and be delivered to private.googleapis.com
(199.36.153.8/30).
70
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
Keep in mind that with this setup, the on-premises DNS server has to be configured to map
*.googleapis.com requests to private.googleapis.com, which resolves to the 199.36.153.8/30 IP
address range.
You can resolve the private.googleapis.com and restricted.googleapis.com domains from any
computer connected to the Internet, as shown in Figure 2-46.
71
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
Figure 2-46. The public DNS A records for private.googleapis.com and restricted.googleapis.com
Also, the Cloud Router instance CloudRouter1 has to be configured to advertise the 199.36.153.8/30 IP
address range through the Cloud VPN tunnel by using a custom route advertisement (step 2). Traffic going to
Google APIs is routed through the IPsec tunnel to VPC1.
There are a few additional points to highlight:
1.
VM1A, VM1B, VM1C, VM1D can access Bucket1 by consuming the private.
googleapis.com endpoint. This is because they all share access to subnet1,
which is configured to allow Private Google Access.
2.
VM2B can also access Bucket1 by consuming the private.googleapis.com
endpoint. This time—as you may have noticed—VM2B shares access to subnet2,
which is configured to deny Private Google Access. However, VM2B can leverage
its external IP address to access Bucket1. Even with its external IP address, traffic
remains within Google Cloud without traversing the Internet.
3.
All VMs that share access to subnet2 and do not have an external IP address (i.e.,
VM2A, VM2C, VM2D) cannot access Bucket1. This is because subnet2 is configured
to deny Private Google Access.
72
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
73
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
Similarly to the Private Google Access example, a new custom route goto-apis must be created, this
time by setting the --next-hop-address flag to the PSC endpoint IPv4 address 10.10.110.10 instead of the
default Internet gateway, which has been removed. CloudRouter1 must subsequently advertise this newly
created custom route so that its on-premises peer router is aware of it.
Also, the PSC endpoint must be located in the VPC connected to the on-premises network—in this
example, VPC1. This is another requirement for on-premises VMs, for example, VM3A, VM4A, to consume data
stored in Bucket1 using the PSC endpoint.
Finally, you must configure your on-premises DNS so that it can make queries to your private DNS
zones. If you've implemented the private DNS zones using Cloud DNS, then you need to complete the
following steps:
1.
In VPC1, create an inbound server policy.
2.
In VPC1, identify the inbound forwarder entry points in the regions where your
Cloud VPN tunnels or Cloud Interconnect attachments (VLANs) are located.
3.
Configure on-premises DNS name servers to forward the DNS names for the PSC
endpoints to an inbound forwarder entry point in the same region as the Cloud
VPN tunnel or Cloud Interconnect attachment (VLAN) that connects to VPC1.
■■Note With PSC, the target service or API does not need to be hosted or managed by Google. A third-
party service provider can be used instead. This point and the fact that PSC is fully managed make PSC the
recommended way to let workloads consume services and APIs privately.
74
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
75
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
An early allocation of IP ranges for the pods in your cluster has a twofold benefit: preventing conflict
with other resources in your cluster’s VPC network and allocating IP addresses efficiently. For this reason,
VPC-native is the default network mode for all clusters in GKE versions 1.21.0-gke.1500 and later.
76
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
In CIDR notation, the mask is denoted by the digit after the “/”, that is, /x where 0 ≤ x ≤ 32.
Assuming you have proper IAM permissions, the preceding code will create a new subnet your-subnet
in the zone us-west1-a and a standard GKE cluster, which is VPC-native because the flag --enable-ip-
alias is set—resulting in two secondary ranges in the subnet your-subnet, one for the pod IPs and another
for the service IPs.
The cluster will use the IP range 10.4.32.0/28 for its worker nodes, the IP range 10.0.0.0/24 for its
pods, and the IP range 10.4.0.0/25 for its services.
77
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
Finally, each worker node will host no more than 16 pods, as illustrated in Figure 2-49.
Now, let’s say you want to decrease the pod density from 16 to 8. This may be because your
performance metrics showed you don’t actually need 16 pods per node to meet your business and technical
requirements. Reducing the pod density will allow you to make better use of your preallocated IP range
10.0.0.0/24 for pods.
You can reduce the pod density from 16 to 8 by creating a node pool in your existing cluster:
The preceding code overrides the previously defined cluster-level default maximum of 16 pods per
node with a new value of 8 pods per node.
In Figure 2-50, you can visually spot how the reduction of pods' density applies to the newly created
node pool.
78
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
When you introduce node pools to reduce the pod density of an existing cluster, GKE automatically
assigns a smaller CIDR block to each node in the node pool based on the value of max-pods-per-node. In
our example, GKE has assigned /28 blocks to Node0, Node1, Node2, Node3.
■■Exam tip You can only set the maximum number of pods per node at cluster creation or after creating a
cluster by using a node pool.
79
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
Non-RFC 1918
The first way to expand your GKE cluster’s IP ranges is by using non-RFC 1918 reserved ranges.
Figure 2-51 shows this list of non-RFC 1918 ranges.
From a routing perspective, these IP addresses are treated like RFC 1918 addresses (i.e., 10.0.0.0/8,
172.16.0.0/12, 192.168.0.0/16), and subnet routes for these ranges are exchanged by default over VPC
peering.
However, since these addresses are reserved, they are not advertised over the Internet, and when you
use them, the traffic stays within your GKE cluster and your VPC network.
80
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
81
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
82
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
Figure 2-53. GKE private cluster with limited public endpoint access
Subsequently, you leverage the --master-authorized-network flag to selectively list the CIDR blocks
authorized to access your cluster control plane.
Figure 2-53 illustrates this configuration: the non-RFC 1918 CIDR block 198.37.218/24 is authorized
access to the control plane, whereas the client 198.37.200.103 is denied access.
This is a good choice if you need to administer your private cluster from source networks that are not
connected to your cluster's VPC network using Cloud Interconnect or Cloud VPN.
83
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
Figure 2-54. GKE private cluster with unlimited public endpoint access
S
ummary
This chapter walked you through the important things you need to consider when designing, planning, and
prototyping a network in Google Cloud.
The key drivers that shape the design of the overall network architecture were introduced, that is,
high availability, resilience, performance, security, and cost. You learned how tweaking your workload’s
nonfunctional requirements has an impact on the resulting network architecture.
We introduced the construct of a Virtual Private Cloud network (VPC), along with its foundational
components (private and public IP addresses), its location components (subnets, routes), and its security
components (firewall rules). You learned how to design VPCs to meet your workload business and technical
requirements, by leveraging the global, highly performant, and highly optimized Google Cloud backbone.
Whether your workloads are cloud-native (born in the cloud), or they are being migrated from your
company on-premises data centers (hybrid), or even from other clouds (multi-cloud), you learned how
Google Cloud provides products, services, and reference architectures to meet your needs.
84
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
Last, as the creator of Kubernetes, you learned how Google Cloud offers a diverse set of unique features
that help you choose the best network capabilities for your containerized workloads, including VPC-native
clusters, container-native load balancing, flexible pod density, and private clusters.
In the next chapter, we will deep dive into VPC networks and introduce the tools you need to build the
VPCs for your workloads.
E xam Questions
Question 2.1 (VPC Peering)
Your company just moved to GCP. You configured separate VPC networks for the Finance and Sales
departments. Finance needs access to some resources that are part of the Sales VPC. You want to allow the
private RFC 1918 address space traffic to flow between Sales and Finance VPCs without any additional cost
and without compromising the security or performance. What should you do?
A.
Create a VPN tunnel between the two VPCs.
B.
Configure VPC peering between the two VPCs.
C.
Add a route on both VPCs to route traffic over the Internet.
D.
Create an Interconnect connection to access the resources.
R
ationale
A is not correct because VPN will hinder the performance and will add
additional cost.
B is CORRECT because VPC network peering allows traffic to flow
between two VPC networks over private RFC 1918 address space without
compromising the security or performance at no additional cost.
C is not correct because RFC 1918 is a private address space and cannot be
routed via public Internet.
D is not correct because Interconnect will cost a lot more to do the same work.
85
Chapter 2 ■ Designing, Planning, and Prototyping a Google Cloud Network
B.
Use nslookup -q=TXT spf.google.com to obtain the API IP endpoints used
for Cloud Storage and BigQuery from Google’s netblock. Configure Cloud
Router to advertise these netblocks to your on-premises router using a
flexible routing advertisement. Use gsutil cp files gs://bucketname and
bq --location=[LOCATION] load --source_format=[FORMAT] [DATASET].
[TABLE] [PATH_TO_SOURCE] [SCHEMA] on-premises to transfer data to Cloud
Storage and BigQuery.
C.
Configure Cloud Router (in your GCP project) to advertise 199.36.153.4/30
to your on-premises router using a flexible routing advertisement (BGP).
Modify your on-premises DNS server CNAME entry from *.googleapis.com
to restricted.googleapis.com. Use gsutil cp files gs://bucketname and
bq --location=[LOCATION] load --source_format=[FORMAT] [DATASET].
[TABLE] [PATH_TO_SOURCE] [SCHEMA] on-premises to transfer data to Cloud
Storage and BigQuery.
D.
Use gsutil cp files gs://bucketname and bq --location=[LOCATION]
load --source_format=[FORMAT] [DATASET].[TABLE] [PATH_TO_SOURCE]
[SCHEMA] on-premises to transfer data to Cloud Storage and BigQuery.
R
ationale
A is not correct because it adds additional operational complexity and
introduces a single point of failure (Instance) to transfer data. This is not Google-
recommended practice for on-premises private API access.
B is not correct because these netblocks can change, and there is no guarantee
these APIs will not move to different netblocks.
C is CORRECT because it enables on-premises Private Google Access,
allowing VPN and Interconnect customers to reach APIs such as BigQuery
and Google Cloud Storage natively across an Interconnect/VPN connection.
The CIDR block 199.36.153.4/30 is obtained when you try to resolve
restricted.googleapis.com. You need this CIDR block when adding a custom
static route to enable access to Google-managed services that VPC Service
Controls supports. Google Cloud Storage and BigQuery APIs are eligible
services to secure the VPC perimeter using VPC Service Controls. Therefore,
the CNAME type DNS records should resolve to restricted.googleapis.com.
D is not correct because it will utilize an available Internet link to transfer data
(if there is one). This will not satisfy the security requirement of using the VPN
connection to the cloud.
86
CHAPTER 3
In the previous chapter, you learned about Virtual Private Cloud networks (VPCs) and how to design one
or more VPCs from a set of business and technical requirements for your workload. We showed how every
detail of your VPC design and connectivity traces back to one or more requirements.
The well-architected framework was also introduced as a tool to guide you through the GCP services
and network tier (premium or standard) decision-making process. Think of the collection of all GCP
services, along with their “flavors” and all their possible configurations as a palette. Your workload business
requirements will tell you what should be drawn in your painting. Your workload technical requirements will
tell you for each element in your painting what mix of color, shade, and pattern to use. The well-architected
framework will help you further choose the optimal combination of shapes, colors, shades, and patterns so
the final painting will look great.
At this point, the overall network architecture and topology should have been designed. With reference
to the painting analogy, you should have your canvas, your shapes, and colors selected.
In this chapter, we will take a step further and will teach you how to build your VPCs, and most
importantly we will teach you how to establish connectivity among your VPCs and other networks, in
accordance with the topologies developed in Chapter 2.
Configuring VPCs
Whether your workload runs entirely on GCP, in a hybrid, or a multi-cloud environment, you usually start
small with a VPC located in a single region.
As your workload gets more hits, incoming traffic increases, and your VPC may need to gradually
expand to more zones and eventually to other regions in order for your workload to perform and be resilient.
However, with GCP the original network configuration just seamlessly works. This is because VPCs are
designed for scale and extensibility.
For example, you can extend the IP address range of a subnet anytime. You can even add a new subnet
after a VPC has been created. When you add a new subnet, you specify a name, a region, and at least a
primary IP address range according to the subnet rules.
There are some constraints each subnet must satisfy, for example, no overlapping IP address ranges
within the same VPC, unique names if the subnets are in the same region and project, and others we will
review later. The idea is that you can scale and extend your VPC with little configuration changes and, most
importantly, without having the need to recreate it.
■■Note To learn how to install the gcloud CLI on your machine, see https://cloud.google.com/sdk/
gcloud#download_and_install_the.
Last, the gcloud command line is less likely to change than any other tools—and typically, changes are
backward compatible.
Figure 3-1 shows how to check the gcloud CLI version. Google maintains the gcloud CLI and all the
software available when you use a Cloud Shell terminal.
Figure 3-1. Using the gcloud CLI from the Google Cloud Shell
In the upcoming sections, you will learn how to create a VPC and how to configure some of its key
components, that is, subnets, firewall rules, and routes.
88
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Creating VPCs
In Google Cloud, VPCs come in two flavors: auto mode (default) and custom mode.
Auto-mode VPCs automatically create a subnet in each region for you. You don’t even have to worry
about assigning an IP range to each subnet. GCP will automatically assign an RFC 1918 IP range from a
predefined pool of RFC 1918 IP addresses.
If you want more control in the selection of subnets for your VPC and their IP ranges, then use custom-
mode VPCs. With custom-mode VPCs, you first create your VPC, and then you manually add subnets to it as
documented as follows.
■■Note In GCP, IP ranges are assigned on a per-subnet basis. As a result, you don’t need to allocate IP ranges
until you add subnets to your custom-mode VPC.
■■Exam tip Every Google Cloud new project comes with a default network (an auto-mode VPC) that has
one subnet in each region. The subnet CIDR blocks have IPv4 ranges only and are automatically assigned for
you. The subnets and all subnet ranges fit inside the 10.128.0.0/9 CIDR block. You will need to remember
this CIDR block for the exam. It is best practice to create your own VPCs in custom mode rather than using the
built-in default auto-mode VPC. This is because the default VPC is a very large VPC (it has one subnet per
region) with little flexibility with respect to CIDR blocks and other networking aspects. Use the default VPC for
experimentation only, when you need a VPC “ready-to-go” to quickly test a feature. For all other scenarios, use
custom-mode VPCs.
Creating Subnets
As shown in Figure 3-3, to create a subnet you must specify at a minimum the VPCs the subnet is a part
of—you learned in Chapter 2 that subnets are partitions of a VPC—as well as the primary IP range in CIDR
notation.
89
Chapter 3 ■ Implementing Virtual Private Cloud Instances
The gcloud command in Figure 3-3 added to the VPC your-first-vpc a subnet named your-subnet-1
with a primary CIDR block 192.168.0.0/27. Notice that you were asked to specify a region. This happens
when the CLI doesn’t know what the default region for your account is.
■■Exam tip You cannot use the first two and the last two IP addresses in a subnet’s primary IPv4 range.
This is because these four IP addresses are reserved by Google for internal use. In our preceding example, you
cannot use 192.168.0.0 (network), 192.168.0.1 (default gateway), 192.168.0.30 (reserved by Google for
future use), and 192.168.0.31 (broadcast). The same constraint does not apply to secondary IP ranges of a
subnet (see the next paragraph for more information about secondary ranges).
In addition to a primary IP range, a subnet can optionally be assigned up to two secondary IP ranges.
You learned in Chapter 2 that the first and the second secondary IP ranges are used by GKE to assign IP
addresses to its pods and its services, respectively. More information about secondary IP ranges will be
provided in the upcoming sections.
Listing Subnets
You can list the subnets of your VPC as displayed in Figure 3-4.
Listing VPCs
Likewise, you can list the VPCs in your GCP project as shown in Figure 3-5.
90
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Notice the list shows your-first-vpc and the default VPC, which has one subnet in each region.
Deleting VPCs
If you want to delete a VPC, you need to make sure any resource that uses the VPC has been deleted first.
For example, the command in Figure 3-6 fails because your-first-vpc contains the subnet your-
subnet-1, which has not been deleted yet.
As a result, first we need to delete your-subnet-1, and only afterward we can delete your-first-vpc, as
illustrated in Figure 3-7.
91
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Figure 3-8. Creating the first VPC and the first subnet
■■Note Unlike default VPCs, custom VPCs require that you explicitly add the default firewall rules to ssh
(Secure Shell) or rdp (Remote Desktop Protocol) to the VPC.
92
Chapter 3 ■ Implementing Virtual Private Cloud Instances
93
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Figure 3-13 shows the current setup. The other three VMs shown in each subnet are for illustrative
purposes to emphasize the internal routing capability of a VPC.
Figure 3-13. A project containing two VPCs with eight VMs in each VPC
To test connectivity between the two VMs, we first need to be able to connect to each VM, and for this
to happen, we need to create an ingress firewall rule for each VPC to allow access to the VMs using the SSH
(Secure Shell) protocol.
94
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Figure 3-14 illustrates the creation of the two firewall rules. Notice the direction (ingress or egress) is
omitted because ingress is the default value. Also, as you will learn later in this chapter, firewall rules apply
to the entire VPC.
Figure 3-14. Enabling ssh and ICMP (Internet Control Message Protocol) to the two VPCs
Let’s now log in to vm1 and test connectivity to vm2. As you can see in Figure 3-15, the ping command
will eventually time out because the two VPCs where vm1 and vm2 reside are completely disjointed.
Now, let’s peer the two VPCs as shown in Figures 3-16 and 3-17.
95
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Vice versa, since the peering relation is symmetrical, we need to peer vpc2 to vpc1.
Once the peering has been established, vm1 can ping vm2 and vice versa, as you can see in Figure 3-18.
96
Chapter 3 ■ Implementing Virtual Private Cloud Instances
97
Chapter 3 ■ Implementing Virtual Private Cloud Instances
98
Chapter 3 ■ Implementing Virtual Private Cloud Instances
In this exercise, user [email protected] will be principal one, that is, a user with Shared VPC
administration access at the organization level. Users [email protected] and [email protected]
will be principals two and three, respectively.
Figures 3-21 to 3-23 illustrate the IAM allow policy setup for principal [email protected].
Figures 3-24 and 3-25 illustrate the IAM allow policy setup for principals [email protected] and
[email protected], respectively.
99
Chapter 3 ■ Implementing Virtual Private Cloud Instances
To scope IAM allow policies at the organization level, we need first to obtain the organization ID, as
described in Figure 3-20.
100
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Last, in order to test connectivity from the subnets, we need to allow incoming traffic using the SSH,
TCP, and ICMP protocols. Firewall rules are defined for the whole VPC. As a result, they apply to all its
subnets. Figure 3-29 illustrates the creation of such firewall rule.
Figure 3-30. Creating the two service projects frontend-devs and backend-devs
Make sure each of the two newly created projects is linked to a billing account. Remember that a project
can only be linked to one billing account. Also, remember that a billing account pays for a project, which
owns Google Cloud resources.
Figure 3-31 shows how to link the two newly created service projects to a billing account. Notice how
the project IDs (frontend-devs-7734, backend-devs-7736) and not the project names (frontend-devs,
backend-devs) are required.
101
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Also, the billing account ID has been redacted, given the sensitivity nature of this data.
For more information on this command, visit https://cloud.google.com/sdk/gcloud/reference/
alpha/billing/accounts/projects/link.
Figure 3-32. Enabling the compute API to service and host projects
102
Chapter 3 ■ Implementing Virtual Private Cloud Instances
The intent of this use case is to show you how to configure a shared VPC with two subnets that are
essentially mutually exclusive. As a result, principals who have permissions to create compute resources in
the subnet-frontend subnet will not be able to create compute resources in the subnet-backend subnet and
vice versa.
103
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Likewise, all compute resources attached to subnet-frontend will be billed to the billing account
associated to the frontend-devs project, and all compute resources attached to subnet-backend will be
billed to the billing account associated to the backend-devs project. These two billing accounts may be the
same, although this is not required.
To associate a project to a host project, use the gcloud compute shared-vpc associated-projects
add command as illustrated in Figure 3-36.
Upon completion of the preceding command, the two projects frontend-devs and backend-devs are
officially service projects of a shared VPC.
As you can see by editing the JSON file (Figure 3-38), no role bindings are present in this IAM allow
policy. This means access to the GCP resource subnet-frontend is implicitly denied for anyone.
Therefore, we are going to add a new role binding that maps the principal [email protected] to the
IAM role roles/compute.networkUser. This role allows service owners to create VMs in a subnet of a shared
VPC as you will see shortly.
104
Chapter 3 ■ Implementing Virtual Private Cloud Instances
■■Note You should have a basic understanding of Google Cloud IAM allow policies. For more details, visit
https://cloud.google.com/iam/docs/policies.
Last, let’s apply this IAM allow policy to our resource. This can be done by using the gcloud beta
compute networks subnets set-iam-policy as illustrated in Figure 3-40.
■■Note While editing the file subnet-frontend-policy.json, make sure you don’t use the tab character
for indentation; otherwise, the YAML parser will throw an error.
Next, let’s repeat the same procedure to ensure the principal [email protected] is the only admin
of subnet-backend (Figures 3-41 to 3-44).
105
Chapter 3 ■ Implementing Virtual Private Cloud Instances
106
Chapter 3 ■ Implementing Virtual Private Cloud Instances
107
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Creating VMs
Next, let’s create two VMs: the first in subnet-frontend and the second in subnet-backend. Figures 3-48 and
3-49 display the VM creation.
Given the existing project boundaries, principal [email protected] has permissions to create a VM
in subnet-frontend, but not in subnet-backend.
108
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Verifying VM Connectivity
Finally, let’s connect via the SSH protocol to each VM and verify the two VMs can connect to each other.
Figure 3-50 illustrates how to connect with SSH. Once connected, we use the hostname Linux command
to determine the internal IP address of vm1.
Figure 3-51 shows the same process to determine the internal IP address of vm2.
109
Chapter 3 ■ Implementing Virtual Private Cloud Instances
As you can see, the connectivity is successful. Let’s repeat the same test to validate connectivity from
vm2 to vm1.
As shown in Figure 3-53, even though there are 20% packet loss, the test is still successful because the
ping command was interrupted after just a few seconds.
Deleting VMs
In order to avoid incurring unnecessary costs, if you no longer need the two VMs we just created it is always
a good idea to delete them. This will keep your cloud cost under control and will reinforce the concept that
infrastructure in the cloud is ephemeral by nature.
Figures 3-54 and 3-55 display how to delete vm1 and vm2, respectively.
110
Chapter 3 ■ Implementing Virtual Private Cloud Instances
■■Exam tip Notice how gcloud asked in which zone the VMs are located. Remember, as you learned in
Chapter 2, VMs are zonal resources.
111
Chapter 3 ■ Implementing Virtual Private Cloud Instances
112
Chapter 3 ■ Implementing Virtual Private Cloud Instances
The rationale about choosing between the two API bundles is based on the level of security needed to
protect your workloads. If your security requirements mandate that you protect your workload from data
exfiltration, then your workload will need to consume the vpc-sc bundle. In most of the remaining use
cases, the all-apis bundle will suffice.
This section will teach you what you need to do to let compute resources in your VPC (e.g., VMs, GKE
clusters, serverless functions, etc.) consume the Google APIs.
■■Exam tip Egress traffic is charged based on whether the traffic uses an internal or external IP address,
whether the traffic crosses zone or region boundaries within Google Cloud, whether the traffic leaves or stays
inside Google Cloud, and the network tier of traffic that leaves Google’s network (premium or standard). For
more information on network pricing, visit https://cloud.google.com/vpc/network-pricing.
Unlike routing and firewall rules which are scoped at the entire VPC level, Private Google Access
operates on a per-subnet basis, that is, it can be toggled for a single subnet.
To enable GPA, make sure that the user updating the subnet has the permission compute.subnetworks.
setPrivateIpGoogleAccess. In our dariokart.com organization, user Gianni needs this permission, and the
role roles/compute.networkAdmin contains this permission. In Figure 3-57, we update the IAM allow policy
attached to the organization, with a binding that maps the user Gianni to the roles/compute.networkAdmin.
Next, as shown in Figure 3-58, we update the subnet that requires PGA and validate the change.
The flag --enable-private-ip-google-access is available also when you create a subnet with the
gcloud compute networks subnets create.
113
Chapter 3 ■ Implementing Virtual Private Cloud Instances
■■Exam tip The key difference between PSC and PGA is that Private Google Access still uses external
IP addresses. It allows access to the external IP addresses used by App Engine and other eligible APIs and
services. PSC lets you access Google APIs via internal IP addresses instead, always keeping traffic in the Google
Global Backbone.
Let’s configure a PSC endpoint in our shared VPC to provide VMs with access to the all-apis bundle.
First and foremost, you need to make sure your network administrator has the following roles:
• roles/servicedirectory.editor
• roles/dns.admin
Similar to the role binding we added before, we update the organization IAM allow policy as shown in
Figures 3-59 and 3-60.
Correspondingly, in the project whose PSC endpoint will be created, you need to enable the Service
Directory API and the Cloud DNS API. The project we will be using is our host project vpc-host-nonprod,
whose ID is vpc-host-nonprod-pu645-uh372. The compute.googleapis.com is also required, but we already
enabled it as a prerequisite to create our shared VPC in the host project.
To discover the exact name of the Service Directory API and the Cloud DNS API, use the gcloud
command in Figure 3-61.
114
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Figure 3-61. Resolving Service Directory API and the Cloud DNS API names
Now, we are ready to enable the two required APIs in the project (Figure 3-62).
Figure 3-62. Enabling the Service Directory API and the Cloud DNS API
Another prerequisite to enable PSC is that the subnet from which the VMs will consume the all-apis
bundle must have PGA enabled. We already enabled PGA for subnet-frontend in the previous section.
Now, log in as the Shared VPC administrator, and follow the steps to create a private IP address
(Figure 3-63) and a PSC endpoint (Figure 3-64).
In order to validate the newly created PSC endpoint, we are going to create a VM in subnet-backend
(Figure 3-65) and a bucket in the project backend-devs (Figure 3-66). We will show that the VM can list the
objects in the bucket.
115
Chapter 3 ■ Implementing Virtual Private Cloud Instances
In Figure 3-67, we want to test HTTP connectivity from our VM to our PSC endpoint, which is mapped
to our reserved internal IP address.
We also want to make sure—always from the VM in subnet-backend—that we can call the GCP
storage API, which can be accessed internally (using RFC 1918 IP space) by prefixing the name of the PSC
forwarding rule psc2allapis as shown in Figure 3-68.
116
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Next, in Figure 3-69, we list the content of our bucket using the gsutil command from vm2.
As you can see, the command returned the newly created file a.txt. This confirms that the VM can
consume the GCP storage API internally by using a Private Service Connect endpoint.
Finally, to avoid incurring unnecessary charges, let’s clean up the resources we just created for this
exercise, namely, the VM (Figure 3-70), the bucket (Figure 3-71), and the PSC endpoint (Figure 3-72).
117
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Configuring Routing
Every VPC comes with a distributed, scalable, virtual routing system.
Routes define the paths network traffic takes from a VM to other destinations. Network traffic is
composed of packets, each characterized by a source and a destination, which can be inside your VPC
network—for example, in another VM in the same or a different subnet—or outside of it.
Each VM has a controller that is kept informed of all applicable routes from the VPC’s routing table.
Each packet leaving a VM is delivered to the appropriate next hop of an applicable route based on a routing
order. When you add or delete a route, the set of changes is propagated to the VM controllers by using an
eventually consistent design.
Routes are defined as a network-wide configuration. Let’s see the routes defined in our shared VPC
(Figure 3-74).
118
Chapter 3 ■ Implementing Virtual Private Cloud Instances
As you can see, each route is identified by the combination of the following elements:
119
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Static Routes
Static routes can be created using the gcloud compute routes create command. An easy way to learn how
to use this command is the gcloud help command (Figure 3-76, and its output in Figure 3-77).
120
Chapter 3 ■ Implementing Virtual Private Cloud Instances
In addition to the route NAME, you must provide a destination range of outgoing packets the route will
apply to, which is denoted by the flag --destination-range=DESTINATION_RANGE. The broadest destination
range is 0.0.0.0/0.
Another required flag is the next hop, which can be one, and one only, of the following flags:
• Next hop gateway (--next-hop-gateway=default-internet-gateway): It denotes a
path to the Internet (0.0.0.0/0) through the default Internet gateway, which routes
packets to external IP addresses or Google API endpoints (Private Google Access).
• Next hop instance by name (--next-hop-instance=VM_HOSTNAME): You can denote
the next hop of a route by using an existing VM’s hostname and its zone only if the
VM and the route are in the same project. In shared VPC setups, where a VM is in
a service project and the route is in its associated host project, you need to use the
VM’s internal IP address instead. The VM must already exist, and its hostname must
match VM_HOSTNAME when creating or updating the route. Also, the VM must have at
least one network interface in the route’s VPC. This flag requires that you specify the
zone of the VM as well by setting the flag --next-hop-instance-zone.
• Next hop instance by address (--next-hop-address=VM_IPV4_ADDRESS): It denotes
the internal IPv4 address of a VM (from the VM’s primary IP or alias IP ranges) that
should handle matching packets. The VM must have at least one network interface
in the route’s VPC, whose internal IP address must match VM_IPV4_ADDRESS. The VM
must also be configured with IP forwarding enabled (e.g., by setting the flag --can-
ip-forward when creating the VM with the gcloud compute instances create
command).
• Next hop internal TCP/UDP load balancer (--next-hop-ilb=FORWARDING_RULE_
NAME): It denotes the name or the IP address of an internal network TCP/UDP load
balancer forwarding rule. When configuring the forwarding rule, the flag --load-
balancing-scheme must be INTERNAL. When configuring the custom static route
with an ILB as next hop, you cannot assign values to the --destination-range flag
that match or are more specific than the IP range of a subnet route destination in the
route’s VPC. This is because the former would “eclipse” the latter, and subnet routes
cannot be deleted or overridden, as indicated in Figure 3-75.
121
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Dynamic Routes
Custom dynamic routes are managed by one or more cloud routers in your VPC. You no longer have to
create routes using the gcloud command because a cloud router does the job for you.
A cloud router is a regional, fully distributed and managed Google Cloud service that uses the Border
Gateway Protocol (BGP) to advertise IP address ranges. It programs custom dynamic routes based on the
BGP advertisements that it receives from a peer router in the same or in a different VPC.
A cloud router is used to provide dynamic routes for
• Dedicated Interconnect
• Partner Interconnect
• HA VPN
• Classic VPN with dynamic routing
You may be thinking that by using dynamic routing instead of static routing, we just moved the problem
to the cloud router. After all, we still need to create a cloud router responsible for advertising newly defined
routes to its peer. Conversely, we need to create a cloud router responsible for learning newly defined routes
from its peer.
Put differently, whether you choose to create custom static routes or you choose to create a cloud router
that does the job for you, you still need to create something manually, for example, using gcloud. So, what’s
the advantage of using dynamic routing vs. static routing?
The main advantage is that dynamic routes—unlike static routes—are change-tolerant, that is, when
the routing infrastructure changes, you don’t have to do anything because the cloud router has been
programmed to learn the change and automatically readjust the routes for you.
122
Chapter 3 ■ Implementing Virtual Private Cloud Instances
A metaphor I often like to use is traveling from location A to location B using a physical map or using a
modern GPS system, for example, a mobile phone with access to Google Maps.
For short distances, chances of high traffic, construction work, or inclement weather—which won’t be
detected in your physical map—are low. As a result, your “good old map” will work most of the times.
For large distances, a number of unpredictable factors will likely impact your travel itinerary, resulting
in detours, unexpected delays, and other hurdles. This is when your GPS system comes in handy by
detecting issues ahead of time and by dynamically recalculating routes for you in order to select an optimal
path toward your destination (location B).
In conclusion, static routes are great for small networks, which are not often subject to change. For large
enterprises, using static routes is simply not a sustainable solution. That’s when a cloud router “shines.” Let’s
see how you create a cloud router (Figures 3-78 and 3-79).
At a minimum, you have to provide a name for your cloud router and a VPC network where the cloud
router will operate.
The optional flags you need to know for the exam are
• Advertisement mode (--advertisement-mode=MODE): MODE can be one of the
two—CUSTOM or DEFAULT. The former indicates that you will manually configure BGP
route advertisement. The latter indicates that Google will automatically configure
BGP route advertisement for you.
• Autonomous system number (--asn=ASN): It denotes the ASN for the router. It
must be a 16-bit or 32-bit private ASN as defined in https://tools.ietf.org/html/
rfc6996, for example, --asn=64512.
• Async (--async): It denotes that you want the router to return immediately, without
waiting for the operation in progress to complete.
123
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Routing Order
A route is a rule that specifies how certain packets should be handled by the VPC. Routes are associated
with VMs by tag, and the set of routes for a particular VM is called its routing table. For each packet leaving
a VM, the system searches that VM’s routing table for a single best matching route. Routes match packets
by destination IP address, preferring smaller or more specific ranges over larger ones (see --destination-
range). If there is a tie, the system selects the route with the smallest priority value (see --priority). If there
is still a tie, it uses the layer 3 and 4 packet headers to select just one of the remaining matching routes.
For static routes, the packet is then forwarded as specified by --next-hop-address, --next-hop-
instance, --next-hop-vpn-tunnel, or --next-hop-gateway of the winning route.
For dynamic routes, the next hop will be determined by the cloud router route advertisement.
Packets that do not match any route in the sending virtual machine routing table will be dropped.
124
Chapter 3 ■ Implementing Virtual Private Cloud Instances
As highlighted, our shared VPC uses the default value REGIONAL. This means that any cloud router
associated with our shared VPC—remember, routing is a network-wide configuration—advertises subnets
and propagates learned routes to all VMs in the same region where the router is configured.
For example, if we create a cloud router for your-app-shared-vpc in us-east1, then the routes it
learned from other VPCs and the subnets it propagated are available only to VMs in subnet-frontend,
because this subnet is in the us-east1 region. VMs in subnet-backend have no clue about “foreign” routes,
that is, routes about other VPCs connected to our shared VPCs.
If you want your cloud router to advertise subnets and propagate learned routes to VMs in all subnets
of your VPC—regardless of which region they belong to—then set the –bgp-routing mode flag to GLOBAL, as
indicated in Figure 3-81.
125
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Figure 3-82. Route exchange between two VPCs configured with bgp-routing-mode set to REGIONAL
On the left side, we have our shared VPC your-app-shared-vpc you are familiar with, which was
configured with the default BGP routing mode, that is, REGIONAL.
On the right side, we have another VPC your-app-connected-vpc, which is connected to the first VPC
network using HA VPN—you will learn more about HA VPN in Chapter 7. For the sake of simplicity, we are
showing in the diagram only the first two tunnels, tunnel1-a and tunnel1-b, even though there are two
other tunnels to ensure fault tolerance. VPC your-app-connected-vpc is also configured with the default
BGP routing mode, that is, REGIONAL.
Behind the scenes, there are two Cloud VPN gateways, one in each VPC network, which are responsible
for enabling the secure IPsec channels. What is relevant to this diagram are the two cloud routers, which are
regional resources—both in us-east1—and are responsible for establishing a BGP session in which subnet
and custom routes are being exchanged.
Let’s focus on the two regional route tables now: one for region us-east1 and another for region
us-central1. I intentionally chose to group routes by region because it’s important that you relate the
concept of a route to a physical region. After all, packets of data are being exchanged across a medium,
which at the physical layer of the OSI model (layer 1) is implemented by interconnecting physical
components, such as network interface controllers (NICs), Ethernet hubs, network switches, and
many others.
As you can see, the first row in the us-east1 regional table shows a subnet route advertised by router-a
to router-b.
Symmetrically, the second row in the us-east1 regional table shows a subnet route advertised by
router-b to router-a.
In addition to the network, destination (prefix), and next hop, the cloud routers include the priority for
the advertised route.
You control the advertised priority by defining a base priority for the prefixes. We will show you how to
set up this value in the upcoming section “Updating the Base Priority for Advertised Routes.” For the time
being, we are using the base priority default value of 100.
126
Chapter 3 ■ Implementing Virtual Private Cloud Instances
■■Note The region for both advertised subnet routes matches the region of the cloud routers, that is,
us-east1.
On the other hand, the us-central1 regional table shows no routes for any of the two VPCs. This
is because the BGP routing mode setting to REGIONAL in both VPCs has caused the two cloud routers to
only exchange with each other’s routes related to their own region, that is, us-east1. As a result, VMs
in 192.168.9.0/27 cannot reach any VMs in 192.168.0.0/27 or 192.168.1.0/27. Likewise, VMs in
192.168.1.0/27 cannot reach any VMs in 192.168.8.0/27 or 192.168.9.0/27.
Now let’s update both VPCs by setting the BGP routing mode to GLOBAL. With this simple change as
illustrated in Figure 3-83, it’s a very different story. Let’s see why.
Figure 3-83. Route exchange between two VPCs configured with bgp-routing-mode set to GLOBAL
First, the two us-east1 regional routes we reviewed before are still there (rows 1 and 3), but this time
router-a and router-b are no longer limited to advertising routes from their own region. Instead, a VPC
configured to use bgp-routing-mode set to GLOBAL always tells its cloud routers to advertise routes (system-
generated, custom, peering) from any region spanned by the VPC.
Consequently, router-a also advertised the subnet route for subnet-backend in your-app-shared-vpc.
Likewise, router-b also advertised the subnet route for subnet-backend in your-app-connected-vpc. The
effect of these two actions resulted in the addition of rows 2 and 4 to the regional table us-east1. We will
explain the rationale about the priority set to 305 shortly. For the time being, just remember that the VPC
BGP dynamic routing property set to GLOBAL has caused the cloud routers to advertise subnets located in a
different region—that is, us-central1—from the one where they operate, that is, us-east1.
You’ll probably wonder why even bother using the default value REGIONAL of bgp-routing-mode for a
VPC if the number of routes advertised is limited to a single region.
Here is the caveat, and this is why I wanted to reiterate the fact that a cloud router is a regional resource to
denote that it is always tied to a region. Inter-region routes add latency and egress charges to traffic leaving the
region through the advertised route. As a result, an inter-region cost is added to the base priority of the route.
127
Chapter 3 ■ Implementing Virtual Private Cloud Instances
■■Note With reference to the well-architected framework, one of the pillars you need to consider when
designing the network architecture for your workload is cost. By letting your VPC routing capability become
global, you need to be aware of the effects of this choice. You get an enhanced connectivity across the
components of your workload with a solid degree of automation by fully leveraging the BGP protocol and the
scalability of the Google Global Backbone. However, this comes at the expense of higher cost and increased
latency due to inter-region paths your workload traffic needs to cross.
The inter-region cost is a system-generated value, which is dynamically calculated by Google, and
its value depends on a number of factors including distance between the regions, network performance,
available bandwidth, and others. Its value ranges from 201 to 9999.
When cloud router-a advertises the prefix 192.168.1 to cloud router-b as illustrated in row 2 of
the us-east1 regional table, it adds the inter-region cost (e.g., C=205) to the base priority of the route (we
used the default value 100). As a result, the advertised inter-regional route has a priority of 305. The same
principle applies to row 4.
You learned so far that a cloud router associated with a VPC configured with BGP routing mode set to
GLOBAL advertises routes in any regional table the VPC spans across—not just the region where the cloud
router operates.
Let’s focus now on the us-central1 regional table. At a minimum, you wouldn’t expect this table to
be empty, right? Your assumption would be correct. In fact, this regional table contains the same rows
contained in the regional table us-east1, but the priorities for all routes are lowered by adding to each route
priority the extra inter-region cost (C=205).
■■Note One important thing you need to be aware of for the exam is that when talking about routes, lower
numbers mean higher priorities. If you want to increase the priority of a route, you need to subtract something
from the route priority. A route with priority 0 (zero) always wins because it has the highest possible priority a
route can have.
If you think about it, it all makes sense because the path to connect a VM in subnet-backend of your-
app-connected-vpc and another VM in subnet-backend of your-app-shared-vpc requires two inter-region
costs in addition to the base priority, that is, 100 + 2*C.
Figure 3-83 illustrates this concept.
128
Chapter 3 ■ Implementing Virtual Private Cloud Instances
The only required argument is the name of your cloud router, that is, NAME. If you don’t specify a region using
the --region flag, Google will try to determine the region where the router you are inquiring about operates.
Assuming a cloud router in the specified region exists, the output of this command includes the
following three sections:
• Best routes (Result -> bestRoutes): List of all routes programmed by the router
NAME in the regional routing table REGION, filtered by the VPC network associated to
the router
• Best routes for router (Result -> bestRoutesForRouter): List of all routes the
router NAME has learned from routers in other regions
• Advertised routes (Result -> bgpPeerStatus -> advertisedRoutes): List of all
advertised routes by the router NAME
129
Chapter 3 ■ Implementing Virtual Private Cloud Instances
With this in mind, you can set the base priority for advertised routes with the gcloud compute routers
update-bgp-peer command (Figures 3-86 and 3-87).
At a minimum, you need to specify the name of your cloud router NAME and the name of its BGP peer
PEER_NAME and substitute the highlighted ADVERTISED_ROUTE_PRIORITY with the value B we just discussed.
130
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Figure 3-88. Adding another pair of HA VPN gateways between us-central1 subnets
131
Chapter 3 ■ Implementing Virtual Private Cloud Instances
The addition of this second pair of HA VPN gateways to the us-central1 region has caused the
following changes to the regional routing tables:
• There are no more routes with priority 510 (lowest priority in the regional routing
tables).
• Each subnet has exactly two routes with the same priority, that is, 305.
If multiple custom routes exist for the same destination and have the same priority and an active
(IKE (Internet Key Exchange) SA (Security Association) established) next hop tunnel, Google Cloud uses
equal-cost multipath (ECMP) routing to distribute packets among the tunnels.
This balancing method is based on a hash, so all packets from the same flow use the same tunnel as
long as that tunnel is up.
Now, let’s say the us-central1 HA VPN tunnels (tunnel2-c, tunnel2-d) are your preferred path to route
traffic from VMs in subnet-frontend in your-app-connected-vpc (CIDR block 192.168.8.0/27) to VMs in
subnet-backend in your-app-shared-vpc (CIDR block 192.168.1.0/27).
To achieve this, you need to update the base route priority for advertised routes that use the other HA
VPN tunnels, that is, tunnel1-a and tunnel1-b. Remember, higher numbers mean lower priorities. You
can update this value by using the gcloud compute routers update-bgp-peer command as illustrated in
Figure 3-87. In our use case, by setting a value of 10200 (as shown in Figure 3-89), we are sure that any other
path leading to the same destination has a higher priority, including the path that uses tunnel2-c—our
preferred route.
■■Note The advertised route (base) priority B combined with the inter-region cost C is the overall route
priority and is referred to as the Multi-exit Discriminator (MED). MED = B + C
This is because, in the worst possible scenario, an inter-regional route could have a MED of 10199,
obtained as follows:
MED = 200 (lowest regional base priority) + 9999 (highest inter-region cost)
Figure 3-90 illustrates the effect of updating the base priority for advertised routes.
132
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Keep in mind you don’t have to take the advertised base route priority of the tunnel1-a route to this
extreme (10200) in order to make it lower priority than the route that uses tunnel2-c. In fact, any number
greater than 100 would have been sufficient to make the tunnel1-a route a less preferable path than
tunnel2-c. However, since you don’t have control over the inter-region costs (as of the writing of this book,
its range is 201 ≤ C ≤ 9999), it is always a good idea to use a large number.
Another example of a routing policy is the use of network (or instance) tags to limit the VMs the route
will be applied to. This type of routing policy is specific to static routes, because with static routes you
manually create the route. As a result, you have direct access to the --tags flag as explained in the “Static
Routes” section.
Instead, with dynamic routes a cloud router manages the routes for you, and the --tags flag is not
available to limit the VMs that can use the advertised dynamic routes. In fact, the only way to limit the VMs is
by using the --set-advertisement-ranges flag during creation or update of a cloud router. In other words,
the selection of VMs the dynamic routes apply to is by CIDR block only, and not by instance tag.
To learn how a routing policy with tags works, let’s say you are a member of the network administrators
group for your company. You installed Network Address Translation (NAT) software in a VM you created in a
subnet of your Google Cloud shared VPC.
■■Note From Wikipedia, a Network Address Translation (NAT) is a technique of modifying the network address
information in the IP packet headers while transferring the packet across a traffic routing device; such a
technique remaps a given address space into another address space. This allows multiple computers to share
a single public IP address, which has become necessary because there are not enough IP addresses for every
computer in the world.
133
Chapter 3 ■ Implementing Virtual Private Cloud Instances
You now have your NAT gateway ready for use. The next step is to create a static route to allow VMs in
your company VPC networks access to the Internet so they can download OS and runtime updates. The
question is, which VMs should use the NAT? Do you want any VMs, or do you want to use a more restrictive
policy to only allow servers? Typically, enterprises control access to the Internet with egress policies
targeting a specific set of users, and that’s where tags come in handy.
The command in Figure 3-91 creates a static route nat-vm-internet-route in our shared VPC your-
app-shared-vpc whose destination is the Internet (0.0.0.0/0) via our NAT gateway as the next hop. Notice
that since we use a next hop instance by name (--next-hop-instance=nat-vm), we must specify a zone
using the --next-hop-instance-zone=us-east1-a flag. Last, to select only VMs that host server software, we
include the flag --tags=server.
Notice that the command created our static route even though the VM was not created yet. Also, since
we did not specify a priority, the default route priority (1000) was chosen.
Finally, to let existing VMs in your-app-shared-vpc use this newly created route, all you have to do is to
add the server tag as indicated in Figure 3-92.
As you can see, this time the gcloud CLI reported an error because it expected an existing VM in order
to add to it the server tag.
Instead, for new VMs we should have used the gcloud compute instances create command with
the --tags=server flag.
It is always good practice to clean up after you’re done. To delete the newly created static route, use the
gcloud compute routes delete command as indicated in Figure 3-93.
134
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Figure 3-94. NAT implementation with internal load balancer as a next hop
There are a few design constraints you need to know when using an ILB as a next hop:
• Distinct network interface controllers (NICs) attached to the same VM cannot be in
the same VPC.
• The VMs comprising the backend instances of your ILB must route traffic to a
different VPC from the one where your ILB has been created. Put differently, your
ILB cannot be used to route traffic between subnets in the same VPC. This is because
subnet routes cannot be overridden.
• The ILB acts as a TCP/UDP pass-through load balancer, that is, it does not alter the
packets’ source and destination.
135
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Figure 3-95. Hub and spoke with internal load balancer as a next hop
The network vpc1 is a hub network, which exposes shared capabilities to a number of consumers—
or spokes.
The consumer networks vpc3 and vpc4 are peered to the hub and use these capabilities to fulfill the
business and technical requirements for their workloads.
To allow for scalability, we created an ILB in the hub network, which will distribute incoming TCP/
UDP traffic to the interface nic0 of four backends nva-be1, nva-be2, nva-be3, and nva-be4. Each of these
backends is a network virtual appliance, which allows IP forwarding and uses another interface nic1 to
route traffic to vpc2 following packet inspection.
136
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Assuming firewall rules allow ingress/egress traffic through the ICMP protocol, since vpc1 and vpc3 are
peered, vm1 and vm2 can ping vm5 and vice versa.
Likewise, since vpc1 and vpc4 are peered, vm3 and vm4 can also ping vm5 and vice versa.
This connectivity vpc1->vpc5, vpc2->vpc5, vpc3->vpc5, vpc4->vpc5 is denoted using dotted lines
and is automatically provided to you when the peering relations vpc1-vpc3 and vpc1-vpc4 are established.
Also, since none of the VMs have an external IP, all traffic remains in RFC 1918 space and doesn’t
traverse the Internet.
For packets egressing the VMs in the spokes to reach VMs (vm6) in vpc2, you need to configure your
peering relations in the hub network—vpc1-vpc3 and vpc1-vpc4—so that custom routes from the hub can
be exported into the spokes.
You also need to allow the peering relation in the spokes—vpc3-vpc1 and vpc4-vpc1—to accept custom
routes from the hub.
The diagram shows the gcloud commands you need to know in order to create peering relations with
the ability to export or import custom routes.
In our example, the hub VPC has a custom static route goto-ilb, which routes traffic to an internal
TCP/UDP load balancer, configured to distribute incoming traffic to four multi-NIC NVAs.
The top of the diagram also shows the creation of two hub peerings:
• vpc1-vpc3, which is configured to export custom routes—including goto-ilb
• vpc3-vpc1, which is configured to import custom routes—including goto-ilb
As a result, packets sent from VMs in the spoke vpc3, whose destination is the internal IP address of
vm6, that is, 192.168.2.34, can be properly routed to their destination—provided the packet inspection
performed by one of the four backend NVAs is successful.
The exact same setup can be performed for all other spokes, for example, vpc4, thereby allowing vm3
and vm4 to send packets to vm6.
Without explicitly configuring custom route export from the hub (or provider) network and custom
route import from the spoke (or consumer) network, no packets sent from any VM in the spokes would have
reached VMs in vpc2.
137
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Provided firewall rules and network policies allow traffic (we’ll review these topics in the upcoming
sections), since the cluster is VPC-native each pod can route traffic to and from any other pod in the cluster
by leveraging subnet-a’s alias IP range.
■■Exam tip The inter-pod routes denoted in blue come free with VPC-native clusters and don’t count against
the project route quota.
Now that you learned what VPN-native clusters are and why you should use them, you may wonder how
do you create one. The “secret” to create a VPC-native cluster is to leverage the --enable-ip-alias flag in
the gcloud container clusters create command, as described in Figure 3-97.
138
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Keep in mind that when you use the --enable-ip-alias flag to create your VPC-native GKE cluster, you
have the option to select your own CIDR ranges for the pods and services, or you can let Google do the job
for you.
The former method is called user-managed alias IP assignment, and it requires that you create a subnet
before creating your cluster.
The latter method is called GKE-managed alias IP assignment, and it allows you to let GKE create the
subnet and the cluster simultaneously. This can be achieved by using the --create-subnetwork flag in the
gcloud container clusters create command.
To summarize, with GKE and alias IP ranges, you get
• Central IP management for GKE: You can centrally manage your cluster RFC 1918
IP addresses because your node IP addresses, your pod IP addresses, and your
service IP addresses are allocated respectively from the primary and the first and the
second secondary ranges of your subnet.
• Native networking capabilities: You can scale your GKE cluster without worrying
about exceeding your project route quota. This is because alias IP ranges provide
native support for routing.
• Pod direct connectivity: You can create firewall rules that target pod IP addresses
instead of node IP addresses. Also, pod connectivity is not limited to Google Cloud
workloads. Your firewall rules can use source IP ranges located in your on-premises
data center and targets defined by your cluster pod IP ranges.
■■Exam tip Once your route-based GKE cluster has been created, if you change your mind and decide to
make it VPC-native, you cannot update the cluster. The --enable-ip-alias flag is only available at creation
time. As a result, before spinning up your clusters, make sure you have a good understanding of the business
and the technical requirements for your apps. Do you need a route-based or a VPC-native cluster? Will the
cluster live in a standalone or a shared VPC network?
Second—as you will see in the upcoming example—a principal with the network admin role in the
host project must create the subnet where your cluster will live as well as its secondary IP ranges for its pods
and services. As you learned in the “Shared VPC Deep Dive” section, the service admin who will create the
cluster must have subnet-level IAM permissions to create the cluster in the subnet being shared.
139
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Third, each service project’s GKE service account (generated by enabling the container API) must be
granted the Host Service Agent User role in the host project.
To summarize, when you deploy a GKE cluster in a shared VPC, the cluster must be VPC-native, its
alias IP assignment must be user-managed, and a special role needs to be granted in the host project to the
each service project’s GKE service account.
Let’s now consolidate these concepts with a real example.
Figure 3-98. Enabling the container API to service and host projects
As a result, a new service account—specific to GKE—is created in each project. For our example,
we need to know the name of the GKE service account in each service project. This is because the GKE
service account in each service project will need subnet-level permissions to create the GKE cluster in the
shared subnet.
In order to get the name, we need to find out what the project number is (Figure 3-99).
140
Chapter 3 ■ Implementing Virtual Private Cloud Instances
You will see in the next section how the project number relates to the GKE service account name.
■■Note Each resource in Google Cloud has an IAM policy (also referred to as an IAM allow policy) attached
to it, which clearly states who can access the resource and in what capacity. The core part of the policy is a
list of bindings, which are nothing but a collection of members-role pairs. The “who” part is expressed in the
members value of the binding by a list of principals, which can be users (prefixed by user:), service accounts
(prefixed by serviceAccount:), Google groups (prefixed by group:), or even your entire organization’s
domain (prefixed by domain:). The “in what capacity” part is expressed in the role value of the binding,
which is exactly one Google Cloud role per list of principals. If you need to add more than one role to the same
list of principals, another binding is required. Think of an IAM policy as a private property sign “No trespassing:
authorized personnel only.” The sign is attached to any room of a private property, not just the external
perimeter. The rules of inheritance apply to this metaphor as well, in that if the property is, for example, a
military facility, where classified data is stored, an individual who wants to read the classified data will need
access to the military facility first and also access to the room where the classified data is stored, that is, the
union of roles to access the facility and access the room. A regular visitor won’t be allowed access to the room.
If you remember in the previous shared VPC deep dive example, we allowed the principal joseph@
dariokart.com to access the subnet-frontend with a role roles/compute.networkUser, which granted him
a permission (among others in the role) to create VMs in its own subnet.
Notice the syntax of the policy file in JSON format with the sections I described in the previous note,
including bindings, members, and role (Figure 3-101).
141
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Now we let the project built-in service accounts and the GKE service account do the same
(Figure 3-102).
Notice the project built-in service accounts and the GKE service account emails include the project
number we determined in the previous section.
Now we can use the gcloud beta compute networks subnets set-iam-policy command to enforce
this IAM allow policy to the subnet (Figure 3-103).
142
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Let’s repeat the same steps for subnet-backend in us-central1 (Figures 3-104 to 3-107).
143
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Finally, let’s enforce the newly updated IAM policy to the subnet subnet-backend.
144
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Figure 3-108. Updating host project IAM policy with binding for subnet-frontend
Figure 3-109. Updating host project IAM policy with binding for subnet-backend
145
Chapter 3 ■ Implementing Virtual Private Cloud Instances
First, the primary IP range 192.168.0.0/26 will be used for cluster worker nodes. The /26 mask
indicates there are 2(32 − 26) = 26 = 64 IP addresses available—actually, the effective number of available IP
addresses is 60 because the first two and the last two IP addresses in the range are reserved by Google, as we
learned in the section “Creating Subnets.”
Notice the command is similar to the one we used in the shared VPC deep dive example at the
beginning of this chapter, with the difference that the second keyword in gcloud is container instead of
compute.
As highlighted in blue, the IP ranges for the pods and services look good as per our specification.
Let’s repeat the validation for the backend-devs service project (Figure 3-111).
146
Chapter 3 ■ Implementing Virtual Private Cloud Instances
As highlighted in green, the IP ranges for the pods and services look good as per our specification.
At this point, all the preliminary steps to deploy our clusters in our shared VPC have been completed.
The command is quite long. Let’s find out why the command failed.
147
Chapter 3 ■ Implementing Virtual Private Cloud Instances
■■Exam tip The exam has a number of questions, which include lengthy gcloud commands and often
complex scenarios. Don’t let this discourage you! Throughout this book, you’ll learn how to quickly scan a long
gcloud command and select only the relevant keywords—the relevant flags in this case—which will help you
choose the correct answer.
With the default settings for standard clusters, GKE requires a /24 mask (256 IP addresses allocated for
pods) for each node and three worker nodes per cluster. See the official Google Cloud documentation here.
Our subnet-frontend has allocated a /24 mask for pods (pod-cidr-frontend in Figure 3-110), that is,
256 IP addresses.
However, in the gcloud command, we have requested the number of nodes to be two without specifying
a maximum number of pods per node, that is, the default /24 mask will be used for each of the two nodes.
As a result, the overall CIDR range for pods should have been at least 192.168.13.0/23 instead of
the existing 192.168.13.0/24. This miscalculation has caused the cluster creation to fail due to IP space
exhaustion in the pod range.
Now you know how to fix the error! As you correctly guessed, let’s try again, but this time by forcing
a limit on the maximum number of pods per node. Instead of the default value of 256 (actually, to reduce
address reuse, GKE uses 110 as the default maximum number of pods per node), let’s use a maximum of 10
pods per node, as shown in Figure 3-113.
148
Chapter 3 ■ Implementing Virtual Private Cloud Instances
The deployment will take a few minutes. Upon completion, make sure to review the summary of the
deployment, which indicates, among others, the cluster name, the location, the master IP address (the
control plane endpoint), the worker node’s machine type, the number of worker nodes, and the status.
Let’s make sure the worker nodes use the correct CIDR range 192.168.0.0/26 (Figure 3-114).
Let’s make sure the worker nodes use the correct CIDR range 192.168.1.0/26 (Figure 3-116).
149
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Figure 3-117 illustrates the resources we created so far and how they relate to each other from network,
security, and cost points of view.
150
Chapter 3 ■ Implementing Virtual Private Cloud Instances
From a networking standpoint, the frontend-cluster and backend-cluster have built-in routes,
which allow them to send and receive traffic to and from each other—provided firewall rules allow, as we
will discuss in the next section. Green routes are defined at the worker node level (VMs), whereas blue
routes are defined at the pod level. Notice the blue routes all collapse in the Alias IP Integrated Routing
blue octagon, which resides in the first secondary IP range of each subnet. Also, for the sake of simplicity,
we did not visualize the clusters’ services VPC-native routing, which are implemented by the second
secondary IP range for each cluster, that is, 192.168.130.0/25 for frontend-cluster (Figure 3-110) and
192.168.150.0/25 for backend-cluster (Figure 3-111).
151
Chapter 3 ■ Implementing Virtual Private Cloud Instances
■■Note Since we did not create the clusters using the --enable-private-nodes flag, by default the
clusters are public, that is, each worker node has an external IP (denoted in pink) in addition to its internal IP
(denoted in green).
From a security standpoint, the frontend developer [email protected] has the editor role in his
own project frontend-devs and can only use subnet-frontend (Figure 3-110) for any container-related
compute resources.
As an editor in the project frontend-devs, Joseph can create resources in his project, whose
consumption is paid by the billing account linked to frontend-devs.
Let’s focus on GKE now. Since GKE is a managed service, when Joseph requests the creation of a
GKE cluster in frontend-devs, Google uses the frontend-cluster service account (created when we
enabled the container API in the project), that is, [email protected].
gserviceaccount.com, to create its underlying infrastructure. Since this service account has the role roles/
compute.networkUser bound to it at the subnet subnet-frontend level, the GKE cluster will be allowed to
use subnet-frontend primary and secondary IP ranges.
As a result, frontend-cluster—along with its underlying compute infrastructure, for example, VMs for
the worker nodes, pods, services, and others—is owned by the project frontend-devs, but its “territory” or
more formally its scope of concern is subnet-frontend, which is located in the us-east1 region, along with
its primary and secondary ranges.
Symmetrically, backend-cluster—along with its underlying compute infrastructure, for example, VMs
for the worker nodes, pods, services, and others—is owned by the project backend-devs, but its “territory” is
subnet-backend, which is located in the us-central1 region, along with its primary and secondary ranges.
Last, from a billing standpoint, the charges incurred by the frontend-cluster usage are paid by the
billing account linked to the project frontend-devs, which is denoted in blue in Figure 3-117.
Likewise, the charges incurred by the backend-cluster usage are paid by the billing account linked to
the project backend-devs, which is denoted in green in Figure 3-117.
This network design promotes separation of concerns from network, security, and cost standpoints
while delivering the necessary native connectivity between workloads running in containerized
applications.
Testing Connectivity
In this example, we will use the Secure Shell (SSH) protocol to connect into one of the two frontend-
cluster worker nodes, and we will perform a few connectivity tests.
Even though routing connectivity is established, packet transmission cannot succeed until firewall rules
allow traffic from origin to destination.
With our travel metaphor, there may be routes connecting location A with location B, but if a checkpoint
in the middle won’t allow traffic, you won’t make it to location B.
In this example, the VPC your-app-shared-vpc was already configured to allow ingress SSH, TCP, and
ICMP (Internet Control Message Protocol) traffic as illustrated in Figure 3-29.
Nevertheless, let’s list all firewall rules for the VPC and double-check (Figure 3-118).
152
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Notice that behind the scenes GKE—as a fully managed Kubernetes engine—has already created
ingress firewall rules to allow traffic into the VPC using a predefined group of protocols and ports.
Let’s connect via SSH to one of the two frontend-cluster worker nodes (Figure 3-119).
153
Chapter 3 ■ Implementing Virtual Private Cloud Instances
154
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Let’s test connectivity to another worker node in the same cluster (Figure 3-121).
155
Chapter 3 ■ Implementing Virtual Private Cloud Instances
This connectivity test is important because it’s telling us that components built in subnet-frontend
(located in zone us-east1-c) can use containerized apps built in subnet-backend (located in zone
us-central1-c).
When I say “can use” I mean they can natively consume containerized apps in subnet-backend. There
is no intermediary between the two, which is good because latency is reduced and risks of packet losses or
even data exfiltration are minimized.
This native connectivity is the product of Shared VPC and VPC-native clusters.
Now, let’s test connectivity to another pod in backend-cluster (Figure 3-124).
156
Chapter 3 ■ Implementing Virtual Private Cloud Instances
This time, the ping failed, why? The reason it failed is because we told GKE to create a cluster with a
maximum of ten pods per node and an initial size of two nodes (see Figure 3-115). The available IP space
we set up the backend-cluster with was /24 (see pod-cidr-backend in Figure 3-111) for a total of 256 IP
addresses.
As a result, only 20 out of 256 IP addresses in the 192.168.15.0/24 CIDR range are being used.
157
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Figure 3-127. Cloning the GKE sample app repo from GitHub
158
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Enabling the container API is the first prerequisite to create and use a GKE cluster. Without this action,
you won’t be able to create your GKE cluster.
■■Note If you performed the cluster with Shared VPC deep dive example, the container API has already been
enabled for the project frontend-devs, so you won’t need to perform this step. Also, in this example, we will
use the default VPC, and not your-app-shared-vpc. The default VPC is an auto-mode network that has
one subnet in each region. On the other hand, our shared VPC is a custom-mode network with only two subnets
subnet-frontend in us-east1 and subnet-backend in us-central1.
Next, let’s create our GKE cluster (Figure 3-129). Notice that a region or a zone must be specified to
denote respectively a regional or a zonal cluster.
Regional clusters have multiple control planes across multiple compute zones in a region, while zonal
clusters have one control plane in a single compute zone.
159
Chapter 3 ■ Implementing Virtual Private Cloud Instances
The network policy in the first use case is intended to limit incoming traffic to only requests originating
from another containerized application labeled app=foo. Let’s review how this rule is expressed in the form
of policy-as-code by viewing the hello-allow-from-foo.yaml file (Figure 3-131).
The network policy is a YAML file (Figure 3-132), and there are a few things to mention:
• The network policy is a Kubernetes API resource: kind: NetworkPolicy
• The network policy is scoped at the entire GKE cluster level.
• The network policy type is ingress. Other available types are egress, or both.
• The network policy subject entity is a pod. Other entities can be namespace(s) or
CIDR block(s).
• Since the entity this ingress network policy applies to is a pod, a pod selector
construct is included in the spec section to match the desired pod, that is, a pod
whose label’s key app matches the value hello.
• Likewise, since the target of this ingress policy is another pod, a pod selector is also
included in the ingress definition to match the desired pod, that is, a pod whose
label’s key app matches the value foo.
160
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Let’s now apply this ingress network policy to the cluster (Figure 3-133).
161
Chapter 3 ■ Implementing Virtual Private Cloud Instances
With the ingress network policy in effect, the former request succeeded, whereas the latter timed out.
We will use this containerized web app labeled app=hello-2 to validate the network policy in the
second use case, which is intended to limit outgoing traffic originating from the app labeled app=foo to only
the following two destinations:
• Pods in the same namespace with the label app=hello
• Cluster pods or external endpoints on port 53 (UDP and TCP)
Let’s view the manifest for this egress network policy (Figures 3-136 and 3-137).
162
Chapter 3 ■ Implementing Virtual Private Cloud Instances
The manifest declares the policy type as egress, and it applies to pods in the test cluster whose label’s
key app matches the value foo.
Notice the egress section denotes two targets, that is, pods whose label’s key app matches the value
hello and the set of protocol-ports pairs {(TCP, 53), (UDP,53)}.
Let’s now apply this egress network policy to the cluster (Figure 3-138).
163
Chapter 3 ■ Implementing Virtual Private Cloud Instances
With the egress network policy in effect, the only allowed outbound connection from the containerized
app labeled app=foo is the endpoint http://hello-web:8080. This is because this endpoint is exposed by a
service that lives in a pod labeled app=hello, which is an allowed connection as shown in the egress section
of Figure 3-137.
Since there is no entry denoted by app=hello-2 in the egress section of the manifest YAML file, the
containerized app labeled app=foo is not allowed to connect to the endpoint http://hello-web-2:8080.
Deleting the Cluster
As a good practice to avoid incurring charges, let’s now delete our test GKE cluster, which is still up and
running along with its three e2-medium nodes all located in us-east1-c (Figure 3-140).
We will use the gcloud container clusters delete command as indicated in Figure 3-141. Notice the
zone is required because our GKE cluster test is zonal.
164
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Additional Guidelines
There are a couple of things to remember about cluster network policies, which are important for the exam.
First, in a cluster network policy, you define only the allowed connections between pods. There is no
action allowed/denied like in firewall rules (you will learn more about firewall rules in the following section).
If you need to enable connectivity between pods, you must explicitly define it in a network policy.
Second, if you don’t specify any network policies in your GKE cluster namespace, the default behavior is
to allow any connection (ingress and egress) among all pods in the same namespace.
In other words, all pods in your cluster that have the same namespace can communicate with each
other by default.
Let’s view a couple of examples on how to override this behavior.
The extreme opposite of the default behavior is by isolating all pods in a namespace from ingress and
egress traffic. The network policy in Figure 3-143 shows you how to achieve just that.
If you want to deny only ingress traffic to your pods, but allow egress traffic, just remove the Egress item
from the list in the policyTypes node as shown in Figure 3-144.
165
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Conversely, if you want to allow incoming traffic for all your pods, just add the ingress node and
indicate all pods {} as the first item in the list as illustrated in Figure 3-145.
166
Chapter 3 ■ Implementing Virtual Private Cloud Instances
The control plane is the single pane of glass for your GKE cluster, and the Kubernetes API server is the
hub for all interactions with the cluster. The control plane exposes a public endpoint by default and a private
endpoint. Both endpoints can be directly accessed via the HTTPS or gRPC (gRPC Remote Procedure Call)
protocols.
The VPC network where the control plane Google-managed VMs live is peered with the VPC where your
cluster worker nodes live. The worker nodes and the control plane VMs interact with each other using the
Kubernetes API server.
In the previous example (Figure 3-117), each worker node had an internal IP address denoted in green
and an external IP address denoted in pink.
You learned that the internal IP address is used to route RFC 1918 traffic between VMs in the same VPC
network. It doesn’t matter if the two VMs are located in the same subnet or in different subnets. The only two
constraints for the traffic to flow are that the subnets are part of the same VPC network and the VPC firewall
rules allow traffic to flow between the two VMs.
Conversely, the external IP address of a VM is used to route traffic to the Internet or to Google public
APIs and services.
What we just described is the default GKE cluster configuration as it relates to worker node IP
addressing.
By default, GKE clusters are created with external IP addresses for master (control plane)
and worker nodes.
There are scenarios where your workload security requirements constrain your design to exclude
external IP addresses for any of your cluster’s worker nodes. When you use this network design, the cluster
is called a private cluster because all worker nodes are isolated from the Internet, that is, zero worker nodes
have an external IP address.
This network design can be accomplished by creating your cluster with the --enable-private-
nodes flag.
■■Exam tip Even though a cluster is private—that is, none of its worker nodes use an external IP address—
its control plane can still be accessed from the Internet. If you want to completely isolate your private cluster’s
control plane from the Internet, you must use the --enable-private-endpoint flag at cluster creation,
which essentially tells GKE that the cluster is managed exclusively using the private IP address of the master
API endpoint. This is the most secure configuration for GKE clusters in that none of its worker nodes have
external IP addresses, and the control plane public endpoint has been disabled. See Figure 2-52 in Chapter 2.
The --enable-private-nodes flag requires that you also specify an RFC 1918 CIDR block for your
cluster control plane (also known as master) nodes unless your cluster is Autopilot (see more info here).
This makes sense because you don’t want any of your cluster nodes—whether they be master nodes (i.e.,
Google-managed VMs in a google-owned-project) or worker nodes (i.e., Google-managed VMs in
your-project)—to have an external IP address. You already specified the CIDR block for the worker nodes
when you “told” gcloud which subnet in your-project to use, and now you need to “tell” gcloud where your
master nodes will live. How do you do that?
You accomplish this by setting the --master-ipv4-cidr flag to a CIDR block with a /28 mask.
167
Chapter 3 ■ Implementing Virtual Private Cloud Instances
If your cluster has been created with the --enable-private-endpoint flag, then the only way to
access the control plane is by using its private endpoint. This can be achieved from an on-premises client
connected with VPN or Interconnect (Dedicated or Partner), provided the client NIC is associated with an IP
address in an authorized CIDR block.
168
Chapter 3 ■ Implementing Virtual Private Cloud Instances
If the cluster has been created without the --enable-private-endpoint flag, then the control plane
can be accessed by using its public endpoint from on-premises authorized clients connected to the Internet
or by using its private endpoint either from requestors in the VPC where the worker nodes live or from on-
premises authorized client connected with VPN or Interconnect.
Now, how do you tell GKE which CIDR block is authorized to access the cluster control plane?
This can be achieved by using the --enable-master-authorized-networks flag when you create
your cluster. This flag requires that you specify the list of CIDR blocks by using the --master-authorized-
networks flag. You can specify up to 100 CIDR blocks for private clusters and up to 50 CIDR blocks for public
clusters.
There are several flags in the command gcloud container clusters create you need to know for
the exam. Each flag has a specific purpose, and a proper use of a single or a combination of them allows you
to define the optimal configuration for your GKE cluster, in accordance with your workload security and
network requirements.
Figure 3-147 summarizes what we just learned in the form of a decision tree.
169
Chapter 3 ■ Implementing Virtual Private Cloud Instances
170
Chapter 3 ■ Implementing Virtual Private Cloud Instances
The top-right box indicates the least restrictive access to your GKE cluster’s master endpoint, that is, a
public cluster with no restrictions to its public endpoint.
The bottom-right box indicates the most restrictive—and recommended for workloads managing
sensitive data—access to your GKE cluster’s master endpoint, that is, a private cluster with public endpoint
disabled.
Going from the top-right box down, more access restrictions are added, resulting in a reduction of the
attack surface of your cluster’s control plane.
Last, your cluster’s control plane private endpoint is exposed using an internal network TCP/UDP load
balancer (ILB) in the google-owned-vpc. By default, clients consuming the Kubernetes API with the private
endpoint must be located in the same region where the ILB lives (the ILB is a regional resource). These
can be VMs hosted in Google Cloud or on-premises VM connected through Cloud VPN tunnels or VLAN
attachments.
By adding the --enable-master-global-access flag, you are allowing clients in any region to consume
your cluster Kubernetes API using its private endpoint.
171
Chapter 3 ■ Implementing Virtual Private Cloud Instances
The first firewall rule allows only incoming traffic over the TCP protocol and port 443 targeting the VMs
denoted by the web-server network tag.
The second firewall rule denies incoming traffic over the TCP protocol and port 5432 targeting the VMs
denoted by the db-server network tag.
■■Note The Source CIDR blocks in Figure 3-148 refer to Google Front Ends (GFEs), which are located in the
Google Edge Network and are meant to protect your workload infrastructure from DDoS (Distributed Denial-of-
Service) attacks. You will learn more about GFEs in Chapter 5.
172
Chapter 3 ■ Implementing Virtual Private Cloud Instances
■■Exam tip Service accounts and network tags are mutually exclusive and can’t be combined in the
same firewall rule. However, they are often used in complementary rules to reduce the attack surface of your
workloads.
The target of a firewall rule indicates a group of VMs in your VPC network, which are selected by
network tags or by associated service accounts. The definition of a target varies based on the rule direction,
that is, ingress or egress.
If the direction is ingress, the target of your firewall rule denotes a group of destination VMs in your
VPC, whose traffic from a specified source outside of your VPC is allowed or denied. For this reason, ingress
firewall rules cannot use the destination parameter.
Conversely, if the direction is egress, the target of your firewall rule denotes a group of source VMs in
your VPC, whose traffic to a specified destination outside of your VPC is allowed or denied. For this reason,
egress firewall rules cannot use the source parameter.
Let’s review the syntax to create a firewall rule.
Use the parameters as follows. More details about each parameter are available in the SDK reference
documentation.
• --network: The network where the rule will be created. If omitted, the rule will be
created in the default network. If you don’t have a default network or want to create
the rule in a specific network, you must use this field.
173
Chapter 3 ■ Implementing Virtual Private Cloud Instances
174
Chapter 3 ■ Implementing Virtual Private Cloud Instances
• By default, firewall rules are created and enforced automatically; however, you can
change this behavior.
• If both --disabled and --no-disabled are omitted, the firewall rule is created
and enforced.
• --disabled: Add this flag to create the firewall rule but not enforce it. The firewall
rule will remain in a disabled state until you update the firewall rule to enable it.
• --no-disabled: Add this flag to ensure the firewall rule is enforced.
• You can enable Firewall Rules Logging for a rule when you create or update it.
Firewall Rules Logging allows you to audit, verify, and analyze the effects of your
firewall rules. Firewall Rules Logging will be reviewed in detail in Chapter 8.
■■Exam tip You cannot change the direction (i.e., ingress, egress) of an existing firewall rule. For example, an
existing ingress firewall rule cannot be updated to become an egress rule. You have to create a new rule with
the correct parameters, then delete the old one. Similarly, you cannot change the action (i.e., deny, allow) of an
existing firewall rule.
Priority
The firewall rule priority is an integer from 0 to 65535, inclusive. Lower integers indicate higher priorities. If
you do not specify a priority when creating a rule, it is assigned a default priority of 1000.
The relative priority of a firewall rule determines if the rule is applicable when evaluated against others.
The evaluation logic works as follows:
A rule with a deny action overrides another with an allow action only if the two rules have the same
priority. Using relative priorities, it is possible to build allow rules that override deny rules, and vice versa.
E xample
Consider the following example where two firewall rules exist:
• An ingress rule from sources 0.0.0.0/0 (anywhere) applicable to all targets, all
protocols, and all ports, having a deny action and a priority of 1000
• An ingress rule from sources 0.0.0.0/0 (anywhere) applicable to specific targets with
the network tag webserver, for traffic on TCP 80, with an allow action
The priority of the second rule determines whether TCP traffic on port 80 is allowed for the webserver
network targets:
• If the priority of the second rule > 1000, it will have a lower priority, so the first rule
denying all traffic will apply.
• If the priority of the second rule = 1000, the two rules will have identical priorities, so
the first rule denying all traffic will apply.
• If the priority of the second rule < 1000, it will have a higher priority, thus allowing
traffic on TCP 80 for the webserver targets. Absent other rules, the first rule would
still deny other types of traffic to the webserver targets, and it would also deny all
traffic, including TCP 80, to instances without the webserver network tag.
175
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Protocols and Ports
You can narrow the scope of a firewall rule by specifying protocols or protocols and ports. You can specify a
protocol or a combination of protocols and their ports. If you omit both protocols and ports, the firewall rule
is applicable for all traffic on any protocol and any port. Table 3-1 shows examples of how protocols and ports
can be combined when creating or updating a firewall rule. These combinations of protocols and ports can be
assigned to the --allow or the --rules flags in the aforementioned gcloud compute firewall-rules create
command (Figure 3-149).
Direction
The direction of a firewall rule can be either ingress or egress. The direction is always defined from the
perspective of your VPC.
• The ingress direction describes traffic sent from a source to your VPC. Ingress rules
apply to packets for new sessions where the destination of the packet is the target.
• The egress direction describes traffic sent from your VPC to a destination. Egress
rules apply to packets for new sessions where the source of the packet is the target.
• If you omit a direction, GCP uses ingress as default.
E xample
Consider an example connection between two VMs in the same network. Traffic from VM1 to VM2 can be
controlled using either of these firewall rules:
• An ingress rule with a target of VM2 and a source of VM1
• An egress rule with a target of VM1 and a destination of VM2
176
Chapter 3 ■ Implementing Virtual Private Cloud Instances
177
Chapter 3 ■ Implementing Virtual Private Cloud Instances
Additionally, we included a few exam tips, which come in handy when trying to remember the
parameters required by ingress (default) or egress firewall rules using gcloud.
■■Exam tip For the exam, you will need to remember that destination ranges are not valid parameters
for ingress firewall rules. Likewise, source ranges are not valid parameters for egress rules. A good way to
remember this is by memorizing the timezone acronyms IST and EDT, respectively, for ingress rules and egress
rules: in the former scenario (Ingress direction), you use Source and Target parameters, whereas in the latter
(Egress direction), you use Destination and Target parameters only.
E xam Questions
Question 3.1 (Routing)
You need to configure a static route as a backup to an existing static route. You want to ensure that the new
route is only used when the existing route is no longer available. What should you do?
A.
Create a network tag with a value of backup for the new static route.
B.
Set a lower priority value for the new static route than the existing static route.
C.
Set a higher priority value for the new static route than the existing static route.
D.
Configure the same priority value for the new static route as the existing
static route.
R
ationale
A is not correct because a route with a specific network tag is only applied to
instances with the tag value.
B is not correct because static route priority uses lower values to indicate a higher
priority.
C is CORRECT because the higher value will make the route take effect only
when the lower value route is not available.
D is not correct because if you use the same value, GCP uses equal-cost
multipath routing and would use the new route some of the time when the
existing static route is still available.
178
Chapter 3 ■ Implementing Virtual Private Cloud Instances
A.
Block all traffic from source tag “web.”
B.
Allow traffic from source tag “app” to port 80 only.
C.
Allow all traffic from source tag “app” to target tag “db.”
D.
Allow ingress traffic from 0.0.0.0/0 on port 80 and 443 for target tag “web.”
E.
Allow ingress traffic using source filter = IP ranges where source IP ranges =
10.10.10.0/24.
R
ationale
A is not correct because web VM still needs to talk to app VM.
B is not correct because this is not required as per the requirements.
C is CORRECT because this rule will allow traffic from app VM to DB VM.
D is CORRECT because this will allow outside users to send request to web VM.
E is not correct because this will provide web VM access to DB VM.
179
Chapter 3 ■ Implementing Virtual Private Cloud Instances
A.
Enable VPC Flow Logs for the Test subnet also.
B.
Make sure that there is a valid entry in the route table.
C.
Add a firewall rule to allow traffic from the Test subnet to the Web subnet.
D.
Create a subnet in another VPC, and move the web servers in the new subnet.
R
ationale
A is not correct because enabling the flow logs in subnet “Test” will still not
provide any data as the traffic is being blocked by the firewall rule.
B is not correct because subnets are part of the same VPC and do not need
routing configured. The traffic is being blocked by the firewall rule.
C is CORRECT because the traffic is being blocked by the firewall rule. Once
configured, the request will reach to the VM and the flow will be logged in the
stackdriver.
D is not correct because the traffic is being blocked by the firewall rule and not
due to subnet being in the same VPC.
Rationale
A is incorrect because the firewall denies traffic if both the permit and deny have
the same priority regardless of rule order.
B is CORRECT because the firewall will allow traffic to pass with the proper
allow ingress rule with a priority lower than the default value of 1000.
C is incorrect because the scenario described does not apply to egress traffic. By
design, the firewall is stateful, and if the tunnel exists, traffic will pass.
D is incorrect because the scenario described does not apply to egress traffic.
By design, the firewall is stateful, and if the tunnel exists, traffic will pass and
the priority value is set higher than the default, meaning the rule would not be
considered.
180
CHAPTER 4
In computer engineering, networking and security go hand in hand. There is no way to effectively design a
network architecture without considering security. The opposite holds true as well.
In the cloud, this “symbiotic” relationship between networking and security is even more important
than in traditional data centers.
So far, you were introduced to firewall rules as a means to secure the perimeter of your VPCs.
VPC service controls were also briefly discussed in Chapter 2 as an approach to prevent exfiltration of
sensitive data.
In this chapter, we will deep dive into VPC Service Controls, and you will understand how this GCP
service is a lot more than a product to prevent data exfiltration. When effectively implemented, VPC Service
Controls will strengthen your enterprise security posture by introducing context-aware safeguards into your
workloads.
To get started, the concepts of an access policy and an access level will be formally defined,
unambiguously. These are the building blocks of a service perimeter, which is the key GCP resource you use
to protect services used by projects in your organization.
You will then learn how to selectively pick and choose what services consumed by your workloads need
context-aware protection.
We will walk you through these constructs through a deep dive exercise on service perimeters, which
will help you visualize and understand how all these pieces fit together.
You will then learn how to connect service perimeters by using bridges, and you will get familiar with
Cloud Logging to detect perimeter violations and respond to them.
Finally, the different behaviors of a service perimeter will be presented through another deep dive
exercise, which will help you understand the different behavior between an enforced perimeter and a
perimeter in dry-run mode.
What if a malicious user were to compromise this service account and try to access data in
some VM in a subnet of your “firewall-protected” VPC?
Perimeters
So what are the components of a perimeter, and most importantly how does a service perimeter differ from a
network perimeter (i.e., VPC firewall rules)?
The components of a perimeter are
• Resources: These are containers of resources the perimeter needs to protect from
data exfiltration.
• Restricted services: These are the Google API endpoints (e.g., storage,googleapis.
com) whose access is restricted to the resources within the perimeter.
• VPC allowed services: These are the Google API endpoints that can be accessed
from network endpoints within the perimeter.
• Access levels: These are means to classify the context of a request based on device,
geolocation, source CIDR range, and identity.
• Ingress policy: This is a set of rules that allow an API client outside the perimeter to
access resources inside the perimeter.
182
Chapter 4 ■ Implementing Virtual Private Cloud Service Controls
• Egress policy: This is a set of rules that allow an API client inside the perimeter to
access resources outside the perimeter.
Before learning each component, let’s first review how to create a perimeter with the gcloud command.
A perimeter is a GCP resource, which can be created with the gcloud CLI as shown in Figure 4-1.
In addition to its perimeter ID (PERIMETER), you must provide an access policy by assigning the policy
ID to the --policy flag—unless you have already set a default access policy for your project, folder, or your
entire organization. You’ll learn about access policies in the upcoming “Service Perimeter Deep Dive”
section. For the time being, all you need to know is that an access policy is a Google Cloud resource where
you store perimeter components.
The only required flag is the --title, which is a short, human-readable title for the service perimeter.
The relevant, optional flags you need to know for the exam are
• Access levels (--access-levels=[LEVEL, …]): It denotes a comma-separated list of
IDs for access levels (in the same policy) that an intra-perimeter request must satisfy
to be allowed.
• Resources (--resources=[RESOURCE, …]): It’s a list of projects you want to protect
by including them in the perimeter and is denoted as a comma-separated list of
project numbers, in the form projects/<projectnumber>.
• Restricted services (--restricted-services=[SERVICE, …]): It denotes a comma-
separated list of Google API endpoints to which the perimeter boundary does apply
(e.g., storage.googleapis.com).
• VPC allowed services (--vpc-allowed-services=[SERVICE, …]): It requires the
flag --enable-vpc-accessible-services and denotes a comma-separated list of
Google API endpoints accessible from network endpoints within the perimeter. In
order to include all restricted services, use the keyword RESTRICTED-SERVICES.
• Ingress policies (--ingress-policies=YAML_FILE): It denotes a path to a file
containing a list of ingress policies. This file contains a list of YAML-compliant
objects representing ingress policies, as described in the API reference.
• Egress policies (--ingress-policies=YAML_FILE): It denotes a path to a file
containing a list of egress policies. This file contains a list of YAML-compliant objects
representing egress policies, as described in the API reference.
183
Chapter 4 ■ Implementing Virtual Private Cloud Service Controls
Access Levels
An access level is a one-directional form of validation. It only validates ingress requests to access resources
inside the perimeter.
Whether the request originates from the Internet, from your corporate network, or from network
endpoints within the perimeter, an access level performs the validation you specify and determines whether
the access to the requested resource is granted or denied.
The validation is based on your workload security requirements, which include endpoint verification
(i.e., device attributes), identity verification (i.e., principal attributes), geolocation, and dependencies with
other access levels.
When you create an access level, you need to decide whether you need a basic access level or a custom
access level. For most use cases, a basic level of validation suffices, while a few ones require a higher degree
of sophistication.
When you use the gcloud command to create or update an access level, both types (basic and custom)
are expressed in the form of a YAML file, whose path is assigned to the --basic-level-spec or
the --custom-level-spec flag, respectively. The two flags are mutually exclusive.
A basic access level YAML spec file is a list of conditions built using assignments to a combination of
one or more of the five following attributes:
• ipSubnetworks: Validates the IPv4 or IPv6 CIDR block of the requestor. RFC 1918
blocks are not allowed.
• regions: Validates the region(s) of the requestor.
• requiredAccessLevels: Validates whether the request meets the criteria of one
or more dependent access levels, which must be formatted as <accessPolicies/
policy-name/accessLevels/level-name>.
• members: Validates whether the request originated from a specific user or service
account.
• devicePolicy: Requires endpoint verification and validates whether the device of
the requestor meets specific criteria, including
• requireScreenlock: Boolean
• allowedEncryptionStatuses: Predefined list of values
• requireCorpOwned: Boolean
• osConstraints
• osType: Predefined list of values
• minimumVersion: Requires osType
The reference guide to the complete list of basic access level attributes can be found at
https://cloud.google.com/access-context-manager/docs/access-level-attributes#ip-
subnetworks
An example of a YAML file can be found at https://cloud.google.com/access-context-manager/docs/
example-yaml-file. Also, for basic access levels, you need to choose whether all conditions are to be met or
just one. This is done using the --combine-function flag, whose allowed values are AND (default) and OR.
184
Chapter 4 ■ Implementing Virtual Private Cloud Service Controls
For more complex access patterns, use a custom access level. A custom access level YAML spec file
contains a list of Common Expression Language (CEL) expressions formatted as a single key-value pair:
expression: CEL_EXPRESSION.
Similarly to basic access levels, the spec file lets you create expressions based on attributes from the
following four objects:
• origin: Contains attributes related to the origin of the request, for example,
(origin.ip == "203.0.113.24" && origin.region_code in ["US", "IT"])
• request.auth: Contains attributes related to authentication and authorization
aspects of the request, for example, request.auth.principal == "accounts.
google.com/1134924314572461055"
• levels: Contains attributes related to dependencies on other access levels, for
example, level.allow_corporate_ips where allow_corporate_ips is another
access level
• device: Contains attributes related to devices the request originates from, for
example, device.is_corp_owned_device == true
To learn how to build Common Expression Language (CEL) expressions for custom access levels, refer
to the Custom Access Level Specification: https://cloud.google.com/access-context-manager/docs/
custom-access-level-spec.
The synopsis of the gcloud command to create an access level is displayed in Figure 4-2 for your reference.
In addition to LEVEL, that is, the fully qualified identifier for the level, and the access policy POLICY
(required only if you haven’t set a default access policy), you must specify a title for your access level.
As you learned before, the level type flags are mutually exclusive. With basic access level (as noted in
Figure 4-2), you have to decide whether all or at least one condition must be true for the validation to pass or
fail. This can be achieved by setting the --combine-function flag to the value "and" (default) or the value "or".
In the next section, we will put these concepts to work by walking you through a simple example of a
perimeter and an access level. With a real example, all these concepts will make sense, and you’ll be ready to
design perimeters and access levels in Google Cloud like a “pro.”
185
Chapter 4 ■ Implementing Virtual Private Cloud Service Controls
■■Exam tip Do not confuse an IAM allow policy with an access policy. Both constructs use the term “access”
after all—IAM stands for Identity and Access Management. However, an IAM policy, also known as an allow
policy, is strictly related to what identities (or principals) are allowed to do on a given resource, whether it
be a VM, a Pub/Sub topic, a subnet, a project, a folder, or an entire organization. Each of these resources (or
containers of resources) has an IAM policy attached to them. Think of it as a sign that lists only the ones who
are allowed to do something on the resource. The “something” is the list of verbs—permissions—and is
expressed in the form of an IAM role, for example, roles/networkUser or roles/securityAdmin, which
is indeed a set of permissions. Conversely, while access policies are also focused on access, they take into
consideration a lot more than just identity and role bindings. Unlike IAM policies, access policies are applicable
to resource containers only, that is, projects, folders, and organizations (one only), and they are used to enable
conditional access to resources in the container based on the device, request origin (e.g., source CIDR blocks),
request authentication/authorization, and dependencies with other access levels.
In our exercise, for the sake of simplicity we are going to create an access policy, whose scope is the
entire dariokart.com organization, as displayed in Figure 4-4.
186
Chapter 4 ■ Implementing Virtual Private Cloud Service Controls
■■Exam tip The only required flags are the access policy title (--title) and its parent organization
(--organization). You can also create an access policy scoped to a specific folder or a specific project in
your organization. This can be achieved by setting the folder (or project) number as value to the --scopes
flag. For more details, visit https://cloud.google.com/sdk/gcloud/reference/access-context-
manager/policies/create#--scopes.
The name of the access policy is system generated. You only get to choose its title. It’s also a good idea to
set the policy as the default access policy for our organization, as illustrated in Figure 4-5.
Since the access level is basic, the YAML file is a simple list of conditions. The conditions apply to the
attributes of any of these four objects:
• origin, for example, origin.ip, origin.region.code
• request.auth, for example, request.auth.principal, origin.region.code
• levels, for example, levels <access_level_name>
• device, for example, device.os_type, device.encryption_status
The complete list of objects and their attributes can be referenced at https://cloud.google.com/
access-context-manager/docs/custom-access-level-spec#objects.
The syntax of the YAML file uses the Common Expression Language. See https://github.com/google/
cel-spec/blob/master/doc/langdef.md for more details.
187
Chapter 4 ■ Implementing Virtual Private Cloud Service Controls
Our YAML file is very simple. We want to enforce a basic access level stating that only user gianni@
dariokart.com is authorized to perform storage API actions.
The first constraint—that is, only a selected user is authorized to do something—is expressed by the
condition in the YAML file, as shown in Figure 4-7.
The second constraint, that is, preventing any user other than [email protected] from consuming
storage.googleapis.com, will be enforced when we create the perimeter in the next section.
With the YAML file saved, we can now create our access level (Figure 4-8).
Creating a Perimeter
Because the user [email protected] is the shared VPC administrator, it makes sense that he creates the
perimeter for our shared VPC. The perimeter will encompass the two service projects and the host project.
In order for [email protected] to be able to create a perimeter, we are going to grant him the
roles/accesscontextmanager.policyAdmin role at the organization level, as shown in Figure 4-9.
188
Chapter 4 ■ Implementing Virtual Private Cloud Service Controls
■■Note You should always use the least privilege principle when designing the security architecture for your
workloads. However, this exercise is solely intended to explain how access levels and perimeters work together
to enforce access control over the Google Cloud Storage API, and for the sake of simplicity, we haven’t strictly
used the principle.
With all permissions in place, we can finally create the perimeter dariokart_perimeter and associate it
to our newly created access level dariokart_level. Figure 4-10 shows you how to create the perimeter.
Testing the Perimeter
This is the fun part!
Users [email protected] and [email protected] have roles/editor roles in their projects,
backend-devs and frontend-devs, respectively.
As a result, without perimeters and access levels they should be able to modify the state of resources
in their respective project, such as creating new VMs and changing or deleting existing VMs or other GCP
resources.
189
Chapter 4 ■ Implementing Virtual Private Cloud Service Controls
As you can see, the response returned an HTTP status code 403, which clearly explained the reason why
the request failed, namely, the perimeter blocked the request after checking the organization access policy.
The same response is returned after [email protected] attempted to create a bucket in his project,
as shown in Figure 4-12.
This is what we expected, right? Since the YAML file in Figure 4-7 did not include [email protected]
nor [email protected] as authorized members of the access level, the result is that none of them
can perform any actions invoking the storage API, even though both principals have editor roles in their
respective project.
Now, let’s check whether the authorized user [email protected] is allowed to create a bucket in
both backend-devs and frontend-devs projects.
Before we do that, we need to grant [email protected] permissions to create storage objects in
both projects.
This can be achieved by binding the role roles/storage.admin to the principal [email protected]
at the project level scope for each project as illustrated in Figures 4-13 and 4-14.
With these two role bindings, user [email protected] has the same storage permissions, user
[email protected] has in project backend-devs, and user [email protected] has in project
frontend-devs.
As a result, we can rest assured the accessibility test we are about to perform with [email protected]
is an “apple-to-apple” comparison among the three principals.
Finally, let’s check whether [email protected] can effectively create a bucket in either project
(Figures 4-15 and 4-16).
190
Chapter 4 ■ Implementing Virtual Private Cloud Service Controls
As expected, user [email protected] is allowed to create a bucket in each project in which he had
proper permissions.
This time, the basic access level dariokart_level associated to the perimeter dariokart_perimeter
has authorized [email protected] to perform any storage API operation, resulting in the successful
creation of a bucket in each service project within the perimeter.
A picture is worth a thousand words! Figure 4-17 provides a holistic view of what we just accomplished
in this exercise.
191
Chapter 4 ■ Implementing Virtual Private Cloud Service Controls
You may wonder, what’s the custom static route whose destination is the CIDR block 199.36.153.4/30?
To answer this question, we need to mention that not every Google API can be protected by VPC Service
Controls. However, the many Google APIs supported—including storage.googleapis.com—are only
accessible with routes whose destination is the CIDR block 199.36.153.4/30 and whose next hop is the
default-internet-gateway.
The fully qualified domain name restricted.googleapis.com resolves to the CIDR block
199.36.153.4/30, and this block is not routable from the Internet.
Instead, this block can only be routed from within the Google Global Backbone.
In other words, if you try to ping this CIDR block from a terminal in a computer connected to your
Internet Service Provider (ISP), you will get a request timeout. However, if you try from a VM in a subnet of
your VPC, you will get a response.
Deleting the Buckets
As always, to avoid incurring unexpected charges, it is a good idea to delete the buckets we just created
(Figure 4-18).
192
Chapter 4 ■ Implementing Virtual Private Cloud Service Controls
So how do you limit access to the Google APIs exposed by the restricted VIP from network endpoints
within the perimeter?
The answer depends on whether you are creating a new perimeter or you are updating an existing
perimeter.
If you are creating a new perimeter with the gcloud access-context-manager perimeters create
command, then use both flags:
• --enable-vpc-accessible-services
• --vpc-allowed-services=[API_ENDPOINT,...]
Keep in mind that the perimeter boundary is only enforced on the list of API endpoints assigned to
the --restricted-services flag, regardless of whether they are on the list assigned to the --vpc-allowed-
services flag.
The list of API endpoints assigned to the --vpc-allowed-services flag has a default value of all
services, that is, all services on the configured restricted VIP are accessible using Private Google Access by
default. If you want to be more selective, provide a comma-delimited list as follows:
• Empty comma-delimited list (""): None of the services on the configured restricted
VIP are accessible using Private Google Access.
• All restricted services (RESTRICTED-SERVICES): Use the keyword RESTRICTED-
SERVICES to denote the list of all restricted services, as specified by the value of
the --restricted-services flag. All these restricted API endpoints will be accessible
using Private Google Access.
• Selected services (e.g., "bigquery.googleapis.com"): Only services explicitly
selected by you will be accessible using Private Google Access.
If you are updating an existing perimeter with the command gcloud access-context-manager
perimeters update, then use
• --enable-vpc-accessible-services if the list of VPC allowed services is empty and
you are going to add services
• --add-vpc-allowed-services=[API_ENDPOINT,...] to add new services
• --remove-vpc-allowed-services=[API_ENDPOINT,...] to remove existing services
193
Chapter 4 ■ Implementing Virtual Private Cloud Service Controls
If you want to disable the ability to restrict access to services from network endpoints within the
perimeter, use both
• --no-enable-vpc-accessible-services
• --clear-vpc-allowed-services
Perimeter Bridges
A perimeter comes in two “flavors”: a regular perimeter (default option) or a perimeter bridge. From now
on, when we use the term perimeter we will always refer to a regular perimeter, which protects a group of
projects (even one single project) by controlling the use of selected Google APIs—that you get to choose
when you create the perimeter—to only the projects inside the perimeter.
While perimeters are an effective means to prevent data exfiltration, there are scenarios where projects
in different perimeters need to interact with each other and share data.
This is when perimeter bridges come in handy. A perimeter bridge allows inter-perimeter
communication between two projects in two different perimeters. Perimeter bridges are symmetrical,
allowing projects from each perimeter bidirectional and equal access within the scope of the bridge.
However, the access levels and service restrictions of the project are controlled solely by the perimeter the
project belongs to.
In fact, when you choose to create a perimeter bridge, you should not attach an access level to it, nor
should you specify which APIs are to be restricted. I’ll explain in a minute why I said “should not” instead of
“cannot.”
■■Exam tip A project may have multiple perimeter bridges connecting it to other projects. However, for this to
happen, the project must already belong to a regular perimeter. Also, all “bridged” projects must be part of the
same organization.
To create a perimeter bridge, use the gcloud access-context-manager perimeters create command
with the flag --perimeter-type=bridge.
If you try to update an existing regular perimeter to a perimeter bridge, the gcloud access-context-
manager perimeters update does not return an error status code as displayed in Figure 4-20.
194
Chapter 4 ■ Implementing Virtual Private Cloud Service Controls
You would think that the operation succeeded, right? However, as you can see from the console in
Figure 4-21—in this exceptional case, we use the console because the console reveals more details (the
perimeter type) than gcloud—the perimeter type has not changed!
If you think about it, this makes sense because unlike regular perimeters, which are essentially groups
of one, two, three, or more projects, a perimeter bridge connects only two projects that belong to different
perimeters. As a result, which one of the two access levels should be used? And which one of the two lists of
restricted (or unrestricted) APIs should be used?
Probably, the gcloud access-context-manager perimeters update command should prohibit a
patch operation when the --type=bridge if the existing regular perimeter has an access level or a list of
restricted services predefined.
In fact, if we try to change dariokart_perimeter’s type from regular to bridge using the console, this
option is grayed out, as you can see in Figure 4-22.
195
Chapter 4 ■ Implementing Virtual Private Cloud Service Controls
Figure 4-23 shows what we just learned and visualizes the key characteristics of perimeter bridges.
196
Chapter 4 ■ Implementing Virtual Private Cloud Service Controls
Audit Logging
By default, VPC Service Controls write to Cloud Logging all requests that are denied because of security
policy violations. More information about Cloud Logging—as it pertains to network operations and
optimization—will be provided in Chapter 8.
The most useful fields to check in the logs to better understand a violation are
• timestamp: The date and time when violation happened.
• protopayload_auditlog.requestMetadata.callerIp: The client IP address.
• protopayload_auditlog.methodName: The API and method invoked.
• protopayload_auditlog.authenticationInfo.principalEmail: The user email or
service account that invoked the API.
• protopayload_auditlog.metadataJson: This field contains interesting information
in JSON format, for example, name of violated perimeter, direction (ingress/egress),
source project, target project, and unique ID.
Dry-Run Mode
You can think of VPC service controls as a firewall that controls which Google APIs the components of your
workload are authorized to consume.
The VPC service perimeter construct was introduced to establish the boundary around these
accessible APIs.
In order to further verify the identity of each request, you may attach one or more access levels to a VPC
service perimeter.
When used in conjunction with other network controls (e.g., firewall rules and other network services
you will learn in the next chapter), VPC service controls are an effective way to prevent data exfiltration.
However, you need to use caution when configuring VPC service controls for your workload. A
misconfiguration may be overly permissive by allowing requests to access some Google APIs they shouldn’t
be allowed to consume. Conversely, a misconfiguration may also be too restrictive to the point that even
authorized requests are mistakenly denied access to one or more accessible Google APIs.
You may wonder, why not test a VPC service control in a non-production environment first, and then if
it works and it passes all forms of test, “promote” this configuration to production?
This is a feasible approach, but it’s not practical because a VPC service control configuration takes into
consideration a large number of “moving parts,” which include a combination of identity data, contextual
data (e.g., geolocation, origination CIDR block, device type, device OS, etc.), and infrastructure data.
Moreover, your non-production VPC service control configuration may work in your non-production
cloud environment, but it may fail in production because, for example, your request didn’t originate from
the expected CIDR block using the expected principal and a trusted device.
Wouldn’t it be nice to be able to test your VPC service control configuration directly in the targeted
environment without creating adverse effects on the environment itself?
This is exactly what a dry-run mode configuration does. A VPC service control dry-run configuration
will not prevent unauthorized requests from accessing the targeted Google APIs. Instead, it will simply
report a perimeter dry-run violation in the audit logs.
For this reason, it is best practice to leverage dry-run mode perimeter configurations with increasing
level of restrictions. Put differently, a VPC service perimeter dry-run configuration should always be more
restrictive than the perimeter enforced configuration.
Upon enforcement (or “promotion”) of the dry-run configuration, unauthorized requests will be denied
access to the targeted Google APIs, and a VPC service control violation will be reported in the audit logs.
197
Chapter 4 ■ Implementing Virtual Private Cloud Service Controls
Dry-Run Concepts
A VPC service perimeter dry-run configuration can be in one of these four states:
1.
Inherited from enforced: By default, the dry-run configuration is identical to the
enforced configuration. This happens, for example, if you create a perimeter in
enforced mode from scratch, as we did in the service perimeter deep dive before.
2.
Updated: The dry-run configuration is different from the enforced configuration.
This happens when you want to apply additional restrictions to the perimeter,
and you choose to test them in the dry-run configuration instead of the enforced
configuration. The upcoming perimeter dry-run deep dive exercise will show you
exactly this approach.
3.
New: The perimeter has a dry-run configuration only. No enforced
configurations exist for the perimeter. The status persists as new until an
enforced configuration is created for the perimeter.
4.
Deleted: The perimeter dry-run configuration was deleted. The status persists as
deleted until a dry-run configuration is created.
Perimeter Dry-Run
In this exercise, you will learn how to use a dry-run mode perimeter configuration to test your workload
security posture.
First, we will configure subnet-frontend with private connectivity to Google APIs and services.
Then, we will update dariokart_perimeter to allow the two principals [email protected] and
[email protected] to use the restricted service, that is, the storage API.
Last, we will limit the access of Google APIs from network endpoints within the perimeter to only the
storage API. This change constrains even more the actions an authorized principal can do—whether these
actions originated from inside or outside of the perimeter. Put differently, the only action allowed will be
consumption of the storage API from authorized requestors within the perimeter.
Now—and this is the important part—instead of applying this change immediately (as we did in
the previous deep dive exercise), you will learn how to test this change by creating a perimeter dry-run
configuration.
You will test the perimeter dry-run configuration, and if this works as expected, then you will enforce it.
At the end of this exercise, you will have learned
• How to create and to enforce a dry-run mode perimeter configuration
• How to find perimeter violations in the audit logs
• How VPC accessible services work
• How to fully set up private connectivity to Google APIs and services
Let’s get started!
198
Chapter 4 ■ Implementing Virtual Private Cloud Service Controls
We will set up private connectivity for subnet-frontend and make sure that any request for the
compute Google API, which originates from this subnet, gets routed to the restricted VIP, that is, the CIDR
block 199.36.153.4/30.
We chose the compute Google API, that is, compute.googleapis.com, as an example just to prove that
when the VPC allowed services are set for a perimeter, only these services can be accessed from network
endpoints within the perimeter.
There are a number of steps you need to complete in order to establish private connectivity for a subnet.
First, you need to make sure the subnet is configured to use Private Google Access.
This is achieved by using the --enable-private-ip-google-access flag when you create or update
a subnet. We already enabled Private Google Access for subnet-frontend in the “Private Google Access”
section of Chapter 3 (Figure 3-58). Let’s check to make sure this flag is still enabled (Figure 4-24).
As you can see in Figure 4-24, Private Google Access is enabled for subnet-frontend but is disabled for
subnet-backend.
Second, we need to make sure there are routes whose next step is the default-internet-gateway and
whose destination is the restricted VIP.
To ensure the highest level of network isolation, I deleted the default Internet route (Figure 4-25), and
I created a custom static route whose next step is the default-internet-gateway and whose destination is the
restricted VIP (Figure 4-26).
■■Note While you cannot delete VPC routes, you can delete the default Internet route. To do so, run the
gcloud compute routes list command and find out the name of the route whose destination range is
0.0.0.0/0. Then run the gcloud compute routes delete command as shown in Figure 4-25.
199
Chapter 4 ■ Implementing Virtual Private Cloud Service Controls
■■Exam tip One of the effects of deleting the default Internet route is that you won’t be able to ssh into any
VM with an external IP.
Since I didn’t want to lose the ability to ssh into VMs with an external IP, I chose to recreate a default
Internet route, but this time with lower priority (1500) than the newly created custom static route (defaulted
to 1000).
Figure 4-27 shows how to create the new default Internet route.
So, just to make sure I have the routes I need, in Figure 4-28 I listed all the routes that use the default
Internet gateway in our shared VPC.
The first route will allow egress traffic to the Internet, which will allow me (among many things) to ssh
to VMs with external IPs I created in any of my subnets.
The second route will allow egress traffic to the restricted VIP.
■■Exam tip In order for your workloads to privately consume Google APIs, you need to allow egress traffic
with protocol and port tcp and 443, respectively, to reach the restricted VIP.
Third, with Private Google Access enabled and a route to the restricted VIP, we need to make sure
firewall rules won’t block any outbound tcp:443 traffic whose destination is the restricted VIP.
As a result, in Figure 4-29 I created a firewall rule allow-restricted-apis.
200
Chapter 4 ■ Implementing Virtual Private Cloud Service Controls
Fourth, we need to set up a DNS resource record set to resolve the fully qualified domain name
compute.googleapis.com.—yes, the trailing dot is required!—to one of the four IPv4 addresses in the
restricted VIP.
An easy way to do it is by leveraging the newly developed Cloud DNS feature Cloud DNS response
policies, which was introduced exactly to simplify access to Google APIs, as illustrated in Figure 4-30.
A Cloud DNS response policy is a Cloud DNS private zone concept that contains rules instead of
records for our shared VPC name resolution.
Last, in Figure 4-31 we need to add a rule that lets the fully qualified domain name compute.
googleapis.com. resolve to one of the four IPv4 addresses in the restricted VIP CIDR block
199.36.153.4/30.
201
Chapter 4 ■ Implementing Virtual Private Cloud Service Controls
With these five steps successfully completed, you can rest assured that tcp:443 egress traffic from
subnet-frontend, whose destination is the compute Google API, will reach its destination without using the
Internet.
Updating the Perimeter
Now we need to make sure the updated access level is reassigned to the perimeter. This is accomplished
by leveraging the --set-access-levels flag in the gcloud access-context-manager perimeters update
command (Figure 4-34).
202
Chapter 4 ■ Implementing Virtual Private Cloud Service Controls
Testing the Perimeter
In this section, we will test the perimeter by showing that the principal [email protected] is allowed to
create a GET request targeting the compute Google API endpoint in order to list the VMs (instances) in its
project.
Let’s first look at the perimeter description (Figure 4-35) to check whether the perimeter actually allows
requestors from a project inside the perimeter, matching one of the three principals in the abovementioned
access level—[email protected] being one of the three—to consume the https://compute.
googleapis.com endpoint.
■■Note For the request to hit the restricted VIP, we need to make sure the VM has no external IP address.
203
Chapter 4 ■ Implementing Virtual Private Cloud Service Controls
Even though our shared VPC has an Internet route, since the VM has no external IP address we cannot
ssh to the VM from Cloud Shell by default—the default option is to ssh into the VM using its external IP
address and the default Internet route.
In order to SSH to our “private” VM, we need to leverage the Identity-Aware Proxy (IAP) tunnel.
As always, we start by enabling its corresponding API in the project (Figure 4-37).
Then the principal [email protected] needs to be granted the tunnel resource accessor role.
In accordance with the least privilege principle, we will bind this role to this principal in his own project
as scope.
Figure 4-38 shows how to update the project frontend-devs IAM allow policy accordingly.
With these preliminary steps completed, we can SSH into vm1, as shown in Figure 4-39.
Now that we are securely logged into vm1, let’s test dariokart_perimeter.
In this test, we will create an HTTP GET request, whose target is the compute Google API endpoint, with
a URI constructed to invoke the instances.list method.
See https://cloud.google.com/compute/docs/reference/rest/v1/instances/list for more details.
The request will be authenticated using [email protected]’s access token, which is redacted in
Figure 4-40.
204
Chapter 4 ■ Implementing Virtual Private Cloud Service Controls
■■Note An access token is a temporary form of authentication, and by default it expires after one hour. In
Figure 4-40, I used the gcloud auth print-access-token command from a gcloud session on my
personal machine authenticated as principal [email protected]. I could have used the same command
from Cloud Shell—provided I was authenticated as [email protected].
Finally, we execute the HTTP GET request using the curl command in Figure 4-41.
I copied the access token, and I pasted it as the value of the Bearer key in the HTTP header section.
As expected, the request was successful and the Google compute API returned a response listing the
details of vm1—the only instance in the project frontend-devs.
In Figure 4-41, you can see the detailed response in JSON format. The access token was redacted.
205
Chapter 4 ■ Implementing Virtual Private Cloud Service Controls
Since we don’t want to enforce this new configuration just yet, we are going to leverage the dry-run
feature of service perimeters in order to thoroughly test this configuration and to make sure it provides the
expected level of protection.
When you use a dry-run configuration for the first time, it is by default inherited from the perimeter
enforced configuration.
The enforced configuration allows all VPC accessible APIs to be consumed by authorized requestors
inside the perimeter (Figure 4-35).
As a result, we want to first remove all VPC allowed services from the inherited dry-run
configuration—step 1.
This is achieved by using the --remove-vpc-allowed-services flag (set to the ALL keyword), in
conjunction with the --enable-vpc-accessible-services flag, as displayed in Figure 4-42.
Subsequently, we want to add the restricted services, that is, storage.googleapis.com, to the list of
VPC allowed services—step 2.
This is achieved by using the --add-vpc-allowed-services flag (set to the RESTRICTED-SERVICES
keyword), also in conjunction with the --enable-vpc-accessible-services flag, as shown in Figure 4-43.
After completing these two steps, dariokart_perimeter’s dry-run configuration (surrounded by the
green rectangle) has drifted from the enforced configuration (surrounded by the yellow rectangle), as
illustrated in Figure 4-44.
206
Chapter 4 ■ Implementing Virtual Private Cloud Service Controls
An easier way to see the differences between the two configurations is by using the gcloud access-
context-manager perimeters dry-run describe command, as shown in Figure 4-45.
The “+” sign denotes a row that is present in the perimeter dry-run configuration, but is absent from the
perimeter enforced configuration.
Conversely, the “-” sign denotes a row that is present in the perimeter enforced configuration, but is
absent from the perimeter dry-run configuration.
Now that you learned the difference between dariokart_perimeter’s enforced and dry-run
configurations, we are ready to test the perimeter dry-run configuration.
207
Chapter 4 ■ Implementing Virtual Private Cloud Service Controls
Even though the request succeeded, the audit logs tell a different story.
Figure 4-47 illustrates how to retrieve from Cloud Logging the past ten minutes (--freshness=600s) of
logs that are reported by VPC Service Controls (read 'protoPayload.metadata.@type:"type.googleapis.
com/google.cloud.audit.VpcServiceControlAuditMetadata"').
Figure 4-47. Reading VPC Service Control audit logs in the past ten minutes
208
Chapter 4 ■ Implementing Virtual Private Cloud Service Controls
Notice the dryRun: true key-value pair (surrounded by the dark blue rectangle) to denote the audit log
was generated by a perimeter dry-run configuration.
209
Chapter 4 ■ Implementing Virtual Private Cloud Service Controls
As you can see in Figure 4-49, the perimeter’s newly enforced configuration matches exactly the spec
section in Figure 4-44.
Put differently, the dry-run configuration was promoted to enforce mode, and the new dry-run
configuration now matches the enforced one because it inherited from it.
210
Chapter 4 ■ Implementing Virtual Private Cloud Service Controls
In this exercise, you learned about the powerful dry-run feature available to GCP perimeter resources.
When you design your service perimeters, it is best practice to thoroughly test their configurations before
enforcing them to your production workloads.
A perimeter dry-run configuration is perfect for that. Depending on how mission-critical your workload
is, the dry-run phase may last a few days, a few weeks, or even longer if you want to perform cybersecurity
threat hunting in an effort to reduce your workload attack surface.
Cleaning Up
Before deleting vm1, let’s check whether compute.googleapis.com resolves to one of the four IP addresses in
the restricted VIP range (Figure 4-51).
211
Chapter 4 ■ Implementing Virtual Private Cloud Service Controls
Final Considerations
There are a few “gotchas” you need to be aware of for the exam. They relate to scenarios that include the
combination of VPC Service Controls and VPCs connected in a number of ways. The main two common
scenarios are briefly explained in the upcoming sections.
Exam Questions
Question 4.1 (Perimeter with Shared VPC)
You are troubleshooting access denied errors between Compute Engine instances connected to a Shared
VPC and BigQuery datasets. The datasets reside in a project protected by a VPC Service Controls perimeter.
What should you do?
A.
Add the host project containing the Shared VPC to the service perimeter.
B.
Add the service project where the Compute Engine instances reside to the
service perimeter.
C.
Create a service perimeter between the service project where the Compute
Engine instances reside and the host project that contains the Shared VPC.
D.
Create a perimeter bridge between the service project where the Compute
Engine instances reside and the perimeter that contains the protected BigQuery
datasets.
212
Chapter 4 ■ Implementing Virtual Private Cloud Service Controls
Rationale
A is CORRECT because if you're using Shared VPC, you must always include
the host project in a service perimeter along with any service project that
consumes the Shared VPC.
B is not correct because the entire Shared VPC (host and all service projects)
must be treated as one service perimeter.
C is not correct because a service perimeter already exists.
D is not correct because there is no need to create a perimeter bridge.
Rationale
A is not correct because Cloud Run has nothing to do with VPC Service Controls.
B is not correct because the “native” perimeter mode does not exist.
C is not correct because the “enforced” mode achieves the opposite of what the
requirement is.
D is CORRECT because the “dry-run” mode is intended to test the perimeter
configuration and to monitor usage of services without preventing access to
resources. This is exactly what the requirement is.
213
CHAPTER 5
Enterprise applications are architected, designed, and built for elasticity, performance, security, cost-
effectiveness, and resilience. These are the five pillars of the well-architected framework.
■■Note The well-architected framework is an organized and curated collection of best practices, which are
intended to help you architect your workloads by taking full advantage of the cloud. Even though each public
cloud provider has created its own framework, all these frameworks have in common the aforementioned
five pillars, as key tenets. If you want to learn more, I recommend starting from the Google Cloud Architecture
framework: https://cloud.google.com/architecture/framework.
In this chapter, you will learn how to choose the most appropriate combination of Google Cloud load
balancing services that will help you architect, design, and build your workloads to be elastic, performant,
secure, cost-effective, and resilient.
By elasticity we mean the ability of a workload to scale horizontally (scale in or scale out) based on the
number of incoming requests it receives. If the workload receives low traffic, then the workload should
only consume a small set of compute, network, and storage resources. As the number of incoming requests
increases, the workload should be able to gradually increase the number of compute, network, and storage
resources to be able to serve the increasing load while maintaining its SLOs.
Performance is typically measured in terms of requests per second (RPS) and request latency, that
is, how many requests per second the workload can process in order to meet its SLOs and what is the
average time (usually expressed in milliseconds—ms) for the workload to serve a request. More metrics
may be added, but for the scope of this chapter, we will be mainly focused on RPS and latency as SLIs for
performance.
The focus areas of security are identity protection, network protection, data protection, and application
protection. Data protection includes encryption at rest, in transit, and in use.
Cost is another important pillar that has to be considered when designing the architecture for your
workloads. Remember, in the cloud you pay for what you use, but ultimately your workloads need a
network to operate. You will learn in this chapter that all the load balancing services available in Google
Cloud come in different “flavors” based on the network tier you choose, that is, premium tier or standard
tier—we introduced the two network service tiers in Chapter 2. The former is more expensive than the
latter, but has the key benefit of leveraging the highly reliable, highly optimized, low-latency Google Global
Backbone network instead of the Internet to connect the parts of your load balancing service. Some load
balancing services are only available in a network tier (premium) rather than the other (standard). For more
information, visit https://cloud.google.com/network-tiers#tab1.
Last, resilience indicates the ability of the workload to recover from failures, whether they be in its
frontend, in its backends, or any other component of its architecture.
As of the writing of this book, there are nine types of load balancers, which are grouped in Figure 5-1
by scope, that is, global or regional—the former to denote a load balancer with components (backends) in
multiple regions and the latter with all components in a single region.
■■Note Do not confuse the scope of a load balancer (global vs. regional) with its “client exposure,” that is,
external vs. internal. An external load balancer denotes a load balancer that accepts Internet traffic, whereas an
internal load balancer only accepts RFC 1918 traffic.
216
Chapter 5 ■ Configuring Load Balancing
In addition to the nine load balancer types, each type—not all—may come in the two network tiers you
learned in Chapter 2.
To avoid confusion, from now on I will denote each load balancer type with a specific number as
indicated in Figure 5-1.
For the exam, you are required to know which load balancer type is best suited for a specific use case.
You will need to determine the most appropriate type of load balancer, as well as a combination of Google
Cloud services that meet the requirements in the question. These are typically requirements related to the
pillars of the well-architected framework, that is, operational efficiency, performance, resilience, security,
and cost-effectiveness.
I have included in Figure 5-1 Cloud Armor’s compatibility, which is a key service that goes hand in hand
with global external load balancers. You will learn Cloud Armor in detail at the end of this chapter. For the
time being, think of Cloud Armor—as the name suggests—as a GCP network service intended to provide an
extra layer of protection for your workloads.
When an HTTPS load balancer combines “forces” with two other network services, that is, Cloud
Armor and Identity-Aware Proxy, your workload is also more secure because it is protected from Distributed
Denial-of-Service (DDoS) and other layer 7 attacks, for example, the Open Web Application Security Project
(OWASP) top ten vulnerabilities.
You will learn first the common components of a load balancer, regardless of its type. Once you have
acquired a good understanding of the parts that make up a load balancer, you will learn the differentiating
capabilities each load balancer has.
217
Chapter 5 ■ Configuring Load Balancing
218
Chapter 5 ■ Configuring Load Balancing
The value of the backend service attribute --port-name (e.g., port1) must match a value in
the --named-ports list of key-value pairs for the instance group (e.g., port1:443,port2:80,port3:8080):
In this case, the backend service uses the value 443 as the port to use for communication with the
instance group’s VMs over the https protocol.
This is because port1 matches the key-value pair port1:443 in the --named-ports instance group list,
thereby resolving port1 to https:443:
2.
Create two VMs:
3.
Create the NEG:
219
Chapter 5 ■ Configuring Load Balancing
--default-port=80
Created [https://www.googleapis.com/compute/beta/projects/project/zones/us-
central1-c/networkEndpointGroups/neg1].
NAME LOCATION TYPE ENDPOINT_TYPE DEFAULT_PORT ENDPOINTS
neg1 us-central1-c LOAD_BALANCING GCE_VM_IP_PORT 80 0
4.
Update NEG with endpoints:
5.
Create a health check:
6.
Create the backend service:
7.
Add the NEG neg1 backend to the backend service:
8.
Create a URL map using the backend service backendservice1:
9.
Create the target http proxy using the url-map urlmap1:
10.
Create the forwarding rule, that is, the key component of a frontend load
balancer’s configuration, and attach it to the newly created target-http-proxy
httpproxy1:
220
Chapter 5 ■ Configuring Load Balancing
At the time of writing this book, the protocol can be one of the following options: grpc, http, https,
http2, ssl, tcp. Legacy health checks only support http and https.
The scope indicates whether the backends are located in a single region or in multiple regions.
221
Chapter 5 ■ Configuring Load Balancing
Keep in mind that health checks must be compatible with the type of load balancer and the backend
types. For example, some load balancers only support legacy health checks—that is, they only support http
and https protocols—while others support grpc, http, https, http2, ssl, and tcp.
In addition to compatibility, for the health check to work effectively, an ingress allow firewall rule that
allows traffic to reach your load balancer backends must be created in the VPC where the backends live.
Figure 5-4 summarizes the minimum required firewall rules for each load balancer.
The two TCP/UDP network load balancers (denoted by 8 and 9 in Figure 5-4) are the only nonproxy-
based load balancers. Put differently, these load balancers preserve the original client IP address because
the connection is not terminated until all packets reach their destination in one of the backends. Due to
this behavior, the source ranges for their compatible health checks are different from the ones applicable to
proxy-based load balancers (1–7 in Figure 5-4).
Also, regional load balancers that use the open source Envoy proxy (load balancers 5, 6, 7 in Figure 5-4)
require an additional ingress allow firewall rule that accepts traffic from a specific subnet referred to as the
proxy-only subnet.
These load balancers terminate incoming connections at the load balancer frontend. Traffic is then sent
to the backends from IP addresses located on the proxy-only subnet.
As a result, in the region where your Envoy-based load balancer operates, you must first create the
proxy-only subnet.
A key requirement is to set the --purpose flag to the value REGIONAL_MANAGED_PROXY, as shown in the
following code snippet, where we assume the VPC network lb-network already exists:
222
Chapter 5 ■ Configuring Load Balancing
■■Exam tip There can only be one active proxy-only subnet per region at a given time. All Envoy-based load
balancers will use backend services configured to share the same proxy-only subnet to test the health of their
backends. This is why it is recommended to use a CIDR block with a mask of /23 for the proxy-only subnet.
Upon creation of the proxy-only subnet, you must add another firewall rule to the load balancer
VPC—lb-network in our preceding example. This firewall rule allows proxy-only subnet traffic flow in the
subnet where the backends live—the two subnets are different, even though they reside in the load balancer
VPC. This means adding one rule that allows TCP port 80, 443, and 8080 traffic from the range of the proxy-
only subnet, that is, 10.129.0.0/23.
In the preceding example, the firewall rule targets all VMs that are associated with the network tag
allow-health-checks.
223
Chapter 5 ■ Configuring Load Balancing
A regional (external HTTP(S)) load balancer is intended to serve content from a specific region, whereas
a global (external HTTP(S)) load balancer is intended to serve content from multiple regions around the
world. A typical use case for a regional (external HTTP(S)) load balancer is compliance due to sovereignty
laws requiring backends to operate in a specific geolocation.
Because of that, global external HTTP(S) load balancers leverage a specific Google infrastructure, which
is distributed globally in the Google Edge Network and operates using Google’s global backbone network
and Google’s control plane. This infrastructure is called the Google Front End (GFE) and uses the CIDR
blocks 130.211.0.0/22, 35.191.0.0/16.
■■Exam tip There are a few IP ranges you need to remember for the exam. The GFE ones, that is,
130.211.0.0/22, 35.191.0.0/16, are definitely in the list to be remembered.
Modes of Operation
An external HTTP(S) load balancer comes in three “flavors”:
• Global external HTTP(S) load balancer: This is the load balancer type 1 in
Figure 5-1 and is implemented as a managed service on Google Front Ends (GFEs).
It uses the open source Envoy proxy to support advanced traffic management
capabilities such as traffic mirroring, weight-based traffic splitting, request/
response-based header transformations, and more.
• Global external HTTP(S) load balancer (classic): This is the load balancer type
2 in Figure 5-1 and is global in premium tier, but can be configured to be regional in
standard tier. This load balancer is also implemented on Google Front Ends (GFEs).
• Regional external HTTP(S) load balancer: This is the load balancer type 5 in
Figure 5-1 and is implemented as a managed service on the open source Envoy
proxy. It includes advanced traffic management capabilities such as traffic mirroring,
weight-based traffic splitting, request/response-based header transformations,
and more.
Architecture
The architecture of all external HTTP(S) load balancers (global and regional) shows common components,
such as forwarding rules, target proxies, URL maps, backend services, and backends.
Yet, there are a few differences among the three types (1, 2, 5 in Figure 5-1) we just described.
Instead of listing each common component of the architecture, and explaining what its intended
purpose is, and how it differs across the three types, I like to visualize them in a picture and go from there.
Figure 5-5 illustrates the architecture of the two global external HTTP(S) load balancers (“regular” and
classic, respectively, types 1 and 2), whereas the bottom part shows the regional external HTTP(S) load
balancer (type 5) architecture. Let’s start from the client, located in the very left part of the figure.
224
Chapter 5 ■ Configuring Load Balancing
Forwarding Rule
An external forwarding rule specifies an external IP address, port, and target HTTP(S) proxy. Clients use the
IP address and port to connect to the load balancer from the Internet.
■■Exam tip Type 1, 2, 5 load balancers only support HTTP and HTTPS traffic on TCP ports 80 (or 8080) and
TCP port 443, respectively. If your clients require access to your workload backend services using different TCP
(or UDP) ports, consider using load balancer types 3, 4, 6, 7, 8, 9 instead (Figure 5-1). More information will be
provided in the upcoming sections.
Global forwarding rules support external IP addresses in IPv4 and IPv6 format, whereas regional
forwarding rules only support external IP addresses in IPv4 format.
225
Chapter 5 ■ Configuring Load Balancing
226
Chapter 5 ■ Configuring Load Balancing
Figure 5-6. Multiple SSL certificates feature of global HTTP(S) load balancer
SSL Policies
From now on, the term SSL refers to both the SSL (Secure Sockets Layer) and TLS (Transport Layer Security)
protocols.
SSL policies define the set of TLS features that external HTTP(S) load balancers use when negotiating
SSL with clients.
227
Chapter 5 ■ Configuring Load Balancing
For example, you can use an SSL policy to configure the minimum TLS version and features that every
client should comply with in order to send traffic to your external HTTP(S) load balancer.
■■Exam tip SSL policies affect only connections between clients and the target HTTP(S) proxy (Connection
1 in Figure 5-5). SSL policies do not affect the connections between the target HTTP(S) proxy and the backends
(Connection 2).
To define an SSL policy, you specify a minimum TLS version and a profile. The profile selects a set of
SSL features to enable in the load balancer.
Three preconfigured Google-managed profiles let you specify the level of compatibility appropriate for
your application. The three preconfigured profiles are as follows:
• COMPATIBLE: Allows the broadest set of clients, including clients that support only
out-of-date SSL features, to negotiate SSL with the load balancer
• MODERN: Supports a wide set of SSL features, allowing modern clients to
negotiate SSL
• RESTRICTED: Supports a reduced set of SSL features, intended to meet stricter
compliance requirements
A fourth CUSTOM profile lets you select SSL features individually.
The SSL policy also specifies the minimum version of the TLS protocol that clients can use to establish a
connection. A profile can also restrict the versions of TLS that the load balancer can negotiate. For example,
ciphers enabled in the RESTRICTED profile are only supported by TLS 1.2. Choosing the RESTRICTED
profile effectively requires clients to use TLS 1.2 regardless of the chosen minimum TLS version.
If you do not choose one of the three preconfigured profiles or create a custom SSL policy, your load
balancer uses the default SSL policy. The default SSL policy is equivalent to an SSL policy that uses the
COMPATIBLE profile with a minimum TLS version of TLS 1.0.
Use the gcloud compute target-https-proxies create or update commands to attach an SSL policy
(--ssl-policy) to your target HTTP(S) proxy.
■■Exam tip You can attach an SSL policy to more than one target HTTP(S) proxy. However, you cannot
configure more than one SSL policy for a particular target proxy. Any changes made to SSL policies don’t alter
or interrupt existing load balancer connections.
URL Map
The target HTTP(S) proxy uses a URL map to decide where to route the new request (Connection 2 in
Figure 5-5)—remember the first request, which carried the original client IP, has been terminated.
Since the HTTP(S) load balancer operates at layer 7, it is fully capable of determining where to route the
request based on HTTP attributes (i.e., the request path, cookies, or headers). When I say “where to route the
request,” I really mean to which backend service or backend bucket the target HTTP(S) proxy should route
the request to.
In fact, the target HTTP(S) proxy forwards client requests to specific backend services or backend
buckets. The URL map can also specify additional actions, such as sending redirects to clients.
228
Chapter 5 ■ Configuring Load Balancing
Backend Service
A backend service distributes requests to healthy backends. Unlike regional global external HTTP(S)
load balancers, the two global external HTTP(S) load balancers also support backend buckets. The key
components of a backend service are
• Scope defined by one of the two flags:
• Global (--global) to indicate the backend service operates with backends
located in multiple regions
• Regional (--region=REGION) to indicate the backend service operates with
backends located in the specified region
• Load balancing scheme specifies the load balancer type. Set the scheme to
EXTERNAL_MANAGED for types 1 and 5 load balancers because they both use the Envoy
open source. Set the scheme to EXTERNAL for type 2, that is, global external HTTP(S)
load balancer (classic).
• Protocol is the protocol the backend service uses to communicate to the backends
(MIGs, various types of NEGs, buckets). The only available options are HTTP, HTTPS,
and HTTP/2. This selection is “unforgiving” in that the load balancer does not fall
back to one of the other protocols if it is unable to negotiate a connection to the
backend with the specified protocol. HTTP/2 requires TLS.
• Port name is only applicable to backends whose type is instance groups. Each
instance group backend exports a list including one or more named ports (see the
gcloud compute instance-groups get-named-ports command), which map
a user-configurable name to a port number. The backend service’s port name
subscribes to exactly one named port on each instance group. The resolved port
number is used by the backend service to send traffic to the backend.
• Health check is the resource used by a backend service to ensure incoming requests
are sent to healthy backends. For global external HTTP(S) load balancer health
check probes, you must create in the load balancer VPC an ingress allow firewall
rule that allows traffic to reach your backends from the GFE CIDR blocks, that is,
130.211.0.0/22 and 35.191.0.0/16. For regional external HTTP(S) load balancers,
you must create in the load balancer VPC an ingress allow firewall rule that allows
traffic to reach your backends from the CIDR block of the proxy-only subnet.
Although it is not required, it is a best practice to use a health check whose protocol
matches the protocol of the backend service.
• Timeout is the amount of time (in seconds) that the backend service waits for a
backend to return a full response to a request. If the backend does not reply at all, the
load balancer returns a 502 Bad Gateway error to the client.
• Enable Cloud CDN (Content Delivery Network) is used to enable Cloud CDN for
the backend service. Cloud CDN caches HTTP responses at the edge of Google’s
network. Cloud CDN is disabled by default. Use --enable-cdn to enable and --no-
enable-cdn to disable.
• Identity-Aware Proxy (IAP): If enabled, you can provide values for oauth2-client-
id and oauth2-client-secret. For example, --iap=enabled,oauth2-client-
id=foo,oauth2-client-secret=bar turns IAP on, and --iap=disabled turns it off.
229
Chapter 5 ■ Configuring Load Balancing
• Connection draining timeout is used during removal of VMs from instance groups.
This guarantees that for the specified time all existing connections to a VM will
remain untouched, but no new connections will be accepted. This defaults to zero
seconds, that is, connection draining is disabled.
• Session affinity must be one of
• CLIENT_IP, that is, routes requests to instances based on the hash of the client’s
IP address.
• GENERATED_COOKIE, that is, routes requests to backend VMs or endpoints in
a NEG based on the contents of the cookie set by the load balancer.
• HEADER_FIELD, that is, routes requests to backend VMs or endpoints in a
NEG based on the value of the HTTP header named in the --custom-request-
header flag.
• HTTP_COOKIE, that is, routes requests to backend VMs or endpoints in a NEG
based on an HTTP cookie named in the HTTP_COOKIE flag (with the
optional --affinity-cookie-ttl flag). If the client has not provided the cookie,
the target HTTP(S) proxy generates the cookie and returns it to the client in a
Set-Cookie header.
• NONE, that is, session affinity is disabled.
Backends
Backends are the ultimate destination of your load balancer incoming traffic.
Upon receiving a packet, they perform computation on its payload, and they send a response back to
the client.
As shown in Figure 5-5, the type of backend depends on the scope of the external HTTP(S) load
balancer, as well as its network tier.
In gcloud, you can add a backend to a backend service using the gcloud compute backend-services
add-backend command.
When you add an instance group or a NEG to a backend service, you specify a balancing mode, which
defines a method measuring backend load and a target capacity. External HTTP(S) load balancing supports
two balancing modes:
• RATE, for instance groups or NEGs, is the target maximum number of requests
(queries) per second (RPS, QPS). The target maximum RPS/QPS can be exceeded if
all backends are at or above capacity.
• UTILIZATION is the backend utilization of VMs in an instance group.
How traffic is distributed among backends depends on the mode of the load balancer.
230
Chapter 5 ■ Configuring Load Balancing
There are several ways to create the networking infrastructure required for the load balancer to operate
in container-native mode. You will learn two ways to accomplish this.
The first way is called container-native load balancing through ingress, and its main advantage is that all
you have to code is the GKE cluster that will host your containers, a deployment for your workload, a service
to mediate access to the pods hosting your containers, and a Kubernetes ingress resource to allow requests
to be properly distributed among your pods. The deployment, the service, and the Kubernetes ingress can all
be coded declaratively using YAML files, as we’ll see shortly.
Upon creating these four components, Google Cloud does the job for you by creating the global external
HTTP(S) load balancer (classic), along with all the components you learned so far. These are the backend
service, the NEG, the backends, the health checks, the firewall rules, the target HTTP(S) proxy, and the
forwarding rule. Not bad, right?
In the second way, we let Google Cloud create only the NEG for you. In addition to the GKE cluster,
your workload deployment, and the Kubernetes service, you will have to create all the load balancing
infrastructure earlier. This approach is called container-native load balancing through standalone
zonal NEGs.
While this approach sounds like more Infrastructure as a Service oriented, it will help you consolidate
the knowledge you need in order to master the configuration of a global external HTTP(S) load balancer that
operates in container-native load balancing mode. Let’s get started!
Figures 5-8, 5-9, and 5-10 display the YAML manifests for the workload deployment, the service, and the
ingress Kubernetes resources, respectively.
231
Chapter 5 ■ Configuring Load Balancing
232
Chapter 5 ■ Configuring Load Balancing
Next, we use the manifests to create a deployment (Figure 5-11), a service (Figure 5-12), and an ingress
(Figure 5-13) Kubernetes resource.
233
Chapter 5 ■ Configuring Load Balancing
The creation of a Kubernetes ingress had the effect of triggering a number of actions behind the scenes.
These included the creation of the following resources in the project frontend-devs:
1.
A global external HTTP(S) load balancer (classic).
2.
A target HTTP(S) proxy.
3.
A backend service in each zone—in our case, since the cluster is a single zone, we
have only a backend service in us-east1-d.
4.
A global health check attached to the backend service.
5.
A NEG in us-east1-d. The endpoints in the NEG and the endpoints of the
Service are kept in sync.
Figure 5-14 displays the description of the newly created Kubernetes ingress resource.
■■Note Since we deployed the GKE cluster in a shared VPC network, the kubectl command notified us that
a firewall rule in the shared VPC your-app-shared-vpc is required to allow ingress traffic from the GFE CIDR
blocks to the GKE cluster worker nodes using the deployment port and protocol tcp:9376.
User [email protected] doesn’t have the permission to add the required firewall rule, but
[email protected] does, as you can see in Figure 5-15.
234
Chapter 5 ■ Configuring Load Balancing
Figure 5-15. Creating a firewall rule to allow health checks to access backends
In our example, when we tested the load balancer, we received a 502 status code “Bad Gateway”
(Figure 5-16).
There are a number of reasons why the test returned a 502 error. For more information on how to
troubleshoot 502 errors, see https://cloud.google.com/load-balancing/docs/https/troubleshooting-
ext-https-lbs.
This exercise was intended to show you the automatic creation of the load balancer and all its
components from the declarative configuration of the ingress manifest and the annotation in the Kubernetes
service manifest.
If you are curious to learn more, I pulled the logs and I discovered that the target HTTP(S) proxy was
unable to pick the backends (Figure 5-17). Can you discover the root cause?
235
Chapter 5 ■ Configuring Load Balancing
Last, let’s clean up the resources we just created in order to avoid incurring unexpected charges
(Figure 5-18). Notice how the --zone flag is provided because the nodes of the cluster are VMs, and VMs are
zonal resources.
It’s worth to mention that by deleting the cluster, Google Cloud takes care of automatically deleting the
load balancer and all its components except the NEG (this is a known issue) and the firewall rule we had to
create in the shared VPC (this is because we deployed the cluster in a shared VPC).
You will have to manually delete these two resources.
Next, let’s create another GKE VPC-native, zonal cluster, this time in the subnet-backend, which is also
hosted in your-app-shared-vpc and is located in the region us-central1 (Figure 5-20).
236
Chapter 5 ■ Configuring Load Balancing
Unlike before, in this exercise we just create two Kubernetes resources (instead of three), that is, a
deployment and a service, which targets the deployment. There is no ingress Kubernetes resource created
this time.
The manifest YAML files for the deployment and the service are displayed in Figures 5-21 and 5-22,
respectively.
237
Chapter 5 ■ Configuring Load Balancing
238
Chapter 5 ■ Configuring Load Balancing
Notice the difference between the two in the metadata.annotation section of the manifest (Figure 5-22,
lines 5–7).
During the creation of the service, the Kubernetes-GKE system integrator reads the annotation section,
and—this time—it doesn’t tell Google Cloud to create the load balancer and all its components anymore
(line 7 in Figure 5-22 is commented).
Let’s see what happens after creating the deployment and the service (Figure 5-23).
As you can see from line 6 of Figure 5-22, we are telling the Kubernetes-GKE system integrator to create
a new NEG named app-service-80-neg. In accordance with the declarative statement in line 6, we are
telling that this NEG should listen to TCP port 80 in order to distribute incoming HTTP requests to the pods
hosting our app deployment. With container-native load balancing, the pods hosting our deployment are the
backends serving incoming HTTP traffic.
To validate what we just described, let’s list the NEGs in our project (Figure 5-24).
239
Chapter 5 ■ Configuring Load Balancing
Indeed, here it is! Our newly created NEG uses a GCE_VM_IP_PORT endpoint type to indicate that
incoming HTTP(S) requests resolve to either the primary internal IP address of a Google Cloud VM’s NIC
(one of the GKE worker nodes, i.e., 192.168.1.0/26) or an alias IP address on a NIC, for example, pod IP
addresses in our VPC-native clusters neg-demo-cluster, that is, 192.168.15.0/24.
See Figure 3-11 as a reminder of the list of usable subnets for containers in our shared VPC setup in
Chapter 3.
So, now that you have your NEG ready, what’s next? With container-native load balancing through
standalone zonal NEGs, all you have automatically created is your NEG, nothing else. To put this NEG to
work—literally, so the endpoints (pods) in the NEG can start serving HTTP(S) requests—you are responsible
for creating all required load balancing infrastructure to use the NEG.
I’ll quickly walk you through this process to get you familiarized with each component of an external
global HTTP(S) load balancer as well as the related gcloud commands.
First, we need an ingress firewall rule to allow health check probes and incoming HTTP(S) traffic to
reach the NEG. Incoming traffic originates at the start point of Connection 2 in Figure 5-5, which in our case
is the GFE because our external HTTP(S) load balancer is global. The target of the ingress firewall rule is the
GKE worker nodes, whose network tags are assigned at creation time by GKE, as shown in Figure 5-25.
Since our network setup uses a shared VPC, the principal [email protected] is not responsible for
managing the network.
Let’s set the CLUSTER_NAME and NETWORK_TAGS environment variables for a principal who has
permissions to create the firewall rule, for example, [email protected] (Figure 5-26).
240
Chapter 5 ■ Configuring Load Balancing
Last, we need to create the frontend components, that is, the URL map, which will be attached to a new
target HTTP(S) proxy, which will be attached to a global forwarding rule.
Figure 5-29 describes the creation of the aforementioned frontend components of the HTTP(S) load
balancer.
241
Chapter 5 ■ Configuring Load Balancing
Notice how the URL map uses the newly created backend service neg-demo-cluster-lb-backend to
bridge the load balancer frontend and backend resources.
This is also visually represented in the URL map boxes in Figure 5-5.
As you can see, container-native load balancing through standalone zonal NEGs requires extra work on
your end. However, you get more control and flexibility than using container-native load balancing through
ingress because you create your own load balancing infrastructure and you configure it to your liking.
Last, remember to delete the load balancer components and the cluster to avoid incurring unexpected
charges (Figure 5-30).
Figure 5-30. Deleting load balancer infrastructure and the GKE cluster
242
Chapter 5 ■ Configuring Load Balancing
■■Exam tip There are multiple ways to set up an HTTPS load balancer in a shared VPC. One way is to use
a service project with ID backend-devs-7736 to host all the load balancer resources. Another way is to
distribute the frontend resources (i.e., the reserved external IP address, the forwarding rule, the target proxy, the
SSL certificate(s), the URL map) in a project (it can be the host or one of the service projects) and the backend
resources (i.e., the backend service, the health check, the backend) in another service project. This latter
approach is also called cross-project backend service referencing.
Figure 5-31 gives you an idea of what our setup will look like upon completion.
Notice the setup will be using the HTTPS protocol, which requires the creation of an SSL certificate in
use by the target HTTPS proxy.
For global external HTTPS load balancers, the SSL certificate is used for domain validation, and it can
be a self-managed or a Google-managed SSL certificate.
243
Chapter 5 ■ Configuring Load Balancing
■■Exam tip Regional external HTTPS load balancers can only use self-managed SSL certificates.
First and foremost, let’s refresh our memory on who can do what in the service project backend-devs.
This is done by viewing the project’s IAM allow policy as illustrated in Figure 5-32.
The two principals [email protected] and [email protected] hold the owner and the
editor roles, respectively.
As per Google recommendation, we will create a Google-managed SSL certificate (Figure 5-33).
Notice the --domains flag, which requires a comma-delimited list of domains. Google Cloud will
validate each domain in the list. For the load balancer to work, we will need to wait until both the managed.
status and the managed.domainStatus properties are ACTIVE. This process is lengthy and may take one
or more hours, depending on how quickly your domain provider can provide evidence that you own the
244
Chapter 5 ■ Configuring Load Balancing
domain. In my case, the validation took about an hour because I bought my domain dariokart.com from
Google workspace. Nevertheless, while we are waiting for the SSL certificate to become active, we can move
forward with the remaining steps.
Next, we need to create an instance template, which will be used by the managed instance group to
create new instances (VMs) as incoming traffic increases above the thresholds we will set up in the backend
service.
Since our global external HTTPS load balancer is intended to serve HTTPS requests, we need to make
sure our instances come with an HTTP server preinstalled. As a result, our instance template will make sure
Apache2 will be installed on the VM upon startup.
Also, the startup script is configured to show the hostname of the VM, which will be used by the
backend service to serve the incoming HTTPS request, as illustrated in Figure 5-34.
■■Note The VMs will be using a network tag allow-health-check to be allowed to be health-checked
from the CIDRs of Google Front End (GFE). We will use this network tag when we configure the load balancer
frontend.
With the instance template ready, we can create our managed instance group (MIG). The MIG will start
with two n1-standard-1 size VMs in us-central1-a, as shown in Figure 5-35.
■■Note With a shared VPC setup, you need to make sure the zone (us-central1-a) you are using to host
your MIG’s VMs is part of the region (us-central1) where the instance template operates, which in turn needs
to match the region of the subnet where your backend service will run (us-central1).
245
Chapter 5 ■ Configuring Load Balancing
Once the MIG is ready, we need to tell it which named port its VMs will be listening to in order to serve
incoming HTTPS traffic. Guess what named port we will use for HTTPS traffic? HTTPS, right ☺? No, it’s
going to be HTTP actually!
This is because our backends will be running in Google Cloud, and when backends run in Google Cloud,
traffic from the Google Front End (GFE) destined to the backends is automatically encrypted by Google. As a
result, there is no need to use an HTTPS named port.
Figure 5-36 illustrates how to set the list of key-value pairs for the desired MIG’s protocol and port. In
our case, the list contains only one key-value pair, that is, http:80.
■■Exam tip An instance group can use multiple named ports, provided each named port uses a different
name. As a result, the value of the --named-ports flag is a comma-delimited list of named-port:port
key:value pairs.
With our managed instance group properly created and configured, we need to make sure health
probes can access the two backend VMs using the named port http (see previous command in Figure 5-36),
which is mapped to the port 80 in the MIG.
As a result, the ingress firewall rule (IST type, i.e., Ingress Source and Target) in Figure 5-37 must be
created in our shared VPC—as you learned in Chapter 3, firewall rules are defined on a per–VPC network
basis—to allow incoming HTTP traffic (tcp:80) originating from the GFE CIDR blocks (130.211.0.0/22,
35.191.0.0/16) to reach the two backend VMs, which are tagged with the network tag allow-health-check.
246
Chapter 5 ■ Configuring Load Balancing
As of the writing of this book, global external HTTP(S) load balancers support IPv4 and IPv6 IP
addresses, whereas regional external HTTP(S) load balancers only support IPv4 IP addresses.
To make sure our backend VMs are continuously checked for health, we need to create a health check.
■■Exam tip As previously mentioned, just because you are using HTTPS—like in this exercise—on your
forwarding rule (Connection 1 in Figure 5-5), you don’t have to use HTTPS in your backend service and
backends (Connection 2). Instead, you can use HTTP for your backend service and backends. This is because
traffic between Google Front Ends (GFEs) and your backends is automatically encrypted by Google for backends
that reside in a Google Cloud VPC network.
In Figure 5-39, we will create a health check to monitor the health status of our backend VMs. Then, we
will use the newly created health check to create a backend service. Finally, we will add our newly created
managed instance group (the MIG we created in Figure 5-35) to the backend service.
With the backend service ready, we can create the remaining frontend resources, that is, the URL map,
which is required along with our SSL certificate to create the target HTTPS proxy, and the globally scoped
forwarding rule.
Figure 5-40 displays the creation of the aforementioned frontend GCP resources.
247
Chapter 5 ■ Configuring Load Balancing
Figure 5-40. URL map, target HTTPS proxy, and forwarding rule creation
The last step in this setup is to add a DNS A record, which is required to point our domain dariokart.com
to our load balancer.
Our load balancer global external IP address is 35.244.158.100 (Figure 5-38).
Since I bought my domain from Google Workspace, I will use the Google Workspace Admin Console
and Google Domains to create two DNS A records. Figure 5-41 shows the newly created DNS A records.
Figure 5-41. Adding DNS A records to resolve domain names to the VIP
When you save the changes, you get a notification that it may take some time to propagate the DNS
changes over the Internet, but in most cases, the propagation happens within an hour or less, depending on
how fast your domain registration service operates.
Now it’s a matter to wait for the SSL certificate to become active so that an SSL handshake can be
established between the clients and the target HTTPS proxy.
248
Chapter 5 ■ Configuring Load Balancing
After about an hour, the SSL certificate became active, as you can see in Figure 5-42.
Figure 5-42. The SSL certificate’s managed domain status becomes active
Figure 5-43 confirms all tests “hitting” the domain with HTTPS were successful!
Figure 5-43. Testing the HTTPS load balancer from Cloud Shell
More specifically, the first HTTPS request was served by the VM lb-backend-example-t0vc, and the
second request was served by the VM lb-backend-example-1dbf.
To further validate that the HTTPS load balancer can serve traffic from the Internet, I tried https://
dariokart.com from my phone. The result was also successful (Figure 5-44).
249
Chapter 5 ■ Configuring Load Balancing
Figure 5-44. Testing the HTTPS load balancer from a mobile device
Now it’s time to clean up—each resource we just created is billable—not to mention that the HTTPS
load balancer is exposed to the public Internet.
Figures 5-45 and 5-46 show you how to delete the load balancer’s resources.
250
Chapter 5 ■ Configuring Load Balancing
251
Chapter 5 ■ Configuring Load Balancing
After saving the changes, the two A records are gone (Figure 5-49).
What if your workload requires a solution to load-balance traffic other than HTTP(S)
instead?
To answer this question, you need to find out where the clients of your workload are located and
whether you want the load balancer to remember the client IP address.
If your workload requires access from the Internet (i.e., if your load balancer forwarding rule is
external), and its compute backends need to be distributed in more than one region—for example, because
252
Chapter 5 ■ Configuring Load Balancing
of reliability requirements—then Google Cloud offers the global external SSL proxy load balancer (type 3)
and the global external TCP proxy load balancer (type 4).
Figure 5-50 illustrates the architecture for these types of load balancer.
As noted in the Global Target (SSL or TCP) Proxy column in Figure 5-50, these global load balancers
are offered in the premium or in the standard tier. The difference is that with the premium tier, latency
is significantly reduced (compared to the standard tier) because ingress traffic enters the Google Global
Backbone from the closest Point of Presence (PoP) to the client—PoPs are geographically distributed around
the globe. In contrast, with standard tier ingress traffic stays in the Internet for a longer period of time and
enters the Google Global Backbone in the GCP region where the global forwarding rule lives.
Moreover, because these load balancers use a target proxy as an intermediary, they “break” the
connection from the client in two connections. As a result, the second connection has no clue of the client IP
address where the request originated.
■■Exam tip Even though SSL proxy and TCP proxy load balancers don’t preserve the client IP address by
default, there are workarounds on how to let them “remember” the client IP address. One of these workarounds
is by configuring the target (SSL or TCP) proxy to prepend a PROXY protocol version 1 header to retain the
original connection information as illustrated in the following: gcloud compute target-ssl-proxies
update my-ssl-lb-target-proxy \
--proxy-header=[NONE | PROXY_V1]
253
Chapter 5 ■ Configuring Load Balancing
a. Instance groups
b. Zonal standalone NEGs
c. Zonal hybrid NEGs
254
Chapter 5 ■ Configuring Load Balancing
255
Chapter 5 ■ Configuring Load Balancing
■■Exam tip External TCP proxy load balancers do not support end-to-end (client to backend) encryption.
However, connection 2 (proxy to backend) does support SSL if needed.
A key differentiating feature of a Google Cloud network load balancer is that it preserves
the source client IP address by default.
256
Chapter 5 ■ Configuring Load Balancing
This is because there are no proxies to terminate the client connection and start a new connection to
direct incoming traffic to the backends.
As you can see in Figure 5-51, there is no target proxy between the forwarding rule and the backend
service, and there is only one connection from the clients to the backends. For this reason, a network load
balancer is also referred to as a pass-through load balancer to indicate that the source client IP address is
passed to the backends intact.
A network load balancer comes in two flavors based on whether the IP address of the forwarding rule
is external (type 9) or internal (type 8). The former allows client IP addresses to be denoted in IPv6 format,
but this feature requires the premium network tier. The latter doesn’t allow clients in IPv6 format because
the IP space is RFC 1918, and there is no concern about IP space saturation. For this reason, clients for type 8
network load balancers are denoted with the Google Cloud Compute Engine icon (VM) to indicate that they
are either VMs in Google Cloud or VMs on-premises.
■■Exam tip Network load balancers are regionally scoped to indicate that backends can only be located in a
single region.
The way session affinity is handled is another differentiating factor that makes a network load balancer
unique when compared to others in Google Cloud.
While HTTP(S)-based load balancers leverage HTTP-specific constructs (e.g., cookies, headers, etc.), a
network load balancer—by virtue of being a layer 4 load balancer—can leverage any combination of source,
destination, port, and protocols to determine session affinity semantics.
The following are some of the common use cases when a network load balancer is a good fit:
• Your workload requires to load-balance non-TCP traffic (e.g., UDP, ICMP, ESP) or an
unsupported TCP port by other load balancers.
• It is acceptable to have SSL traffic decrypted by your backends instead of by the load
balancer. The network load balancer cannot perform this task. When the backends
decrypt SSL traffic, there is a greater CPU burden on the backends.
• It is acceptable to have SSL traffic decrypted by your backends using self-managed
certificates. Google-managed SSL certificates are only available for HTTP(S) load
balancers (types 1, 2, 5, and 6) and external SSL proxy load balancers (type 3).
• Your workload is required to forward the original client packets unproxied.
• Your workload is required to migrate an existing pass-through-based workload
without changes.
• Your workload requires advanced network DDoS protection, which necessitates the
use of Google Cloud Armor.
Examples
A simple example of a type 8 load balancer is a shopping cart application as illustrated in Figure 5-52. The
load balancer operates in us-central1, but its internal regional forwarding rule is configured with the
–allow-global-access flag set to true.
257
Chapter 5 ■ Configuring Load Balancing
As a result, clients in a different region (scenario “a”) of the same VPC can use the shopping cart
application. Also, clients in a VPC peered to the ILB’s VPC can use the application (scenarios “b” and “d”).
Even on-premises clients can use the shopping cart application (scenario “c”).
Figure 5-53 illustrates another example of a type 8 load balancer, which is used to distribute TCP/UDP
traffic between separate (RFC 1918) tiers of a three-tier web application.
258
Chapter 5 ■ Configuring Load Balancing
Figure 5-53. Three-tier application with a regional internal network TCP/UDP load balancer
Implementation
Let’s see now how a regional internal network TCP/UDP load balancer works.
As illustrated in Figure 5-54, a type 8 load balancer is a software-defined load balancer, which leverages
the Google Cloud software-defined network virtualization stack Andromeda. Andromeda acts as the middle
proxy, thereby eliminating the risk of choke points and helping clients select the optimal backend ready to
serve requests.
259
Chapter 5 ■ Configuring Load Balancing
In comparing traditional internal load balancing with software-defined load balancing on GCP,
Figure 5-54 also highlights the distributed nature of software-defined components, that is, network load
balancing, but also GCP firewall rules. The latter are also software-defined, distributed resources, which
result in the elimination of choke points—when compared to traditional firewalls, as you learned in
Chapters 2 and 3.
You definitely need an HTTPS proxy solution, which terminates the connection from the clients and
takes the burden of decrypting the incoming packets.
The Google Cloud internal HTTP(S) load balancer (type 6) does just that!
260
Chapter 5 ■ Configuring Load Balancing
The top part of Figure 5-55 illustrates the architecture of this type of load balancer.
This load balancer type is regional and proxy-based, operates at layer 7 in the OSI model, and enables
you to run and scale your services behind an internal IP address.
These services operate in the form of backends hosted on one of the following:
• Zonal/regional instance groups
• Zonal NEGs
• Regional serverless NEGs
• Zonal hybrid NEGs
• Regional Private Service Connect NEGs
By default, the Google Cloud internal HTTP(S) load balancer (type 6) is accessible only from IPv4
clients in the same region of the internal regional forwarding rule’s region.
If your workload requires access from clients located in a region other than the internal forwarding
rule’s region, then you must configure the load balancer internal forwarding rule to allow global access as
indicated in Figure 5-55.
261
Chapter 5 ■ Configuring Load Balancing
Last, an internal HTTP(S) load balancer is a managed service based on the open source Envoy proxy.
This enables rich traffic control capabilities based on HTTP(S) parameters. After the load balancer has been
configured, Google Cloud automatically allocates Envoy proxies in your designated proxy-only subnet to
meet your traffic needs.
Another “flavor” of Google Cloud internal, proxy-based load balancers is the Google Cloud internal TCP
proxy load balancer (type 7). See the bottom part of Figure 5-55.
This type of load balancer is essentially the internal version of the Google Cloud external TCP proxy
load balancer (type 4), with a couple of caveats you need to remember for the exam.
■■Exam tip The type 7 load balancer’s target proxy is an Envoy proxy managed by Google Cloud. As a result,
this proxy is located in your designated proxy-only subnet instead of a GFE location.
Unlike the case of type 4 load balancers, Regional Private Service Connect NEGs are a valid backend option
for this type of load balancer. This makes sense because the load balancer is internal, and by operating in
RFC 1918 IP address space, it has full access to Google APIs and services with a proper subnet configuration
(gcloud compute networks subnets update --enable-private-ip-google-access).
262
Chapter 5 ■ Configuring Load Balancing
Additionally, the decision tree in Figure 5-57 is also provided to help you choose what load balancer
best suits your workload load balancing requirements.
263
Chapter 5 ■ Configuring Load Balancing
There are many criteria that drive a decision on the best-suited load balancer. These can be grouped in
categories that map to the five pillars of the well-architected framework, as outlined in the following:
• Protocol from client to frontend: Performance, technical requirements
• Protocol from frontend to backends: Performance
• Backends: Performance, operational efficiency
• IP addresses: Technical requirements
• Network topology: Resilience, operational efficiency
• Failover: Resilience
• Session affinity: Business requirements
264
Chapter 5 ■ Configuring Load Balancing
• Routing: Performance
• Autoscaling and self-healing: Elasticity, resilience
• Security: Security
The decision tree in Figure 5-57 is not exhaustive, but highlights the right mix of criteria (in the features
column) you need to consider for your load balancer, for example, backends and security.
■■Exam tip Cloud Armor is supported for all external load balancer types but the regional external HTTP(S)
load balancer (type 5). Identity-Aware Proxy (IAP) is supported by all HTTP(S) load balancer types (types 1, 2,
5, and 6). SSL offload is supported by all external proxy-based load balancer types but the global external TCP
proxy load balancer (type 4).
Protocol Forwarding
Protocol forwarding is a Compute Engine feature that lets you create forwarding rule objects that can send
packets to a single target Compute Engine instance (VM) instead of a target proxy.
A target instance contains a single VM that receives and handles traffic from the corresponding
forwarding rule.
To create a target instance, you can use the gcloud command:
■■Note The preceding command assumes you already have a VM your-vm, and it creates a target instance
Google Cloud resource, which is different from the actual VM resource.
You then use the newly created target instance resource to create the forwarding rule with protocol
forwarding:
265
Chapter 5 ■ Configuring Load Balancing
For external protocol forwarding and virtual private networks (VPNs), Google Cloud supports protocol
forwarding for the AH (Authentication Header), ESP (Encapsulating Security Payload), ICMP (Internet
Control Message Protocol), SCTP (Stream Control Transmission Protocol), TCP (Transmission Control
Protocol), and UDP (User Datagram Protocol) protocols.
For internal protocol forwarding, only TCP and UDP are supported.
■■Exam tip The serving capacity of a load balancer is always defined in the load balancer’s backend service.
When you configure autoscaling for a MIG that serves requests from an HTTP(S) load balancer (types 1, 2, 5,
and 6), the serving capacity of your load balancer is based on either utilization or rate (requests per second, i.e.,
RPS, or queries per second, i.e., QPS) as shown in Figure 5-5.
The values of the following fields in the backend services resource determine the backend’s behavior:
• A balancing mode, which defines how the load balancer measures backend
readiness for new requests or connections.
• A target capacity, which defines a target maximum number of connections, a target
maximum rate, or target maximum CPU utilization.
• A capacity scaler, which adjusts overall available capacity without modifying the
target capacity. Its value can be either 0.0 (preventing any new connections) or a
value between 0.1 (10%) and 1.0 (100% default).
These fields can be set using the gcloud compute backend-services add-backend command, whose
synopsis is displayed in Figure 5-58.
266
Chapter 5 ■ Configuring Load Balancing
Each load balancer type supports different balancing modes, and the balancing mode is based on what
backend type is associated to the backend service.
Google Cloud has three balancing modes:
• CONNECTION: Determines how the load is spread based on the number of concurrent
connections that the backend can handle.
• RATE: The target maximum number of requests (queries) per second (RPS, QPS). The
target maximum RPS/QPS can be exceeded if all backends are at or above capacity.
• UTILIZATION: Determines how the load is spread based on the utilization of
instances in an instance group.
Figure 5-59 shows which balancing mode is supported for each of the nine load balancer types based
on backends.
267
Chapter 5 ■ Configuring Load Balancing
Security Policies
Google Cloud Armor uses security policies to protect your application from common web attacks. This is
achieved by providing layer 7 filtering and by parsing incoming requests in a way to potentially block traffic
before it reaches your load balancer’s backend services or backend buckets.
Each security policy is comprised of a set of rules that filter traffic based on conditions such as an
incoming request’s IP address, IP range, region code, or request headers.
Google Cloud Armor security policies are available only for backend services of global external HTTP(S)
load balancers (type 1), global external HTTP(S) load balancers (classic) (type 2), global external SSL proxy
load balancers (type 3), or global external TCP proxy load balancers (type 4). The load balancer can be in a
premium or standard tier.
268
Chapter 5 ■ Configuring Load Balancing
The backends associated to the backend service can be any of the following:
• Instance groups
• Zonal network endpoint groups (NEGs)
• Serverless NEGs: One or more App Engine, Cloud Run, or Cloud Functions services
• Internet NEGs for external backends
• Buckets in Cloud Storage
■■Exam tip When you use Google Cloud Armor to protect a hybrid or a multi-cloud deployment, the backends
must be Internet NEGs. Google Cloud Armor also protects serverless NEGs when traffic is routed through a load
balancer. To ensure that only traffic that has been routed through your load balancer reaches your serverless
NEG, see Ingress controls.
Google Cloud Armor also provides advanced network DDoS protection for regional external TCP/
UDP network load balancers (type 9), protocol forwarding, and VMs with public IP addresses. For more
information about advanced DDoS protection, see Configure advanced network DDoS protection.
Adaptive Protection
Google Cloud Armor Adaptive Protection helps you protect your Google Cloud applications, websites, and
services against L7 DDoS attacks such as HTTP floods and other high-frequency layer 7 (application-level)
malicious activity. Adaptive Protection builds machine learning models that do the following:
• Detect and alert on anomalous activity
• Generate a signature describing the potential attack
• Generate a custom Google Cloud Armor WAF (web application firewall) rule to block
the signature
You enable or disable Adaptive Protection on a per–security policy basis.
Full Adaptive Protection alerts are available only if you subscribe to Google Cloud Armor Managed
Protection Plus. Otherwise, you receive only a basic alert, without an attack signature or the ability to deploy
a suggested rule.
269
Chapter 5 ■ Configuring Load Balancing
When you select a sensitivity level for your WAF rule, you opt in signatures at the sensitivity levels less
than or equal to the selected sensitivity level. In the following example, you tune a preconfigured WAF rule
by selecting the sensitivity level of 1:
inIpRange(origin.ip, '9.9.9.0/24')
270
Chapter 5 ■ Configuring Load Balancing
Figure 5-60. Example of two security policies applied to different backend services
In the example, these are the Google Cloud Armor security policies:
• mobile-clients-policy, which applies to external users of your game services
• internal-users-policy, which applies to your organization’s test-network team
You apply mobile-clients-policy to the game service, whose backend service is called games, and
you apply internal-users-policy to the internal test service for the testing team, whose corresponding
backend service is called test-network.
If the backend instances for a backend service are in multiple regions, the Google Cloud Armor security
policy associated with the service is applicable to instances in all regions. In the preceding example, the
security policy mobile-clients-policy is applicable to instances 1, 2, 3, and 4 in us-central1 and to
instances 5 and 6 in us-east1.
271
Chapter 5 ■ Configuring Load Balancing
E xample
Create the Google Cloud Armor security policies:
In the preceding commands, the first (and only) positional argument denotes the security policy
priority, which is an integer ranging from 0 (highest) to 2147483647 (lowest).
Add rules to the security policies:
In the preceding commands, the two CIDR blocks 192.0.2.0/24 and 198.51.100.0/24 denote Internet
reserved IP addresses scoped for documentation and examples.
Attach the security policies to the backend services:
272
Chapter 5 ■ Configuring Load Balancing
273
Chapter 5 ■ Configuring Load Balancing
The GFE determines whether a cached response to the user’s request exists in the cache, and if it does,
it returns the cached response to the user without any further action. This interaction is called cache hit,
because Cloud CDN was able to serve the request from the user by retrieving the cached response directly
from the cache, thereby avoiding an extra round-trip to the origin servers (backends), as well as the time
spent regenerating the content. A cache hit is displayed with a green arrow in Figure 5-61.
Conversely, if the GFE determines that a cached response does not exist in the cache—for example,
when the cache has no entries or when a request has been sent for the first time—the request is forwarded
to the HTTP(S) load balancer and eventually reaches the origin servers (backends) for processing. Upon
completion, the computed content is packaged in an HTTP(S) response and is sent back to the cache, which
becomes replenished, and is then sent back to the user. This interaction is called a cache miss, because the
GFE failed to retrieve a response from the cache and was forced to reach the origin servers in order to serve
the request from the user. A cache miss is displayed with a red arrow in Figure 5-61.
If the origin server’s response to this request is cacheable, Cloud CDN stores the response in the Cloud
CDN cache for future requests. Data transfer from a cache to a client is called cache egress. Data transfer to a
cache is called cache fill.
274
Chapter 5 ■ Configuring Load Balancing
Optional flags:
• --no-cache-key-include-protocol
• --no-cache-key-include-host
• --no-cache-key-include-query-string
Additionally, upon enabling Cloud CDN you can choose whether Cloud CDN should cache all content,
only static content, or selectively pick and choose which content to cache based on a setting in the origin
server. This can be achieved using the --cache-mode optional flag, whose value can be one the following:
• FORCE_CACHE_ALL, which caches all content, ignoring any private, no-store, or no-
cache directives in Cache-Control response headers.
• CACHE_ALL_STATIC, which automatically caches static content, including common
image formats, media (video and audio), and web assets (JavaScript and CSS).
Requests and responses that are marked as uncacheable, as well as dynamic content
(including HTML), aren’t cached.
• USE_ORIGIN_HEADERS, which requires the origin to set valid caching headers to
cache content. Responses without these headers aren’t cached at Google’s edge and
require a full trip to the origin on every request, potentially impacting performance
and increasing load on the origin servers.
■■Warning Setting the cache-mode to FORCE_CACHE_ALL may result in Cloud CDN caching private,
per-user (Personally Identifiable Information—PII) content. You should only enable this on backends that are not
serving private or dynamic content, such as storage buckets. To learn more, visit https://csrc.nist.gov/
glossary/term/PII.
Cacheable Responses
A cacheable response is an HTTP response that Cloud CDN can store and quickly retrieve, thus allowing
for faster load times resulting in lower latencies and better user experiences. Not all HTTP responses are
cacheable. Cloud CDN stores responses in cache if all the conditions listed in Figure 5-62 are true.
275
Chapter 5 ■ Configuring Load Balancing
■■Exam tip You don’t need to memorize all the preceding criteria for the exam. However, the ones you should
remember are the first two and the last, that is, Cloud CDN must be enabled for a backend service or a backend
bucket; only responses to GET requests may be cached, and there is a limit to cacheable content size.
276
Chapter 5 ■ Configuring Load Balancing
• http://dariokart.com/images/supermario.jpg?user=user1
• https://dariokart.com/images/supermario.jpg?user=user2
• https://media.dariokart.com/images/supermario.jpg
• https://www.dariokart.com/images/supermario.jpg
■■Exam tip You can change which parts of the URI are used in the cache key. While the filename and path
must always be part of the key, you can include or omit any combination of protocol, host, or query string when
customizing your cache key.
277
Chapter 5 ■ Configuring Load Balancing
Use this command to add the strings user and time to an exclude list:
278
Chapter 5 ■ Configuring Load Balancing
Cache Invalidation
After an object is cached, it remains in the cache until it expires or is evicted to make room for new content.
You can control the expiration time through the standard HTTP header Cache-Control (www.rfc-editor.
org/rfc/rfc9111#section-5.2).
Cache invalidation is the action of forcibly removing an object (a key-value pair) from the cache prior to
its normal expiration time.
■■Exam tip The Cache-Control HTTP header field holds the directives (instructions) displayed in
Figure 5-63—in both requests and responses—that control caching behavior. You don’t need to know each
of the sixteen Cache-Control directives, but it’s important you remember the two directives no-store and
private. The former indicates not to store any content in any cache—whether it be a private cache (e.g.,
local cache in your browser) or a shared cache (e.g., proxies, Cloud CDN, and other Content Delivery Network
caches). The latter indicates to only store content in private caches.
279
Chapter 5 ■ Configuring Load Balancing
P ath Pattern
Each invalidation request requires a path pattern that identifies the exact object or set of objects that
should be invalidated. The path pattern can be either a specific path, such as /supermario.png, or an entire
directory structure, such as /pictures/*. The following rules apply to path patterns:
• The path pattern must start with /.
• It cannot include ? or #.
• It must not include an * except as the final character following a /.
• If it ends with /*, the preceding string is a prefix, and all objects whose paths begin
with that prefix are invalidated.
• The path pattern is compared with the path component of the URL, which is
everything between the hostname and any ? or # that might be present.
■■Exam tip If you have URLs that contain a query string, for example, /images.php?image=supermario.
png, you cannot selectively invalidate objects that differ only by the value of the query string. For example, if
you have two images, /images.php?image=supermario.png and /images.php?image=luigi.png, you
cannot invalidate only luigi.png. You have to invalidate all images served by images.php, by using
/images.php as the path pattern.
The next sections describe how to invalidate your Cloud CDN cached content.
For example, if a file located at /images/luigi.jpg has been cached and needs to be invalidated, you
can use several methods to invalidate it, depending on whether you want to affect only that file or a wider
scope. In each case, you can invalidate for all hostnames or for only one hostname.
To invalidate a single file for a single host, add the --host flag as follows:
By default, the Google Cloud CLI waits until the invalidation has completed. To perform the
invalidation in the background, append the --async flag to the command line.
280
Chapter 5 ■ Configuring Load Balancing
To invalidate the whole directory for a single host, add the --host flag as follows:
To perform the invalidation in the background, append the --async flag to the command line.
I nvalidate Everything
To invalidate all directories for all hosts, use the command
To invalidate all directories for a single host, add the --host flag as follows:
To perform the invalidation in the background, append the --async flag to the command line.
Signed URLs
Signed URLs give time-limited resource access to anyone in possession of the URL, regardless of whether the
user has a Google Account.
A signed URL is a URL that provides limited permission and time to make a request. Signed URLs
contain authentication information in their query strings, allowing users without credentials to perform
specific actions on a resource. When you generate a signed URL, you specify a user or service account that
must have sufficient permission to make the request associated with the URL.
After you generate a signed URL, anyone who possesses it can use the signed URL to perform specified
actions (such as reading an object) within a specified period of time.
281
Chapter 5 ■ Configuring Load Balancing
To keep the keys secret, the key values aren’t included in responses to any API requests. If you lose a
key, you must create a new one.
■■Exam tip Keep the generated key file private, and do not expose it to users or store it directly in source
code. Consider using a secret storage mechanism such as Cloud Key Management Service to encrypt the key
and provide access to only trusted applications.
First, generate a strongly random key and store it in the key file with the following command:
To list the keys on a backend service or backend bucket, run one of the following commands:
When URLs signed by a particular key should no longer be honored, run one of the following
commands to delete that key from the backend service or backend bucket. This will prevent users from
consuming the URL that was signed with KEY_NAME:
S
igning URLs
Use these instructions to create signed URLs by using the gcloud compute sign-url command as follows:
282
Chapter 5 ■ Configuring Load Balancing
--key-file KEY_FILE_NAME \
--expires-in TIME_UNTIL_EXPIRATION \
[--validate]
This command reads and decodes the base64url encoded key value from KEY_FILE_NAME and then
outputs a signed URL that you can use for GET or HEAD requests for the given URL.
For example, the command
creates a signed URL that expires in one hour. For more information about time format, visit https://
cloud.google.com/sdk/gcloud/reference/topic/datetimes.
■■Exam tip The URL must be a valid URL that has a path component. For example, http://dariokart.
com is invalid, but https://dariokart.com/ and https://dariokart.com/whatever are both
valid URLs.
If the optional --validate flag is provided, this command sends a HEAD request with the resulting
URL and prints the HTTP response code.
If the signed URL is correct, the response code is the same as the result code sent by your backend.
If the response code isn’t the same, recheck KEY_NAME and the contents of the specified file, and make
sure that the value of TIME_UNTIL_EXPIRATION is at least several seconds.
If the --validate flag is not provided, the following are not verified:
• The inputs
• The generated URL
• The generated signed URL
The URL returned from the Google Cloud CLI can be distributed according to your needs.
■■Note We recommend signing only HTTPS URLs, because HTTPS provides a secure transport that prevents
the signature component of the signed URL from being intercepted. Similarly, make sure that you distribute the
signed URLs over secure transport protocols such as TLS/HTTPS.
Custom Origins
A custom origin is an Internet network endpoint group (NEG), that is, a backend that resides outside of
Google Cloud and is reachable across the Internet.
283
Chapter 5 ■ Configuring Load Balancing
The best practice is to create the Internet NEG with the INTERNET_FQDN_PORT endpoint type and
an FQDN (Fully Qualified Domain Name) value as an origin hostname value. This insulates the Cloud CDN
configuration from IP address changes in the origin infrastructure. Network endpoints that are defined by
using FQDNs are resolved through public DNS. Make sure that the configured FQDN is resolvable through
Google Public DNS.
After you create the Internet NEG, the type cannot be changed between INTERNET_FQDN_PORT and
INTERNET_IP_PORT. You need to create a new Internet NEG and change your backend service to use the
new Internet NEG.
Figure 5-65 shows an Internet NEG used to deploy an external backend with HTTP(S) load balancing
and Cloud CDN.
284
Chapter 5 ■ Configuring Load Balancing
Best Practices
In this last section, you will learn a few load balancing best practices I want to share based on my experience
and my research with GCP.
285
Chapter 5 ■ Configuring Load Balancing
Leverage the OSI layer 3–7 protection and the geolocation and WAF (web application firewall) defense
capabilities offered by Cloud Armor.
When Cloud Armor combines forces with IAP, you are significantly strengthening your workloads’
security posture.
Identity-Aware Proxy is a Google Cloud service that accelerates you on your way to a Zero Trust
Security Model.
■■Exam tip QUIC is a transport layer protocol (layer 4) developed by Google, which is faster, more efficient,
and more secure than earlier protocols, for example, TCP. For the exam, you need to know that QUIC is only
supported by global HTTP(S) load balancers, that is, types 1 and 2. For increased speed, QUIC uses the UDP
transport protocol, which is faster than TCP but less reliable. It sends several streams of data at once to make
up for any data that gets lost along the way, a technique known as multiplexing. For better security, everything
sent over QUIC is automatically encrypted. Ordinarily, data has to be sent over HTTPS to be encrypted. But QUIC
has TLS encryption built-in by default.
286
Chapter 5 ■ Configuring Load Balancing
Exam Questions
Question 5.1 (Backend Services)
You are configuring the backend service for a new Google Cloud HTTPS load balancer. The application
requires high availability and multiple subnets and needs to scale automatically. Which backend
configuration should you choose?
A.
A zonal managed instance group
B.
A regional managed instance group
C.
An unmanaged instance group
D.
A network endpoint group
Rationale
A is not correct because it would only allow the use of a single zone within
a region.
B is CORRECT because it allows the application to be deployed in multiple
zones within a region.
C is not correct because it does not allow for autoscaling.
D is not correct because traffic cannot be distributed across multiple subnets and
is a singular NEG as opposed to multiple NEGs.
287
Chapter 5 ■ Configuring Load Balancing
A.
Maximum CPU utilization: 60 and Maximum RPS: 80
B.
Maximum CPU utilization: 80 and Capacity: 80
C.
Maximum RPS: 80 and Capacity: 80
D.
Maximum CPU: 60, Maximum RPS: 80, and Capacity: 80
Rationale
A is not correct because this reduces both the CPU utilization and requests per
second, resulting in more than a 20% reduction.
B is CORRECT because you are changing the overall instance group
utilization by 20%.
C is not correct because this reduces the requests per second by more than 20%.
D is not correct because this reduces both max CPU and RPS, resulting in greater
than 20%.
288
Chapter 5 ■ Configuring Load Balancing
A.
Create a new load balancer, and update VPC firewall rules to allow test clients.
B.
Create a new load balancer, and update the VPC Service Controls perimeter to
allow test clients.
C.
Add the backend service to the existing load balancer, and modify the existing
Cloud Armor policy.
D.
Add the backend service to the existing load balancer, and add a new Cloud
Armor policy and target test-network.
R
ationale
A is not correct because the HTTPS load balancer acts as a proxy and doesn’t
provide the correct client IP address.
B is not correct because VPC Service Controls protects Google Managed Services.
C is not correct because this change would allow everyone to access the test service.
D is CORRECT because this provides integration and support for multiple
backend services. Also, a Cloud Armor Network Security Policy is attached
to backend services in order to whitelist/blacklist client CIDR blocks, thus
allowing traffic to specific targets. In this case, a Cloud Armor Network
Security Policy would allow (whitelist) incoming requests originated by the
selected testers’ IP ranges to reach the test-network backend and deny them
(blacklist) access to the game backends.
289
Chapter 5 ■ Configuring Load Balancing
R
ationale
A is CORRECT because Cloud CDN will front (cache) static content from a
Cloud Storage bucket and move the graphical resources closest to the users.
B,C are not correct because Cloud CDN requires and HTTPS proxy.
D is not correct because dynamic routing will not help serve additional web
clients.
R
ationale
A is not correct because the stated limitation is not the result of CPU utilization,
and this method is inefficient.
B is not correct because doubling the number of instances is inefficient.
C is not correct because the stated limitation is on requests per second, not CPU
utilization.
D is CORRECT because the autoscaling method leverages the load balancer
and efficiently scales the instances.
290
CHAPTER 6
Configuring Advanced
Network Services
You learned in Chapter 5 how Google Cloud load balancing comprises an ecosystem of products and
services. Load balancing alone includes nine different types of load balancers, and most of them are
available in the two network service tiers, that is, premium and standard.
While load balancing focuses on the performance and reliability aspects of your workloads, there are
other important factors you need to consider when designing the network architecture of your workloads.
In this chapter, our focus will shift toward security. I already mentioned it once, but you should
also have started to notice how security and networking are two sides of the same coin: there is no well-
architected workload designed without addressing network and security—altogether.
In this chapter, you will learn how to configure three advanced network services, that is, Cloud DNS,
Cloud NAT, and Packet Mirroring policies.
These three advanced network services supplement nicely the capabilities offered by the GCP load
balancers and when properly used will reinforce the security posture of your workloads.
Let’s get started!
■■Note Cloud DNS creates NS (Name Server) and SOA (Start of Authority) records for you automatically when
you create the zone. Do not change the name of your zone’s NS record, and do not change the list of name
servers that Cloud DNS selects for your zone.
■■Note As you learned in Chapter 3, every Google Cloud new project has a default network (an auto-mode
VPC) that has one subnet in each region. The subnet CIDR blocks have IPv4 ranges only and are automatically
assigned for you. The subnets and all subnet ranges fit inside the 10.128.0.0/9 CIDR block.
If you receive an accessNotConfigured error, you must enable the Cloud DNS API.
To change the networks to which a private zone is visible:
292
Chapter 6 ■ Configuring Advanced Network Services
--networks=default,your-app-shared-vpc \
--visibility=private \
--forwarding-targets=8.8.8.8,8.8.4.4
■■Exam tip VPC network peering is not the same as DNS peering. VPC network peering allows VMs in
multiple projects (even in different organizations) to reach each other, but it does not change name resolution.
Resources in each VPC network still follow their own resolution order.
In contrast, through DNS peering, you can allow requests to be forwarded for specific zones to another VPC
network. This lets you forward requests to different Google Cloud environments, regardless of whether the VPC
networks are connected.
VPC network peering and DNS peering are also set up differently. For VPC network peering, both VPC networks
need to set up a peering relationship to the other VPC network. The peering is then automatically bidirectional.
DNS peering unidirectionally forwards DNS requests and does not require a bidirectional relationship between
VPC networks. A VPC network referred to as the DNS consumer network performs lookups for a Cloud DNS
peering zone in another VPC network, which is referred to as the DNS producer network. Users with the IAM
permission dns.networks.targetWithPeeringZone on the producer network’s project can establish DNS
peering between consumer and producer networks. To set up DNS peering from a consumer VPC network, you
require the DNS peer role for the producer VPC network’s host project. We will discuss DNS peering in detail
shortly, but if you can’t wait to see how this works, have a look at Figure 6-3.
293
Chapter 6 ■ Configuring Advanced Network Services
Managing Records
Managing DNS records for the Cloud DNS API involves sending change requests to the API. This page
describes how to make changes, consisting of additions and deletions to or from your resource record sets
collection. This page also describes how to send the desired changes to the API using the import, export, and
transaction commands.
Before learning how to perform an operation on a DNS resource record, let’s review the list of resource
record types. Figure 6-1 displays the complete list.
294
Chapter 6 ■ Configuring Advanced Network Services
You add or remove DNS records in a resource record set by creating and executing a transaction that
specifies the operations you want to perform. A transaction is a group of one or more record changes that
should be propagated altogether and atomically, that is, either all or nothing in the event the transaction
fails. The entire transaction either succeeds or fails, so your data is never left in an intermediate state.
You start a transaction using the gcloud dns record-sets transaction start command as follows:
where --zone is the name of the managed zone whose record sets you want to manage.
To add a record to a transaction, you use the transaction add command as follows:
where
• --name is the DNS or domain name of the record set to add.
• --ttl is the TTL (time to live in seconds) for the record set.
• --type is the record type described in Figure 6-1.
• --zone is the name of the managed zone whose record sets you want to manage.
To execute a transaction, you use the execute command as follows:
where
• --name is the DNS or domain name of the record set to add.
• --ttl is the TTL (time to live in seconds) for the record set.
• --type is the record type described in Figure 6-1.
• --zone is the name of the managed zone whose record sets you want to manage.
To remove a record as part of a transaction, you use the remove command as follows:
295
Chapter 6 ■ Configuring Advanced Network Services
where
• --name is the DNS or domain name of the record set to remove.
• --ttl is the TTL (time to live in seconds) for the record set.
• --type is the record type described in the table.
To replace an existing record, issue the remove command followed by the add command.
■■Note You can also edit transaction.yaml in a text editor to manually specify additions, deletions, or
corrections to DNS records. To view the contents of transaction.yaml, run
gcloud dns record-sets transaction describe
To import record sets, you can use import and export to copy record sets into and out of a managed
zone. The formats you can import from and export to are either BIND zone file format or YAML
records format:
To export a record set, use the dns record-sets export command. To specify that the record sets are
exported into a BIND zone–formatted file, use the --zone-file-format flag. For example:
■■Exam tip If you omit the --zone-file-format flag, the gcloud dns record-sets export
command exports the record set into a YAML-formatted records file.
---
kind: dns#resourceRecordSet
name: dariokart.com.
rrdatas:
- ns-gcp-private.googledomains.com.
ttl: 21600
type: NS
---
kind: dns#resourceRecordSet
name: dariokart.com.
296
Chapter 6 ■ Configuring Advanced Network Services
rrdatas:
- ns-gcp-private.googledomains.com. cloud-dns-hostmaster.google.com. 1 21600 3600 259200 300
ttl: 21600
type: SOA
---
kind: dns#resourceRecordSet
name: host1.dariokart.com.
rrdatas:
- 192.0.2.91
ttl: 300
type: A
To display the current DNS records for your zone, use the gcloud dns record-sets list command:
The command outputs the JSON response for the resource record set for the first 100 records (default).
You can specify these additional parameters:
• limit: Maximum number of record sets to list.
• name: Only list record sets with this exact domain name.
• type: Only list records of this type. If present, the --name parameter must also be
present.
297
Chapter 6 ■ Configuring Advanced Network Services
■■Warning If your import file contains NS or SOA records for the apex of the zone, they will conflict with
the preexisting Cloud DNS records. To use the preexisting Cloud DNS records (recommended), ensure that you
remove the NS or SOA records from your import file. However, there are use cases for overriding this behavior,
which goes beyond the scope of the exam.
To import record sets correctly, you must remove the apex records:
V erify the Migration
To monitor and verify that the Cloud DNS name servers have picked up your changes, you can use the Linux
watch and dig commands.
First, look up your zone's Cloud DNS name servers using the gcloud dns managed-zones describe
command:
nameServers:
- ns-cloud-a1.googledomains.com.
- ns-cloud-a2.googledomains.com.
- ns-cloud-a3.googledomains.com.
- ns-cloud-a4.googledomains.com.
Replace ZONE_NAME_SERVER with one of the name servers returned when you ran the previous
command.
298
Chapter 6 ■ Configuring Advanced Network Services
If the output shows that all changes have propagated, you’re done. If not, you can check intermittently,
or you can automatically run the command every two seconds while you wait for the name servers to
change. To do that, run the following:
299
Chapter 6 ■ Configuring Advanced Network Services
■■Note A DNS policy that enables outbound DNS forwarding disables resolution of Compute Engine internal
DNS and Cloud DNS managed private zones. An outbound server policy is one of two methods for outbound
DNS forwarding.
300
Chapter 6 ■ Configuring Advanced Network Services
As a Google Cloud Professional Cloud Network Engineer, you need to understand your business and
technical requirements so that you can determine where an authoritative service for all domain resolution
takes place.
Does it make sense to have an authoritative service for all domain resolution on-premises,
in Google Cloud, or both?
Let’s discuss these three approaches and learn when either one of the three is better suited than
the other.
301
Chapter 6 ■ Configuring Advanced Network Services
302
Chapter 6 ■ Configuring Advanced Network Services
Split-Horizon DNS
Split-horizon DNS can provide a mechanism for security and privacy management by logical or physical
separation of DNS resolution for internal network access (RFC 1918) and access from an unsecure, public
network (e.g., the Internet).
Cloud DNS can be used as the authoritative name server to resolve your domains on the Internet
through public DNS zones and use private DNS zones to perform internal DNS resolution for your private
GCP networks.
DNS Peering
In large Google Cloud environments, Shared VPC is a very scalable network design that lets an organization
connect resources from multiple projects to a common Virtual Private Cloud (VPC) network, so that
they can communicate with each other securely and efficiently using internal IPs. Typically shared by
many application teams, a central team (or platform team) often manages the Shared VPC’s networking
configuration, while application teams use the network resources to create applications in their own service
projects.
In some cases, application teams want to manage their own DNS records (e.g., to create new DNS
records to expose services, update existing records, etc.). There’s a solution to support fine-grained IAM
policies using Cloud DNS peering. In this section, we will explore how to use it to give your application
teams autonomy over their DNS records while ensuring that the central networking team maintains fine-
grained control over the entire environment.
303
Chapter 6 ■ Configuring Advanced Network Services
With DNS peering, you can create a Cloud DNS private peering zone and configure it to perform DNS
lookups in a VPC network where the records for that zone’s namespace are available.
The VPC network where the DNS private peering zone performs lookups is called the DNS producer
network, as indicated in Figure 6-3. The project that owns the producer network is called the producer
project, referred to as project-p in the figure.
The VPC network where DNS queries originate is called the DNS consumer network. The project that
owns the consumer network is called the consumer project, referred to as project-c in the same figure.
Figure 6-3 shows you how to create a DNS peering zone with the gcloud CLI.
First, as indicated in step a, the service account associated to the consumer network (vpc-c) must be
granted the roles/dns.peer role in the producer project, that is, project-p.
Next, as indicated in step b, the same service account must be granted the roles/dns.admin role in the
consumer project, that is, project-c.
Finally, you create a new managed private peering zone by running the gcloud dns managed-zones
create as indicated in Figure 6-3.
When the setup is completed, any DNS query to resolve a hostname with suffix p.dariokart.com, for
example, leaderboard.p.dariokart.com, is sent to the DNS private zone in the producer VPC, as shown
in step d.
304
Chapter 6 ■ Configuring Advanced Network Services
■■Exam tip You may wonder how a DNS private peering zone setup is any different than any other DNS private
zone. After all, the gcloud dns managed-zones create command shows no indication that the new zone
uses DNS peering. The answer is “hidden” in step a. By granting the roles/dns.peer IAM role to the consumer
service account, we are basically giving this principal access to target networks with DNS peering zones. In fact,
the only permission included in such IAM role is the permission dns.networks.targetWithPeeringZone.
Put differently, principals with the IAM permission dns.networks.targetWithPeeringZone on the producer
network’s project can establish DNS peering between consumer and producer networks.
Cloud DNS peering is not to be confused with VPC peering, and it doesn’t require you to configure any
communication between the source and destination VPC. All the DNS flows are managed directly in the Cloud
DNS backend: each VPC talks to Cloud DNS, and Cloud DNS can redirect the queries from one VPC to the other.
So, how does DNS peering allow application teams to manage their own DNS records?
The answer is by using DNS peering between a Shared VPC and other Cloud DNS private zones that are
managed by the application teams. Figure 6-4 illustrates this setup.
For each application team that needs to manage its own DNS records, you provide them with
• Their own private DNS subdomain, for example, t1.dariokart.com
• Their own Cloud DNS private zone(s) in a dedicated project (DNSproject1), plus a
standalone VPC (dns-t1-vpc) with full IAM permissions
You can then configure DNS peering for the specific DNS subdomain to their dedicated Cloud DNS
zone—you just learned how to configure DNS peering with the steps (a-b-c-d) illustrated in Figure 6-3.
In the application team’s standalone VPC (dns-t1-vpc), they have Cloud DNS IAM permissions only on
their own Cloud DNS instance and can manage only their DNS records.
A central team, meanwhile, manages the DNS peering and decides which Cloud DNS instance is
authoritative for which subdomain, thus allowing application teams to only manage their own subdomain.
By default, all VMs that consume the Shared VPC use Cloud DNS in the Shared VPC as their local
resolver. This Cloud DNS instance answers for all DNS records in the Shared VPC by using DNS peering to
the application teams’ Cloud DNS instances or by forwarding to on-premises for on-premises records.
305
Chapter 6 ■ Configuring Advanced Network Services
To summarize, shared-vpc acts as the DNS consumer network and dns-t1-vpc and dns-t2-vpc act as
the DNS producer networks.
As a result, the flow is the following:
• (Step a) A VM in ServiceProject1 tries to ping game.t2.dariokart.com.
• (Step a) The VM uses Cloud DNS as its local DNS resolver.
• (Step a) The VM tries to resolve game.t2.dariokart.com, which is a DNS record
owned by team 2.
• (Step b) The VM then sends the DNS request to the Shared VPC Cloud DNS.
• (Step c) This Cloud DNS is configured with DNS peering and sends everything under
the t2.dariokart.com subdomain to Cloud DNS in DNSproject2.
• (Step d) Team 2 is able to manage its own DNS records, but only in its dedicated DNS
project, that is, DNSproject2. It has a private zone there for *.t2.dariokart.com and
an A record for game.t2.dariokart.com that resolves to 10.150.1.8.
306
Chapter 6 ■ Configuring Advanced Network Services
The command output should return the name of the associated DNS policy:
your-shared-vpc-dns-policy
Then, run the gcloud dns policies describe command as indicated as follows:
The command output should return the status of the Cloud DNS logging feature (True for enabled,
False for disabled).
Wouldn’t it be nice for your internal VMs to reach the Internet without requiring to use an
external IP address?
That’s where Cloud NAT comes into play. Cloud NAT is a distributed, software-defined managed
service, which lets certain compute resources without external IP addresses create outbound connections to
the Internet.
307
Chapter 6 ■ Configuring Advanced Network Services
Architecture
Cloud NAT is not based on proxy VMs or network appliances. Rather, it configures the Andromeda software-
defined network that powers your VPC, so that it provides Source Network Address Translation (SNAT)
for VMs without external IP addresses. Cloud NAT also provides Destination Network Address Translation
(DNAT) for established inbound response packets. Figure 6-5 shows a comparison between traditional NAT
proxies and Google Cloud NAT.
With Cloud NAT, you achieve a number of benefits, when compared to a traditional NAT proxy. As you
can see, these benefits match the five pillars of the well-architected framework.
308
Chapter 6 ■ Configuring Advanced Network Services
First and foremost, with Cloud NAT you achieve better security because your internal VMs (or other
compute resource instance types) are not directly exposed to potential security threats originating from the
Internet, thereby minimizing the attack surface of your workloads.
You also get higher availability because Cloud NAT is fully managed by Google Cloud. All you need
to do is to configure a NAT gateway on a Cloud Router, which provides the control plane for NAT, holding
configuration parameters that you specify.
Finally, you also achieve better performance and scalability because Cloud NAT can be configured
to automatically scale the number of NAT IP addresses that it uses, and it does not reduce the network
bandwidth per VM.
You can specify which subnets are allowed to use the Cloud NAT instance by selecting exactly one of
these flags:
• --nat-all-subnet-ip-ranges, which allows all IP ranges of all subnets in the
region, including primary and secondary ranges, to use the Cloud NAT instance
• --nat-custom-subnet-ip-ranges=SUBNETWORK[:RANGE_NAME],[…], which lets you
specify a list of the subnet’s primary and secondary IP ranges allowed to use the
Cloud NAT instance
• SUBNETWORK: Specifying a subnetwork name includes only the primary subnet
range of the subnetwork.
309
Chapter 6 ■ Configuring Advanced Network Services
■■Exam tip Each NAT IP address on a Cloud NAT gateway offers 64,512 TCP source ports and 64,512 UDP
source ports. TCP and UDP each support 65,536 ports per IP address, but Cloud NAT doesn’t use the first 1024
well-known (privileged) ports.
Customizing Timeouts
Cloud NAT uses predefined timeout settings based on the connection type.
A connection is a unique 5-tuple consisting of the NAT source IP address and source port tuple
combined with a unique destination 3-tuple.
310
Chapter 6 ■ Configuring Advanced Network Services
Use the gcloud compute routers nats create command to create a NAT gateway with custom
timeout settings:
Logging and Monitoring
Cloud NAT logging allows you to log NAT connections and errors.
When you enable Cloud NAT logging, a single log entry can be generated for each of the following
scenarios:
• When a network connection is created
• When a packet is dropped because no port was available for NAT
You can choose to log both kinds of events or only one or the other.
All logs are sent to Cloud Logging.
■■Note Dropped packets are logged only if they are egress (outbound) TCP and UDP packets. No dropped
incoming packets are logged. For example, if an inbound response to an outbound request is dropped for any
reason, no error is logged.
311
Chapter 6 ■ Configuring Advanced Network Services
where
• NAT_GATEWAY denotes the name of the NAT gateway.
• ROUTER_NAME denotes the name of the Cloud Router that hosts the NAT gateway.
• REGION denotes the region of the Cloud Router.
312
Chapter 6 ■ Configuring Advanced Network Services
{
insertId: "1the8juf6vab1t"
jsonPayload: {
connection: {
Src_ip: "10.0.0.1"
Src_port: 45047
Nat_ip: "203.0.113.17"
Nat_port: 34889
dest_ip : "198.51.100.142"
Dest_port: 80
Protocol: "tcp"
}
allocation_status: "OK"
Gateway_identifiers: {
Gateway_name: "my-nat-1"
router_name: "my-router-1"
Region: "europe-west1"
}
Endpoint: {
Project_id: "service-project-1"
Vm_name: "vm-1"
Region: "europe-west1"
Zone: "europe-west1-b"
}
Vpc: {
Project_id: "host-project"
Vpc_name: "network-1"
Subnetwork_name: "subnetwork-1"
}
Destination: {
Geo_location: {
Continent: "Europe"
Country: "France"
Region: "Nouvelle-Aquitaine"
City: "Bordeaux"
}
}
}
logName: "projects/host-project/logs/compute.googleapis.com%2Fnat_flows"
receiveTimestamp: "2018-06-28T10:46:08.123456789Z"
resource: {
313
Chapter 6 ■ Configuring Advanced Network Services
labels: {
region: "europe-west1-d"
project_id: "host-project"
router_id: "987654321123456"
gateway_name: "my-nat-1"
}
type: "nat_gateway"
}
labels: {
nat.googleapis.com/instance_name: "vm-1"
nat.googleapis.com/instance_zone: "europe-west1-b"
nat.googleapis.com/nat_ip: "203.0.113.17"
nat.googleapis.com/network_name: "network-1"
nat.googleapis.com/router_name: "my-router-1"
nat.googleapis.com/subnetwork_name: "subnetwork-1"
}
timestamp: "2018-06-28T10:46:00.602240572Z"
}
M
onitoring
Cloud NAT exposes key metrics to Cloud Monitoring that give you insights into your fleet's usage of NAT
gateways.
Metrics are sent automatically to Cloud Monitoring. There, you can create custom dashboards, set up
alerts, and query the metrics.
The following are the required Identity and Access Management (IAM) roles:
• For Shared VPC users with VMs and NAT gateways defined in different projects,
access to the VM level metrics requires the roles/monitoring.viewer IAM role for
the project of each VM.
• For the NAT gateway resource, access to the gateway metrics requires the roles/
monitoring.viewer IAM role for the project that contains the gateway.
Cloud NAT provides a set of predefined dashboards that display activity across your gateway:
• Open connections
• Egress data processed by NAT (rate)
• Ingress data processed by NAT (rate)
• Port usage
• NAT allocation errors
• Dropped sent packet rate
• Dropped received packet rate
You can also create custom dashboards and metrics-based alerting policies.
314
Chapter 6 ■ Configuring Advanced Network Services
■■Exam tip You need to know a few constraints on packet mirroring policies. For a given packet mirroring
policy: (1) All mirrored sources must be in the same project, VPC network, and Google Cloud region. (2) Collector
instances must be in the same region as the mirrored sources’ region. (3) Only a single collector destination can
be used.
315
Chapter 6 ■ Configuring Advanced Network Services
As you can see, there are a number of preliminary steps you need to complete in order to create a packet
mirroring policy. These include
1.
Permissions: For Shared VPC topologies, you must have the compute.
packetMirroringUser role in the project where the collector instances are
created and the compute.packetMirroringAdmin in the project where the
mirrored instances are created.
2.
Collector instances: You must create an instance group, which will act as the
destination of your mirrored traffic.
3.
Internal TCP/UDP network load balancer: You must create a type 8 load
balancer, configured to use the collector instances as backends.
4.
Firewall rules: Mirrored traffic must be allowed to go from the mirrored source
instances to the collector instances, which are the backends of the internal TCP/
UDP network load balancer.
Upon completion of the four preliminary steps, you can create a packet mirroring policy using the
command gcloud compute packet-mirrorings create as explained in the following:
316
Chapter 6 ■ Configuring Advanced Network Services
• PROTOCOL: One or more IP protocols to mirror. Valid values are tcp, udp, icmp, esp,
ah, ipip, sctp, or an IANA (Internet Assigned Numbers Authority) protocol number.
You can provide multiple protocols in a comma-separated list. If the filter-protocols
flag is omitted, all protocols are mirrored.
• DIRECTION: The direction of the traffic to mirror relative to the VM. By default, this
is set to both, which means that both ingress and egress traffic are mirrored. You
can restrict which packets are captured by specifying ingress to capture only ingress
packets or egress to capture only egress packets.
In the next section, you will learn some of the most relevant reference topologies you can use for your
workloads’ network packet inspection requirements.
Figure 6-7. Packet mirroring policy with source and collector in the same VPC
317
Chapter 6 ■ Configuring Advanced Network Services
In Figure 6-7, the packet mirroring policy is configured to mirror subnet-mirrored and send mirrored
traffic to the internal TCP/UDP network load balancer configured in subnet-collector. Google Cloud
mirrors the traffic on existing and future VMs in the subnet-mirrored. This includes all traffic to and from
the Internet, on-premises hosts, and Google services.
Figure 6-8. Packet mirroring policy with source and collector in peered VPCs
318
Chapter 6 ■ Configuring Advanced Network Services
The two VPCs VPC1 and VPC2 are peered. All resources are located in the same region us-central1,
which complies with the third constraint you just learned in the previous exam tip.
The packet mirroring policy policy-1 is similar to the policy in Figure 6-7 in that policy-1 is configured
to collect traffic from subnet-mirrored-1 and send it to the forwarding rule of the internal load balancer in
subnet-collector—mirrored sources and collector instances are all in the same VPC.
However, this is not the case for policy-2 because this policy is configured with mirrored sources (all
VMs in subnet-mirrored-2) and collector instances (the backend VMs of the load balancer) in different VPC
networks.
As a result, policy-2 can be created by the owners of Project1 or the owners of Project2 under one of
the following conditions:
• The owners of Project1 must have the compute.packetMirroringAdmin role on the
network, subnet, or instances to mirror in Project2.
• The owners of Project2 must have the compute.packetMirroringUser role in Project1.
■■Note The collector instances are in a service project, which means they are billed to the billing account
associated to the service project, even though they consume subnet-collector in the host project. This is
how Shared VPC works.
In this reference topology, the packet mirroring policy has also been created in the service project and is
configured to mirror ingress and egress traffic for all VMs that have a network interface in subnet-mirrored.
■■Exam tip In this topology, service or host project users can create the packet mirroring policy. To do so, users
must have the compute.packetMirroringUser role in the service project where the collector destination is
located. Users must also have the compute.packetMirroringAdmin role on the mirrored sources.
319
Chapter 6 ■ Configuring Advanced Network Services
320
Chapter 6 ■ Configuring Advanced Network Services
Figure 6-10. Packet mirroring policy with collector in the host project
This reference topology is a perfect use case of a Shared VPC for what it was intended to do. You learned
in Chapter 2 that the idea of a Shared VPC is all about separation of duties, by letting developers manage
their own workloads in their own service project without worrying about network setups, and network
engineers manage the network infrastructure in the host project. Packet inspection is a network concern.
As a result, it makes sense to let network engineers own the collector instances (i.e., the backend VMs along
with all the internal TCP/UDP network load balancer resources) and the packet mirroring policies in the
host project.
■■Exam tip In this topology, service or host project users can create the packet mirroring policy. To do so,
users in the service project must have the compute.packetMirroringUser role in the host project. This is
because the collector instances are created in the host project. Alternatively, users in the host project require
the compute.packetMirroringAdmin role for mirrored sources in the service projects.
321
Chapter 6 ■ Configuring Advanced Network Services
■■Exam tip If you need to mirror more than one network interface (NIC) of a multi-NIC VM, you must create
one packet mirroring policy for each NIC. This is because each NIC connects to a unique VPC network.
■■Exam tip A network interface card (NIC) can be connected to one and one only (VPC) network.
In this use case, the multi-NIC VMs are configured as backend instances in a managed instance group.
These multi-NIC VMs can be commercial solutions from third parties or solutions that you build yourself.
The managed instance group (MIG) is added to the backend service of an internal TCP/UDP network
load balancer (referenced by the regional internal forwarding rule ilb-a). See Figure 6-11.
Since each backend VM has two NICs, and since each NIC maps one to one to exactly one VPC, the
same group of VMs can be used by the backend service of another internal TCP/UDP network load balancer
(referenced in Figure 6-11 by the regional internal forwarding rule ilb-b).
In the VPC network called vpc-a, the internal network TCP/UDP load balancer referenced by the regional
internal forwarding rule ilb-a distributes traffic to the nic0 network interface of each VM in the backend MIG.
322
Chapter 6 ■ Configuring Advanced Network Services
Likewise, in the VPC network called vpc-b, the second internal network TCP/UDP load balancer
referenced by the regional internal forwarding rule ilb-b distributes traffic to a different network
interface, nic1.
As you can see, this is another way to let two VPCs exchange traffic with each other. This is achieved
by leveraging two custom static routes, whose next hop is the forwarding rule of their internal TCP/UDP
network load balancer and whose destination is the CIDR block of the subnet in the other VPC.
■■Exam tip The multi-NIC VMs must be allowed to send and receive packets with nonmatching destination
or source IP addresses. This can be accomplished by using the --can-ip-forward flag in the gcloud
compute instances create command: https://cloud.google.com/sdk/gcloud/reference/
compute/instances/create#--can-ip-forward.
323
Chapter 6 ■ Configuring Advanced Network Services
324
Chapter 6 ■ Configuring Advanced Network Services
Configuring the Networks
This reference topology uses two custom-mode VPC networks named vpc-a and vpc-b, each with
one subnet.
Each backend VM has two network interfaces, one attached to each VPC network (nic0 attached to VPC
vpc-a, nic1 attached to VPC vpc-b).
The subnets, subnet-a and subnet-b, use the 10.13.1.0/24 and 10.15.1.0/24 primary IP address
ranges, respectively, and they both reside in the us-central1 region.
■■Exam tip In this reference topology, the subnets attached to each NIC must all share the same region, for
example, us-central1, because VMs are zonal resources.
325
Chapter 6 ■ Configuring Advanced Network Services
#!/bin/bash
# Enable IP forwarding:
echo 1 > /proc/sys/net/ipv4/ip_forward
echo "net.ipv4.ip_forward=1" > /etc/sysctl.d/20-iptables.conf
# Read VM network configuration:
md_vm="http://metadata.google.internal/computeMetadata/v1/instance/"
md_net="$md_vm/network-interfaces"
nic0_gw="$(curl $md_net/0/gateway -H "Metadata-Flavor:Google" )"
nic0_mask="$(curl $md_net/0/subnetmask -H "Metadata-Flavor:Google")"
nic0_addr="$(curl $md_net/0/ip -H "Metadata-Flavor:Google")"
nic0_id="$(ip addr show | grep $nic0_addr | awk '{print $NF}')"
nic1_gw="$(curl $md_net/1/gateway -H "Metadata-Flavor:Google")"
326
Chapter 6 ■ Configuring Advanced Network Services
2.
Create a common instance template named third-party-template-multinic,
which will be used to create new VMs in both the vpc-a and vpc-b VPC networks,
when an autoscaling event is triggered:
■■Exam tip The --can-ip-forward flag is required for the instance template creation. This setting lets each
backend VM forward packets with any source IP in the vpc-a and vpc-b VPCs, not just the ones whose source
IP matches one of the VM’s NICs.
3.
Create a common managed instance group named third-party-instance-
group that will also be used by the two backend services, one in the vpc-a and
the other one in the vpc-b VPC networks:
327
Chapter 6 ■ Configuring Advanced Network Services
2.
Use the previously created health check to create two internal backend services
in the us-central1 region: one named backend-service-a in the vpc-a VPC,
and the other one named backend-service-b in the vpc-b VPC (see Figure 6-12).
3.
Add to each of the two backend services the managed instance groups you
created in step 3 (third-party-instance-group), which contains the third-party
virtual appliances as backends:
4.
Create two regional, internal forwarding rules: one associated with the subnet-a
and the other one associated with the subnet-b. Connect each forwarding rule
to its respective backend service, that is, backend-service-a and backend-
service-b:
328
Chapter 6 ■ Configuring Advanced Network Services
■■Note With the optional --tags flag, one or more network tags can be added to the route to indicate that
the route applies only to the VMs with the specified tag. Omitting this flag tells Google Cloud that the custom
static route applies to all VMs in the specified VPC network, whose value is set using the --network flag. In
this example, the route applies to all VMs in each VPC. Remember, routes (just like firewall rules) are global
resources that are defined at the VPC level.
This last step concluded the actual configuration of this reference topology. Let’s validate the setup.
C
reating the First VM
Let’s now create a VM with the IP address 10.13.1.70 in the subnet-a (10.13.1.0/24). The creation of the
VM installs the Apache web server, which will serve incoming traffic on the TCP port 80:
329
Chapter 6 ■ Configuring Advanced Network Services
C
reating the Second VM
Similarly, let’s create another VM with the IP address 10.15.1.70 in the subnet-b (10.15.1.0/24). The
creation of the VM installs the Apache web server, which will serve incoming traffic on the TCP port 80:
You should see a message that confirms both load balancers are in a healthy status.
330
Chapter 6 ■ Configuring Advanced Network Services
E xam Questions
Question 6.1 (Cloud DNS)
You are migrating to Cloud DNS and want to import your BIND zone file.
Which command should you use?
A.
gcloud dns record-sets import ZONE_FILE --zone MANAGED_ZONE
B.
gcloud dns record-sets import ZONE_FILE --replace-origin-ns --
zone MANAGED_ZONE
C.
gcloud dns record-sets import ZONE_FILE --zone-file-format --
zone MANAGED_ZONE
D.
gcloud dns record-sets import ZONE_FILE --delete-all-existing --
zone MANAGED ZONE
R
ationale
A is not correct because the default behavior of the command is to expect
ZONE_FILE in YAML format.
B is not correct because the --replace-origin-format flag indicates that NS
records for the origin of a zone should be imported if defined, which is not what
the question asked.
C is CORRECT because the --zone-file-format flag indicates that the input
records file is in BIND zone format. If omitted, the ZONE_FILE is expected in
YAML format.
D is not correct because the --delete-all-existing flag indicates that all
existing record sets should be deleted before importing the record sets in the
records file, which is not what the question asked.
331
Chapter 6 ■ Configuring Advanced Network Services
Rationale
A is not correct because the fact that the instance uses multi-NIC is not related
with its inability to use the Cloud NAT for outbound NAT.
B is CORRECT because the existence of an external IP address on an interface
always takes precedence and always performs one-to-one NAT, without using
Cloud NAT.
C is not correct because the custom static routes don’t use the default Internet
gateway as the next hop.
D is not correct because the question asked to select the cause of the inability
of the instance to use the Cloud NAT for outbound traffic. However, this answer
describes a scenario for inbound traffic.
332
Chapter 6 ■ Configuring Advanced Network Services
Rationale
A is not correct because the answer does not allow to resolve GCP hostnames
from on-premises, which is one of the two requirements.
B is CORRECT because both requirements are met. https://cloud.google.
com/dns/docs/best-practices#hybrid-architecture-using-hub-vpc-
network-connected-to-spoke-vpc-networks
C and D are not correct because you don’t need to configure the on-premises
DNS as an alternate DNS server to meet the requirements.
333
CHAPTER 7
Cloud computing unlocks a significant number of capabilities due to its ability to deliver an unprecedented
level of computing power at a relatively low cost.
It’s no secret that every large enterprise is moving away from its corporate data centers and has invested
in a cloud adoption program.
However, a cloud adoption program faces a number of challenges mainly due to the choice technology
and business leaders need to make to balance the right mix of innovation while maintaining the “business
as usual.”
Innovation maps to delivering cloud-native solutions, which leverage modern compute products and
services, for example, serverless products like Cloud Run, Cloud Functions, or App Engine.
“Business as usual” maps to migrating to the cloud existing applications that generate business value.
Put differently, unless you are a small startup beginning from scratch, chances are you need a plan to
move your applications to the cloud. Unplugging the connectivity to your data center and by magic turning
the switch on to enable your applications in the cloud is not a realistic option.
That’s why Google Cloud has developed a number of offerings that let your company’s data centers (or
your local development environment) connect to Google Cloud in a variety of different ways, ranging from
solutions that prioritize performance, reliability, and reduced latencies to others that prioritize cost savings
and easy setups.
In this chapter, you will learn what these connectivity offerings are, how to configure them, and most
importantly how to choose which one(s) best suits the requirements for your workload.
■■Exam tip Cloud Interconnect circuits do not traverse the Internet and, by default, do not encrypt data
in transit.
Prerequisites
As a Google Cloud professional cloud network engineer, you are responsible for making sure the following
prerequisites are met before ordering Dedicated Interconnect:
• Your network must physically meet Google’s global backbone in a colocation facility.
Use the gcloud compute interconnects locations list command to list the
colocation facilities close to you.
• You must provide your own routing equipment. Your on-premises router is typically
located in the colocation facility. However, you can also extend your connection to a
router outside of the colocation facility.
• In the colocation facility, your network devices must support the following technical
requirements:
• 10 Gbps circuits, single-mode fiber, 10GBASE-LR (1310 nm) or 100 Gbps circuits,
single-mode fiber, 100GBASE-LR4
• IPv4 link-local addressing
• LACP (Link Aggregation Control Protocol), even if you’re using a single circuit
• EBGP-4 (External Border Gateway Protocol version 4 - https://www.ietf.org/
rfc/rfc4271.txt) with multi-hop
• 802.1Q VLANs
How It Works
You provision a Dedicated Interconnect connection between the Google global backbone and your own
network. The diagram in Figure 7-1 shows a single Dedicated Interconnect connection between a Virtual
Private Cloud (VPC) network and your on-premises network.
336
Chapter 7 ■ Implementing Hybrid Connectivity
Figure 7-1. Example of a Dedicated Interconnect connection. Portions of this page are reproduced under
the CC-BY license and shared by Google: https://cloud.google.com/network-connectivity/docs/
interconnect/concepts/dedicated-overview
For the basic setup shown in Figure 7-1, a Dedicated Interconnect connection is provisioned between
the Google global backbone and the on-premises router in a common colocation facility.
VLAN Attachments
VLAN attachments (also known as interconnectAttachments) determine which Virtual Private Cloud
(VPC) networks can reach your on-premises network through a Dedicated Interconnect connection.
Billing for VLAN attachments starts when you create them and stops when you delete them.
A VLAN attachment is always associated to a Cloud Router. This Cloud Router creates a BGP session for
the VLAN attachment and its corresponding on-premises peer router. The Cloud Router receives the routes
that your on-premises router advertises. These routes are added as custom dynamic routes in your VPC net-
work. The Cloud Router also advertises routes for Google Cloud resources to the on-premises peer router.
■■Note It is possible to associate multiple, different VLAN attachments to the same Cloud Router.
337
Chapter 7 ■ Implementing Hybrid Connectivity
3.
Testing the connection: Google sends you automated emails with configuration
information for two different tests. First, Google sends an IP address
configuration to test light levels on every circuit in an Interconnect connection.
After those tests pass, Google sends the final IP address configuration to test
the IP connectivity of each connection’s production configuration. Apply these
configurations to your routers so that Google can confirm connectivity. If you
don’t apply these configurations (or apply them incorrectly), Google sends an
automated email with troubleshooting information. After all tests have passed,
your Interconnect connection is ready to use.
4.
Creating a VLAN attachment: When your Interconnect connection is ready to
use, you need to connect Virtual Private Cloud (VPC) networks to your on-
premises network. To do that, first create a VLAN attachment, specifying an
existing Cloud Router that’s in the VPC network that you want to reach.
5.
Configuring on-premises routers: After you create a VLAN attachment, to
start sending traffic between networks, you need to configure your on-premises
router to establish a BGP session with your Cloud Router. To configure your on-
premises router, use the VLAN ID, interface IP address, and peering IP address
provided by the VLAN attachment.
The process is illustrated in Figure 7-2, with emphasis on the gcloud CLI commands you need to know
for the exam.
338
Chapter 7 ■ Implementing Hybrid Connectivity
■■Note The link type that you select when you create an Interconnect connection cannot be changed later.
For example, if you select a 10 Gbps link type and need a 100 Gbps link type later, you must create a new
Interconnect connection with the higher capacity.
To create a Dedicated Interconnect connection, use the gcloud compute interconnects create as
indicated as follows:
339
Chapter 7 ■ Implementing Hybrid Connectivity
• EMAIL_ADDRESS and STRING: Optional; for the NOC (Network Operations Center)
contact, you can specify only one email address—you don’t need to enter your own
address because you are included in all notifications. If you are creating a connection
through workforce identity federation, providing an email address with the --noc-
contact-email flag is required.
For redundancy, Google recommends to create a duplicate Interconnect connection that is in the same
location but in a different edge availability domain (metro availability zone).
After you order an Interconnect connection, Google emails you a confirmation and allocates ports
for you. When the allocation is complete, Google generates LOA-CFAs for your connections and emails
them to you.
All the automated emails are sent to the NOC (Network Operations Center) technical contact and the
email address of the Google Account used when ordering the Interconnect connection. You can also get your
LOA-CFAs by using the Google Cloud console.
You can use the Interconnect connection only after your connections have been provisioned and tested
for light levels and IP connectivity.
Retrieving LOA-CFAs
After you order a Dedicated Interconnect connection, Google sends you and the NOC (technical contact)
an email with your Letter of Authorization and Connecting Facility Assignment (LOA-CFA) (one PDF file per
connection). You must send these LOA-CFAs to your vendor so that they can install your connections. If you
don’t, your connections won’t get connected.
If you can’t find the LOA-CFAs in your email, retrieve them from the Google Cloud console (in the
Cloud Interconnect page, select Physical connections). This is one of the very few operations that require the
use of the console. You can also respond to your order confirmation email for additional assistance.
After the status of an Interconnect connection changes to PROVISIONED, the LOA-CFA is no longer
valid, necessary, or available in the Google Cloud console.
■■Note To retrieve the LOA-CFAs, you must be granted the permission compute.interconnects.
create, which is available in the following IAM roles: roles/owner, roles/editor, roles/compute.
networkAdmin.
■■Note LACP is required because it allows you to adjust the capacity of an Interconnect connection without
disrupting traffic. An Interconnect connection can be shared by multiple VLAN attachments.
340
Chapter 7 ■ Implementing Hybrid Connectivity
The example in Table 7-1 shows an IP address configuration similar to the one that Google sends you
for the test. Replace these values with the values that Google sends you for your network.
Apply the test IP address that Google has sent you to the interface of your on-premises router that con-
nects to Google. For testing, you must configure this interface in access mode with no VLAN tagging.
Google tests your connection by pinging the link-local IP address with LACP enabled. Google tests
once, 30 minutes after detecting light, and then every 24 hours thereafter.
After a successful test, Google sends you an email notifying you that your connection is ready to use.
If a test fails, Google automatically retests the connection once a day for a week.
After all tests have passed, your Interconnect connection can carry traffic, and Google starts billing it.
However, your connection isn’t associated with any Google Virtual Private Cloud (VPC) networks. The next
step will show you how to attach a VPC network to your Dedicated Interconnect connection.
■■Exam tip The Cloud Router can use any private autonomous system number (64512–65535 or
4200000000–4294967294) or the Google public ASN, that is, 16550.
341
Chapter 7 ■ Implementing Hybrid Connectivity
After all three checks are completed, you can create a VLAN attachment using the following command:
At a minimum, you must specify the name of your Direct Interconnect connection and the name of your
Cloud Router resources, which are passed to the flags --interconnect and --router, respectively.
If you don’t specify a region, you may be prompted to enter one. Again, this is because a Cloud Router is
a regional resource, and a VLAN attachment is always associated to a Cloud Router.
The optional --candidate-subnets flag is a list of up to 16 CIDR blocks—all in the link-local address
space 169.254.0.0/16, which was sent by Google to ping your on-premises router in the previous step. You
can use this list to restrict the CIDR blocks Google can use to allocate a BGP IP address for the Cloud Router
in your Google Cloud project and your router on-premises.
■■Exam tip The BGP IP CIDR blocks that you specify as values of the --candidate-subnets flag must be
unique among all Cloud Routers in all regions of a VPC network.
The optional --bandwidth flag denotes the maximum provisioned capacity of the VLAN attachment. In
the example, its value is set to 400 Mbps. As of the time writing this book (April 2023), you can only choose
from the following discrete list: 50m (50 Mbps), 100m, 200m, 300m, 400m, 500m, 1g, 2g, 5g, 10g (default),
20g, 50g (50 Gbps). This ability to “tweak” the capacity of a VLAN attachment is possible because the LACP
protocol is required, as we observed in the previous section.
The optional --vlan flag denotes the VLAN ID for this attachment and must be an integer in the range
2–4094. You cannot specify a VLAN ID that is already in use on the Interconnect connection. If your VLAN ID
is in use, you are asked to choose another one. If you don’t enter a VLAN ID, an unused, random VLAN ID is
automatically selected for the VLAN attachment.
Upon creation of the VLAN attachment, use the describe command to extract the VLAN ID, the Cloud
Router IP address, and the customer router IP address:
You will need these values to configure your Cloud Router and your on-premises router. Here is an
example of an output:
cloudRouterIpAddress: 169.254.180.81/29
creationTimestamp: '2022-03-13T10:31:40.829-07:00'
customerRouterIpAddress: 169.254.180.82/29
id: '7'
interconnect: https://www.googleapis.com/compute/v1/projects/my-project/global/
interconnects/myinterconnect
kind: compute#interconnectAttachment
name: my-attachment
operationalStatus: ACTIVE
342
Chapter 7 ■ Implementing Hybrid Connectivity
privateInterconnectInfo:
tag8021q: 1000
region: https://www.googleapis.com/compute/v1/projects/my-project/regions/us-central1
router: https://www.googleapis.com/compute/v1/projects/my-project/regions/us-central1/
routers/my-router
Associate your newly created VLAN attachment to your Cloud Router by adding an interface that
connects to it. The interface IP address is automatically configured using your attachment’s cloudRouterI-
pAddress:
Last, associate a BGP peer to your Cloud Router by adding the customer router to the newly added
interface:
The first three flags, that is, --interface, --peer-asn, and --peer-name, are mandatory. This makes
sense because you are adding a peer router to your Cloud Router’s interface, which is associated to your
VLAN attachment. As a result, you must provide at a minimum the interface name, your peer ASN, and your
peer name.
For the peer ASN, use the same number that you used to configure your on-premises router. The peer IP
address is automatically configured using your attachment’s customerRouterIpAddress.
The --advertised-route-priority optional flag denotes the base priority of routes advertised to the
BGP peer. As you learned in Chapter 2, the value must be in the 0–65535 range.
The --md5-authentication-key optional flag can be added if you want to use MD5 authentication.
■■Exam tip By default, any VPC network can use Cloud Interconnect. To control which VPC networks can use
Cloud Interconnect, you can set an organization policy.
343
Chapter 7 ■ Implementing Hybrid Connectivity
Figure 7-3. Reference Dedicated Interconnect layer 3 topology. Portions of this page are reproduced under
the CC-BY license and shared by Google: https://cloud.google.com/network-connectivity/docs/
interconnect/how-to/dedicated/configuring-onprem-routers
344
Chapter 7 ■ Implementing Hybrid Connectivity
Prerequisites
The only two prerequisites are
Supported service provider: You must select a supported service provider to establish
1.
connectivity between their network and your on-premises network. The list of supported
service providers is available at https://cloud.google.com/network-connectivity/
docs/interconnect/concepts/service-providers#by-location.
Cloud Router: You must have a Cloud Router in the region where your selected service
2.
provider operates.
How It Works
You select a service provider from the previous list and establish connectivity.
Next, you create a VLAN attachment in your Google Cloud project, but this time you specify that your
VLAN attachment is for a Partner Interconnect connection. This action generates a unique pairing key that
you use to request a connection from your service provider. You also need to provide other information such
as the connection location and capacity.
After the service provider configures your VLAN attachment, you activate your connection to start using
it. Depending on your connection, either you or your service provider then establishes a Border Gateway
Protocol (BGP) session. Figure 7-4 illustrates this setup.
Figure 7-4. Example of a Partner Interconnect connection. Portions of this page are reproduced under the CC-
BY license and shared by Google: https://cloud.google.com/network-connectivity/docs/interconnect/
concepts/partner-overview
345
Chapter 7 ■ Implementing Hybrid Connectivity
VLAN Attachments
The main difference here is that with Partner Interconnect, a VLAN attachment generates a pairing key that
you share with your service provider.
■■Note Unlike Dedicated Interconnect, with Partner Interconnect you delegate to a service provider the task
of setting up the connectivity between your on-premises router and the Cloud Router in your Google Cloud
project.
The pairing key is a unique key that lets the service provider identify and connect to your Virtual Private
Cloud (VPC) network and associated Cloud Router. The service provider requires this key to complete the
configuration of your VLAN attachment.
346
Chapter 7 ■ Implementing Hybrid Connectivity
C
reating a VLAN Attachment
For Partner Interconnect, the only item in the checklist you must complete—in addition to the necessary
permissions and the connectivity to a selected service provider—is to have an existing Cloud Router in the
VPC network and region that you want to reach from your on-premises network. If you don’t have an existing
Cloud Router, you must create one.
■■Exam tip Unlike Dedicated Interconnect, Partner Interconnect requires that your Cloud Router uses the
Google public ASN, that is, 16550 (www.whatismyip.com/asn/16550/).
It is best practice to utilize multiple VLAN attachments into your VPC network to maximize throughput
and increase cost savings. For each BGP session, Google Cloud recommends using the same MED values to
let the traffic use equal-cost multipath (ECMP) routing over all the configured VLAN attachments.
The following example creates a VLAN attachment in edge availability domain availability-domain-1
and is associated with the Cloud Router my-router, which is in the region us-central1:
347
Chapter 7 ■ Implementing Hybrid Connectivity
adminEnabled: false
edgeAvailabilityDomain: AVAILABILITY_DOMAIN_1
creationTimestamp: '2017-12-01T08:29:09.886-08:00'
id: '7976913826166357434'
kind: compute#interconnectAttachment
labelFingerprint: 42WmSpB8rSM=
name: my-attachment
pairingKey: 7e51371e-72a3-40b5-b844-2e3efefaee59/us-central1/1
region: https://www.googleapis.com/compute/v1/projects/customer-project/regions/us-central1
router: https://www.googleapis.com/compute/v1/projects/customer-project/regions/us-central1/
routers/my-router
selfLink: https://www.googleapis.com/compute/v1/projects/customer-project/regions/us-
central1/interconnectAttachments/my-attachment
state: PENDING_PARTNER
type: PARTNER
Remember not to share the pairing key with anyone you don’t trust. The pairing key is sensitive data.
The state of the VLAN attachment is PENDING_PARTNER until the service provider completes your
VLAN attachment configuration. After the configuration is complete, the state of the attachment changes to
ACTIVE or
PENDING_CUSTOMER.
■■Note Billing for VLAN attachments starts when your service provider completes their configurations,
whether or not you preactivated your attachments. Your service provider configures your attachments when
they are in the PENDING_CUSTOMER or ACTIVE state. Billing stops when you or the service provider deletes the
attachments (when they are in the DEFUNCT state).
348
Chapter 7 ■ Implementing Hybrid Connectivity
If you use a layer 2 connection, you can improve the security posture of your BGP sessions by enforcing
MD5 authentication.
■■Exam tip To build fault tolerance—with a second VLAN attachment—repeat the same process for the
redundant VLAN attachment. Make sure you use the same Cloud Router and the same metropolitan area, but
use a different edge availability domain.
A
ctivating Your Connection
When the state has changed to PENDING_CUSTOMER, the ball is in your court, and you need to activate
your connection in order to use it. You activate your connection by updating your VLAN attachment
with the --
admin-enabled flag:
349
Chapter 7 ■ Implementing Hybrid Connectivity
--region us-central1
bgp:
advertiseMode: DEFAULT
asn: 16550
bgpPeers:
- interfaceName: auto-ia-if-my-attachment-c2c53a710bd6c2e
ipAddress: 169.254.67.201
managementType: MANAGED_BY_ATTACHMENT
name: auto-ia-bgp-my-attachment-c2c53a710bd6c2e
peerIpAddress: 169.254.67.202
creationTimestamp: '2018-01-25T07:14:43.068-08:00'
description: 'test'
id: '4370996577373014668'
interfaces:
- ipRange: 169.254.67.201/29
linkedInterconnectAttachment: https://www.googleapis.com/compute/alpha/projects/customer-
project/regions/us-central1/interconnectAttachments/my-attachment-partner
managementType: MANAGED_BY_ATTACHMENT
name: auto-ia-if-my-attachment-c2c53a710bd6c2e
kind: compute#router
name: partner
network: https://www.googleapis.com/compute/v1/projects/customer-project/global/
networks/default
region: https://www.googleapis.com/compute/v1/projects/customer-project/regions/us-central1
selfLink: https://www.googleapis.com/compute/v1/projects/customer-project/regions/us-
central1/routers/my-router
350
Chapter 7 ■ Implementing Hybrid Connectivity
Figure 7-6. Reference Partner Interconnect layer 3 topology. Portions of this page are reproduced under
the CC-BY license and shared by Google: https://cloud.google.com/network-connectivity/docs/
interconnect/how-to/partner/configuring-onprem-routers
351
Chapter 7 ■ Implementing Hybrid Connectivity
■■Note From Wikipedia, IPsec (Internet Protocol Security) is a secure network protocol suite that
authenticates and encrypts packets of data to provide secure encrypted communication between two
computers over an Internet Protocol network. It is used in virtual private networks (VPNs).
IPsec includes protocols for establishing mutual authentication between agents at the beginning of a session
and negotiation of cryptographic keys to use during the session. IPsec can protect data flows between a pair of
hosts (host-to-host), between a pair of security gateways (network-to-network), or between a security gateway
and a host (network-to-host). IPsec uses cryptographic security services to protect communications over
Internet Protocol (IP) networks. It supports network-level peer authentication, data origin authentication, data
integrity, data confidentiality (encryption), and replay protection (protection from replay attacks).
Traffic traveling between the two networks is encrypted by one VPN gateway and then decrypted by the
other VPN gateway. This action protects your data as it travels over the Internet. You can also connect two
VPC networks with each other with Cloud VPN.
■■Exam tip Unlike Dedicated or Partner Interconnect, Cloud VPN always encrypts traffic in transit by design.
This is because Cloud VPN is built on top of IPsec tunnels. Also, traffic traveling over an IPsec tunnel traverses
the Internet, and the maximum bandwidth of an IPsec tunnel is 3 Gbps. Finally, Cloud VPN is not supported in
standard tier.
Cloud VPN comes in two “flavors”: HA (high availability) VPN and Classic VPN.
352
Chapter 7 ■ Implementing Hybrid Connectivity
The former is the recommended choice from Google, due to its higher reliability (99.99% SLA) and its
adaptability to topology changes.
■■Exam tip The two IPsec tunnels must originate from the same region.
Put differently, with HA VPN you cannot have a tunnel originating from an HA VPN gateway with a network
interface in us-east1 and another tunnel originating from another network interface (associated to the same
HA VPN gateway) in us-central1.
How It Works
When you create an HA VPN gateway, Google Cloud automatically reserves two external IPv4 addresses, one
for each (of the two) network interfaces. The two IPv4 addresses are chosen from a unique address pool to
support high availability.
When you delete the HA VPN gateway, Google Cloud releases the IP addresses for reuse.
The best way to understand how an HA VPN gateway works is with an illustration (Figure 7-7).
353
Chapter 7 ■ Implementing Hybrid Connectivity
Figure 7-7. HA VPN simplest topology. Portions of this page are reproduced under CC-BY license and shared
by Google: https://cloud.google.com/network-connectivity/docs/vpn/concepts/topologies
Figure 7-7 shows the simplest HA VPN topology, with one HA VPN gateway equipped with two network
interfaces—each associated with its own regional external IP address.
The HA VPN gateway connects to one peer on-premises router, which has one external IP address (i.e.,
one network card).
The HA VPN gateway uses two tunnels, which are connected to the single external IP address on the
peer router.
In Google Cloud, the REDUNDANCY_TYPE for this configuration takes the value SINGLE_IP_
INTERNALLY_REDUNDANT.
The topology in Figure 7-7 provides 99.99% availability on Google Cloud, but there is a single point of
failure on-premises.
354
Chapter 7 ■ Implementing Hybrid Connectivity
Other topologies can offer a higher level of resilience on-premises, for example, by adding an extra net-
work interface to the peer on-premises router (TWO_IPS_REDUNDANCY) or by using two peer routers each
with two network interfaces (FOUR_IPS_REDUNDANCY).
355
Chapter 7 ■ Implementing Hybrid Connectivity
Created [https://www.googleapis.com/compute/v1/projects/PROJECT_ID/regions/us-central1/
vpnGateways/ha-vpn-gw-a].
NAME INTERFACE0 INTERFACE1 NETWORK REGION
ha-vpn-gw-a 203.0.113.16 203.0.113.23 network-a us-central1
As expected, the output shows the two interfaces, each associated to its own regional (us-central1),
external IPv4 address.
Created [https://www.googleapis.com/compute/v1/projects/PROJECT_ID/global/
externalVpnGateways/peer-gw].
NAME INTERFACE0
peer-gw PEER_GW_IP_0
356
Chapter 7 ■ Implementing Hybrid Connectivity
--network=NETWORK \
--asn=GOOGLE_ASN
Substitute as follows:
• ROUTER_NAME: The name of the Cloud Router, which must be in the same region
as your HA VPN gateway
• REGION: The Google Cloud region where you created your HA VPN gateway and you
will create your IPsec tunnel
• NETWORK: The name of your Google Cloud VPC network
• GOOGLE_ASN: Any private ASN (64512 through 65534, 4200000000 through
4294967294) that you are not already using in the peer network
■■Exam tip The Google ASN is used for all BGP sessions on the same Cloud Router, and it cannot be
changed later.
An example output is
Created [https://www.googleapis.com/compute/v1/projects/PROJECT_ID/regions/us-central1/
routers/router-a].
NAME REGION NETWORK
router-a us-central1 network-a
C
reating IPsec Tunnels
Now that you have your HA VPN gateway, the representation of your peer on-premises gateway, and your
Cloud Router, you can establish two IPsec tunnels, one for each interface on the HA VPN gateway.
When creating IPsec tunnels, specify the peer side of the IPsec tunnels as the external VPN gateway that
you created earlier.
In our simplest scenario, both IPsec tunnels connect to interface 0 of the external VPN gateway:
357
Chapter 7 ■ Implementing Hybrid Connectivity
Substitute as follows:
• TUNNEL_NAME_IF0 and TUNNEL_NAME_IF1: A name for each tunnel; naming the
tunnels by including the gateway interface name can help identify the tunnels later.
• PEER_GW_NAME: The name of the external peer gateway created earlier.
• PEER_EXT_GW_IF0: The interface number configured earlier on the external peer gateway.
• Optional: The --vpn-gateway-region is the region of the HA VPN gateway to
operate on. Its value should be the same as --region. If not specified, this option is
automatically set. This option overrides the default compute/region property value
for this command invocation.
• IKE_VERS: 1 for IKEv1 or 2 for IKEv2. If possible, use IKEv2 for the IKE version. If
your peer gateway requires IKEv1, replace --ike-version 2 with --ike-version 1. To
allow IPv6 traffic, you must specify IKEv2.
• SHARED_SECRET: Your preshared key (shared secret), which must correspond
with the preshared key for the partner tunnel that you create on your peer gateway;
for recommendations, see generate a strong preshared key.
• INT_NUM_0: The number 0 for the first interface on the HA VPN gateway that you
created earlier.
• INT_NUM_1: The number 1 for the second interface on the HA VPN gateway that
you created earlier.
An example output is
Created [https://www.googleapis.com/compute/v1/projects/PROJECT_ID/regions/us-central1/vpnTunnels/
tunnel-a-to-on-prem-if-0].
NAME REGION GATEWAY VPN_INTERFACE PEER_GATEWAY PEER_INTERFACE
tunnel-a-to-on-prem-if-0 us-central1 ha-vpn-gw-a 0 peer-gw 0
Created [https://www.googleapis.com/compute/v1/projects/PROJECT_ID/regions/us-central1/vpnTunnels/
tunnel-a-to-on-prem-if-1].
NAME REGION GATEWAY VPN_INTERFACE PEER_GATEWAY PEER_INTERFACE
tunnel-a-to-on-prem-if-1 us-central1 ha-vpn-gw-a 1 peer-gw 0
358
Chapter 7 ■ Implementing Hybrid Connectivity
• MASK_LENGTH: 30; each BGP session on the same Cloud Router must use a unique
/30 CIDR from the 169.254.0.0/16 block.
• TUNNEL_NAME_0 and TUNNEL_NAME_1: The tunnel associated with the HA VPN
gateway interface that you configured.
• AUTHENTICATION_KEY (optional): The secret key to use for MD5 authentication.
For each of the two IPsec tunnels, you must
• Add an interface to your Cloud Router associated to the IPsec tunnel
• Add a BGP peer to the interface
Let’s start with the first IPsec tunnel:
Substitute PEER_NAME_0 with a name for the peer VPN interface, and substitute PEER_ASN with the
ASN configured for your peer VPN gateway.
Let’s repeat the same for the second IPsec tunnel:
Substitute PEER_NAME_1 with a name for the peer VPN interface, and substitute PEER_ASN with the
ASN configured for your peer VPN gateway.
359
Chapter 7 ■ Implementing Hybrid Connectivity
bgp:
advertiseMode: DEFAULT
asn: 65001
bgpPeers:
- interfaceName: if-tunnel-a-to-on-prem-if-0
ipAddress: 169.254.0.1
name: bgp-peer-tunnel-a-to-on-prem-if-0
peerAsn: 65002
peerIpAddress: 169.254.0.2
- interfaceName: if-tunnel-a-to-on-prem-if-1
ipAddress: 169.254.1.1
name: bgp-peer-tunnel-a-to-on-prem-if-1
peerAsn: 65004
peerIpAddress: 169.254.1.2
creationTimestamp: '2018-10-18T11:58:41.704-07:00'
id: '4726715617198303502'
interfaces:
- ipRange: 169.254.0.1/30
linkedVpnTunnel: https://www.googleapis.com/compute/v1/projects/PROJECT_ID/regions/us-
central1/vpnTunnels/tunnel-a-to-on-prem-if-0
name: if-tunnel-a-to-on-prem-if-0
- ipRange: 169.254.1.1/30
linkedVpnTunnel: https://www.googleapis.com/compute/v1/projects/PROJECT_ID/regions/us-
central1/vpnTunnels/tunnel-a-to-on-prem-if-1
name: if-tunnel-a-to-on-prem-if-1
kind: compute#router
name: router-a
network: https://www.googleapis.com/compute/v1/projects/PROJECT_ID/global/networks/
network-a
region: https://www.googleapis.com/compute/v1/projects/PROJECT_ID/regions/us-central1
selfLink: https://www.googleapis.com/compute/v1/projects/PROJECT_ID/regions/us-central1/
routers/router-a
This completes the setup from Google Cloud. The final step is to configure your peer VPN gateway on-
premises. You will need assistance from your on-premises network administrator to properly configure it
and fully validate the IPsec tunnels and their fault tolerance.
360
Chapter 7 ■ Implementing Hybrid Connectivity
It is best practice to migrate your production traffic from Classic VPN to HA VPN, whenever possible.
The only scenario when you should retain Classic VPN is when your on-premises VPN devices don’t
support BGP and thus can’t be used with HA VPN.
However, you should consider upgrading those devices to solutions that support BGP, which is a more
flexible and reliable solution than static routing.
361
Chapter 7 ■ Implementing Hybrid Connectivity
Policy-Based Routing
Policy-based routing is a technique that forwards and routes data packets based on policies or filters.
Network administrators can selectively apply policies based on specific parameters such as source and
destination IP address, source or destination port, traffic type, protocols, access list, packet size, or other
criteria and then route the packets on user-defined routes.
Route-Based Routing
In contrast to policy-based routing, a route-based VPN works on IPsec tunnel interfaces as the endpoints
of your virtual network. All traffic passing through an IPsec tunnel interface is routed into the VPN. Rather
than relying on an explicit policy to dictate which traffic enters the VPN, static and/or dynamic IP routes are
formed to direct the desired traffic through the IPsec tunnel interface.
■■Exam tip A route-based VPN is required when there is a requirement for redundant VPN connections, or
there is a need for dynamic routing within an IPsec tunnel.
■■Exam tip Direct peering and carrier peering do not use Cloud Routers.
362
Chapter 7 ■ Implementing Hybrid Connectivity
The preceding command updates the base priority for routes advertised by ROUTER_NAME to its BGP peer
BGP_PEER_NAME with the integer number B.
For a visual representation on how MED applies to route priorities, we are going to use the example in
Chapter 3, section “Updating the Base Priority for Advertised Routes.”
363
Chapter 7 ■ Implementing Hybrid Connectivity
Assuming the inter-region cost is denoted by the letter C, here is a summary of how the MED is calcu-
lated, based on whether your Cloud Router’s VPC network operates in regional or global BGP routing mode:
• Regional dynamic routing mode: The Cloud Router only advertises prefixes of
subnet ranges in the same region in its VPC. Each range is advertised as a prefix with
priority: MED = B (Figure 7-9).
• Global dynamic routing mode: The Cloud Router advertises prefixes of subnet
ranges in the same region in its VPC. Each range is advertised as a prefix with
priority: MED = B. Additionally, the Cloud Router advertises prefixes of subnet ranges
in different regions in its VPC. Each range is advertised as a prefix with priority: MED
= B+C (Figure 7-10).
Figure 7-9. MED for VPCs configured with regional BGP routing mode
Notice in Figure 7-9 how the two Cloud Routers router-a and router-b advertise prefixes of subnet
routes in their own region only, that is, us-east1.
There are no prefixes of subnet routes in the us-central1 route table.
This is because the two VPC networks your-app-shared-vpc and your-app-connected-vpc are config-
ured with regional BGP routing mode.
Figure 7-10 shows the effect of updating them with global BGP routing mode.
364
Chapter 7 ■ Implementing Hybrid Connectivity
Figure 7-10. MED for VPCs configured with global BGP routing mode
This time, the two cloud routers router-a and router-b advertise prefixes of subnet routes in all regions
of their respective VPC.
As a result, both route tables are populated with prefixes of subnet routes.
Notice how the priorities for prefixes of subnets in us-central1 (as shown in the lower table in
Figure 7-10) carry the extra inter-region cost C=205.
■■Exam tip The inter-region cost is an integer number between 201 and 9999, inclusive. It is defined by
Google and is specific to the exact combination of the two regions, that is, the region of the subnet whose prefix
is being advertised (e.g., us-east1) and the region of the BGP peer router (e.g., us-central1). This number
may vary over time based on factors such as network performance, latency, distance, and available bandwidth
between regions.
When your BGP peer routers receive the advertised prefixes and their priorities, they create routes that
are used to send packets to your VPC network.
365
Chapter 7 ■ Implementing Hybrid Connectivity
• For Partner Interconnect, Google Cloud selects unused link-local IPv4 addresses
automatically.
• For HA VPN and Classic VPN using dynamic routing, you can specify the BGP
peering IP addresses when you create the BGP interface on the Cloud Router.
Router appliances use internal IPv4 addresses of Google Cloud VMs as BGP IP addresses.
IPv6 Support
Cloud Router can exchange IPv6 prefixes, but only over BGP IPv4 sessions.
■■Exam tip Cloud Router does not support BGP IPv6 sessions natively.
For Cloud Router to be able to exchange IPv6 prefixes, the subnets must be configured to operate in
dual stack mode (by using the flag --stack-type=IPV4_IPV6 in the gcloud compute networks subnets
create/update commands), and you must enable IPv6 prefix exchange in an existing BGP IPv4 session by
toggling the --enable-ipv6 flag in the gcloud compute routers update-bgp-peer command.
By default, internal IPv6 subnet ranges are advertised automatically.
External IPv6 subnet ranges are not advertised automatically, but you can advertise them manually by
using custom route advertisements.
You can enable IPv6 prefix exchange in BGP sessions that are created for HA VPN tunnels.
366
Chapter 7 ■ Implementing Hybrid Connectivity
■■Note Remember, the subnet routes to be excluded can be only the ones in your Cloud Router’s region or
all the subnet routes in all regions spanned by your Cloud Router’s VPC. This set of subnet routes is based on
whether your Cloud Router’s VPC network BGP routing mode is set to regional or global.
Resilience
It is best practice to enable Bidirectional Forwarding Detection (BFD) on your Cloud Routers and on your
peer BGP routers (on-premises or in other clouds) if they support this feature.
BFD is a UDP-based detection protocol that provides a low-overhead method of detecting failures in the
forwarding path between two adjacent routers.
367
Chapter 7 ■ Implementing Hybrid Connectivity
When configured with default settings, BFD detects failure in 5 seconds, compared to 60 seconds for
BGP-based failure detection. With BFD implemented on Cloud Router, end-to-end detection time can be as
short as 5 seconds.
Enabling BFD on both sides will make your network more resilient.
You can enable BFD on your Cloud Router by setting the --bfd-session-initialization-mode flag to
ACTIVE in the gcloud compute routers add-bgp-peer/update-bgp-peer commands, as shown in the fol-
lowing code snippet:
■■Exam tip You don’t need to know all the BFD settings to pass the exam. What you need to know instead is that
enabling BFD is one way to achieve resilience, and the way you do it is by setting the BFD_SESSION_INITIALIZATION_
MODE to ACTIVE. To learn how to configure BFD in detail, use the Google Cloud documentation: https://cloud.
google.com/network-connectivity/docs/router/concepts/bfd#bfd-settings.
368
Chapter 7 ■ Implementing Hybrid Connectivity
Reliability
Enable graceful restart on your on-premises BGP device. With graceful restart, traffic between networks isn’t
disrupted in the event of a Cloud Router or on-premises BGP device failure—as long as the BGP session is
reestablished within the graceful restart period.
For high reliability, set up redundant routers and BGP peers, even if your on-premises device supports
graceful restart. In the event of nontransient failures, you are protected even if one path fails.
Last, to ensure that you do not exceed Cloud Router limits, use Cloud Monitoring to create alerting
policies. For example, you can use the metrics for learned routes to create alerting policies for the limits for
learned routes.
High Availability
If graceful restart is not supported or enabled on your device, configure two on-premises BGP devices with
one tunnel each to provide redundancy. If you don’t configure two separate on-premises devices, Cloud
VPN tunnel traffic can be disrupted in the event of a Cloud Router or an on-premises BGP device failure.
Notice that in this configuration, the redundant on-premises BGP device provides failover only and not
load sharing.
Security
Enable MD5 authentication on your BGP peers, if they support this feature.
This will add an extra layer of security to your BGP sessions.
Exam Questions
Question 7.1 (Interconnect Attachments)
You need to give each member of your network operations team least-privilege access to create, modify, and
delete Cloud Interconnect VLAN attachments.
What should you do?
A.
Assign each user the editor role.
B.
Assign each user the compute.networkAdmin role.
C.
Give each user the following permissions only: compute.
interconnectAttachments.create, compute.interconnectAttachments.get.
D.
Give each user the following permissions only: compute.
interconnectAttachments.create, compute.interconnectAttachments.get,
compute.routers.create,
compute.routers.get,
compute.routers.update.
369
Chapter 7 ■ Implementing Hybrid Connectivity
Rationale
A is incorrect because the editor role is too permissive. The editor role contains
permissions to create and delete resources for most Google Cloud services.
B is CORRECT because it contains the minimum set of permissions to create,
modify, and delete Cloud Interconnect VLAN attachments. You learned this
in the “VLAN Attachments” section. The keyword to consider in this question
is the ability to delete VLAN attachments, whose permission is included in the
compute.networkAdmin role, but is not included in the permissions in answer D.
C is incorrect because it doesn’t include the permission to delete VLAN
attachments.
D is incorrect because it doesn’t include the permission to delete VLAN
attachments.
Rationale
A is incorrect because the packets are directed to your on-premises resource,
which is located behind a Cloud VPN gateway.
B is incorrect because the name of the tunnel is required, and not the IP address
of the Cloud VPN gateway.
C is CORRECT because you can use the gcloud compute routes create
command with the flags --next-hop-vpn-tunnel and --next-hop-vpn-
tunnel-region.
D is incorrect because the custom static route won’t be able to directly reach the
on-premises resource without going through the VPN gateways.
370
Chapter 7 ■ Implementing Hybrid Connectivity
Rationale
A is CORRECT because it’s the cheapest and the quickest option to set up that
meets the requirements.
B, C, D are incorrect because Dedicated and Partner Interconnect are not as
cheap as Cloud VPN, and their setup will likely take more than 24 hours.
Rationale
A is incorrect because you request the VLAN attachment from Google Cloud, and
not from your partner’s portal.
B is incorrect because after establishing connectivity with a supported service
provider (partner)—as explained in Figure 7-5—the next step is to create a VLAN
attachment and retrieve the pairing key resulting from the VLAN attachment
creation. You will need the pairing key to provision a physical connection to
Google Cloud using the partner’s portal.
C is CORRECT because you need to create a VLAN attachment from Google
Cloud, as explained in Figure 7-5, where the gcloud CLI is used. You can also
use the Google Cloud console.
D is incorrect because the specified gcloud command is used to activate the
Partner Interconnect connection, as explained in Figure 7-5.
371
Chapter 7 ■ Implementing Hybrid Connectivity
Rationale
A is incorrect because your Cloud Router should be centralized and not
distributed across all VPC networks.
B is incorrect because the IT service project admin doesn’t need to create other
VLAN attachments or Cloud Routers in the IT service project.
C is CORRECT because you must create VLAN attachments and Cloud
Routers for an Interconnect connection only in the Shared VPC host project.
D is incorrect because service project admins don’t need to create other VLAN
attachments or Cloud Routers in the service projects.
372
CHAPTER 8
This is the last chapter of our study. You’ve come a long way from the beginning of this book, where you
learned the tenets of “well architecting” a Google Cloud network. You then learned in Chapter 3 how to
implement Virtual Private Cloud (VPC) networks. The concept of a VPC as a logical routing domain is the
basis of every network architecture in the cloud. As a natural progression, in Chapters 5 and 6 you learned
how to leverage the wide spectrum of network services, which uniquely differentiate Google Cloud from
other public cloud service providers. Last, you learned in Chapter 7 how to implement hybrid topologies—
which are prevalent in any sector—along with all considerations related to resilience, fault tolerance,
security, and cost.
Now that you have all your network infrastructure set up and running, what’s next?
Well, you (and your team of Google Cloud professional network engineers) are in charge of maintaining
this infrastructure to make sure it operates in accordance with the SLOs (Service Level Objectives) for your
workloads.
In this chapter, you will learn how to use the products and services offered by Google Cloud to assist you
in this compelling task.
However, Cloud Operations is a lot more than just a suite of products to troubleshoot incidents. When
combined with Google Cloud data engineering, data analytics, and machine learning products, Cloud
Operations can proactively detect anomalies, identify trends, reveal security threats, and most importantly
respond to them in a timely fashion.
Cloud Logging
As illustrated in Figure 8-2, Cloud Logging is a fully managed service that allows you to collect, store, and
route (forward) logs and events from Google Cloud, from other clouds (e.g., AWS, Azure, etc.), and from on-
premises infrastructure. As of the writing of this book, you can collect logging data from over 150 common
application components.
374
Chapter 8 ■ Managing Network Operations
Cloud Logging includes built-in storage for logs called log buckets, a user interface called the Logs
Explorer, and an API to manage logs programmatically (see Figure 8-3). Cloud Logging lets you read and
write log entries, query your logs with advanced filtering capabilities, and control how and where you want
to forward your logs for further analysis or for compliance.
By default, your Google Cloud project automatically stores all logs it receives in a Cloud Logging log
bucket referred to as _Default. For example, if you create a Cloud Router, then all logs your Cloud Router
generates are automatically stored for you in this bucket. However, if you need to, you can configure a
number of aspects about your log storage, such as which logs are stored, which are discarded, and where the
logs are stored.
375
Chapter 8 ■ Managing Network Operations
As you can see in Figure 8-3, the vpc-host-nonprod project contains two log buckets, _Default and
_Required. They are both globally scoped and have a retention of 30 days and 400 days, respectively. The
_Required log bucket is also automatically generated by Google. This bucket is locked to indicate that it
cannot be updated.
■■Exam tip By locking a log bucket, you are preventing any updates on the bucket. This includes the log
bucket’s retention policy. As a result, you can't delete the bucket until every log in the bucket has fulfilled the
bucket's retention period. Also, locking a log bucket is irreversible.
You can also route, or forward, log entries to the following destinations, which can be in the same
Google Cloud project or in a different Google Cloud project:
• Cloud Logging log buckets: Provides built-in storage in Cloud Logging. A log bucket
can store logs collected by multiple Google Cloud projects. You specify the data
retention period, the data storage location, and the log views on a log bucket. Log
views let you control which logs in a log bucket a user is authorized to access. Log
buckets are recommended storage when you want to troubleshoot your applications
and services, or you want to quickly analyze your log data. Analysis on your log
bucket data can be performed by enabling Log Analytics and then by linking the log
bucket to BigQuery.
• Pub/Sub topics: Provides support for third-party integrations, such as Splunk. Log
entries are formatted into JSON and then delivered to a Pub/Sub topic. You can then
use Dataflow to process your log data and stream it to other destinations.
376
Chapter 8 ■ Managing Network Operations
• BigQuery datasets: Provides storage of log entries in BigQuery datasets. You can use
big data analysis capabilities on the stored logs. If you need to combine your Cloud
Logging data with other data sources, then you can route your logs to BigQuery. An
alternative is to store your logs in log buckets that are upgraded to use Log Analytics
and then are linked to BigQuery—these are known as external tables in BigQuery.
• Cloud Storage buckets: Provides inexpensive, archival storage of log data in Cloud
Storage. Log entries are stored as JSON files.
Log Types
Logs are classified in the following categories (not mutually exclusive):
• Platform logs: These are logs written by the Google Cloud services you use in your
project. These logs can help you debug and troubleshoot issues and help you better
understand the Google Cloud services you’re using. For example, VPC Flow Logs
record a sample of network flows sent from and received by VMs.
• Component logs: These are similar to platform logs, but they are generated by
Google-provided software components that run on your systems. For example, GKE
provides software components that users can run on their own VM or in their own
data center. Logs are generated from the user’s GKE instances and sent to a user’s
cloud project. GKE uses the logs or their metadata to provide user support.
• Security logs: These logs help you answer “who did what, where, and when” and are
comprised of
• Cloud Audit Logs, which provide information about administrative activities
and accesses within your Google Cloud resources. Enabling audit logs helps
your security, auditing, and compliance entities monitor Google Cloud data and
systems for possible vulnerabilities or external data misuse, for example, data
exfiltration.
• Access Transparency logs, which provide you with logs of actions taken by
Google staff when accessing your Google Cloud content. Access Transparency
logs can help you track compliance with your legal and regulatory requirements
for your organization.
• User-written logs: These are logs written by custom applications and services.
Typically, these logs are written to Cloud Logging by using one of the following
methods:
• Ops agent or the Logging agent (based on the fluentd open source data collector)
• Cloud Logging API
• Cloud Logging client libraries, for example, the gcloud CLI
• Multi-cloud logs and hybrid cloud logs: These refer to logs from other cloud
service providers like Microsoft Azure, or AWS, and also logs from your on-premises
infrastructure.
377
Chapter 8 ■ Managing Network Operations
In this example, I’ll create a Cloud Router (Figure 8-4), and I’ll show you how to read logs from the
_Default bucket (Figure 8-5).
To read logs, use the gcloud logging read command as shown in Figure 8-5.
Figure 8-5. Cloud Audit log entry (request) for router creation operation
In the first rectangle, you can see the filter I used “resource.type=gce_router” to select only logs
applicable to cloud routers.
As you can see, a Cloud Audit log was captured, which included detailed information on who (gianni@
dariokart.com) performed the action (type.googleapis.com/compute.routers.insert) and when, along
with a wealth of useful metadata.
In the fourth rectangle, you can see the resulting Cloud Router resource name.
The second page of the log entry (Figure 8-6) shows the response.
The rectangle shows that the resource gianni-router was successfully created, and it’s in
running status.
378
Chapter 8 ■ Managing Network Operations
■■Note When a request requires a long-running operation, Cloud Logging adds multiple log entries.
The operation node in Figure 8-5 includes a last:true key-value pair to indicate that this is the last
entry logged.
To avoid incurring charges, I am going to delete the Cloud Router, and I will verify that this operation
(deletion) is properly logged. Figure 8-7 shows you how to delete the Cloud Router. Notice that the --region
flag is provided because a Cloud Router is a regional resource.
Figure 8-8 shows one of the two log entries (the last) for the delete operation.
379
Chapter 8 ■ Managing Network Operations
Figure 8-8. Cloud Audit log entry (request) for router deletion operation
Note in Figure 8-8 the use of the flag --freshness=t10m to retrieve the latest log entries within the past
ten minutes.
Before moving to the next section, and for the sake of completeness, I want to quickly show you an
alternative—equally expressive—approach to reading logs by using the provided user interface, referred to
as the Logs Explorer.
■■Note My preference, and general recommendation for the exam, is that you get very familiar with the
gcloud CLI, instead of the tools offered by the console, like Logs Explorer. This is because the exam is focused
on gcloud, and there’s a reason behind it. User interfaces available in the console change frequently, whereas
the gcloud CLI and the Google Cloud REST APIs don’t change as frequently as user interfaces. Also, gcloud
code is a natural path into Infrastructure as Code (e.g., Terraform), which is strongly encouraged. Nevertheless,
there are a few compelling use cases where the console is required because the gcloud CLI doesn’t offer the
expected functionality.
With that being said, in Figure 8-9 I included a screenshot of my Log Explorer session during my
troubleshooting of the global external HTTP(S) load balancer (classic) with NEGs you learned in Chapter 5.
380
Chapter 8 ■ Managing Network Operations
Cloud Monitoring
Cloud Monitoring uses the logs managed by Cloud Logging to automatically capture metrics, for example,
firewall/dropped_bytes_count, https/backend_latencies, and many others.
Metrics are grouped by cloud services (e.g., GCP services, AWS services), by agents, and by third-party
applications. For a comprehensive list of metrics, visit https://cloud.google.com/monitoring/api/
metrics.
Cloud Monitoring also allows you to create custom metrics based on your workloads’ unique business
and technical requirements.
Using the metrics explorer and the monitoring query language, you can analyze your workload’s metrics
on the fly, discovering correlations, trends, and abnormal behavior.
You can leverage these insights to build an overall view of health and performance of your workloads’
code and infrastructure, making it easy to spot anomalies using Google Cloud visualization products.
This is all great information, but you cannot just sit and watch, right? You need a more proactive
approach for your SRE team, so that when an abnormal behavior (or an incident) is detected, you can
promptly respond to it.
381
Chapter 8 ■ Managing Network Operations
This is when Cloud Monitoring alerts come into play. With alerts, you can create policies on
performance metrics, uptime, and Service Level Objectives (SLOs). These policies will ensure your SRE team
and network engineers are promptly notified and are ready to respond when your workloads don’t perform
as expected.
Figure 8-10 summarizes the Cloud Monitoring capabilities.
In the next section, we’ll use a simple example to show you how Cloud Monitoring can notify your SRE
team and help them respond to incidents.
382
Chapter 8 ■ Managing Network Operations
Figure 8-11. A JSON file defining the filter for a custom metric
Finally, we will test the alerting policy by performing an action that triggers an alert and notifies the
selected recipients in the channel configured for the alerting policy.
First, let’s create a custom metric that measures the number of times a route was changed (created or
deleted) in our shared VPC.
You can create a custom metric using the gcloud logging metrics create command, which requires
you to define your metric during the creation process.
You can pass the definition of your custom metric by using the --log-filter flag or the --config-from-file
flag. The former accepts the filter inline; the latter expects the path to a JSON or a YAML file such as the one in
Figure 8-11, which includes the metric definition. To learn how this JSON or YAML file needs to be structured, visit
https://cloud.google.com/logging/docs/reference/v2/rest/v2/projects.metrics#resource:-
logmetric
Figure 8-11 shows our JSON file. Notice how the filter property is expressed to tell the policy to raise an
alert for resources whose type is a gce_route and whenever a route is deleted or inserted:
The custom metric alone doesn’t alert anyone. All it does is measure the number of times a route was
deleted or created in our shared VPC.
For this custom metric to be effective, it must be associated with an alerting policy, which needs to
know at a minimum
1.
To whom a notification should be sent
2.
When to raise an alert for this metric
The first item requires you to create a notification channel (Figure 8-13).
383
Chapter 8 ■ Managing Network Operations
■■Note Alpha is a limited availability test before releases are cleared for more widespread use. Google’s
focus with alpha testing is to verify functionality and gather feedback from a limited set of customers. Typically,
alpha participation is by invitation and subject to pre-general availability terms. Alpha releases may not contain
all features, no SLAs are provided, and there are no technical support obligations. However, alphas are generally
suitable for use in test environments. Alpha precedes beta, which precedes GA (general availability).
The second item is addressed by defining a triggering condition when you create the alerting policy.
To walk you through the process—this time, as an exception—I’ll use the console. Go to Cloud Logging,
and select the log-based metric section. This will bring you to the page displayed in Figure 8-14.
Click the vertical ellipsis icon as indicated in Figure 8-14 with the arrow, and select “Create alert from
metric,” as indicated in Figure 8-15.
384
Chapter 8 ■ Managing Network Operations
In the next screen, select “Notifications and name,” check “dariokart_channel,” and click OK, as
indicated in Figure 8-16.
Upon confirming your channel, you get a notification that multiple channels are recommended for
redundancy (Figure 8-17).
385
Chapter 8 ■ Managing Network Operations
Last, scroll down, assign a name to the alerting policy, and click “Create Policy,” as indicated in
Figure 8-18.
Our monitoring alerting policy is ready to go.
Before testing our alerting policy, let’s have a look at it.
386
Chapter 8 ■ Managing Network Operations
In Figure 8-19, you can see our newly created alerting policy’s details.
387
Chapter 8 ■ Managing Network Operations
■■Note To understand each element of an alerting policy and a metrics threshold, visit
respectively: https://cloud.google.com/monitoring/api/ref_v3/rest/v3/projects.
alertPolicies#resource:-alertpolicy
https://cloud.google.com/monitoring/api/ref_v3/rest/v3/projects.alertPolicies#metr
icthreshold
Second, our policy has one condition, which is based on our custom metric. The condition also uses a
threshold, which tells you any count greater than zero should trigger an alert.
This is formalized in the filter, the trigger, and the comparison properties of the
conditionThreshold object.
Now that you understand how to create a monitoring alerting policy and its components, we can finally
test it by deliberately creating an event that will trigger an alert.
Our custom metric measures the number of times a route has changed (created or deleted) in our
shared VPC. The easiest way to test this policy is by deleting an existing route. Let’s try!
388
Chapter 8 ■ Managing Network Operations
In Figure 8-20, I first made sure that the route I wanted to delete existed in our shared VPC. Upon
confirming its existence, I used the gcloud compute routes delete command to delete the custom static
route goto-restricted-apis.
Shortly after, I used the Cloud Monitoring dashboard to verify whether the alerting policy detected a
triggering event.
As you can see in Figure 8-21, the event was captured (as pointed to by the arrow). The rectangle shows
the condition threshold, as defined in our alerting policy.
As a result, an incident was added (Figure 8-22), and an email alert was sent to the recipient(s) of the
notification channel, as shown in Figure 8-23.
389
Chapter 8 ■ Managing Network Operations
Before moving to the next section, here’s a quick note about costs.
390
Chapter 8 ■ Managing Network Operations
The good news is that there are no costs associated with using alerting policies. For more information,
visit https://cloud.google.com/monitoring/alerts#limits.
This concludes our section about logging and monitoring network components using the Cloud
Operations suite. In the next section, you will learn how to address security-related network operations.
391
Chapter 8 ■ Managing Network Operations
Figure 8-24. Network firewall policies overview. Portions of this page are reproduced under the CC-BY license
and shared by Google: https://cloud.google.com/firewall
392
Chapter 8 ■ Managing Network Operations
■■Note A service account is a “tricky” type of identity in that it is simultaneously an identity and a resource.
B denotes the verb or the action A wants to perform on resource C. In this context, the verb is expressed
as a Google Cloud permission, for example, compute.instanceGroups.create.
■■Exam tip Remember the difference between a permission and a role. The former always denotes
a single verb, whereas the latter denotes a group of permissions. For example, a permission can be
compute.instances.create, while a role (including the permission) can be Compute Admin (roles/
compute.admin).
C denotes the Google Cloud resource the actor A wants to perform an action on. The term resource is
not intended to be used in its generic form (e.g., an object or an entity). Instead, I used the term resource
deliberately because C is really a REST (Representational State Transfer) resource. As a result, C is always
uniquely identified by a URI (Uniform Resource Identifier).
Whether you are a software (cloud) engineer, a software (cloud) architect, a database administrator, or
simply a user of an application, chances are you ran into a permission issue, for example, you encountered
the number 403 forbidden while browsing the Web.
Google Cloud provides an effective tool to quickly diagnose and resolve IAM permission issues. The
tool is called the Policy Troubleshooter and can be accessed using the Google Cloud console, the gcloud CLI,
or the REST API.
Policy Troubleshooter
Given the aforementioned three inputs, that is, A, B, and C, Policy Troubleshooter examines all Identity and
Access Management (IAM) policies that apply to the resource (C). It then determines whether the principal’s
roles include the permission on that resource and, if so, which policies bind the principal to those roles.
When used from the console, Policy Troubleshooter also determines whether there are any deny
policies that could impact the principal’s access. The gcloud CLI and the REST API don’t provide
information about deny policies.
Let’s see how it works with an example.
I want to verify whether user gianni is allowed to delete subnet-backend in our shared VPC. Simply put:
• A = [email protected]
• B = compute.subnetworks.delete
• C = subnet-backend
Since I want to find out about allow and deny resource policies for subnet-backend—not just the IAM
(allow) policies, I am going to use the Policy Troubleshooter from the console.
From the “IAM & Admin” menu in the left bar, select “Policy Troubleshooter” and fill out the form as
follows.
The resource field in Figure 8-25 dynamically populates the resources the selected principal—gianni@
dariokart.com—has visibility on.
393
Chapter 8 ■ Managing Network Operations
Figure 8-25. Selecting the resource in Policy Troubleshooter form for gianni
Figure 8-26 shows the selection of the permission for the same principal.
Figure 8-26. Selecting the permission in Policy Troubleshooter form for gianni
When I clicked the “Check Access” button, the Policy Troubleshooter confirmed that the principal
[email protected] has permission to delete subnet-backend as shown in Figure 8-27.
394
Chapter 8 ■ Managing Network Operations
If you remember the way our shared VPC was set up in Chapter 3, this makes sense because principal
[email protected] was granted the compute.networkAdmin role at the organization level (Figure 3-57),
which includes the permission to delete subnets.
Figure 8-28 shows the outcome of this verification.
395
Chapter 8 ■ Managing Network Operations
396
Chapter 8 ■ Managing Network Operations
Next, let’s test the same permission on the same resource from the standpoint of another principal.
In Figure 8-30, I am selecting one of the two service project administrators, for example, samuele@
dariokart.com.
Figure 8-30. Selecting the resource in Policy Troubleshooter form for samuele
397
Chapter 8 ■ Managing Network Operations
When I clicked the “Check Access” button, the Policy Troubleshooter confirmed that the principal
[email protected] does not have permission to delete subnet-backend as shown in Figure 8-31.
398
Chapter 8 ■ Managing Network Operations
This outcome makes sense as well, because as shown in Figure 8-32 the principal samuele@dariokart.
com has the editor role in the backend-devs project and the compute.networkUser role in subnet-backend.
The latter role doesn’t include the permission to delete subnets as shown in Figure 8-33.
399
Chapter 8 ■ Managing Network Operations
■■Exam tip The compute.networkAdmin role gives you permissions to create, modify, and delete
networking resources, except for firewall rules and SSL certificates. You need to be a member of the compute.
securityAdmin role to be able to create, modify, and delete firewall rules and SSL certificates.
In this section, you learned how to manage network security operations for firewalls and IAM. You
learned how network firewall policies are an effective way to control firewall rules at different levels
of your organization. You also learned how to use the Policy Troubleshooter to diagnose and resolve
IAM permission issues. In the next section, we will shift our focus on best practices to resolve common
connectivity issues.
400
Chapter 8 ■ Managing Network Operations
H
ow It Works
Connection draining uses a timeout setting on the load balancer’s backend service, whose duration must be
from 0 to 3600 seconds inclusive.
For the specified duration of the timeout, existing requests to the removed VM or endpoint are given
time to complete. The load balancer does not send new TCP connections to the removed VM. After the
timeout duration is reached, all remaining connections to the VM are closed.
The events that trigger connection draining are
• Manual removal of a VM from an instance group
• Programmatic removal of VM from a managed instance group (MIG) by performing
a resize(), deleteInstances(), recreateInstances(), or abandonInstances()
REST API call
• Removal of an instance group from a backend service
• Deletion of a VM as part of autoscaling
• MIG update using the managed instance group updater
• Manual removal of an endpoint from a zonal NEG
It can take up to 60 seconds after your specified timeout duration has elapsed for the VM in the
(managed) instance group to be terminated.
■■Exam tip If you enable connection draining on multiple backend services that share the same instance
groups or NEGs, the largest timeout value is used.
401
Chapter 8 ■ Managing Network Operations
How It Works
VPC Flow Logs collects samples of each VM’s TCP, UDP, ICMP, ESP (Encapsulating Security Payload), and
GRE (Generic Routing Encapsulation) protocol flows.
Samples of both inbound and outbound flows are collected as shown in Figure 8-34.
402
Chapter 8 ■ Managing Network Operations
403
Chapter 8 ■ Managing Network Operations
■■Exam tip When enabled, VPC Flow Logs collects flow samples for all the VMs in the subnet. You cannot
pick and choose which VM should have flow logs collected—it’s all or nothing.
You enable VPC Flow Logs using the --enable-flow-logs flag when you create or update a subnet
with the gcloud CLI. The following code snippets show how to enable VPC Flow Logs at subnet creation and
update time, respectively:
404
Chapter 8 ■ Managing Network Operations
C
ost Considerations
As you can see in Figure 8-34, the two arrows ingest flow log samples into Cloud Logging at a frequency
specified by the AGGREGATION_INTERVAL variable.
You can also control the amount of sampling (from zero to one inclusive) with the SAMPLE_RATE
variable, but if you are not careful, there may be a large volume of data collected for each VM in your subnet.
These may result in significant charges.
Luckily, the console provides a view of the estimated logs generated per day based on the assumption
that the AGGREGATION_INTERVAL is the default value of five seconds. The estimate is also based on data
collected over the previous seven days. You can use this estimated volume to have an idea on how much
enabling VPC Flow Logging would cost you.
resource.type="gce_subnetwork"
logName="projects/PROJECT_ID/logs/compute.googleapis.com%2Fvpc_flows"
To view flow logs for a specific subnet in your project (that have VPC Flow Logs enabled), use this
logging query:
resource.type="gce_subnetwork"
logName="projects/PROJECT_ID/logs/compute.googleapis.com%2Fvpc_flows"
resource.labels.subnetwork_name="SUBNET_NAME"
To view flow logs for a specific VM in your project (whose NIC is attached to a subnet that has VPC Flow
Logs enabled), use this logging query:
resource.type="gce_subnetwork"
logName="projects/PROJECT_ID/logs/compute.googleapis.com%2Fvpc_flows"
jsonPayload.src_instance.vm_name="VM_NAME"
To view flow logs for a specific CIDR block, use this logging query:
resource.type="gce_subnetwork"
logName="projects/PROJECT_ID/logs/compute.googleapis.com%2Fvpc_flows"
ip_in_net(jsonPayload.connection.dest_ip, CIDR_BLOCK)
To view flow logs for a specific GKE cluster, use this logging query:
resource.type="k8s_cluster"
logName="projects/PROJECT_ID/logs/vpc_flows"
resource.labels.cluster_name="CLUSTER_NAME"
To view flow logs for only egress traffic from a subnet, use this logging query:
405
Chapter 8 ■ Managing Network Operations
logName="projects/PROJECT_ID/logs/compute.googleapis.com%2Fvpc_flows" AND
jsonPayload.reporter="SRC" AND
jsonPayload.src_vpc.subnetwork_name="SUBNET_NAME" AND
(jsonPayload.dest_vpc.subnetwork_name!="SUBNET_NAME" OR NOT jsonPayload.dest_vpc.
subnetwork_name:*)
To view flow logs for all egress traffic from a VPC network, use this logging query:
logName="projects/PROJECT_ID/logs/compute.googleapis.com%2Fvpc_flows" AND
jsonPayload.reporter="SRC" AND
jsonPayload.src_vpc.vpc_name="VPC_NAME" AND
(jsonPayload.dest_vpc.vpc_name!="VPC_NAME" OR NOT jsonPayload.dest_vpc:*)
To view flow logs for specific ports and protocols, use this logging query:
resource.type="gce_subnetwork"
logName="projects/PROJECT_ID/logs/compute.googleapis.com%2Fvpc_flows"
jsonPayload.connection.dest_port=PORT
jsonPayload.connection.protocol=PROTOCOL
This concludes the section about VPC Flow Logs. In the next section, you will learn when to enable
firewall logs and firewall insights and which option better suits your network operations needs.
■■Exam tip You cannot enable Firewall Rules Logging for the implied deny ingress and implied allow
egress rules. For more details about the implied rules, visit https://cloud.google.com/vpc/docs/
firewalls#default_firewall_rules.
406
Chapter 8 ■ Managing Network Operations
In Figure 8-35, you can see how each of the two firewall rules associated to the VPC has its own log
stream, which ingests connection records to Cloud Logging.
In contrast to VPC Flow Logs, firewall rule logs are not sampled. Instead, connection records—whether
the connections are allowed or denied—are continuously collected and sent to Cloud Logging.
As shown in Figure 8-35, each connection record includes the source and destination IP addresses, the
protocol and ports, date and time, and a reference to the firewall rule that applied to the traffic.
The figure also reminds you—as you learned in Chapters 2 and 3—that firewall rules are defined at the
VPC level because they are global resources. They also operate as distributed, software-defined firewalls. As
a result, they don’t become choke points as traditional firewalls.
To enable or disable Firewall Rules Logging for an existing firewall rule, follow these directions. When
you enable logging, you can control whether metadata fields are included. If you omit them, you can save on
storage costs.
407
Chapter 8 ■ Managing Network Operations
■■Exam tip Firewall rule logs are created in the project that hosts the network containing the VM instances
and firewall rules. With Shared VPC, VM instances are created and billed in service projects, but they use a
Shared VPC network located in the host project. As a result, firewall rule logs are stored in the host project.
Firewall rule logs are initially stored in Cloud Logging. Here are some guidelines on how to filter the
data in Cloud Logging to select the firewall rule logs that best suit your network operation needs.
To view all firewall logs, use this logging query:
resource.type="gce_subnetwork"
logName="projects/PROJECT_ID/logs/compute.googleapis.com%2Ffirewall"
To view firewall logs specific to a given subnet, use this logging query:
resource.type="gce_subnetwork"
logName="projects/PROJECT_ID/logs/compute.googleapis.com%2Ffirewall"
resource.labels.subnetwork_name="SUBNET_NAME"
To view firewall logs specific to a given VM, use this logging query:
resource.type="gce_subnetwork"
logName="projects/PROJECT_ID/logs/compute.googleapis.com%2Ffirewall"
jsonPayload.instance.vm_name="INSTANCE_ID"
To view firewall logs for connections from a specific country, use this logging query:
resource.type="gce_subnetwork"
logName="projects/PROJECT_ID/logs/compute.googleapis.com%2Ffirewall"
jsonPayload.remote_location.country=COUNTRY
where the variable COUNTRY denotes the ISO 3166-1alpha-3 code of the country whose connections
you are inquiring about.
408
Chapter 8 ■ Managing Network Operations
Firewall Insights
Firewall Insights analyzes your firewall rules and provides guidance on how you can optimize them. It
produces insights, recommendations, and metrics about how your firewall rules are being used. Firewall
Insights also uses machine learning to predict future firewall rules usage.
For example, by using Firewall Insights you learn which firewall rules are overly permissive, and you can
leverage the generated recommendations to make them more strict.
Additional insights include firewall rules that overlap existing rules, rules with no hits, and unused
firewall rule attributes such as IP address and port ranges. Insights are classified as follows:
• Shadowed firewall rule insights, which are derived from data about how you have
configured your firewall rules. A shadowed rule shares attributes—such as IP address
ranges—with other rules of higher or equal priority.
• Overly permissive rule insights, including each of the following:
• Allow rules with no hits
• Allow rules with unused attributes
• Allow rules with overly permissive IP addresses or port ranges
• Deny rule insights with no hits during the observation period.
Firewall Insights uses the logs produced by Firewall Rules Logging to generate metrics. You can use
these metrics to determine whether your firewall rules protect your workloads as expected and—most
importantly—in accordance with the least privilege principle and cloud security best practices.
You can also use these metrics to discover malicious attempts to access your network and act on it by
leveraging Cloud Monitoring.
Insights and recommendations are accessible through the gcloud CLI by using one of the three
following commands, the first of which lists the names of the available insights whose type is google.
compute.firewall.Insight:
• gcloud recommender insights list --insight-type=google.compute.firewall.
Insight --location=LOCATION
• gcloud recommender insights describe INSIGHT --insight-type=google.
compute.firewall.Insight --location=LOCATION
• gcloud recommender insights mark-accepted INSIGHT --etag=ETAG --insight-
type= google.compute.firewall.Insight --location=LOCATION
Replace as follows:
• LOCATION: The location defining the scope of your insights, for example, global
• INSIGHT: The name of the insight resource
• ETAG: Fingerprint of the insight. Provides optimistic locking when updating
states. For more info, visit https://cloud.google.com/storage/docs/hashes-
etags#etags.
Insights and recommendations can also be viewed with the console in one of the following locations:
• On the Network Intelligence page
• On the details page for a VPC firewall rule
• On the details page for a VPC network interface
Firewall Insights metrics can be viewed using Cloud Monitoring.
409
Chapter 8 ■ Managing Network Operations
To view IPsec tunnel details, use the gcloud compute vpn-tunnels describe command as follows:
The output will return a message that shows the status of the investigated IPsec tunnel.
If the detailedStatus property value doesn’t provide enough information to help you resolve the issue,
you can perform root cause analysis by using the following checklist:
1.
Verify that the remote peer IP address configured on the Cloud VPN gateway is
correct.
2.
Verify that traffic flowing from your on-premises hosts is reaching the peer
gateway.
3.
Verify that traffic is flowing between the two VPN gateways in both directions.
In the VPN logs, check for reported incoming messages from the other VPN
gateway.
4.
Check that the configured IKE versions are the same on both sides of the tunnel.
5.
Check that the shared secret is the same on both sides of the tunnel.
6.
If your peer VPN gateway is behind one-to-one NAT, ensure that you have
properly configured the NAT device to forward UDP traffic to your peer VPN
gateway on ports 500 and 4500.
410
Chapter 8 ■ Managing Network Operations
7.
If the VPN logs show a no-proposal-chosen error, this error indicates that Cloud
VPN and your peer VPN gateway were unable to agree on a set of ciphers.
For IKEv1, the set of ciphers must match exactly. For IKEv2, there must be at
least one common cipher proposed by each gateway. Make sure that you use
supported ciphers to configure your peer VPN gateway.
8.
Make sure that you configure your peer and Google Cloud routes and firewall
rules so that traffic can traverse the tunnel. You might need to contact your
network administrator for help.
If the issue still persists, use Cloud Logging with the gcloud CLI (or with Logs Explorer) as you learned
in the first section of the chapter. Figure 8-36 shows a list of common Cloud VPN events you may want to
consult for further troubleshoot analysis.
411
Chapter 8 ■ Managing Network Operations
412
Chapter 8 ■ Managing Network Operations
In Figure 8-37, you can find an example. The parent (also referred to as root) span describes the latency
observed by the end user and is drawn in the top part of the figure. Each of the child spans describes how
a particular service in a distributed system was called and responded with latency data captured for each.
Child spans are shown under the root span.
All Cloud Run, Cloud Functions, and App Engine standard applications are automatically traced, and
libraries are available to trace applications running elsewhere after minimal setup.
If your application doesn’t run in any of the aforementioned services, it must be instrumented to
submit traces to Cloud Trace. You can instrument your code by using the Google client libraries. However,
it’s recommended that you use OpenTelemetry to instrument your application. OpenTelemetry is an
open source tracing package, which is actively in development and is the preferred package. For more
information, visit https://opentelemetry.io/.
An alternative approach to instrumenting your application is to use the Cloud Trace API and write
custom methods to send tracing data to Cloud Trace.
Either way, your application will need to be instrumented to create traces and send them to Cloud Trace
for inspection and analysis. The resulting instrumentation code leverages HTTP headers to submit the trace
data to Cloud Trace for further analysis.
Spans are created as needed programmatically. This allows Cloud Trace to explicitly measure the
latency of the services that are part of our application.
To view your application’s traces, you need to use the Google Cloud console.
From the product dashboard, go to the product menu and select Trace.
The trace list shows all the traces that were captured by Cloud Trace over the selected time interval, as
shown in Figure 8-38.
413
Chapter 8 ■ Managing Network Operations
Figure 8-38. Trace list. Portions of this page are reproduced under the CC-BY license and shared by Google:
https://cloud.google.com/trace/docs/finding-traces#viewing_recent_traces
You can filter by label to select only the traces for a given span (e.g., RootSpan: Recv as shown in
Figure 8-39).
Figure 8-39. Filtering traces by label. Portions of this page are reproduced under the CC-BY license and
shared by Google: https://cloud.google.com/trace/docs/finding-traces#filter_traces
You can use this filter to drill down a relevant trace and find latency data in the “waterfall chart”
(Figure 8-40).
414
Chapter 8 ■ Managing Network Operations
Figure 8-40. Waterfall chart for a trace. Portions of this page are reproduced under the CC-BY license and
shared by Google: https://cloud.google.com/trace/docs/viewing-details#timeline
On the top, you can see the root span, and all the child spans are below it. You can see the latency for
each span in the chart to quickly determine the source of the latency in our application overall.
In this section, you learned how to use Cloud Trace to measure and analyze both the overall latency of
user requests and the way services interact. This will help you find the primary contributor to latency issues.
415
Chapter 8 ■ Managing Network Operations
Whether you want to diagnose connectivity issues to prevent outages, improve network security and
compliance, or save time with intelligent monitoring, Network Intelligence Center has the tools you need to
increase the security posture of your applications and infrastructure, whether they be in Google Cloud or in
hybrid and multi-cloud environments.
Exam Questions
Question 8.1 (VPC Flow Logs, Firewall Rule Logs)
Your company is running out of network capacity to run a critical application in the on-premises data center.
You want to migrate the application to GCP. You also want to ensure that the security team does not lose
their ability to monitor traffic to and from Compute Engine instances.
Which two products should you incorporate into the solution? (Choose two.)
A.
VPC Flow Logs
B.
Firewall logs
C.
Cloud Audit logs
D.
Cloud Trace
E.
Compute Engine instance system logs
Rationale
A and B are CORRECT because the requirement is to let the security team
keep their ability to monitor traffic to and from Compute Engine instances.
VPC Flow Logs allows you to monitor a sample of flow logs that reach
and leave each of the VMs in your subnet, provided the subnet has been
configured to enable VPC Flow Logs. Firewall logs allow you to continuously
detect TCP and UDP connection records for each of your firewall rules
associated to your VPC.
C is incorrect because Cloud Audit logs are a specific type of logs that describe
administrative activities and accesses within your Google Cloud resources. They
don’t provide connection details, packet information, and other network-specific
component data, which are required to monitor traffic to and from Compute
Engine instances.
D is incorrect because Cloud Trace is a product intended to measure latency
between the components of your workloads, which is not a requirement in the
question.
E is incorrect because Compute Engine instance system logs are intended for
automated Google Cloud actions that modify the configuration of resources,
which is not a requirement in the question.
416
Chapter 8 ■ Managing Network Operations
A.
Enable logging on the default Deny Any Firewall Rule.
B.
Enable logging on the VM instances that receive traffic.
C.
Create a logging sink forwarding all firewall logs with no filters.
D.
Create an explicit Deny Any rule and enable logging on the new rule.
Rationale
A is incorrect because you cannot enable Firewall Rules Logging for the implied
deny ingress and implied allow egress rules.
B is incorrect because enabling logging on the VMs won’t tell you anything about
allowed or denied connections, which is the requirement.
C is incorrect because a logging sink won’t change the current scenario.
D is CORRECT because to capture denied connections, you need to create a
new firewall rule with --action=deny and --direction=ingress and enable
logging on it with the flag --enable-logging.
Rationale
A is CORRECT because the security admin role (roles/securityAdmin)
gives you permissions to create, modify, and delete firewall rules and SSL
certificates and also to configure Shielded VM settings.
B is incorrect because the service project admin by design doesn’t give you
permissions to manage network infrastructure—including firewall rules—in the
host project, where the shared VPC is located.
C and D are incorrect because the specified IAM roles are both overly permissive.
417
Chapter 8 ■ Managing Network Operations
Rationale
A is incorrect because the compute.networkUser role doesn’t give you
permissions to manage networking resources.
B is CORRECT because compute.networkAdmin is the role with the least set
of privileges that gives you permissions to manage network resources, and
it lets you read firewall rules. If you need permissions to modify firewall
rules (and SSL certificates), then the compute.securityAdmin role should be
granted as well.
C is incorrect because the permission compute.firewalls.list lets you list firewall
rules but not read them.
D is incorrect because the compute.networkViewer role won’t let you modify
network components.
418
Chapter 8 ■ Managing Network Operations
Rationale
A is CORRECT because the no-proposal-chosen error indicates that Cloud
VPN and your peer VPN gateway were unable to agree on a set of ciphers.
For IKEv1, the set of ciphers must match exactly. For IKEv2, there must be at
least one common cipher proposed by each gateway. Make sure that you use
supported ciphers to configure your peer VPN gateway.
B, C, and D are incorrect because the issue is a VPN misconfiguration.
419
Index
A
BigQuery datasets, 212, 377
BIND zone file format, 296, 331
Accessible services, VPC, 192–194 Border Gateway Protocol (BGP), 45, 122, 345
Access level, 184, 185 attributes, 363
creation, 187, 188 Cloud Router, 362
policy, 186, 187 custom route, 367
updating, 202 establishing, 358, 359
Advertised routes, 129, 130 failed session, 411
Alias IP integrated routing, 138–140 iBGP, 412
Amazon Web Services, 47 invalid value, 412
Andromeda, 259 IP addresses, 342, 365, 366, 412
Apache web server, 329 route advertisements, 366
Application programming interfaces (APIs), Bring Your Own IP (BYOIP), 32
9, 70–72 Built-in service accounts, 142
configuring, 112, 113 Business unit (BU), 111
host projects, 102, 140
service projects, 102, 140
Attributes, 184, 185, 270, 409 C
Audit logging, 197 Cache
Authoritative DNS resolution, 16, 17, 302 data store, 273
Authorized networks, 168–171 egress, 274
Auto-mode VPCs, 89, 90 fill, 274
Autonomous system number (ASN), 123, 363 hit, 274
Autoscaling, 265–268, 290 invalidation, 279
Availability, 11, 13, 15 keys
enabling Cloud CDN, 277
B
host, 277, 278
HTTP headers, 278
Backend-cluster, 150, 151, 156, 157 named cookies, 278
Backend services, 217, 218 protocol, 277, 278
Cloud Armor, 270, 271 query string, 277, 278
external HTTP(S), 223, 224 responses, 275, 276
health checks, 221–223 Capacity-based routing, 255
target HTTP(S) proxy, 229, 230 Capacity scaling/scaler, 223, 224, 266
Balancing method, 132, 223, 224 “Check Access” button, 394, 398
Balancing mode, 230, 266, 267 Circuits, 335, 340
bandwidth flag, 342 Classless Inter-domain Routing (CIDR) blocks,
Bandwidth, 53–55, 315 30–32, 79, 80, 86, 167–169, 197, 224
Bidirectional Forwarding Detection (BFD) Cloud Armor, 217
reliability, 369 adaptive protection, 269
resilience, 367–369 backend services, 270, 271
security, 369 custom rules language, 270
© Dario Cabianca 2023
D. Cabianca, Google Cloud Platform (GCP) Professional Cloud Network Engineer Certification 421
Companion, Certification Study Companion Series, https://doi.org/10.1007/978-1-4842-9354-6
■ INDEX
422
■ INDEX
423
■ INDEX
424
■ INDEX
425
■ INDEX
426
■ INDEX
427
■ INDEX
428
■ INDEX
429