GCP ACE Notes
GCP ACE Notes
GCP ACE Notes
GCP Page 1
GCP Page 2
GCP Page 3
Different services breadth
Monday, February 1, 2021 10:49 PM
• Compute
a. Compute Engine (GCE):
i. Is a Zonal service
ii. Provides fast booting VMs on demand for rent
iii. Normal instances are expensive but become cheaper with the 'sustained use
discount' if used long enough
iv. Pre-emptive instances are even cheaper, but can be deleted at any time by
Google
b. Kubernetes Engine (GKE):
i. Used to be called Google cloud engine
ii. Provides container orchestration with autoscaling
iii. Is a Regional resource
iv. Doesn't integrate with GCP IAM
c. App engine (GAE):
i. Platform as a Service
ii. Is a Regional resource
iii. Auto scales based on load
d. Cloud functions (GCF):
i. Runs code in response to an event in :Node.js, python, java, go
ii. Function as a Service (or serverless)
iii. Is a Regional resource
iv. Pay for CPU and RAM assigned to a function per 100ms (minimum)
v. Each function automatically gets an HTTP endpoint
vi. Can be triggered by GCS, PUB/SUB, etc
vii. Auto horizontal scaling-Runs as many copies as needed to handle the load
viii. Use cases are typical apps who's usage can vary depending on requirement ex:
Chat boxes, message processors, Iot, automation etc
• Storage
a. Local SSD:
i. Is Zonal resource I.e attached to any instance
ii. Ephemeral in general
iii. Pay for per GB
b. Persistent disk:
i. Is Zonal resource but can be regional too in certain cases
ii. Flexible, block based, network attached storage
iii. Can be resized while still in use
iv. Can be shared among different VMs, but only in read only mode
c. Cloud filestore:
i. Fully managed file based storage
ii. Comparable to a NAS
iii. Is zonal is reachable to every instance in that zone's VPC
iv. Primary used for application migration to cloud
d. Google cloud storage:
i. Can be regional or multi-regional
ii. Versioning can also be done wherein GCS stores different versions of the same
object
iii. Has integrated site hosting and CDN functionality too
iv. Lifecycle includes: multi-regional, regional, nearline, coldline
GCP Page 4
iv. Lifecycle includes: multi-regional, regional, nearline, coldline
v. Pay for data operations and GB stored
vi. Nearline and coldline: Also pay for GBs retrieved, plus early deletion fee if
<30/90 days respectively
• Databases
a. Cloud SQL:
i. Reliable mysql, sql server and postgres db
ii. Is a Regional resource
b. Cloud spanner:
i. Horizontally scalable, strongly consistent, relational db
ii. Scales from 1 to 1000 nodes
iii. Regional, Multi-regional and Global resource
iv. Used mostly for big production uses and cost can go to $1000s with more
nodes
c. BigQuery:
i. Serverless column store data warehouse for analytics using SQL
ii. Scales internally to handle petabyte scale data
iii. Caches results for 24hrs for free, if needs to be re-used
iv. Gets cheaper if the table is not modified for 90days, with read only operations
v. Can streaming inserts from streaming data as well
d. BigTable:
i. Low latency and high throughput NoSQL db for large operational analytics apps
ii. Supports open source HBase api
iii. Is zonal resource
iv. Scales seamlessly and unlimitedly
v. Made for huge workloads
vi. Can be cheaper if using hdd instead of ssd
e. Datastore:
i. Managed and autoscaled NoSql with indexed, queries and acid transactions
ii. Has some in-built indexes and supports manual custom indexes
iii. Regional and multi-regional resource
iv. Pay for gb of data used, and data IOs done on the db
f. Firebase realtime db and cloud firestore:
i. Firebase is Zonal resource. Firestore is Multi-regional resource
ii. NoSql document store with ~real time client updates via managed web sockets
(which can complicated in AWS)
iii. Firestore has collections, documents and contained data. It revolves around 1
huge DB is central america, and its data revolves around it
• Data transfer
a. Data transfer appliance:
i. Rackable, high-capacity server to physically ship data to GCS
ii. 100TB/400TB viz faster than network transfer
iii. Is global resource
b. Storage transfer service:
i. Copies objects so you need to setup machines
ii. Source could be anything-S3, http endpoints, any other GCS bucket.
Destination is always a GCS bucket
iii. Free to use but have to pay for actions
iv. Is global resource
• External networking
a. Google domains
i. Google's registrar for getting domain names
ii. Global resource. Gives private whois records. Supports DNSSEC
GCP Page 5
ii. Global resource. Gives private whois records. Supports DNSSEC
b. Cloud DNS
i. Scalable, reliable, managed DNS service
ii. Build and use custom nameserver service
iii. 100% uptime guarantee. Mostly for private and public managed zones-> so its
only accessible for a particular VPC if required or split to internet.
iv. Pay fixed fee per zone or to distribute DNS around world.
c. Static IP:
i. Reserve static ip addr to assign to resources
ii. Regional IPs used for GCE and network balancers
iii. DNS is preferred over static
iv. Pay if hoard static ip after a while
d. Load balancer
i. High perf, scalable traffic distributed integrated with autoscaling and cloud cdn
ii. Build on andromeda sdn. Handles spikes without pre-warming.
iii. No need to build instances or devices
iv. Can be regional or global
v. Pay by making ingress traffic billable
e. Cloud CDN
i. Low latency content delivery based on HTTP(s) CLB integrated with GCE and
GCS
ii. Supports HTTP/2 and HTTPS but no custom origins, only supports GCP
iii. Simple to integrate with load balancer. Caches results to improve latency
iv. Pay per http(s) request volume. You also pay if you to have cache invalidation
request, to handle updating caches automatically. Hence good cache strategy
can help
• Internal Networking
a. VPC
i. Global IPv4 SDN for GCP
ii. Automatic mode is easy, custom gives control
iii. VPC is global but subnets are global
b. Cloud Interconnect
i. Connect external networks to google's
ii. Private connection vpc via vpn or dedicated/partner interconnect. It also gives
some SLAs
iii. Dedicated interconnect doesn't encrypt the data as opposed to vpn
iv. Public google services accessible via External peering (no SLAs!)
1. Direct peering for high volume
2. Carrier peering via a partner for lower volume
v. Significantly low ingress fee
c. Cloud VPN
▪ Ipsec vpn using public internet for low-data volume
▪ For persistent static connections
▪ Pay per hour for tunnel usage
d. Dedicated interconnect
i. Direct physical link from vpc to on-prem
ii. VLAN is private connection to vpc in 1 region, no public gcp apis
iii. Private links but not encrypted, have to add your own
iv. Pay for per 10Gbps link vlan
v. You however pay reduced egress rates through this
e. Cloud Router
i. Dynamic routing for hybrid linking vpc to external networks
ii. Free to setup but vpc egress charges
GCP Page 6
ii. Free to setup but vpc egress charges
f. CDN Interconnect
i. Connect vpc to external CDNs not GCP's cdn
ii. Free to enable but pay for egress
• Machine learning
a. Cloud ML engine
i. ML models and making predictions
ii. Based on tensorflow
iii. Enables apps/dev to use tensorflow on any size of dataset
iv. Integrates with gcs, datalab, dataflow
v. HyperTune automatically tunes models to avoid manual tweaking
vi. Pay per hour for training
vii. Pay per provisioned node-hour plus prediction request volume made
b. Cloud vision
i. Image processing
ii. Already pre-trained model
iii. Pay per image, based on detection features requested
c. Cloud speech
i. Auto speech recog. with pre-trained models for 110 languages
ii. Pay per 15sec audio that is processed
d. CloudNatural language
i. Analyze text for sentiment, intent, content classification and extracts info
ii. Good to use with speech api, vision and translation
iii. cost depends on what analysis type and is charged per 1000 characters
e. Cloud translation
i. Translate semantics, auto detect source language
ii. Helps support non-native language
iii. Cost is per character processed
f. Diagflow
i. Build conversational interfaces for websites, mobile apps, messaging apps , iot
ii. Has pre-trained model and service for accepting, parsing, lexing input and
responding
iii. Add custom code to handle chat bots or use pre-built agents
iv. Free plan has unlimited chat but limited voice responses. Paid is premium and
has non limits
g. Cloud video intelligence
i. Annotates videos in gcs with info about what they contain ex. Offensive
content
ii. Pay per minute video is processed
h. Cloud job discovery
i. Helps career sites, company job boards, to improve engagement and
conversion
ii. Integrates with job/hiring systems
• Big data and IoT
GCP Page 7
a. Cloud IoT core:
i. Managed service to connect, manage, and ingest data from devices globally
ii. Device manager handles device identity, auth, config or control
iii. Connect securely using MQTT or HTTPS
iv. Can publish telemetry to cloud pub/sub for processing
v. 2-way device communication handles configuration and updates
vi. Pay per MB of data exchanged
b. Cloud Pub/Sub
i. Infinitely-scalable messaging for ingestion, decoupling
ii. Allows subscribers to read based on Topics
iii. Is global resource, and subscribers can be load balanced worldwide
iv. Push mode delivers to https endpoints and succeeds on http success status
code
1. Slow-start algo ramps up on success and back off & retries, on failures
v. Pull mode delivers messages to requesting clients and waits for ack to delete
1. lets client set rate of consumption and supports batching and long-polling
c. Cloud data prep
i. Visually explore, clean and prepare data for analysis without running servers
ii. Data wrangling for business analytics not IT pros
iii. managed version of trifacta wrangler and manager by it not Google
iv. Source data from GCS, BQ or file upload-formatted in json,csv
v. Pay for underlying dataflow job, plus management overhead
d. Data proc
i. Batch map reduce processing via configurable, managed spark and hadoop
cluster
ii. Handles scale even while running jobs
iii. Switches between versions of spark, hadoop and others
iv. Best for moving existing spark/hadoop setup to gcp, but you should use Data
flow for processing new data pipelines
e. Data flow
i. Smartly autoscaled and fully managed batch or stream map reduce like
processing
ii. Autoscales and dynamically redistributes lagging work, mid-job to optimize run
time
GCP Page 8
time
iii. Data flow shuffle service for batch offload. Shuffle ops from workers for big
gains
iv. Pay for per second vCPUs, ram, ssd and is extra for data flow shuffle mode
f. Datalab
i. Tool for data exploration, analysis, visualization and ML
ii. Similar to jupyter notebooks
iii. Pay for gce, gae instance storing your notebook
g. Data studio
i. Big data visualization tool for dashboards
ii. No charge for this
h. Genomics
i. Store and process genomes
ii. Query complete genomic info or large projects in secs
iii. Requester pays sharing model for pricing
• Identity and access (Core Security)
a. Roles
i. Collections or permissions to use or manage gcp resources
b. IAM
i. Controls access to gcp resources: authorization but not authentication really
ii. Member is user, group, domain, service account, or the public (I.e all users)
iii. Policies bind members to roles at hierarchy level
c. Service accounts
i. Account that represents apps not users
ii. Always use service accounts rather than user accounts or api keys, for most
development work
iii. Use cloud platform managed keys for most gcp resource usage
d. Identity
i. Gsuite or gcloud type service
e. Security key enforcement
i. Usb or bluetooth 2-step verification
f. Resource manager
i. Centrally manage and secure organization's project
ii. Is root node in hierarchy
g. Cloud IAP
i. Guards apps running on gcp via identity verification, not VPN access
ii. Based on clb and iam
iii. Grant access to any iam, group or service accounts
h. Cloud audit logging
i. Maintains non-tamperable audit logs for each project and organization
ii. Admin activity-400 days retention
iii. Access transperancy-400 day
1. shows actions by google staff
iv. data access 30 day retention
1. for gcp visible services
• Monitoring and responding
a. Cloud armor
i. Edge level protection from DDoS and other on global https load balancer
ii. Manage IPs with cidr based allow/block lists
iii. You can preview effect of changes before turning on
iv. Pay monthly charge per policy and rule
b. Security scanner
i. Free but limited GAE app vulnerability scanner
GCP Page 9
i. Free but limited GAE app vulnerability scanner
ii. Can identify cross-site-scripting, flash injection, mixed content, outdate
libraries
c. Cloud DLP
i. Finds and optionally redacts sensitive info in unstructured data streams
ii. Helps minimize what you collect, expose or copy to other systems
iii. 50+ sensitive data collectors including card number, ssn, paspport# etc
iv. Can scan text and images
v. Pay for amount of data processed per GB and get cheaper with volume
d. Event threat detection
i. Automatically scans stackdriver logs for suspicious activity
ii. Uses threat intelligence including safe browsing
iii. Can detect malware, cryptominig, outgoing DDoS attacks, brute-force ssh
iv. Can monitor access to gcp resources via abusive IAM access, or look for
patterns out of the normal
v. Integrates with Stackdriver, BQ, Pub/sub, etc for more analysis
e. Cloud SCC
i. Security info and event mgmt (SIEM)
ii. Can integrate with other vendor SIEM products
iii. Helps you prevent, detect, and respond to threats from a single pane of glass
iv. Integrates with ETD, scanner, DLP
v. Free to use. Can alert when needed
• Encryption key mgmt
a. Cloud KMS
i. Manage and use crypto keys
ii. Supports symmetric (AES) and asymm (RSA)
iii. Move secrets out of code and into env
iv. Integrated with IAM and Audit logging
v. Rotate keys for new encryption either automatically or on demand
vi. Keeps older keys to decrypt older keys'
b. Cloud HSM
i. Cloud KMS keys managed by FIPS 140-2
ii. Device hosts encryption leus and performs crypto ops
iii. KMS uses HSM to store keys
iv. Same apis and features as KMS
• Operations and Management Services-Stackdriver
a. Stackdriver
i. Family of services for monitoring, logging and diagnosing apps
ii. Can work on GCP, AWS, Hybrid etc
iii. Simple usage based pricing
b. SD Logging
i. Store, search, analyze, monitor and alert on log data and events
ii. Debug issues via integration with SD monitoring, trace and error reporting
iii. Create real time metrics from log data
c. SD Error reporting
i. Counts, analyzes, aggregates errors and crashes
ii. Groups errors in groups automatically
iii. Links notifications to errors and gives time charts, occurrences, affects user
counts etc
iv. Exception stack trace parser knows java, python, js, ruby, c#, php, go
d. SD Trace
i. Tracks and displays tree and timings across distributed systems
ii. Automatically captures from GCE
GCP Page 10
ii. Automatically captures from GCE
iii. Detects app latency shift (degradation) over time by evaluating perf results
e. SD Debugger
i. Grabs program state in live deploys with low deploy
ii. Logpoints repeat for upto 24hr, fuller snapshots run once but can be
conditional
iii. Source view supports cloud repos, but you can also upload
iv. Supports java and python on gce, gae,gke. Go only on gcp,gke
v. Auto enabled for gce, but agents can be installed
vi. Is free to use
f. SD Profiler
i. Continuous cpu and memory profiling to improve perf and reduce cost
ii. Low overhead Max<5% typical overhead
iii. Is Agent based, which need to be installed
g. Deployment manager
i. Create/manage resources via declarative templates
ii. Supports yaml, python, jinja2
iii. Supports input/output parameters with json schema
h. Cloud billing api
i. Programmatically manage billing for projects and pricing
ii. List billing accs, enable/disable accs, list billable SKUs, get public pricing
iii. Wont show current bill. You would have to export it manually.
• Development and APIs
a. Cloud source repositories
i. Similar to github
ii. No enhances support like PULL requests like github
iii. Integrates with SD debugger
iv. Pay per month per usage per user
b. Cloud Build
i. Takes source code to build, test and deploy it-CI/CD service
ii. Similar to Jenkis
iii. Free and runs in parallel (upto 10 at a time)
iv. Plus scans for package vulnerabilities
v. Docker: simple build and push functions
vi. Json and Yaml supported
vii. Pay per min of build time->after you go over 120mins in a day
c. Container Registry
i. Fast, private and docker image storage with docker v2 api
ii. Creates and manages multi-regional gcs bucket, then translates gcr calls to gcs
iii. Integrates with cloud build and sd logs
d. Cloud Endpoints
i. Handles authorization, monitoring, logging and API keys like nginx
ii. Proxy instances are distributed and hook into cloud load balancer
iii. Super fast <1ms
iv. Handles auth and logging
v. Handles gRPC and can transcode HTTP
vi. Pay per call to your API
e. Apigee
i. Enterprise API management platform for whole API lifecycle
ii. Transform calls between protocols SOAP, REST, XMS, Binary
iii. Authenticates via OAuth or role based
iv. Throttle traffic with quotas, manage API versions etc
v. Enterprise grade and bit expensive
GCP Page 11
v. Enterprise grade and bit expensive
f. Test lab for Android
i. Cloud infra for running test matrix across REAL android devices
ii. Production grade devices flashed with real Android versions
iii. Automatic, can add custom scripts, monitor progress
GCP Page 12
GCP commands (highlighted are only in gsutil cli not gui)
Thursday, December 12, 2019 8:36 PM
Command Description
gcloud auth login Authorize with a user account without setting up a configuration.
gcloud auth activate-service-account Authorize with a service account instead of a user account.
Useful for authorizing non-interactively and without a web browser.
gcloud config [COMMAND] Create and manage Cloud SDK configurations and properties.
gcloud config configurations [COMMAND]
• When you install the SDK, the following components are installed by default: [ref]
ID Name Description
gcloud Default gcloud CLI Tool for interacting with Google Cloud. Only commands at the General Availability
Commands and Preview release levels are installed with this component. You must separately
install the gcloud alpha Commands and/or gcloud beta Commands components if
you want to use commands at other release levels.
bq BigQuery Command-Line Tool for working with data in Google BigQuery
Tool
gsutil Cloud Storage Tool for performing tasks related to Google Cloud Storage.
Command-Line Tool
core Cloud SDK Core Libraries Libraries used internally by the SDK tools.
GCLOUD Commands:
Basic syntax for gcloud follows: gcloud <global flags> <service/product> <group/area> <command> <flags>
<parameters>
Global flags: For all commands in the gcloud CLI, users can specify a number of global flags to modify command
behavior. The --account <account>, --project <project id>, --billing-project <billing acc> and --configuration flags allow
you to override the current defaults for your environment. These are useful to avoid modifying or switching
configurations to run a few quick commands. Other global flags including --quiet, --flatten, --format (ex: json,yaml,csv
etc), and --verbosity allow users to modify the output from running commands—often useful when running scripts or
debugging operations.
GCP Page 13
a. gcloud services enable <api name > : Enable api
4) gcloud compute ssh --project=<PROJECT_ID> --zone=<ZONE VM_NAME> : SSH into vm from gcloud
5) gcloud config set compute/region REGION : Set region for gcloud commands
6) gcloud compute regions list: A list of regions can be fetched
7) gcloud config unset compute/region: To unset the property
8) gcloud compute project-info describe : gcloud compute project-info describe displays all data associated with the
Compute Engine project resource. The project resource contains data such as global quotas, common instance
metadata, and the project's creation time.
• gcloud config set : sets the specified property in your active configuration only. A property governs the behavior of a
specific aspect of Cloud SDK such as the service account to use or the verbosity level of logs. To set the property across
all configurations, use the --installation flag. For more information regarding creating and using
• Cloud Shell provisions 5 GB of free persistent disk storage mounted as your $HOMEdirectory on the virtual machine
instance. This storage is on a per-user basis and is available across projects. Unlike the instance itself, this storage does
not time out on inactivity. All files you store in your home directory, including installed software, scripts and user
configuration files like .bashrc and .vimrc, persist between sessions. Your $HOME directory is private to you and cannot
be accessed by other users.
Gcloud help compute instances create : To get help filling out command flags
GSUTIL Commands:
gsutil uses the prefix gs:// to indicate a resource in Cloud Storage:
Ex: gs://BUCKET_NAME/OBJECT_NAME
1) Gsutil -> used to connect to gcp storage
2) Gsutil mb -l northamerica-northeast1 gs://<bucket name>: To create bucket in northeast region
3) Gsutil label get gs://<storage name>: Get bucket labels in json format
a. Gsutil label get gs://<storage name> > test.json : Save bucket labels to json file
GCP Page 14
a. Gsutil label get gs://<storage name> > test.json : Save bucket labels to json file
b. Gsutil label set test.json gs://<storage name>: Set labels for bucket from json file
4) Gsutil label ch -l "extralabel:extravalue" <bucket storage name>: Change label. While you can do a 'get' to pull the
bucket labels in json format, edit it and set it back to the bucket with 'set', changing/adding individual labels is
better this way.
5) Gsutil versioning get <bucket name>: get if versioning is enabled or not
a. Gsutil versioning set on <bucket name>: enable versioning. This can be done only through the cli or api.
Object versioning will make sure that the bucket doesn't lose any objects due to running the delete command
or if we add a new version of the object
6) Gsutil ls -a <bucket name>: list all files and archive files too
7) Only way to delete object is to remove it by name
8) Gsutil cp <bucket name>/** <dest>: copy all files. Use -r to copy AND maintain file structure
9) Gsutil acl ch -u AllUsers:R <file location inside bucket>: Make objects available to all users in public internet. Ch
says change, -u says users, :R says Read only access.
Gcloud Configurations:
Configurations are a group of settings or properties. 'Gcloud config init' initializes the initial base configuration of a
project and is helpful across multiple projects. You can create configuration commands using 'gcloud config create' and
to set common properties (this is different from gcloud config configurations list, which shows all configurations). But
you need to activate the configuration to switch to it. 'Gcloud config list' will list config properties of the current confi g.
IS_ACTIVE column in the configuration list shows which configuration is currently being used.
GCP Page 15
Stackdriver, Operations and mgmt
Tuesday, January 14, 2020 11:00 PM
Stackdriver agents can be used for monitoring and logging and diagnosing on gcp or even aws.
The stackdriver agent pushes logs to stackdriver. To authenticate the transactions, the SD agent needs to use an
authentication token that the SD console can accept.
Stackdriver never pulls logs from instances, instances push data to stackdriver. The host project stores Stackdriver's
metadata.
Stackdriver always requires a host project to run on. If you use a single project you can make it the host project. However
if you want to run stackdriver on multiple projects, you should create an empty project just to run stackdriver.
You can export logs by creating one or more sinks that include a logs query and an export destination. Supported
destinations for exported log entries are Cloud Storage, BigQuery, and Pub/Sub.
GCP Page 16
○ You can refine the scope of the logs displayed in the Logs Explorer through the Refine scope panel. You have
the option to only search logs within the current project or to search logs based on one or more storage views.
○ Logs-based metrics are Cloud Monitoring metrics that are based on the content of log entries. For example,
the metrics can record the number of log entries containing particular messages, or they can extract latency
information reported in log entries. You can use logs-based metrics in Cloud Monitoring charts and alerting
policies. Logs-based metrics are calculated from both included and excluded logs.
○ Logs-based metrics apply only to a single Google Cloud project. You cannot create them for logs buckets or for
other Google Cloud resources such as Cloud Billing accounts or organizations.
○ System (logs-based) metrics are predefined by Logging. These metrics record the number of logging events
that occurred within a specific time period. These metrics are free of charge and available to all Google Cloud
projects.
○ User-defined (logs-based) metrics are created by a user on a Google Cloud project. They count the number of
log entries that match a given filter, or keep track of particular values within the matching log entries. When
you create a filter for the log entries that you want to count in your metric, you can use regular expressions.
○ Metric types include: Counter (All system based are. Counts number of entries matching your filter) and
Distribution (gives count, mean, std dev)
• SD monitoring (now Google Cloud Monitoring):
○ Performance, uptime and health of cloud apps. Can log and alert
○ By default monitors: CPU utilization, basic disk traffic information, uptime info and network traffic. Additional
metrics require the collectd based daemon agent to be installed.
○ Cloud Monitoring uses Workspaces to organize and manage its information. A Workspace can manage the
monitoring data for a single Google Cloud project, or it can manage the data for multiple Google Cloud projects
and AWS accounts. However, a Google Cloud project or an AWS account can only be associated with one
Workspace at a time.
○ After installing the Monitoring agent, you can monitor supported third-party applications by adding
application-specific collectd configurations.
○ The following services have the agent pre-installed: App engine, Dataflow, Dataproc, GKE uses cloud
operations for GKE.
○ Edit the Monitoring agent configuration file /etc/stackdriver/collectd.conf and then restart it (to make any
changes).
○ Custom metrics. Collectd metrics that have the metadata key stackdriver_metric_type and a single data source
are handled as custom metrics and sent to Monitoring using the projects.timeSeries.create method in the
Monitoring API.
• Sd error reporting (Cloud Error Reporting):
○ Counts, analyzes, aggregates, and tracks crashes in a central interface. Alerts when new error cannot be
grouped with existing ones. Can understand java,python,go,c#,php and ruby
○ You can report errors from your application by sending them directly to Cloud Logging with proper formatting
or by calling an Error Reporting API endpoint that sends them for you.
○ You need the google-fluentd library to call logger api from your code.
○ Modify your application so that it logs exceptions and their stack traces to Logging. Choose a log name for your
error information to keep it separate from other logged information.
○ Error logging is automatically available for app engine, cloud functions and cloud run. GKE needs agent
installed.
• SD trace (or Cloud trace):
○ Tracks and displays call tree and timings across systems. Can capture traces from apps on java, nodejs, ruby
and go. Can detect latency issues over time by evaluating performance reports on apps.
○ Cloud Trace provides distributed tracing data for your applications.
○ Trace allows us to analyze calls made from any source ex: http requests from anywhere, can be better
analyzed for latency using trace.
GCP Page 17
analyzed for latency using trace.
○ After instrumenting your application, you can inspect latency data for a single request and view the aggregate
latency for an entire application in the Cloud Trace console.
○ Cloud Trace recommends using OpenTelemetry. OpenTelemetry is an open-source product from the merger
between OpenCensus and OpenTracing.
○ For your application to submit traces to Cloud Trace, it must be instrumented. You can instrument your code
by using the Google client libraries (except for app engine):
○ Use OpenTelemetry and the associated Cloud Trace client library. This is the recommended way to instrument
your applications.
○ Use OpenCensus if an OpenTelemetry client library is not available for your language.
○ Use the Cloud Trace API and write custom methods to send tracing data to Cloud Trace.
○ Cloud Trace doesn't sample every request, this is customizable. However you can force it to trace a curl you
make, by adding the header "X-Cloud-Trace-Context:TRACE_ID/SPAN_ID;o=TRACE_TRUE" to your curl
command
• SD debugger (now Cloud debugger):
○ Grabs program state in live deploys. Can be used to debug issues on apps at any specific times. Source view
supports logs from cloud source repos, github, bitbucket, local machine traces being uploaded.
○ Cloud Debugger is a feature of Google Cloud that lets you inspect the state of an application at any code
location without using logging statements and without stopping or slowing down your applications.
○ The gcloud debug command group provides interaction with Cloud Debugger, allowing you to list and
manipulate debugging targets, snapshots and logpoints.
○ Import the 'google-python-cloud-debugger' library in your python code to add snapshots and logpoints to your
code.
▪ Debug Snapshots: After you have deployed or started your app, you can open Cloud Debugger in the
Google Cloud Console. Debugger allows you to capture and inspect the call stack and local variables in
your app without stopping or slowing it down. After you set a snapshot, the Debugger agent tests the
snapshot on a subset of your instances. After the Debugger agent verifies the snapshot can execute
successfully, the snapshot is applied to all your instances. This takes about 40 seconds.
▪ Debug Logpoints: Logpoints allow you to inject logging into running services without restarting or
interfering with the normal function of the service. Every time any instance executes code at the logpoint
location, Cloud Debugger logs a message. Output is sent to the appropriate log for the target's
environment.
• SD profiler (now Cloud Profiler):
○ Watch app cpu and memory usage realtime to improve perf of apps. Agent based wherein the agent send data
to profiler to explore for 30 days, and is free
○ Needs the agent installed: 'google-cloud-profiler'
○ Import the googlecloudprofiler module and call the googlecloudprofiler.start function as early as possible in
your initialization code.
○ The information provided shows which consumes the most CPU time and Wall Time (state-end time)
○ Cloud Trace is not supported on Google Cloud Storage. Stackdriver Trace runs on Linux in the following
environments: Compute Engine, Google Kubernetes Engine (GKE), App Engine flexible environment, App
Engine standard environment.
• Cloud billing api: Manage billing for gcp projects. List of accounts, details on each project, change account for each
project.
GCP Page 18
•
for network monitoring, forensics, real-time security analysis, and expense optimization.
Flow logs are aggregated by connection, at 5-second intervals, from Compute Engine VMs and exported in real time.
By subscribing to Cloud Pub/Sub, you can analyze flow logs using real-time streaming APIs.
• If you want to adjust log sampling and aggregation, click Configure logs and adjust any of the following:
○ Aggregation interval: whether or not to Include metadata in the final log entries
○ By default, Include metadata only includes certain fields. Refer to Customizing metadata fields for details. To
customize metadata fields, you must use the gcloud command-line interface or the API.
○ the Sample rate. 100% means that all entries are kept.
GCP Page 19
Logging and cloud audit
Tuesday, February 9, 2021 2:45 PM
GCP Page 20
About the Logging agent
In its default configuration, the Logging agent streams logs from common third-party
applications and system software to Logging; review the list of default logs. You can configure
the agent to stream additional logs; go to Configuring the Logging agent for details on agent
configuration and operation.
It is a best practice to run the Logging agent on all your VM instances. The agent runs under
both Linux and Windows. To install the Logging agent, go to Installing the agent.
If you are running specialized logging workloads that require higher throughput and/or
improved resource-efficiency compared to the standard Cloud Logging agent, consider using
the Ops agent.
The logging agent works on both GCP instances and AWS EC2 instances. You must allow internal
routing in your gcp instance if the no external routing is supported.
The following VM instances support Logging using their own software, so manually installing
the Logging agent on them is not supported:
• App Engine standard environment instances. App Engine includes built-in support for
Logging. For more information, go to Writing application logs.
• App Engine flexible environment instances. Apps running in the App Engine flexible
environment can write logs that are in addition to what is included in the App Engine
standard environment. For more information, go to Writing application logs.
• Google Kubernetes Engine node instances. You can enable Cloud Operations for GKE, an
integrated monitoring and logging solution, for your new or existing container clusters.
• For instances running on Anthos clusters on VMware, the agent collects system logs but
doesn't collect application logs.
• Cloud Run container instances. Cloud Run includes built-in support for Logging. For more
information, go to Logging and viewing logs.
• Cloud Functions HTTP and background functions. Cloud Functions includes built-in support
for Logging. For more information, go to Writing, Viewing, and Responding to Logs.
All logs, including audit logs, platform logs, and user logs, are sent to the Cloud Logging API
where they pass through the Logs Router. The Logs Router checks each log entry against
existing rules to determine which log entries to ingest (store), which log entries to include in
exports, and which log entries to discard.
Exporting involves writing a filter that selects the log entries you want to export, and choosing a
destination from the following options:
• Cloud Storage: JSON files stored in Cloud Storage buckets.
GCP Page 21
• Cloud Storage: JSON files stored in Cloud Storage buckets.
• BigQuery: Tables created in BigQuery datasets.
• Pub/Sub: JSON messages delivered to Pub/Sub topics. Supports third-party integrations,
such as Splunk, with Logging.
• Another Google Cloud Cloud project: Log entries held in Cloud Logging logs buckets.
The filter and destination are held in an object called a sink. Sinks can be created in Google
Cloud project, organizations, folders, and billing accounts.
Access control
To create or modify a sink, you must have the Identity and Access Management roles Owner or
Logging/Logs Configuration Writer in the sink's parent resource. To view existing sinks, you
must have the IAM roles Viewer or Logging/Logs Viewer in the sink's parent resource. For more
information, go to Access control.
To export logs to a destination, the sink's writer service account must be permitted to write to
the destination.
GCP Page 22
without logging into Google Cloud.
• To view these logs, you must have the IAM roles Logging/Private Logs Viewer or
Project/Owner.
• Data Access audit logs-- except for BigQuery Data Access audit logs-- are disabled by
default because audit logs can be quite large. If you want Data Access audit logs to be
written for Google Cloud services other than BigQuery, you must explicitly enable them.
Enabling the logs might result in your Cloud project being charged for the additional logs
usage.
GCP Page 23
and set the sink's includeChildren parameter to True. That sink can then export log
entries from the organization or folder, plus (recursively) from any contained folders,
billing accounts, or projects. You can use the sink's filter to specify log entries from
projects, resource types, or named logs.
The supported destinations for sinks are the following:
• A Cloud Storage bucket
• A Pub/Sub topic
• A BigQuery table
• A Cloud Logging bucket
GCP Page 24
• Sinks. Cloud Logging compares the log entry against a sink’s filter to determine whether to
route the log entry to the sink's destination. Matching log entries are then compared
against the sink's exclusion filters to determine whether to discard the log entry or to
route it to the sink's destination. Logs sinks can be used to route log entries to supported
destinations.
• Exclusions. By default, every project has a _Default logs sink that routes all logs to be
stored in a _Default logs bucket in Cloud Logging. Logs exclusions control the exclusion
filters for the _Default log sink and can be used to prevent matching logs from being
stored in Cloud Logging by default.
GCP Page 25
Billing budgets and alerts
Tuesday, August 18, 2020 5:26 PM
• When creating a project through gcloud, it does not automatically link the project to
your billing account; you can use gcloud beta billing to do that.
1) Billing accounts:
A billing acct is a type of resource that stays at the organization level, not project level. Hence
billing accounts don’t affect projects directly.
Consists of payment methods to link to a project. Each project will have 1 billing acc. You can
change billing accounts. 2 types of billing accounts
A) Self served: Charged when you hit your acc or 30 days
B) Invoiced: For organizations
Billing account user can only link projects to billing accounts (from the org or billing acc
perspective), but project billing manager can link and unlink projects from billing accounts
(from org or project level perspective).
To change the billing account of a project: Use Project Owner or Project Billing Manager on the
project, AND Billing Account Administrator or Billing Account User for the target Cloud Billing
account.
Both the "Owner" role and the "Project Billing Manager" role each grant the same billing
permissions--and these are exactly the
resourcemanager.projects.createBillingAssignment permission
2) Budgets and alerts: Can link budget alerts to individual projects or individual billing accounts.
If attached to a billing account, it will alert when the budget for the total account is reached,
not each project. Can create alerts based on billing history too (previous month bill). Billing
alerts are sent to the billing admin when thresholds are hit. You can use pub/sub to interact
with these alerts.
3) Billing exports: You can export billing data for better understanding using either BigQuery
export or File Export. You can file store it in csv or json format. You can also automatically have
it store a file per day into the storage bucket, so everyday new files would be added. However
billing exports are NOT real time updated, so there can be a delay of even few hours between
action and export.
Billing export has to be setup for each billing acount.
Billing data in BigQuery can be done using BQ in the project or even better a BQ in another
project. This way the resources are separate but better organized. Billing export is not real-
time, there is a delay of few hours. You should also tag different resources being tracked for
better understanding.
GCP Page 26
used instead. If you need to migrate an existing billing account:
• Get the necessary permissions for migration:
• roles/billing.admin on the source Organization.
• roles/billing.creator on the destination Organization.
• Go to the Billing page in the Cloud Console.
• Go to the Billing page
• Click on the name of the billing account you want to move.
• At the top of the Overview page, click Change organization.
• Select the destination Organization, and then click Ok.
• The billing account is now associated with the specified Organization.
GCP Page 27
Metadata and its use with startup and stop scripts, MIG
Tuesday, December 17, 2019 9:09 PM
https://cloud.google.com/compute/docs/storing-retrieving-metadata
Every instance stores its metadata on a metadata server. You can query this metadata server programmatically, from within the instance
and from the Compute Engine API. You can query for information about the instance, such as the instance's host name, instance ID,
startup and shutdown scripts, custom metadata, and service account information. Your instance automatically has access to the
metadata server API without any additional authorization.
The metadata server is particularly useful when used in combination with startup and shutdown scripts because you can use the
metadata server to programmatically get unique information about an instance, without additional authorization. For example, you can
write a startup script that gets the metadata key-value pair for an instance's external IP and use that IP in your script to set up a
database. Because the default metadata keys are the same on every instance, you can reuse your script without having to updat e it for
each instance. This helps you create less brittle code for your applications.
Metadata is stored in the format key:value. There is a default set of metadata entries that every instance has access to. You can also set
custom metadata. Metadata ALWAYS uses http and not https to get data.
To access the metadata server, you can query the metadata URL.
Getting metadata:
You can query the contents of the metadata server by making a request to the following root URLs from within a virtual machin e
instance. Use the http://metadata.google.internal/computeMetadata/v1/ URL to make requests to the metadata server.
Note: When you make a request to get information from the metadata server, your request and the subsequent metadata response
never leave the physical host that is running the virtual machine instance. Since instance is connected to the metadata serv er internally
over the VM machine, you can just use http instead of https
All metadata values are defined as sub-paths below these root URLs.
You can query for default metadata values only from within the associated instance. You cannot query an instance's default metadata
from another instance or directly from your local computer. You can use standard tools like curl or wget from the instance to its
metadata server.
When you query for metadata, you must provide the following header in all of your requests:
Metadata-Flavor: Google (this is required only for GCE not app engine. In app engine you need to use OAuth or
service token to authenticate)
This header indicates that the request was sent with the intention of retrieving metadata values, rather than unintentionally from an
insecure source, and lets the metadata server return the data you requested. If you don't provide this header, the metadata s erver
denies your request.
Ex: curl -H Metadata-Flavor:Google metadata.google.internal/computeMetadata/v1/
For automating startup: you need to add key and value of bucket storage to allow the instance to store data directly to the b ucket.
GCP Page 28
count towards that limit.
CLI ex: gcloud compute instances create example-instance \ --metadata foo=bar
Compute Engine enforces a combined total limit of 512 KB for all metadata entries. Maximum size limits are also applied to each key and
value as follows:
• Each metadata key has a maximum limit of 128 bytes
• Each metadata value has a maximum limit of 256 KB
In particular, SSH keys are stored as custom metadata under the ssh-keys key. If your metadata content for this key exceeds the 256 KB
limit, you won't be able add more SSH keys. If you run into this limit, consider removing unused keys to free up metadata space for new
keys.
Startup and shutdown script contents might also be stored as custom metadata and count toward these size limitations, if you provide
the startup or shutdown script contents directly. To avoid this, store your startup or shutdown script as a file hosted at an external
location, such as Cloud Storage, and provide the startup script URL when creating an instance. These files are downloaded ont o the VM
instance, rather than stored in the metadata server.
Guest attributes are a specific type of custom metadata that your applications can write to while running on your instance. A ny
application or user on your instance can both read and write data to these guest attribute metadata values.
Use guest attributes only for use cases that require small amounts of data that don't change frequently. The best use cases f or guest
attributes have the following characteristics:
• Startup scripts that can signal successful initialization by setting a custom status value in guest attributes.
• Configuration management agents that can publish a guest OS name and version to guest attributes.
• Inventory management agents that can publish list of packages installed in the VM instance to guest attributes.
• Workload orchestration software that can signal completion of an operation in the guest to the software control plane by setting a
custom status value in guest attributes.
GCP Page 29
IAM, Service accounts
Thursday, December 12, 2019 9:29 PM
The GCP IAM model for access management has three main parts:
• Member. A member can be a Google Account (for end users), a service account (for apps and virtual machines), a Google group, or a Google
Workspace or Cloud Identity domain that can access a resource. The identity of a member is an email address associated with a user, service
account, or Google group; or a domain name associated with Google Workspace or Cloud Identity domains.
• Role. A role is a collection of permissions. Permissions determine what operations are allowed on a resource. When you grant a role to a member,
you grant all the permissions that the role contains.
• Policy. The IAM policy binds one or more members to a role. When you want to define who (member) has what type of access (role) on a
resource, you create a policy and attach it to the resource. Policies are only to Allow access not Deny.
Cloud identity is just the identity aspect of G-Suite without the rest of the apps. It is possible to use a non-google provider account if it is supported by
google cloud directory sync (viz AD sync for google).
Roles:
There are more than 200+ roles.
Types of roles:
1. Primitive: Roles: Project Viewer, editor, owner. Owner can do everything an editor does but also handle billing and acct services. Predates Google
IAM but still exists. Too generic for general use.
2. Predefined: More granular. For specific uses. IAM for different services.
3. Custom roles: Like predefined but custom made. Can be made as specific as needed but better to use least privilege policy. Custom roles allow us
to group permissions to different services and assign members any of those roles. You can also segregate roles based on services and billing
accounts. However when you create a custom role, you add permissions based on what google has defined at that time, incase google updates
their permissions your custom role won't be updated so you would have to update it.
Each permission follows service.resource.verb, and any user who gets access is allowed access to that resource, for example,
pubsub.subscriptions.consume.
You can use get-iam-policy in json/yaml format, edit the json/yaml file and can then set the new policy using set-iam-policy.
A Policy is a collection of bindings. A binding binds one or more members to a single role.
JSON Format:
{
"bindings": [
{
"role": "roles/resourcemanager.organizationAdmin",
"members": [
"user:[email protected]",
"group:[email protected]",
"domain:google.com",
"serviceAccount:[email protected]"
]
} ] }
• Bindings can be multiple so it’s a list, Role is singular so its just a dict key:value, members can be multiple so it’s a list.
Instead, prefer to use :"gcloud [group] add-iam-policy-binding [resource name] --role=[role-id-grant] --member=user: [user email]"
"gcloud [group] remove-iam-policy-binding [resource name] --role=[role-id-grant] --member=user: [user email]"
since they are simpler, less work and less error prone. This also helps avoid race condition, wherein 2 user edit permissions at the same time, resulting
in only 1 getting used. Race condition creates issue with get-iam and set-iam policy (where gcloud will try to add multiple policies at once, and some
might interfere with other when not done sequentially), however in gcloud add-iam policy commands only the targeted command is sent to user not
the whole set of commands.
https://cloud.google.com/iam/docs/using-iam-securely
GCP Page 30
https://cloud.google.com/iam/docs/using-iam-securely
Folders are nodes in the Cloud Platform Resource Hierarchy. A folder can contain projects, other folders, or a combination of both. Organizations can
use folders to group projects under the organization node in a hierarchy. For example, your organization might contain multiple departments, each with
its own set of Google Cloud resources. Folders allow you to group these resources on a per-department basis. Folders are used to group resources that
share common IAM policies. While a folder can contain multiple folders or resources, a given folder or resource can have exactly one parent.
• To create a folder under the Organization resource using the gcloud command-line tool, run the following command.
• gcloud alpha resource-manager folders create \
--display-name=[DISPLAY_NAME] \
--organization=[ORGANIZATION_ID]
• To create a folder whose parent is another folder:
• gcloud alpha resource-manager folders create \
--display-name=[DISPLAY_NAME] \
--folder=[FOLDER_ID]
• gcloud iam roles copy [--dest-organization=DEST_ORGANIZATION] [--dest-project=DEST_PROJECT] [--destination=DESTINATION] [--source=SOURCE] [--
source-organization=SOURCE_ORGANIZATION] [--source-project=SOURCE_PROJECT] [GCLOUD_WIDE_FLAG …] : Copy IAM roles from 1
project/org to another
• --dest-organization=DEST_ORGANIZATION
○ The organization of the destination role.
• --dest-project=DEST_PROJECT
○ The project of the destination role.
• --destination=DESTINATION
○ The destination role ID for the new custom role. For example: viewer.
• --source=SOURCE
○ The source role ID. For predefined roles, for example: roles/viewer. For custom roles, for example: myCompanyAdmin.
• --source-organization=SOURCE_ORGANIZATION
○ The organization of the source role if it is an custom role.
• --source-project=SOURCE_PROJECT
○ The project of the source role if it is an custom role.
• gcloud group remove-iam-policy-binding resource \ --member=member --role=role-id : To revoke or remove an IAM
role. Group can be project or organization
• Testing permissions
• Most Google Cloud resources expose the testIamPermissions() method, which allows you to programmatically check whether the currently
authenticated caller has been granted one or more specific IAM permissions on the resource. The testIamPermissions() method takes a resource
identifier and a set of permissions as input parameters, and returns the set of permissions that the caller is allowed.
• Access scopes only apply to the default service account and not custom service accounts, since google assumes we have the correct settings in the
custom account.
• Google kubernetes engine doesn't integrate with GCP-IAMs, hence you need to manually add IAMs for GKE.
• Google recommends predefined roles instead of primitive roles like Project Editor.
Members:
In IAM, you grant access to members. Members can be of the following types:
• Google Account: A Google Account represents a developer, an administrator, or any other person who interacts with Google Cloud. Any email address
that's associated with a Google Account can be an identity, including gmail.com or other domains. New users can sign up for a Google Account by going
to the Google Account signup page.
• Service account: A service account is an account for an application instead of an individual end user. When you run code that's hosted on Google Cloud,
the code runs as the account you specify. You can create as many service accounts as needed to represent the different logical components of your
application.
• Google group: A Google group is a named collection of Google Accounts and service accounts. Every Google group has a unique email address that's
GCP Page 31
• Google group: A Google group is a named collection of Google Accounts and service accounts. Every Google group has a unique email address that's
associated with the group. You can find the email address that's associated with a Google group by clicking About on the homepage of any Google
group.
• Google Workspace domain: A Google Workspace domain represents a virtual group of all the Google Accounts that have been created in an
organization's Google Workspace account. Google Workspace domains represent your organization's internet domain name (such as example.com), and
when you add a user to your Google Workspace domain, a new Google Account is created for the user inside this virtual group (such as
[email protected]). Like Google Groups, Google Workspace domains cannot be used to establish identity, but they enable convenient permission
management.
• Cloud Identity domain: A Cloud Identity domain is like a Google Workspace domain because it represents a virtual group of all Google Accounts in an
organization. However, Cloud Identity domain users don't have access to Google Workspace applications and features.
• All authenticated users: The value allAuthenticatedUsers is a special identifier that represents all service accounts and all users on the internet who
have authenticated with a Google Account. This identifier includes accounts that aren't connected to a Google Workspace or Cloud Identity domain,
such as personal Gmail accounts. Users who aren't authenticated, such as anonymous visitors, aren't included. Some resource types do not support this
member type.
• All users: The value allUsers is a special identifier that represents anyone who is on the internet, including authenticated and unauthenticated users.
Some resource types do not support this member type.
• Service accounts: Google says "For almost all cases whether you are developing locally or in a production, you should use service accounts rather
than user accounts or api keys". Service accounts are meant to be used by services and code to leverage gcp api in an authorized manner. Some are
maintained by google, others can be built by us. You need to specify to your code explicitly where your service account keys are, else it will choose
default viz google's own service acc.
• Service accounts are associated with private/public RSA key-pairs that are used for authentication to Google.
• Service accounts do not have passwords
• Note that service accounts can be thought of as both a resource and as an identity.
○ When thinking of the service account as an identity, you can grant a role to a service account, allowing it to access a resource (such as a
project).
○ When thinking of a service account as a resource, you can grant roles to other users to access or manage that service account.
GCP Page 32
• Service Account User (roles/iam.serviceAccountUser): Allows members to indirectly access all the resources that the service account can access. For
example, if a member has the Service Account User role on a service account, and the service account has the Cloud SQL Admin role
(roles/cloudsql.admin) on the project, then the member can impersonate the service account to create a Cloud SQL instance.
• When granted together with roles/compute.instanceAdmin.v1, roles/iam.serviceAccountUser gives members the ability to create and manage
instances that use a service account. Specifically, granting roles/iam.serviceAccountUser and roles/compute.instanceAdmin.v1 together gives members
permission to:
• Create an instance that runs as a service account.
• Attach a persistent disk to an instance that runs as a service account.
• Set instance metadata on an instance that runs as a service account.
• Use SSH to connect to an instance that runs as a service account.
• Reconfigure an instance to run as a service account.
• You can grant roles/iam.serviceAccountUser one of two ways:
• Recommended. Grant the role to a member on a specific service account. This gives a member access to the service account for which they are an
iam.serviceAccountUser but prevents access to other service accounts for which the member is not an iam.serviceAccountUser.
• Grant the role to a member on the project level. The member has access to all service accounts in the project, including service accounts that are
created in the future.
Google Cloud Client Libraries will automatically find and use the service account credentials.
Accessing private data on behalf of a Service account key You need to create a service account, and download its private key as a JSON file. You need
service account outside Google Cloud to pass the file to Google Cloud Client Libraries, so they can generate the service account
environments credentials at runtime.
Google Cloud Client Libraries will automatically find and use the service account credentials
by using the GOOGLE_APPLICATION_CREDENTIALS environment variable.
• Automatic:
• If your application runs inside a Google Cloud environment that has a default service account, your application can retrieve the service account
credentials to call Google Cloud APIs.
• We recommend using this strategy because it is more convenient and secure than manually passing credentials.
• Additionally, we recommend you use Google Cloud Client Libraries for your application. Google Cloud Client Libraries use a library called
Application Default Credentials (ADC) to automatically find your service account credentials.
• ADC looks for the credentials in the following order:
• If GOOGLE_APPLICATION_CREDENTIALS is set as an env variable, use it.
GCP Page 33
• If GOOGLE_APPLICATION_CREDENTIALS is set as an env variable, use it.
• If GOOGLE_APPLICATION_CREDENTIALS is NOT set as an env variable, use service acc of the resource running your code
• If not either, use the default service acc for that service.
• Manual:
• Create a service account using 'gcloud iam service-accounts create <acc-name>'
• Grant permissions using 'gcloud projects add-iam-policy <project id> --member="serviceAccount=<acc-name>@<project id>" --
role="roles/owner"'
• Generate the key file (in json) using 'gcloud iam service-accounts keys create <file name>.json --iam-account=<acc-name>@<project
id>.iam.gserviceaccount.com'
• Pass this json file to env variable GOOGLE_APPLICATION_CREDENTIALS using export i.e export GOOGLE_APPLICATION_CREDENTIALS="<path to
file>/key.json"
• Create JSON keys for the service account and execute gcloud auth activate-service-account --key-file [KEY_FILE] : To create a service acc and verify if
it works.
APP Engine
Roles/appengine App Engine Read/Write/Modify access to all application configuration and settings. Project
.appAdmin Admin
Roles/appengine App engine Read-only access to all application configuration and settings. Not deployed source code Project
.appViewer viewer
Roles/appengine App engine Read-only access to all application configuration, settings, and deployed source code. This is the only app engine predefined Project
.codeViewer code viewer role that grants access to view the Source code (not even admin)
GCP Page 34
.codeViewer code viewer role that grants access to view the Source code (not even admin)
Roles/appengine App engine Read-only access to all application configuration and settings. Project
.deployer deployer Write access only to create a new version; cannot modify existing versions other than deleting versions that are not receiving
traffic.
Note: The App Engine Deployer (roles/appengine.deployer) role alone grants adequate permission to deploy using the App
Engine Admin API. To use other App Engine tooling, like gcloud commands, you must also have the Compute Storage Admin
(roles/compute.storageAdmin) and Cloud Build Editor (cloudbuild.builds.editor) roles.
Roles/appengine App engine Read-only access to all application configuration and settings. Project
.serviceAdmin service admin Write access to module-level and version-level settings. Cannot deploy a new version.
roles/appengin App Engine Ability to create the App Engine resource for the project. Project
e.appCreator Creator
All the above roles on their own can use app-engine using the API, if you want to use the gcloud tool to deploy, you must add the Compute Storage Admin
(roles/compute.storageAdmin) and Cloud Build Editor (roles/cloudbuild.builds.editor) roles.
Compute Engine
Roles/compute.admin Compute admin Full control of all Compute Engine resources.
If the user will be managing virtual machine instances that are configured to run as a service account, you must also grant
the roles/iam.serviceAccountUser role.
Roles/compute.imageUser Compute image Permission to list and read images without having other permissions on the image. Granting the compute.imageUser
user role at the project level gives users the ability to list all images in the project and create resources, such as instances and
persistent disks, based on images in the project.
Roles/compute.instanceAd Compute Permissions to create, modify, and delete virtual machine instances. This includes permissions to create, modify, and
min instance admin delete disks, and also to configure Shielded VMBETA settings.
(beta) If the user will be managing virtual machine instances that are configured to run as a service account, you must also grant
the roles/iam.serviceAccountUser role.
For example, if your company has someone who manages groups of virtual machine instances but does not manage
network or security settings and does not manage instances that run as service accounts, you can grant this role on the
organization, folder, or project that contains the instances, or you can grant it on individual instances.
Roles/compute.instanceAd Compute Full control of Compute Engine instances, instance groups, disks, snapshots, and images. Read access to all Compute
min.v1 instance admin Engine networking resources.
(v1) If you grant a user this role only at an instance level, then that user cannot create new instances.
Roles/compute.viewer Compute viewer Read-only access to get and list Compute Engine resources, without being able to read the data stored on them.
For example, an account with this role could inventory all of the disks in a project, but it could not read any of the data on
those disks.
LoadBalancer Roles
Roles/compute.loadBalance Compute load Permissions to create, modify, and delete load balancers and associate resources.
rAdmin balancer admin For example, if your company has a load balancing team that manages load balancers, SSL certificates for load balancers,
SSL policies, and other load balancing resources, and a separate networking team that manages the rest of the networking
resources, then grant the load balancing team's group the loadBalancerAdmin role.
Networking roles
Roles/compute.networkAd Compute Permissions to create, modify, and delete networking resources, except for firewall rules and SSL certificates. The network
min network admin admin role allows read-only access to firewall rules, SSL certificates, and instances (to view their ephemeral IP addresses).
The network admin role does not allow a user to create, start, stop, or delete instances.
For example, if your company has a security team that manages firewalls and SSL certificates and a networking team that
manages the rest of the networking resources, then grant the networking team's group the networkAdmin role.
Roles/compute.networkUse Compute Provides access to a shared VPC network
r network user Once granted, service owners can use VPC networks and subnets that belong to the host project. For example, a network
user can create a VM instance that belongs to a host project network but they cannot delete or create new networks in
the host project.
Roles/compute.networkVie Compute Read-only access to all networking resources
wer network viewer For example, if you have software that inspects your network configuration, you could grant that software's service
account the networkViewer role.
Security Roles
Roles/compute.orgSecurityP Compute org Full control of Compute Engine Organization Security Policies.
olicyAdmin security policy
admin
Roles/compute.orgSecurityP Compute org View or use Compute Engine Security Policies to associate with the organization or folders.
olicyUser security policy
User
SSH Login roles
Roles/compute.osAdminLo Compute os Access to log in to a Compute Engine instance as an administrator user.
gin admin login
GCP Page 35
Roles/compute.osLogin Compute os Access to log in to a Compute Engine instance as a standard user.
login
Roles/compute.osLoginExte Compute os Available only at the organization level.
rnalUser login external Access for an external user to set OS Login information associated with this organization. This role does not grant access to
user instances. External users must be granted one of the required OS Login roles in order to allow access to instances using
SSH.
Packet mirroring
Roles/compute.packetMirro Compute packet Specify resources to be mirrored.
ringAdmin mirroring admin
Roles/compute.packetMirro Compute packet Use Compute Engine packet mirrorings
ringUser mirror user
Firewall rules
Roles/compute.securityAd Compute Permissions to create, modify, and delete firewall rules and SSL certificates, and also to configure Shielded VMBETA
min Security Admin settings.
For example, if your company has a security team that manages firewalls and SSL certificates and a networking team that
manages the rest of the networking resources, then grant the security team's group the securityAdmin role.
Shared VPC role
Roles/compute.xpnAdmin Compute shared Permissions to administer shared VPC host projects, specifically enabling the host projects and associating shared VPC
vpc admin service projects to the host project's network.
This role can only be granted on the organization by an organization admin.
Google Cloud recommends that the Shared VPC Admin be the owner of the shared VPC host project. The Shared VPC
Admin is responsible for granting the compute.networkUser role to service owners, and the shared VPC host project
owner controls the project itself. Managing the project is easier if a single principal (individual or group) can fulfill both
roles.
GCP Page 36
or view. bigquery.tables.updateData
This role bigquery.tables.updateTag
cannot be resourcemanager.projects.get
applied to resourcemanager.projects.list
individual
models or
routines.
When applied
to a dataset,
this role
provides
permissions to:
• Read the
dataset's
metadata
and list
tables in
the
dataset.
• Create,
update,
get, and
delete
the
dataset's
tables.
When applied
at the project
or organization
level, this role
can also create
new datasets.
roles/bigqu BigQuery Data Owner When applied bigquery.datasets.* Table or view
ery.dataOwn to a table or bigquery.models.*
er view, this role bigquery.routines.*
provides bigquery.tables.*
permissions to: resourcemanager.projects.get
• Read and resourcemanager.projects.list
update
data and
metadata
for the
table or
view.
• Share the
table or
view.
• Delete
the table
or view.
This role
cannot be
applied to
individual
models or
routines.
When applied
to a dataset,
this role
provides
permissions to:
• Read,
update,
and
delete
the
dataset.
• Create,
update,
get, and
delete
the
dataset's
GCP Page 37
dataset's
tables.
When applied
at the project
or organization
level, this role
can also create
new datasets.
roles/bigqu BigQuery Data Viewer When applied bigquery.datasets.get Table or view
ery.dataVie to a table or bigquery.datasets.getIamPolicy
wer view, this role bigquery.models.export
provides bigquery.models.getData
permissions to: bigquery.models.getMetadata
• Read bigquery.models.list
data and bigquery.routines.get
metadata bigquery.routines.list
from the bigquery.tables.export
table or bigquery.tables.get
view. bigquery.tables.getData
This role bigquery.tables.getIamPolicy
cannot be bigquery.tables.list
applied to resourcemanager.projects.get
individual resourcemanager.projects.list
models or
routines.
When applied
to a dataset,
this role
provides
permissions to:
• Read the
dataset's
metadata
and list
tables in
the
dataset.
• Read
data and
metadata
from the
dataset's
tables.
When applied
at the project
or organization
level, this role
can also
enumerate all
datasets in the
project.
Additional
roles, however,
are necessary
to allow the
running of jobs.
roles/bigqu BigQuery Job User Provides bigquery.jobs.create Project
ery.jobUser permissions to resourcemanager.projects.get
run jobs, resourcemanager.projects.list
including
queries, within
the project.
roles/bigqu BigQuery Metadata Viewer When applied bigquery.datasets.get Table or view
ery.metadat to a table or bigquery.datasets.getIamPolicy
aViewer view, this role bigquery.models.getMetadata
provides bigquery.models.list
permissions to: bigquery.routines.get
• Read bigquery.routines.list
metadata bigquery.tables.get
from the bigquery.tables.getIamPolicy
table or bigquery.tables.list
view. resourcemanager.projects.get
GCP Page 38
view. resourcemanager.projects.get
This role resourcemanager.projects.list
cannot be
applied to
individual
models or
routines.
When applied
to a dataset,
this role
provides
permissions to:
• List
tables
and
views in
the
dataset.
• Read
metadata
from the
dataset's
tables
and
views.
When applied
at the project
or organization
level, this role
provides
permissions to:
• List all
datasets
and read
metadata
for all
datasets
in the
project.
• List all
tables
and
views
and read
metadata
for all
tables
and
views in
the
project.
Additional roles
are necessary
to allow the
running of jobs.
roles/bigqu BigQuery Read Session User Access to bigquery.readsessions.*
ery.readSes create and use resourcemanager.projects.get
sionUser read sessions resourcemanager.projects.list
roles/bigqu BigQuery Resource Admin Administer all bigquery.bireservations.*
ery.resourc BigQuery bigquery.capacityCommitments.*
eAdmin resources. bigquery.jobs.get
bigquery.jobs.list
bigquery.jobs.listAll
bigquery.reservationAssignments.*
bigquery.reservations.*
resourcemanager.projects.get
resourcemanager.projects.list
roles/bigqu BigQuery Resource Editor Manage all bigquery.bireservations.get
ery.resourc BigQuery bigquery.capacityCommitments.get
eEditor resources, but bigquery.capacityCommitments.list
cannot make bigquery.jobs.get
purchasing bigquery.jobs.list
GCP Page 39
purchasing bigquery.jobs.list
decisions. bigquery.jobs.listAll
bigquery.reservationAssignments.*
bigquery.reservations.*
resourcemanager.projects.get
resourcemanager.projects.list
roles/bigqu BigQuery Resource Viewer View all bigquery.bireservations.get
ery.resourc BigQuery bigquery.capacityCommitments.get
eViewer resources but bigquery.capacityCommitments.list
cannot make bigquery.jobs.get
changes or bigquery.jobs.list
purchasing bigquery.jobs.listAll
decisions. bigquery.reservationAssignments.list
bigquery.reservationAssignments.search
bigquery.reservations.get
bigquery.reservations.list
resourcemanager.projects.get
resourcemanager.projects.list
roles/bigqu BigQuery User When applied bigquery.bireservations.get Dataset
ery.user to a dataset, bigquery.capacityCommitments.get
this role bigquery.capacityCommitments.list
provides the bigquery.config.get
ability to read bigquery.datasets.create
the dataset's bigquery.datasets.get
metadata and bigquery.datasets.getIamPolicy
list tables in the bigquery.jobs.create
dataset. bigquery.jobs.list
When applied bigquery.models.list
to a project, bigquery.readsessions.*
this role also bigquery.reservationAssignments.list
provides the bigquery.reservationAssignments.search
ability to run bigquery.reservations.get
jobs, including bigquery.reservations.list
queries, within bigquery.routines.list
the project. A bigquery.savedqueries.get
member with bigquery.savedqueries.list
this role can bigquery.tables.list
enumerate bigquery.transfers.get
their own jobs, resourcemanager.projects.get
cancel their resourcemanager.projects.list
own jobs, and
enumerate
datasets within
a project.
Additionally,
allows the
creation of new
datasets within
the project; the
creator is
granted the
BigQuery Data
Owner role
(roles/bigqu
ery.dataOwn
er) on these
new datasets.
GCP Page 40
Storage HMAC Key Admin Full control over HMAC keys in a project. This role can only be applied storage.hmacKeys.*
(roles/storage.hmacKeyAdmin) to a project.
Storage Admin Grants full control of buckets and objects. firebase.projects.get
(roles/storage.admin) resourcemanager.projects.get
When applied to an individual bucket, control applies only to the resourcemanager.projects.list
specified bucket and objects within the bucket. storage.buckets.*
storage.objects.*
GCP Page 41
IAM Best practices and scenarios
Tuesday, February 9, 2021 5:52 PM
Best practices
• Mirror your Google Cloud resource hierarchy structure to your organization structure. The
Google Cloud resource hierarchy should reflect how your company is organized, whether
it's a startup, a SME, or a large corporation. A startup may start out with a flat resource
hierarchy with no organization resource. When more people start collaborating on
projects and the number of projects increase, getting an organization resource might
make sense. An organization resource is recommended for larger companies with multiple
departments and teams where each team is responsible for their own set of applications
and services.
• Use projects to group resources that share the same trust boundary. For example,
resources for the same product or microservice can belong to the same project.
• Set policies at the organization level and at the project level rather than at the resource
level. As new resources are added, you may want them to automatically inherit policies
from their parent resource. For example, as new virtual machines are added to the project
through auto-scaling, they automatically inherit the policy on the project.
For more information about how to set policies, see Granting, changing, and revoking
access.
• Grant roles to a Google group instead of to individual users when possible. It is easier to
manage members in a Google group than to update an IAM policy. Make sure to control
the ownership of the Google group used in IAM policies.
For more information about how to manage Google groups, see Google Groups help.
• Use the security principle of least privilege to grant IAM roles; that is, only give the least
amount of access necessary to your resources.
To find the appropriate predefined role, see the predefined roles reference. If there are
no appropriate predefined roles, you can also create your own custom roles.
• Grant roles at the smallest scope needed. For example, if a user only needs access to
publish messages to a Pub/Sub topic, grant the Publisher role to the user for that topic.
• Remember that the policies for child resources inherit from the policies for their parent
resources. For example, if the policy for a project grants a user the ability to administer
Compute Engine virtual machine (VM) instances, then the user can administer any
Compute Engine VM in that project, regardless of the policy you set on each VM.
• If you need to grant a role to a user or group that spans across multiple projects, set that
role at the folder level instead of setting it at the project level.
• Use labels to annotate, group, and filter resources.
GCP Page 42
• Use labels to annotate, group, and filter resources.
• Audit your policies to ensure compliance. Audit logs contain all setIamPolicy() calls,
so you can trace when a policy has been created or modified.
• Audit the ownership and the membership of the Google groups used in policies.
• If you want to limit project creation in your organization, change the organization access
policy to grant the Project Creator role to a group that you manage.
GCP Page 43
• The central IT team must be able to:
○ Associate projects with billing accounts.
○ Turn off billing for projects.
○ View the credit card information.
○ They must not have permissions to view the project contents.
○ Developers should be able to view the actual costs of the Google Cloud resources
being consumed, but shouldn't be able to turn billing off, associate billing with
projects, and view the credit card information.
• IT Dept: Grant them the billing account administrator role. They can associate projects
with billing, turn of billing on projects, view credit card info, but cannot view the projects.
• Service account: Create a Service account that is used for automating project creation.
Geant it the Billing account user role to enable billing on projects.
"role": "roles/billing.user",
"members": [
serviceAccount:[email protected] ]
• Developers of project: Grant them the Viewer role to allow the developers to view the
expenses for the projects they own.
• Cost aggregation
• In this scenario, a company wants to calculate and keep track of how much each team,
department, service, or project is costing them. For example, keep track of how much
does a test deployment cost them each month.
• This can be tracked by using the following practices:
• Use projects to organize resources. Cost is shown per project and project IDs are included
in billing export.
• Annotate projects with labels that represent additional grouping information. For
example, environment=test. Labels are included in billing export to allow you to slice and
dice further. However, labels on a project are permissioned the same way as the rest of
the project's metadata which means a project owner can change labels. You can educate
your employees about what not to change and then monitor (through audit logs), or grant
them only granular permissions so they can't change project metadata.
• You can export to JSON and CSV, but exporting directly to BigQuery is the solution we
GCP Page 44
• You can export to JSON and CSV, but exporting directly to BigQuery is the solution we
recommend. This is easily configurable from the billing export section of the billing
console.
• If each cost center must pay a separate invoice or pay in a separate currency for some
workloads, then a separate billing account for each cost center is required. However this
approach would require an affiliate agreement signed for each billing account
GCP Page 45
roles/compute.instanceAdmin (at service project level)
• For this scenario you need three separate IAM policies: one for the organization, one for
GCP Page 46
• For this scenario you need three separate IAM policies: one for the organization, one for
the host project, and one for the service projects.
• The first IAM policy, which needs to be attached at the organization level, grants the
network team the roles they need to administer shared VPC host projects and to manage
all network resources. This includes the ability to associate service projects with the host
project. The network admin role also grants the network team the ability to view but not
modify firewall rules. It also grants the security team the ability to set IAM policies and
manage firewall rules and SSL certificates in all projects in the organization.
○ Network Team: Grant them the xpn (share vpc) admin role, network admin role to
create and share vpc
role: roles/compute.xpnAdmin, roles/compute.networkAdmin
○ Security Team: Grant security admin and resource manager roles to set iam policies
and manage firewall rules and SSL certss
role: roles/compute.securityAdmin, roles/resourcemanager.organization
• The second IAM policy is at the host level, and enable the developers to use the shared
networks and shared vpc
○ Developers: roles/compute.networkUser
• The 3rd IAM policy is at each service project level, which allows developers using the
project to manage instances in the service project and to use shared subnets in the host
project.
• You could place all service projects in a folder and set this particular policy at that level of
the hierarchy. This would allow all projects created in that folder to inherit the
permissions set at the folder within which the service project is created.
○ Developers: roles/compute.networkUser, roles/compute.instanceAdmin
GCP Page 47
• This requires an IAM policy bound at each team's allocated folder.
○ DevteamLeads01: roles/resourcemanager.foldersAdmin,
roles/resourcemanager.projectCreator
○ Security team: roles/compute.securityAdmin, roles/compute.networkAdmin,
roles/compute.instanceAdmin
○ Devteam: roles/compute.instanceAdmin, roles/bigquery.admin
GCP Page 48
SSH to Compute engine
Thursday, January 28, 2021 2:18 AM
To check ssh-keys in metadata for any project or instance you can run : curl -H Metadata-Flavor:Google
metadata.google.internal/computeMetadata/v1/instances/ssh-keys
GCP Page 49
• iam.serviceAccounts.actAs on the project if setting project-wide metadata
1. OS login method:
a. Enabling/disabling os login:
i. Gcloud: Enable os-login either at the project level to all instances or at each instance level using either
gcloud compute project-info add-metadata \ --metadata enable-oslogin=TRUE or gcloud compute instances
add-metadata <vm name> \ --metadata enable-oslogin=TRUE
ii. Console: In compute engine->metadata->edit/custom metadata-> set key enable-oslogin value True to
enable False to disable. For entire project level do this at the organization level and choose project, or for
instance level do it from inside the project the instance lies in.
iii. Alternatively you can also create instances with enable-oslogin true, by adding it in the create vm page or by
gcloud compute instances create <vm name> \ --metadata enable-oslogin=TRUE
b. Add IAM roles to user you want oslogin to accept, else they wont get the correct ssh access
i. Grant one of the following instance access roles in IAM->User name.
1. roles/compute.osLogin, which doesn't grant administrator permissions
2. roles/compute.osAdminLogin, which grants administrator permissions
ii. If your VM instance uses a service account, then each user must be configured to have the
roles/iam.serviceAccountUser role on the service account. This is useful if your app (in GCE) requires ssh
access. For outside apps manually configure the service account for it.
iii. For users that are outside of your organization to access your VMs, in addition to granting an instance access
role, grant the roles/compute.osLoginExternalUser role. This role must be granted at the organization level
by an organization administrator.
c. Create ssh keys associated with that username: You need to create and provide your ssh public keys to gcloud so
that it can use those to allow your username to login
i. Open a terminal on your workstation and use the ssh-keygen command to generate a new key. Specify the -C
flag to add a comment with your username.
• ssh-keygen -t rsa -f ~/.ssh/<KEY_FILENAME> -C <USERNAME>
• where: <KEY_FILENAME> is the name that you want to use for your SSH key files. For example, a
filename of my-ssh-key generates a private key file named my-ssh-key and a public key file named my-
ssh-key.pub. <USERNAME> is the username for the user connecting to the instance.
ii. This command generates a private SSH key file and a matching public SSH key (at
~/.ssh/<KEY_FILENAME>.pub) with the following structure: ssh-rsa <KEY_VALUE> <USERNAME> where:
<KEY_VALUE> is the key value generated by ssh-keygen. It is a long string of characters. <USERNAME> is the
user that created the key. You can modify this value to be more descriptive.
iii. Restrict access to your private key so that only you can read it and nobody can write to it.
• chmod 400 ~/.ssh/[KEY_FILENAME> where <KEY_FILENAME> is the name that you used for your SSH
key files.
iv. Repeat this process for every user who needs a new key. Before uploading to gcp, make Then, locate the
public SSH keys that you made and any existing public SSH keys that you want to add to a project or instance.
v. Add the public key to your gcloud using gcloud compute os-login ssh-keys add --key-file .ssh/id_rsa.pub and
do the same for all usernames
d. SSH from external pc using command: ssh -i <path to private key> username@external_ip
1. Username can be google identity that they use to login to gcp
GCP Page 50
<USERNAME>
b. Inside GCP add your ssh public key by going inside the console->Compute engine->Metadata->ssh keys->edit->add keys and
adding the public key
c. This can be done at the project or instance level
d. SSH from external pc using command: ssh -i <path to private key> username@external_ip
• To add or remove instance-level public SSH keys with the gcloud tool:
• If your instance already has instance-level public SSH keys, get those public SSH keys from metadata:
• Get the existing metadata for the instance:
○ gcloud compute instances describe [INSTANCE_NAME]
• From the output, find the ssh-keys metadata value:
○ Note: The ssh-keys metadata value doesn't appear if your instance doesn't have existing instance-level public SSH keys.
○ ...
○ metadata:
○ fingerprint: QCofVTHlggs=
○ items:
○ ...
○ - key: ssh-keys
○ value: |-
○ [USERNAME_1]:ssh-rsa [EXISTING_KEY_VALUE_1] [USERNAME_1]
○ [USERNAME_2]:ssh-rsa [EXISTING_KEY_VALUE_2] [USERNAME_2]
○ ...
○ where:
○ [USERNAME_1] and [USERNAME_2] are the usernames for your existing keys.
○ [EXISTING_KEY_VALUE_1] and [EXISTING_KEY_VALUE_2] are public key values that are already applied to your instance.
○ Note: If a public SSH key has an expiration time, that key has a slightly different format than the keys in this example.
• Copy the public SSH keys under the ssh-keys metadata value.
• Create and open a new text file on your local workstation.
• In the file, create a list of all of the public SSH keys that you want to add or keep in instance-level metadata. If you currently have
public SSH keys in instance-level metadata, any keys that you don't include in your list are removed.
○ For example, the sample list below removes the key for [USERNAME_1] because their SSH key is omitted. It also keeps the
SSH key for [USERNAME_2] and adds the SSH key for [USERNAME_3] because their SSH keys are included in the list.
○ [USERNAME_2]:ssh-rsa [EXISTING_KEY_VALUE_2] [USERNAME_2]
○ [USERNAME_3]:ssh-rsa [NEW_KEY_VALUE] [USERNAME_3]
○ where:
○ [USERNAME_1],[USERNAME_2], and [USERNAME_3] are the usernames of the public SSH keys.
GCP Page 51
○ [USERNAME_1],[USERNAME_2], and [USERNAME_3] are the usernames of the public SSH keys.
○ [EXISTING_KEY_VALUE_1] is a public key value for an SSH key that you want to remove.
○ [EXISTING_KEY_VALUE_2] is a public key value for an SSH key that you want to keep.
○ [NEW_KEY_VALUE] is a public key value for an SSH key that you want to add.
• Save and close the file.
• In the command prompt, use the compute instances add-metadata command to set the instance-only ssh-key value. Include the --
metadata-from-file flag and specify the path to the public key file list that you made.
○ gcloud compute instances add-metadata [INSTANCE_NAME] --metadata-from-file ssh-keys=[LIST_PATH]
○ where:
○ [INSTANCE_NAME] is the name of the instance where you want to apply the public SSH key file.
○ [LIST_PATH] is the path to your list of public SSH keys.
GCP Page 52
Compute engine
Friday, February 12, 2021 11:42 AM
• When choosing boot disk for a compute engine, you can choose from multiple options what the image attached to the
disk is:
Scenarios Machine image Persistent disk snapshot Custom image Instance template
Single disk backup Yes Yes Yes No
Multiple disk backup Yes No No No
○
Differential backup Yes Yes No No
Instance cloning and replication Yes No Yes Yes
VM instance configuration Yes No No Yes
○ Machine Image: A machine image is a Compute Engine resource that stores all the configuration, metadata,
permissions, and data from one or more disks required to create a virtual machine (VM) instance. You can use a
machine image in many system maintenance scenarios, such as instance creation, backup and recovery, and
instance cloning. You have to create the instance from within this image, not the GCE menu.
○ Persistent disk snapshot: Create snapshots to periodically back up data from your zonal persistent disks or regional
persistent disks. You can create snapshots from disks even while they are attached to running instances. Snapshots
are global resources, so you can use them to restore data to a new disk or instance within the same project. You
can also share snapshots across projects.
○ Instance templates: Instance templates define the machine type, boot disk image or container image, labels, and
other instance properties. You can then use an instance template to create a MIG or to create individual VMs.
Instance templates are a convenient way to save a VM instance's configuration so you can use it later to create
VMs or groups of VMs.
○ Custom Images: A custom image is a boot disk image that you own and control access to. You can clone images
from other GCE images, your own on-prem images. You can copy an image into another, by using the first image as
the source for the 2nd image.
○ Public images: Standard linux and windows OS images provided by gcp to everyone.
• A VM instance’s availability policy determines how it behaves when an event occurs that requires Google to move your
VM to a different host machine. For example, you can choose to keep your VM instances running while Compute Engine
live migrates them to another host or you can choose to terminate your instances instead.
○ You can update an instance’s availability policy at any time to control how you want your VM instances to behave.
○ You can change an instance’s availability policy by configuring the following two settings:
▪ The VM instance’s maintenance behavior, which determines whether the instance is live migrated or
terminated when there is a maintenance event.
▪ The instance’s restart behavior, which determines whether the instance automatically restarts if it crashes or
gets terminated.
○ The default maintenance behavior for instances is to live migrate, but you can change the behavior to terminate
your instance during maintenance events instead.
○ Configure an instance’s maintenance behavior and automatic restart setting using the onHostMaintenance and
automaticRestart properties.
○ All instances are configured with default values unless you explicitly specify otherwise.
▪ onHostMaintenance: Determines the behavior when a maintenance event occurs that might cause your
instance to reboot.
• [Default] migrate, which causes Compute Engine to live migrate an instance when there is a
maintenance event.
• terminate, which terminates an instance instead of migrating it.
▪ automaticRestart: Determines the behavior when an instance crashes or is terminated by the system.
GCP Page 53
▪ automaticRestart: Determines the behavior when an instance crashes or is terminated by the system.
[Default] true, so Compute Engine restarts an instance if the instance crashes or is terminated.
false, so Compute Engine does not restart an instance if the instance crashes or is terminated.
• Snapshots and images:
○ An image is a complete backup of your server including all volumes (and OS).
▪ A snapshot can be done from a specific volume (for example you have a server with a volume containing the
OS and another one containing the application data, and you want to use different snapshot strategies on
both volumes) but mostly not OS.
○ You need to have a persistent made before you can create a snapshot or an image.
○ Create snapshots to periodically back up data from your zonal persistent disks or regional persistent disks.
○ You can create snapshots from disks even while they are attached to running instances. Snapshots are global
resources, so you can use them to restore data to a new disk or instance within the same project. You can also
share snapshots across projects.
○ If you create a snapshot of your persistent disk while your application is running, the snapshot might not capture
pending writes that are in transit from memory to disk. Because of these inconsistencies, the snapshot might not
reflect the exact state of your application at the time you captured the snapshot.
○ You cannot edit a snapshot schedule. To change a schedule that is already attached to a disk, you must first detach
the schedule from the disk and delete it. Then you can create a new schedule, and attach it to the disk.
○ If you delete a snapshot schedule, all auto-generated snapshots associated with the snapshot schedule are kept
permanently. However, after the schedule is deleted, it can no longer generate snapshots.
○ You can create disk images from the following sources:
▪ A persistent disk, even while that disk is attached to an instance
▪ A snapshot of a persistent disk
▪ Another image in your project
▪ An image that is shared from another project
▪ A compressed RAW image in Cloud Storage
GCP Page 54
○ You can revert a deprecation (make an image active again), by changing the deprecation state to ACTIVE.
○ You can only delete custom images that you, or someone who has access to the project, have added. Use the
Google Cloud Console, gcloud command-line tool, or the Compute Engine API method to delete the image.
▪ gcloud compute images delete <IMAGE_NAME>
• When to use a machine image: The following table compares the use of machine images, persistent disk snapshots,
instance templates, and custom images.
Scenarios Machine image Persistent disk snapshot Custom image Instance template
Single disk backup Yes Yes Yes No
Multiple disk backup Yes No No No
GCP Page 55
Differential backup Yes Yes No No
Instance cloning and replication Yes No Yes Yes
VM instance configuration Yes No No Yes
○ From the preceding table, you can see that machine images are the most ideal resources for the following use
cases:
▪ Disk backups
▪ Instance cloning and replication
• Instance templates are immutable i.e you can't edit an existing template. You can duplicate/copy it into another template and
then edit that.
• Instance groups: Are used to add multiple instance created from a instance template, into 1 system.
• Managed groups allow autoscaling and is used for multiple similar instances.
• Unmanaged groups is for dissimilar instances and doesn't support autoscaling though it does load balancing.
Unmanaged instance group cannot be multi-zone.
• Note: With load balancing alone, you’ll have to know ahead of time how much capacity you need so you can keep additional
instances running and registered with the load balancer to serve higher loads. Or you could manually stop worrying about it
and auto scale based on say CPU usage so that instances increase or decrease dynamically based on the load.
• Deleting a managed instance group will delete all the instances it had created, since it owns them. However deleting an
unmanaged instance group wont delete any instances.
• maxSurge specifies the maximum number of instances that can be created over the desired number of instances. If maxSurge
is set to 0, the rolling update cannot create additional instances and is forced to update existing instances resulting in a
reduction in capacity. Therefore, it does not satisfy our requirement to ensure that the available capacity does not decrease
during the deployment.
• maxUnavailable - specifies the maximum number of instances that can be unavailable during the update process. When
maxUnavailable is set to 1, the rolling update updates 1 instance at a time. i.e. it takes 1 instance out of service, updates it,
and puts it back into service. This option results in a reduction in capacity while the instance is out of service. Example - if we
have 10 instances in service, this combination of setting results in 1 instance at a time taken out of service for an upgrade
while the remaining 9 continue to serve live traffic. That’s a reduction of 10% in available capacity and does not satisfy ou r
requirement to ensure that the available capacity does not decrease during the deployment.
• Manually setting the size of a MIG
• If a managed instance group is not already set to automatically scale, you can resize the group manually to change the
number of instances.
• If you increase the size, the managed instance group uses the current instance template to add new instances.
• If you decrease the size, the managed instance group deletes VMs from the group. The group deletes instances with a
currentAction of DELETING, CREATING, and RECREATING before it deletes instances that are running with no
scheduled actions.
• To remove an instance from the instance group without deleting the instance, use the abandon-instances command. Once
instances have been abandoned, the currentSize of the group is automatically reduced as well to reflect the change.
• Abandoning an instance does not reboot or delete the underlying virtual machine instances, but just removes the
instances from the instance group. If you would like the delete the underlying instances, use the delete-instances
command instead.
GCP Page 56
• Target based autoscaling:
○ Scaling based on CPU utilization: You can autoscale based on the average CPU utilization of a managed instance
group (MIG).
▪ gcloud compute instance-groups managed set-autoscaling example-managed-instance-group \
--max-num-replicas 20 \
--target-cpu-utilization 0.60 \
--cool-down-period 90
○ Scaling based on the serving capacity of an external HTTP(S) load balancer
▪ This is autoscaling in conjunction with loadbalancing.
▪ This means that autoscaling adds or removes VM instances in the group when the load balancer indicates
that the group has reached a configurable fraction of its fullness, where fullness is defined by the target
capacity of the selected balancing mode of the backend instance group.
▪ The loadbalancer offers 2 balancing mode: Utilization and Rate
• Utilization specifies the max target for avg backend utilization. Rate specifies target number of requests
per second on a per-instance or per-group basis
▪ Autoscaling does not work with maximum requests per group because this setting is independent of the
number of instances in the instance group. The load balancer continuously sends the maximum number of
requests per group to the instance group, regardless of how many instances are in the group.
○ Scaling based on Cloud Monitoring metrics
▪ Scale based on monitored metrics provided on a per-instance basis or a per-group basis, and handled in
Cloud Monitoring (but not detected by cloud monitoring logs)
▪ These can be cloud monitoring based or custom built metrics.
▪ Per-instance metrics: Per-instance metrics provide data for each VM in a MIG separately, indicating resource
utilization for each instance. When using per-instance metrics, the MIG cannot scale below a size of 1 VM
because the autoscaler requires metrics about at least one running VM in order to operate.
▪ Valid utilization metric for scaling meets the following criteria:
• The standard metric must contain data for a gce_instance monitored resource. You can use the
timeSeries.list API call to verify whether a specific metric exports data for this resource.
• The standard metric describes how busy an instance is, and the metric value increases or decreases
proportionally to the number of VMs in the group.
▪ Per-group metrics: Per-group metrics allow autoscaling with a standard or custom metric that does not
export per-instance utilization data. Instead, the group scales based on a value that applies to the whole
group and corresponds to how much work is available for the group or how busy the group is. The group
scales based on the fluctuation of that group metric value and the configuration that you define.
• Scaling based on schedules:
○ You can use schedule-based autoscaling to allocate capacity for anticipated loads.
○ For each scaling schedule, specify the following:
▪ Capacity: minimum required VM instances
▪ Schedule: start time, duration, and recurrence (for example, once, daily, weekly, or monthly)
○ Each scaling schedule is active from its start time and for the configured duration. During this time, autoscaler
scales the group to have at least as many instances as defined by the scaling schedule.
• Cool down period:
○ The cool down period is also known as the application initialization period. Compute Engine uses the cool down
period for scaling decisions in two ways:
○ To omit unusual usage data after a VM is created and while its application is initializing.
GCP Page 57
○ To omit unusual usage data after a VM is created and while its application is initializing.
○ If predictive autoscaling is enabled, to inform the autoscaler how much time in advance to scale out ahead of
anticipated load, so that applications are initialized when the load arrives.
○ Specify a cool down period to let your instances finish initializing before the autoscaler begins collecting usage
information from them. By default, the cool down period is 60 seconds.
• Stabilization period
○ For the purposes of scaling in, the autoscaler calculates the group's recommended target size based on peak load
over the last 10 minutes. These last 10 minutes are referred to as the stabilization period.
○ Using the stabilization period, the autoscaler ensures that the recommended size for your managed instance group
is always sufficient to serve the peak load observed during the previous 10 minutes.
○ This 10-minute stabilization period might appear as a delay in scaling in, but it is actually a built-in feature of
autoscaling. The delay ensures that the smaller group size is enough to support peak load from the last 10 minutes
• Predictive autoscaling
○ If you enable predictive autoscaling to optimize your MIG for availability, the autoscaler forecasts future load
based on historical data and scales out a MIG in advance of predicted load, so that new instances are ready to
serve when the load arrives.
○ Predictive autoscaling works best if your workload meets the following criteria:
○ Your application takes a long time to initialize—for example, if you configure a cool down period of more than 2
minutes.
○ Your workload varies predictably with daily or weekly cycles.
• Scale-in controls
○ If your workloads take many minutes to initialize (for example, due to lengthy installation tasks), you can reduce
the risk of response latency caused by abrupt scale-in events by configuring scale-in controls. Specifically, if you
expect load spikes to follow soon after declines, you can limit the scale-in rate to prevent autoscaling from
reducing a MIG's size by more VM instances than your workload can tolerate.
○ To configure scale-in controls, set the following properties in your autoscaling policy.
○ Maximum allowed reduction: The number of VM instances that your workload can afford to lose (from its peak
size) within the specified trailing time window.
○ Trailing time window: Define how long the autoscaler should wait before removing instances, as defined by the
maximum allowed reduction. With a longer trailing time window, the autoscaler considers more historical peaks,
making scale-in more conservative and stable.
Updating a managed instance group:
• A managed instance group contains one or more virtual machine instances that are controlled using an instance
template.
• To update instances in a managed instance group, you can make update requests to the group as a whole, using the
Managed Instance Group Updater feature.
○ The Managed Instance Group Updater allows you to easily deploy new versions of software to instances in your
managed instance groups, while controlling the speed of deployment, the level of disruption to your service, and
the scope of the update.
○ The Updater offers two primary advantages:
▪ The rollout of an update happens automatically to your specifications, without the need for additional user
input after the initial request.
▪ You can perform partial rollouts which allows for canary testing.
○ By allowing new software to be deployed inside an existing managed instance group, there is no need for you to
reconfigure the instance group or reconnect load balancing, autoscaling, or autohealing each time new version of
software is rolled out.
○ Without the Updater, new software versions must be deployed either by creating a new managed instance group
GCP Page 58
○ Without the Updater, new software versions must be deployed either by creating a new managed instance group
with a new software version, requiring additional set up each time, or through a manual, user-initiated, instance-
by-instance recreate.
▪ Both of these approaches require significant manual steps throughout the process.
○ A rolling update is an update that is gradually applied to all instances in an instance group until all instances have
been updated. You can control various aspects of a rolling update, such as how many instances can be taken offline
for the update, how long to wait between updating instances, whether the update affects all or just a portion of
instances, and so on.
○ There is no explicit command for rolling back an update to a previous version, but if you decide to roll back an
update (either a fully committed update or a canary update), you can do so by making a new update request and
passing in the instance template that you want to roll back to.
• Machine types
• E2 machine types are cost-optimized VMs that offer up to 32 vCPUs with up to 128 GB of memory with a maximum of 8
GB per vCPU. E2 machines have a predefined CPU platform running either an Intel or the second generation AMD EPYC
Rome processor. E2 VMs provide a variety of compute resources for the lowest price on Compute Engine, especially
when paired with committed-use discounts.
• N2 machine types offer up to 80 vCPUs, 8 GB of memory per vCPU, and are available on the Intel Cascade Lake CPU
platforms.
○ N2D machine types offer up to 224 vCPUs, 8 GB of memory per vCPU, and are available on second generation AMD
EPYC Rome platforms.
○ N1 machine types offer up to 96 vCPUs, 6.5 GB of memory per vCPU, and are available on Intel Sandy Bridge, Ivy
Bridge, Haswell, Broadwell, and Skylake CPU platforms.
• Shared-core machine types are available in the E2 and N1 families. These machine types timeshare a physical core. This
can be a cost-effective method for running small, non-resource intensive applications.
○ E2: e2-micro, e2-small, and e2-medium shared-core machine types have 2 vCPUs available for short periods of
bursting.
○ N1: f1-micro and g1-small shared-core machine types have up to 1 vCPU available for short periods of bursting.
GCP Page 59
ram. 80 vCPU and 640gb ram
• N2-highcpu-2 starts with 2
vCPU and 2gb ram and goes
upto 80 vCPU and 80gb ram
GCP Page 60
GCP Page 61
Networking, VPC and Firewalls
Monday, January 6, 2020 1:48 AM
• Google provides premium and standard routing to its services. Premium routing or cold
potato, is used when you want your app to reach google cloud services directly on egress.
Standard routing or hot potato is used when you traverse the general internet before
reaching googles services.
• Reserved IP addresses in a subnet
• Every subnet has four reserved IP addresses in its primary IP range. There are no reserved IP
addresses in the secondary IP ranges.
Reserved IP Description Example
address
Network First address in the primary IP range for the subnet 10.1.2.0 in
10.1.2.0/24
Default Second address in the primary IP range for the subnet 10.1.2.1 in
• gateway 10.1.2.0/24
Second-to-last Second-to-last address in the primary IP range for the subnet 10.1.2.254 in
address that is reserved by Google Cloud for potential future use 10.1.2.0/24
Broadcast Last address in the primary IP range for the subnet 10.1.2.255 in
10.1.2.0/24
• Static IP: Reserve static Ips in projects to assign in resources. You pay for reserved Ips and
not used Ips.
• You can increase the default CIDR subnet at any time, but the new subnet must contain the
older one i.e the new one should have the older as a subset.
• gcloud compute networks subnets expand-ip-range NAME --prefix-
length=PREFIX_LENGTH [--region=REGION] [GCLOUD_WIDE_FLAG …]
○ Name: Name of subnetwork
○ Prefix-Length: The new prefix length of the subnet
○ REGION: Region of the subnetwork to operate on. Is optional if already set in
gcloud init
GCP Page 62
Google Cloud offers two types of VPC networks, determined by their subnet creation mode:
• When an auto mode VPC network is created, one subnet from each region is
automatically created within it. These automatically created subnets use a set of
predefined IP ranges that fit within the 10.128.0.0/9 CIDR block. As new Google Cloud
regions become available, new subnets in those regions are automatically added to
auto mode VPC networks by using an IP range from that block. In addition to the
automatically created subnets, you can add more subnets manually to auto mode VPC
networks in regions that you choose by using IP ranges outside of 10.128.0.0/9.
• When a custom mode VPC network is created, no subnets are automatically created.
This type of network provides you with complete control over its subnets and IP
ranges. You decide which subnets to create in regions that you choose by using IP
ranges that you specify.
Gcloud commands to create a custom vpc and adding 2 subnets in different regions:
○ gcloud compute networks create testvpc --project=<project name> --subnet-
mode=custom --mtu=1460 --bgp-routing-mode=regional
○ gcloud compute networks subnets create subnet1 --project=<project name> --
range=192.168.0.0/24 --network=testvpc --region=us-east1
○ gcloud compute networks subnets create subnet2 --project=<project name> --
range=192.168.1.0/24 --network=testvpc --region=us-west1
GCP Page 63
Enabling Private Google Access helps you save your egress traffic costs.
○ In case you have disabled the Private Google Access, now VM instances can no
longer access Google services, they will be able to send traffic within the VPC
network, if you still want to access Google services then you have to configure
external IP Address.
○ Private Google Access has no impact on instances that have external IP addresses
too. Instances with an external IP address will access the web. They do not need any
explicit setup to send requests to the external IP address of Google Apis and
services.
○ You can enable Private Google Access on a subnet by subnet basis, not on the whole
network, it is a setting for subnets in a VPC network.
• Always blocked traffic: Google Cloud always blocks the traffic that is described in the
following table. Your firewall rules cannot be used to allow any of this traffic
Always blocked traffic Applies to
Certain GRE traffic (beta) • Traffic in Cloud VPN tunnels
• Traffic on Cloud Interconnect attachments (VLANs)
• Traffic for forwarding rules (load balancing or protocol forwarding)
• Protocols other than TCP, The type of resource further limits the protocol. For example,
UDP, ICMP, AH, ESP, SCTP, Network TCP/UDP Load Balancing supports only TCP and
and GRE to external IP UDP. Also, a forwarding rule for protocol forwarding only
addresses of Google Cloud processes a single protocol. Refer to the protocol forwarding
resources documentation for a list of supported protocols.
Egress traffic to TCP Traffic from:
destination port 25 (SMTP) • instances to external IP addresses on the internet
• instances to external IP addresses of instances
You can switch a VPC network from auto mode to custom mode. This is a one-way
conversion; custom mode VPC networks cannot be changed to auto mode VPC networks.
Shared VPC:
• In any organization you can share VPC among projects. HOST project owns the shared
vpc while the projects are granted access.
• A Shared VPC network is a VPC network defined in a host project and made available
as a centrally shared network for eligible resources in service projects. Shared VPC
networks can be either auto or custom mode, but legacy networks are not supported.
• When a host project is enabled, you have two options for sharing networks:
○ You can share all host project subnets. If you select this option, then any new
subnets created in the host project, including subnets in new networks, will also
be shared.
○ You can specify individual subnets to share. If you share subnets individually,
then only those subnets are shared unless you manually change the list.
• Organization policies and IAM permissions work together to provide different levels of
access control. Organization policies enable you to set controls at the organization,
folder, or project level.
• If you are an organization policy administrator, you can specify the following Shared
VPC constraints in an organization policy:
○ You can limit the set of host projects to which a non-host project or non-host
projects in a folder or organization can be attached. The constraint applies when
a Shared VPC Admin attaches a service project with a host project. The constraint
doesn't affect existing attachments. Existing attachments remain intact even if a
policy denies new ones. For more information, see the
constraints/compute.restrictSharedVpcHostProject constraint.
○ You can specify the Shared VPC subnets that a service project can access at the
project, folder, or organization level. The constraint applies when you create new
resources in the specified subnets and doesn't affect existing resources. Existing
resources continue to operate normally in their subnets even if a policy prevents
new resources from being added. For more information, see the
constraints/compute.restrictSharedVpcSubnetworks constraint.
• When defining each Service Project Admin, a Shared VPC Admin can grant permission
GCP Page 64
• When defining each Service Project Admin, a Shared VPC Admin can grant permission
to use the whole host project or just some subnets:
○ Project-level permissions: A Service Project Admin can be defined to have
permission to use all subnets in the host project if the Shared VPC Admin grants
the role of compute.networkUser for the whole host project to the Service
Project Admin. The result is that the Service Project Admin has permission to use
all subnets in all VPC networks of the host project, including subnets and VPC
networks added to the host project in the future.
○ Subnet-level permissions: Alternatively, a Service Project Admin can be granted a
more restrictive set of permissions to use only some subnets if the Shared VPC
Admin grants the role of compute.networkUser for those selected subnets to the
Service Project Admin. A Service Project Admin who only has subnet-level
permissions is restricted to using only those subnets. After new Shared VPC
networks or new subnets are added to the host project, a Shared VPC Admin
should review the permission bindings for the compute.networkUser role to
ensure that the subnet-level permissions for all Service Project Admins match the
intended configuration.
• Shared VPC Admins have full control over the resources in the host project, including
administration of the Shared VPC network. They can optionally delegate certain
network administrative tasks to other IAM members:
Administrator Purpose
Network Admin Shared VPC Admin defines a Network Admin by granting an IAM member
• IAM member the Network Admin (compute.networkAdmin) role to the host project.
in the host Network Admins have full control over all network resources except for
project, or firewall rules and SSL certificates.
• IAM member
in the
○ organization
Security Admin A Shared VPC Admin can define a Security Admin by granting an IAM
• IAM member member the Security Admin (compute.securityAdmin) role to the
in the host host project. Security Admins manage firewall rules and SSL certificates.
project, or
• IAM member
in the
organization
GCP Page 65
• The implied rules cannot be removed, but they have the lowest possible priorities. You can
create rules that override them as long as your rules have higher priorities (priority numbers
less than 65535). Because deny rules take precedence over allow rules of the same priority, an
ingress allow rule with a priority of 65535 never takes effect.
• Source can be of type ip addr subnet or source tags or service accounts:
• Target can be of either All instances, tags or service accounts (not multiple)
• If Target is type All instance: You can choose IP addr. You can choose either tag or service acc
• If target is type Tag: You can choose IP addr or tag
• If target is type Service acc: You can choose ip or service
• Pre-populated rules:
• The default network is pre-populated with firewall rules that allow incoming
connections to instances. These rules can be deleted or modified as necessary:
• default-allow-internal
○ Allows ingress connections for all protocols and ports among instances in the
network. This rule has the second-to-lowest priority of 65534, and it effectively
permits incoming connections to VM instances from others in the same network.
This rule allows traffic in 10.128.0.0/9 (from 10.128.0.1 to 10.255.255.254), a
range that covers all subnets in the network.
• default-allow-ssh
○ Allows ingress connections on TCP destination port 22 from any source to any
instance in the network. This rule has a priority of 65534.
• default-allow-rdp
○ Allows ingress connections on TCP destination port 3389 from any source to any
instance in the network. This rule has a priority of 65534, and it enables
connections to instances running the Microsoft Remote Desktop Protocol (RDP).
• default-allow-icmp
○ Allows ingress ICMP traffic from any source to any instance in the network. This
rule has a priority of 65534, and it enables tools such as ping.
• Hierarchical firewall policies let you create and enforce a consistent firewall policy across
your organization (across multiple VPCs).
• You can assign hierarchical firewall policies to the organization as a whole or to
individual folders.
• These policies contain rules that can explicitly deny or allow connections, as do Virtual
Private Cloud (VPC) firewall rules. In addition, hierarchical firewall policy rules can
delegate evaluation to lower-level policies or VPC network firewall rules with a
goto_next action.
• Hierarchical firewall policy rules are defined in a firewall policy resource that acts as a
container for firewall rules. The rules defined in a firewall policy are not enforced until
the policy is associated with a node (an organization or a folder).
• VPC firewall rules are evaluated. VPC firewall rules either allow or deny connections.
• Lower-level rules cannot override a rule from a higher place in the resource hierarchy.
This lets organization-wide admins manage critical firewall rules in one place.
• Target networks
○ You can restrict a hierarchical firewall policy rule to VMs in only specified
networks. Specifying the target network in the rule gives you control over which
VPC networks are configured with that rule. Combined with goto_next or allow, it
lets you create exceptions for specific networks when you want to define an
otherwise restrictive policy.
• Target service accounts
○ You can specify a target service account for a rule. Such rules are applied only to
VMs owned by the specified service account. Hierarchical firewall policy rules do
not support targeting by instance tags.
GCP Page 66
Connecting On-prem to GCP:
• Cloud interconnect: Connect external networks to gcp. Private to VPC via cloud vpn or
dedicated/partner interconnect.
• Cloud Interconnect provides low latency, highly available connections that enable you
to reliably transfer data between your on-premises and Google Cloud Virtual Private
Cloud (VPC) networks.
• Cloud Interconnect connections also provide internal IP address communication,
which means internal IP addresses are directly accessible from both networks.
• Cloud interconnect contains the following:
Dedicated Interconnect, Partner Interconnect, Direct Peering, and Carrier Peering can
all help you optimize egress traffic from your VPC network and can help you reduce
your egress costs.
• Cloud VPN by itself does not reduce egress costs.
• Dedicated interconnect: Direct link between vpc and on-prem system. VLAN is private
to vpc in 1 region no public gcp. Link is private but not encrypted.
• Cloud vpn: Ipsec vpn to connect to vpc via public internet. Supports static or dynamic
routing.
• “You must configure routes so that Google API traffic is forwarded through your Cloud
VPN or Cloud Interconnect connection, firewall rules on your on-premises firewall to
allow the outgoing traffic, and DNS so that traffic to Google APIs resolves to the IP
range you’ve added to your routes.”
• “You can use Cloud Router Custom Route Advertisement to announce the Restricted
Google APIs IP addresses through Cloud Router to your on-premises network.
• The Restricted Google APIs IP range is 199.36.153.4/30. While this is technically a
public IP range, Google does not announce it publicly. This IP range is only accessible to
hosts that can reach your Google Cloud projects through internal IP ranges, such as
through a Cloud VPN or Cloud Interconnect connection.”
• Public to gcp via external peering i.e carrier peering (or direct peering) via partner for lower
volume (no SLAs).
• Similar to Interconnect, except it provides connection to entire google suite I.e gcp,
google workspace applications
• Unless you need to access Google Workspace applications as described in the
preceding use case, Partner Interconnect is the recommended way to connect to
Google through a service provider.
• CDN interconnect: Direct low latency connection to certain CDN. But NOT to gcp CDN.
• Work with your supported CDN provider to learn what locations are supported and
how to correctly configure your deployment to use intra-region egress routes.
• Typical use cases for CDN Interconnect
○ High-volume egress traffic. If you're populating your CDN with large data files
from Google Cloud, you can use the CDN Interconnect links between Google
Cloud and selected providers to automatically optimize this traffic and save
money.
○ Frequent content updates. Cloud workloads that frequently update data stored in
CDN locations benefit from using CDN Interconnect because the direct link to the
CDN provider reduces latency for these CDN destinations.
○ For example, if you have frequently updated data served by the CDN originally
hosted on Google Cloud, you might consider using CDN Interconnect.
• Cloud DNS:
• Cloud DNS: Useful for public and private DNS service with 100% uptime.
• Reduces latency. Supports DNSSEC. Fixed fee for service and fee per lookup.
• Cloud DNS offers both public zones and private managed DNS zones. A public
zone is visible to the public internet, while a private zone is visible only from one
or more Virtual Private Cloud (VPC) networks that you specify.
•
GCP Page 67
GCP Page 68
Load Balancers
Wednesday, February 17, 2021 2:09 PM
• Load balancer:
The type of traffic that you need your load balancer to handle is another factor in determining
which load balancer to use:
• For HTTP and HTTPS traffic, use:
• External HTTP(S) Load Balancing
• Internal HTTP(S) Load Balancing
• Can be single or multi-regional
• For TCP traffic, use:
• TCP Proxy Load Balancing
• Network Load Balancing
Internal TCP/UDP Load Balancing
GCP Page 69
• Internal TCP/UDP Load Balancing
• Can be single or multi-regional
• For UDP traffic, use:
• Network Load Balancing
• Internal TCP/UDP Load Balancing
• Is only single regional
• SSL Proxy Load Balancing is intended for non-HTTP(S) traffic. Although SSL Proxy Load Balancing
can handle HTTPS traffic, we don't recommend this. You should instead use HTTP(S) Load
Balancing for HTTPS traffic. HTTP(S) Load Balancing also does the following, which makes it a
better choice in most cases:
• Negotiates HTTP/2 and SPDY/3.1.
• Rejects invalid HTTP requests or responses.
• Forwards requests to different VMs based on URL host and path.
• Integrates with Cloud CDN.
• Spreads the request load more evenly among backend instances, providing better
backend utilization. HTTPS load balances each request separately, whereas SSL Proxy Load
Balancing sends all bytes from the same SSL or TCP connection to the same backend
instance.
• The backend instances must allow connections from the load balancer GFE (google front
end)/health check ranges. This means that you must create a firewall rule that allows traffic
from 130.211.0.0/22 and 35.191.0.0/16 to reach your backend instances or endpoints. These IP
address ranges are used as sources for health check packets and for all load -balanced packets
sent to your backends.
• All load balancers forward traffic to a backend service which can include backend instances
(normal or managed instance groups) or a network endpoint group.
• Health checks ensure that Compute Engine forwards new connections only to instances that are
up and ready to receive them. Compute Engine sends health check requests to each instance at
the specified frequency; once an instance exceeds its allowed number of health check failures,
it is no longer considered an eligible instance for receiving new traffic. Existing connections will
not be actively terminated which allows instances to shut down gracefully and to close TCP
connections.
• The health checker continues to query unhealthy instances, and returns an instance to the
pool when the specified number of successful checks is met. If all instances are marked as
UNHEALTHY, the load balancer directs new traffic to all existing instances.
• Network Load Balancing relies on legacy HTTP Health checks for determining instance
health. Even if your service does not use HTTP, you’ll need to at least run a basic web
server on each instance that the health check system can query.
• Google Cloud External HTTP(S) Load Balancing is a global, proxy-based Layer 7 load balancer
that enables you to run and scale your services worldwide behind a single external IP address.
External HTTP(S) Load Balancing distributes HTTP and HTTPS traffic to backends hosted on
Compute Engine and Google Kubernetes Engine (GKE).
• Support only ports 80 and 8080 for http and port 443 for https
• When you configure an external HTTP(S) load balancer in Premium Tier, it uses a global
external IP address and can intelligently route requests from users to the closest backend
GCP Page 70
external IP address and can intelligently route requests from users to the closest backend
instance group or NEG, based on proximity.
• External HTTP(S) Load Balancing supports the following backend types:
• Instance groups
• Zonal network endpoint groups (NEGs)
• Serverless NEGs: One or more App Engine, Cloud Run, or Cloud Functions services
• Internet NEGs, for endpoints that are outside of Google Cloud (also known as
custom origins)
• Buckets in Cloud Storage
• HTTP(S) Load Balancing supports content-based load balancing using URL maps to select a
backend service based on the requested host name, request path, or both. For example,
you can use a set of instance groups or NEGs to handle your video content and another
set to handle everything else.
• Internal TCP/UDP Load Balancing distributes traffic among VM instances in the same region in a
Virtual Private Cloud (VPC) network by using an internal IP address.
• Google Cloud Internal TCP/UDP Load Balancing is a regional load balancer that enables
you to run and scale your services behind an internal load balancing IP address that is
accessible only to your internal virtual machine (VM) instances.
• You can access an internal TCP/UDP load balancer in your VPC network from a connected
network by using the following:
• VPC Network Peering
• Cloud VPN and Cloud Interconnect
• Unlike a proxy load balancer, an internal TCP/UDP load balancer doesn't terminate
connections from clients and then open new connections to backends. Instead, an
internal TCP/UDP load balancer routes original connections directly from clients to the
healthy backends, without any interruption.
GCP Page 71
• Support for the following well-known ports: 25, 43, 110, 143, 195, 443, 465, 587, 700,
993, 995, 1883, 3389, 5222, 5432, 5671, 5672, 5900, 5901, 6379, 8085, 8099, 9092, 9200,
and 9300. When you use Google-managed SSL certificates with SSL Proxy Load Balancing,
the frontend port for traffic must be 443 to enable the Google-managed SSL certificates to
be provisioned and renewed.
• By default, the original client IP address and port information is not preserved. You can
preserve this information by using the PROXY protocol.
GCP Page 72
defined in the same forwarding rule region where the instances in
project as the must be defined the target pool exist. Health
instances being load in the same checks associated with the
balanced. project as the target pool must be defined in
instances in the the same project as well.
target pool (the
service project).
Internal HTTP(S) An internal IP address An internal A regional target A regional backend service must
Load Balancing must be defined in the forwarding rule HTTP(S) proxy be defined in the same project
same project as the must be defined and associated as the backend instances.
load balancer. in the same regional URL Health checks associated with
project as the map must be backend services must be
backend defined in the defined in the same project as
instances (the same project as well.
service project). the backend
instances.
External HTTP(S) An external IP address The external The target HTTP A global backend service must
Load Balancing must be defined in the forwarding rule proxy or target be defined in the same project
same project as the must be defined HTTPS proxy and as the backend instances. These
instances being load in the same associated URL instances must be in instance
balanced (the service project as the map must be groups attached to the backend
project). backend defined in the service as backends. Health
instances (the same project as checks associated with backend
service project). the backend services must be defined in the
instances. same project as the backend
service as well.
SSL Proxy Load The target SSL
Balancing proxy must be
defined in the
same project as
the backend
instances.
TCP Proxy Load The target TCP
Balancing proxy must be
defined in the
same project as
the backend
instances.
The following table provides some specific information about each load balancer.
GCP Page 73
TCP Proxy TCP No Global* EXTERNAL 25, 43, 110, 143, 195, Proxy
without 443, 465, 587, 700,
SSL 993, 995, 1883, 3389,
offload 5222, 5432, 5671,
5672, 5900, 5901,
6379, 8085, 8099,
9092, 9200, and 9300
External TCP or Yes Regional EXTERNAL Any Pass-
Network UDP through
TCP/UDP
Internal TCP or Yes Regional INTERNAL Any Pass-
TCP/UDP UDP backends, through
regional
frontends
(global access
supported)
*Global in Premium Tier. Regional in Standard Tier.
GCP Page 74
HTTP/1.1, (includes
HTTP/2, or HTTPS QUIC)
One of:
TCP or UDP
SSL or TCP
Backends
Filter this table:
Internal HTTP(S) External HTTP(S) Internal TCP/UDP External TCP/UDP External
Network SSL Proxy
and
TCP
Proxy
Backends can be in
multiple regions (Premium Tier) (Premium
Tier)
Backends must be
in one region (Standard Tier) (Standard
Tier)
Cloud Storage in info
backend buckets
External endpoints info
in internet NEGs as (Premium Tier)
custom origins for
Cloud CDN
Load balancer can
have multiple
backend services
and a URL map
Self-managed
Kubernetes and
GKE
Serverless info
backends:
• Cloud Run
(fully
managed)
• App Engine
• Cloud
Functions
Virtual machine
backends on
Compute Engine
Zonal NEGs Using Using Using GCE_VM_IP Use
GCE_VM_IP_POR GCE_VM_IP_POR type endpoints standalon
T type endpoints T type endpoints with GKE: e zonal
with GKE: with GKE: • Use zonal NEGs
• Use • Use NEGs for
standalone standalone Internal
zonal NEGs zonal NEGs TCP/UDP
• Use Ingress • Use Ingress Load
for Internal for external Balancing
HTTP(S) HTTP(S)
Load Load
Balancing Balancing
GCP Page 75
Balancing Balancing
• Use Ingress
for Internal
HTTP(S)
Load
Balancing
Health checks
For links to reference information, see Health checks.
Filter this table:
Internal HTTP(S) External Internal External External SSL
HTTP(S) TCP/UDP TCP/UDP Proxy and
Network TCP Proxy
Configurable expected 1
response string
Configurable health checks:
• Port
• Check intervals
• Timeouts
• Healthy and unhealthy
thresholds
Configurable request path 1
1 This table documents health checks supported by backend service-based network load balancers
(currently in Preview). Target pool-based network load balancers only support legacy HTTP health
checks.
IP addresses
For links to reference information, see Addresses.
Filter this table:
Internal HTTP(S) External Internal External External
HTTP(S) TCP/UDP TCP/UDP SSL Proxy
Network and
TCP Proxy
Client source IP address X-Forwarded- X- In TCP
preservation For header Forwarded- Proxy
For header header
Internal IP address, accessible
in your Virtual Private Cloud
(VPC) network
Internet accessible
(including by clients that are in
Google Cloud and have
internet access)
IPv6 termination
GCP Page 76
IPv6 termination
GCP Page 77
Feature Internal External Internal External TCP/UDP External SSL
HTTP(S) HTTP(S) TCP/UDP Network Proxy and
TCP Proxy
Automatic failover to
healthy backends within
same region
Automatic failover to
healthy backends in other (Premium (Premium
regions Tier) Tier)
Behavior when all Returns Returns Configurable Traffic distributed Traffic
backends are unhealthy HTTP 503 HTTP 502 behavior among all dropped
backends
Configurable standby
backends (with failover (with failover
backends) backends1)
Connection draining on info
failover and failback (configurable (configurable2)
)
This table documents failover as supported by backend service-based network load balancers (currently
in Preview).
1 Target pool-based network load balancers use backup pools to support failover.
2 Target pool-based network load balancers do not support configuration of connection draining on
failover/failback.
Logging and monitoring
Feature Internal External Internal External External SSL
HTTP(S) HTTP(S) TCP/UDP TCP/UDP Proxy and
Network TCP Proxy
Byte count metrics info info info info info
Packet count metrics info info info
HTTP cookie
Generated cookie
GCP Page 78
Client IP address, protocol
(3-tuple hash of packet’s source IP (TCP only) (TCP only)
address, packet’s destination IP
address, and protocol)
Client IP address, port, protocol
(TCP only) (TCP only)
None (5-tuple hash)
Configurable
maximum
capacity per
backend instance
group or NEG
Percent of
traffic/weight-
based
Prefers region
closest to client (Premium (Premium
on the internet Tier) Tier)
GCP Page 79
For internal HTTP(S) load balancers, see the following links:
• Traffic management overview for internal HTTP(S) load balancers
• Setting up traffic management for internal HTTP(S) load balancers
For external HTTP(S) load balancers, see the following links:
• Traffic management overview for external HTTP(S) load balancers
• Setting up traffic management for external HTTP(S) load balancers
Feature Internal HTTP(S) External HTTP(S) Internal External External SSL
TCP/UDP TCP/UDP Proxy and
Network TCP Proxy
HTTP/Layer 7 Suffix, prefix, and Suffix, prefix, and
request routing match on: match on:
Request/response info
header
transformations
Traffic splitting info
1This table documents autoscaling and autohealing features supported by backend service -based
network load balancers (currently in Preview). Target pool-based network load balancers do not support
connection draining.
Security
GCP Page 80
Security
Feature Internal External Internal External TCP/UDP External SSL
HTTP(S) HTTP(S) TCP/UDP Network Proxy and
TCP Proxy
Managed info info
certificates (SSL proxy only)
CORS info
SSL offload
(SSL proxy only)
SSL policies info info
(TLS version and (SSL proxy only)
cipher suites)
Special features
Feature Internal External HTTP(S) Internal External External SSL
HTTP(S) TCP/UDP TCP/UDP Proxy and
Network TCP Proxy
Cloud CDN info
(Premium Tier)
External endpoints in info
internet NEGs as custom (Premium Tier)
origins for Cloud CDN
Internal DNS names info info
GCP Page 81
Storage services and additional disk options
Monday, January 6, 2020 7:28 PM
• Persistent disk: Flexible block based attached to every GCE boot. Slower than SSD but more durable. Stay on even when instance is
shutdown. Can resize upto 64TB. Can also store instance snapshots for full backup. Mostly zonal resources. You can mount 1 disk to
multiple instances if all are read-only.
• When you configure a zonal or regional persistent disk, you can select one of the following disk types.
• Standard persistent disks (pd-standard) are backed by standard hard disk drives (HDD).
• Balanced persistent disks (pd-balanced) are backed by solid-state drives (SSD). They are an alternative to SSD persistent disks
that balance performance and cost.
• SSD persistent disks (pd-ssd) are backed by solid-state drives (SSD).
• If you create a disk in the Cloud Console, the default disk type is pd-balanced. If you create a disk using the gcloud tool or the
Compute Engine API, the default disk type is pd-standard.
• Zonal persistent disk: Efficient, reliable block storage.
○ You can create zonal disks from existing pd, images or snapshots
○ You can only resize a zonal persistent disk to increase its size. You cannot reduce the size of a zonal persistent disk.
○ The zone, region, and disk type of the clone must be the same as that of the source disk.
○ You cannot create a zonal disk clone from a regional disk. You cannot create a regional disk clone from a zonal disk.
○ You can create at most 1000 total disk clones of a given source disk. Exceeding this limit returns an internalError.
• Regional persistent disk:
○ Regional block storage replicated in two zones.
○ You cannot create regional disks from images
○ Regional persistent disks can't be used as boot disks.
○ When resizing a regional persistent disk, you can only increase its size. You cannot reduce the size of a persistent disk.
○ To convert your existing zonal persistent disks to regional persistent disks, snapshot the current persistent disk and create a
regional persistent disk from the snapshot.
○ The minimum size of a regional standard persistent disk is 200 GB.
GCP Page 82
• Cloud filestore: Multiple users for the same file. Fully manages files but not backups. Comparable to a NAS. Is Zonal resource.
• Google Cloud Filestore provides managed NFS file servers as a fully managed service on GCP. It is meant to provide high-
performance file storage capabilities to applications running on Compute Engine and Kubernetes Engine instances.
• Data storage service: Service where you can physically send you stored data to google storage, where they will upload it directly to GCS.
Is useful for very high amount of data ~400TB.
• Cloud storage transfer: Storage Transfer Service allows you to quickly import online data (ex: AWS data) into Cloud Storage. You can also
set up a repeating schedule for transferring data, as well as transfer data within Cloud Storage, from one bucket to another. You can use a
signed URL to allow people temporary access to read/write to your bucket. It can be made valid for limited time or more.
• Cloud Transfer appliance: Transfer Appliance is a hardware appliance you can use to securely migrate large volumes of data (from
hundreds of terabytes up to 1 petabyte) to Google Cloud Platform without disrupting business operations.
• To get help with general Transfer Appliance billing questions, email Transfer Appliance support.
• Cloud storage (GCS): Infinitely scalable, managed, versioned and durable object storage.
• Google Cloud Storage has some specific features that differentiate it from a proper file system:
• It doesn't actually provide directories/folders, it only implements buckets and objects, i.e there's no concept of folders, nested
directories, etc... See doc here for more details about that.
• It doesn't implement file modification. When you upload an update to an existing object, it actually replaces that object
altogether with the new version (versioning is available though).
• Has integrated site hosting and CDN functionality too. Can be transitioned from region to multi-regional etc.
• The gsutil utility can also automatically use object composition to perform uploads in parallel for large, local files that you want to
upload to Cloud Storage. It splits a large file into component pieces, uploads them in parallel and then recomposes them once
they're in the cloud (and deletes the temporary components it created locally).
○ gsutil -o GSUtil:parallel_composite_upload_threshold=150M cp ./localbigfile gs://your-bucket
Where "localbigfile" is a file larger than 150MB. This divides up your data into chunks ~150MB and uploads them in
parallel, increasing upload performance.
• You can host static (http) webpages on GCS for high-availability but limited budget.
• To host a static site in Cloud Storage, you need to create a Cloud Storage bucket, upload the content, and test your new site.
• You can serve your data directly from storage.googleapis.com, or you can verify that you own your domain and use your domain
name. Either way, you'll get consistent, fast delivery from global edge caches.
• You can create your static web pages however you choose. For example, you could hand-author pages by using HTML and CSS. You
can use a static-site generator, such as Jekyll, Ghost, or Hugo, to create the content. Static-site generators make it easier for you to
create a static website by letting you author in markdown, and providing templates and tools. Site generators generally provide a
local web server that you can use to preview your content.
• After your static site is working, you can update the static pages by using any process you like. That process could be as
straightforward as hand-copying an updated page to the bucket. You might choose to use a more automated approach, such as
storing your content on GitHub and then using a webhook to run a script that updates the bucket. An even more advanced system
might use a continuous-integration /continuous-delivery (CI/CD) tool, such as Jenkins, to update the content in the bucket. Jenkins
has a Cloud Storage plugin that provides a Google Cloud Storage Uploader post-build step to publish build artifacts to Cloud
Storage.; If you have a web application that needs to serve static content or user-uploaded static media, using Cloud Storage can be
a cost-effective and efficient way to host and serve this content, while reducing the amount of dynamic requests to your web
application.
• Signed URLs: Use to give time-limited resource access to anyone in possession of the URL, regardless of whether they have a Google
account.
○ Signed URLs can only be used to access resources in Cloud Storage through XML API endpoints
○ Creating a signed URL to download an object:
▪ Generate a new private key, or use an existing private key for a service account. The key can be in either JSON or PKCS12
format.
▪ gsutil signurl -d 10m Desktop/private-key.json gs://example-bucket/cat.jpeg
○ Creating a signed URL to upload an object
▪ Generate a new private key, or use an existing private key for a service account. The key can be in either JSON or PKCS12
format.
GCP Page 83
format.
▪ Use gcloud auth activate-service-account to authenticate with the service account:
• gcloud auth activate-service-account --key-file KEY_FILE_LOCATION/KEY_FILE_NAME
▪ gsutil signurl -m PUT -d 1h -c CONTENT_TYPE -u gs://BUCKET_NAME/OBJECT_NAME
-c: Specifies the content type for which the signed url is valid for ex:-c text/plain
• Cloud object Metadata: Objects stored in Cloud Storage have metadata associated with them.
○ Metadata identifies properties of the object, as well as specifies how the object should be handled when it's accessed.
Metadata exists as key:value pairs. For example, the storage class of an object is represented by the metadata entry
storageClass:STANDARD.
○ The mutability of metadata varies: some metadata you can edit at any time, some metadata you can only set at the time the
object is created, and some metadata you can only view. For example, you can edit the value of the Cache-Control metadata
at any time, but you can only assign the storageClass metadata when the object is created or rewritten, and you cannot
directly edit the value for the generation metadata, though the generation value changes when the object is replaced.
○ Content-Disposition: The Content-Disposition metadata specifies presentation information about the data being transmitted.
Setting Content-Disposition allows you to control presentation style of the content, for example determining whether an
attachment should be automatically displayed or whether some form of action from the user should be required to open it
○ Content-Type: The most commonly set metadata is Content-Type (also known as media type), which lets browsers render the
object properly. All objects have a value specified in their Content-Type metadata, but this value does not have to match the
underlying type of the object. For example, if the Content-Type is not specified by the uploader and cannot be determined, it
is set to application/octet-stream or application/x-www-form-urlencoded, depending on how you uploaded the object.
○ Object holds: Use metadata flags to place object holds, which prevent objects from being deleted or replaced.
• Best practices:
○ If you are concerned that your application software or users might erroneously delete or replace objects at some point, Cloud
Storage has features that help you protect your data:
○ A retention policy that specifies a retention period can be placed on a bucket. An object in the bucket cannot be deleted or
replaced until it reaches the specified age.
○ An object hold can be placed on individual objects to prevent anyone from deleting or replacing the object until the hold is
removed.
○ Object versioning can be enabled on a bucket in order to retain older versions of objects. When the live version of an object is
deleted or replaced, it becomes noncurrent if versioning is enabled on the bucket. If you accidentally delete a live object
version, you can copy the noncurrent version of it back to the live version.
○ If you want to bulk delete a hundred thousand or more objects, avoid using gsutil, as the process takes a long time to
complete. Instead, use one of the following options:
▪ The Cloud Console can bulk delete up to several million objects and does so in the background. The Cloud Console can
also be used to bulk delete only those objects that share a common prefix, which appear as part of a folder when using
the Cloud Console.
○ Object Lifecycle Management can bulk delete any number of objects. To bulk delete objects in your bucket, set a lifecycle
configuration rule on your bucket where the condition has Age set to 0 days, and the action is set to Delete. Be aware that
during the deletion process, object listing for the affected bucket may be impacted.
GCP Page 84
The following table summarizes the primary storage classes offered by Cloud Storage. See class descriptions for a complete
discussion.
Storage Class Name for APIs and gsutil Minimum storage duration Typical monthly availability1
GCP Page 85
Storage Class Name for APIs and gsutil Minimum storage duration Typical monthly availability1
Standard Storage STANDARD None • >99.99% in multi-regions and dual-regions
• 99.99% in regions
Nearline Storage NEARLINE 30 days • 99.95% in multi-regions and dual-regions
• 99.9% in regions
Coldline Storage COLDLINE 90 days • 99.95% in multi-regions and dual-regions
• 99.9% in regions
Archive Storage ARCHIVE 365 days • 99.95% in multi-regions and dual-regions
• 99.9% in regions
GCP Page 86
Cloud storage and gsutil
Saturday, February 6, 2021 12:09 AM
• The gsutil config command applies to users who have installed gsutil as a standalone tool.
○ If you installed gsutil via the Cloud SDK, gsutil config fails unless you are specifically using the -a flag or
have configured gcloud to not pass its managed credentials to gsutil (via the command gcloud config set
pass_credentials_to_gsutil false). For all other use cases, Cloud SDK users should use the gcloud auth
group of commands instead, which configures OAuth2 credentials that gcloud implicitly passes to gsutil
at runtime. To check if you are using gsutil from the Cloud SDK or as a stand-alone, use gsutil version -l
and in the output look for "using cloud sdk".
○ The gsutil config command obtains access credentials for Cloud Storage and writes a boto/gsutil
configuration file containing the obtained credentials along with a number of other configuration -
controllable values.
○ Unless specified otherwise (see OPTIONS), the configuration file is written to ~/.boto (i.e., the file .boto
under the user's home directory). If the default file already exists, an attempt is made to rename the
existing file to ~/.boto.bak; if that attempt fails the command exits. A different destination file can be
specified with the -o option (see OPTIONS).
○ Because the boto configuration file contains your credentials you should keep its file permissions set so
no one but you has read access. (The file is created read-only when you run gsutil config.)
• gsutil mb [-b (on|off)] [-c <class>] [-l <location>] [-p <proj_id>] [--retention
<time>] gs://<bucket_name>...
○ The mb command creates a new bucket. Cloud Storage has a single namespace, so you are not allowed to create
a bucket with a name already in use by another user.
○ The -c and -l options specify the storage class and location, respectively, for the bucket. Once a bucket is created
in a given location and with a given storage class, it cannot be moved to a different location, and the storage
class cannot be changed. Instead, you would need to create a new bucket and move the data over and then
delete the original bucket.
▪ Use gsutil rewrite -s nearline gs://bucket/foo or set new default storage call
in new bucket to change storage class of objects.
○ The --retention option specifies the retention period for the bucket.
You can specify retention period in one of the following formats:
▪ --retention <number>s
▪ Specifies retention period of <number> seconds for objects in this bucket.
▪ --retention <number>d
▪ Specifies retention period of <number> days for objects in this bucket.
▪ --retention <number>m
▪ Specifies retention period of <number> months for objects in this bucket.
▪ --retention <number>y
▪ Specifies retention period of <number> years for objects in this bucket.
• rb - Remove buckets
○ gsutil rb [-f] gs://<bucket_name>...
○ The rb command deletes a bucket. Buckets must be empty before you can delete them.
○ Be certain you want to delete a bucket before you do so, as once it is deleted the name becomes available and
another user may create a bucket with that name.
GCP Page 87
• Access to buckets: Cloud Storage offers two systems for granting users permission to access your buckets and objects:
IAM and Access Control Lists (ACLs). These systems act in parallel - in order for a user to access a Cloud Storage
resource, only one of the systems needs to grant the user permission. IAM is used throughout Google Cloud and
allows you to grant a variety of permissions at the bucket and project levels. ACLs are used only by Cloud Storage and
have limited permission options, but they allow you to grant permissions on a per-object basis
In order to support a uniform permissioning system, Cloud Storage has uniform bucket-level access. Using this feature
disables ACLs for all Cloud Storage resources: access to Cloud Storage resources then is granted exclusively through
IAM. After you enable uniform bucket-level access, you can reverse your decision only within 90 days.
○ iam - Get, set, or change bucket and/or object IAM permissions.
▪ Cloud Identity and Access Management (Cloud IAM) allows you to control who has access to the resources
in your Google Cloud project.
▪ The iam command has three sub-commands:
▪ Get: The iam get command gets the Cloud IAM policy for a bucket or object, which you can save and edit
for use with the iam set command. The output is in json similar to the iam-policy-binding output
▪ The following examples save the bucket or object's Cloud IAM policy to a text file:
• gsutil iam get gs://example > bucket_iam.txt
▪ Set: The iam set command sets a Cloud IAM policy on one or more buckets or objects, replacing the
existing policy on those buckets or objects.
• gsutil -m iam set -r iam.txt gs://dogs
▪ The set sub-command has the following options:
• -R, -r: Performs iam set recursively on all objects under the specified bucket. This flag can only be set
if the policy exclusively uses roles/storage.legacyObjectReader or roles/storage.legacyObjectOwner.
This flag cannot be used if the bucket is configured for uniform bucket -level access.
• -a: Performs iam set on all object versions.
• -e <etag>: Performs the precondition check on each object with the specified etag before setting the
policy. You can retrieve the policy's etag using iam get.
• -f The default gsutil error-handling mode is fail-fast. This flag changes the request to fail-silent
mode. This option is implicitly set when you use the gsutil -m option.
▪ Ch: The iam ch command incrementally updates Cloud IAM policies. You can specify multiple access grants
or removals in a single command. The access changes are applied as a batch to each url in the order in
which they appear in the command line arguments. Each access change specifies a member and a role that
is either granted or revoked.
▪ You can use gsutil -m to handle object-level operations in parallel.
▪ The ch sub-command has the following options:
• -d Removes roles granted to the specified member.
• -R, -r: Performs iam ch recursively to all objects under the specified bucket.
• This flag can only be set if the policy exclusively uses roles/storage.legacyObjectReader or
roles/storage.legacyObjectOwner. This flag cannot be used if the bucket is configured for uniform
bucket-level access.
• -f : The default gsutil error-handling mode is fail-fast. This flag changes the request to fail-silent
mode. This is implicitly set when you invoke the gsutil -m option.
○ acl - Get, set, or change bucket and/or object ACLs
▪ The acl command has three sub-commands:
• gsutil acl set [-f] [-r] [-a] <file-or-canned_acl_name> url...
○ R: READ
W: WRITE
GCP Page 88
W: WRITE
O: OWNER
• gsutil acl get url
• Use gsutil acl ch to change access sequentially of some members:
• gsutil acl ch [-f] [-r] <grant>... url...
where each <grant> is one of the following forms:
○ -u <id>|<email>:<permission> ex –u AllUsers:R I.e grant allusers read only permission
○ -g <id>|<email>|<domain>|All|AllAuth:<perm> ex: -g [email protected]:O gs://example-
bucket/**.jpg I.e allow admin group owner access to all jpeg files
○ -p (viewers|editors|owners)-<project number>:<perm> ex: -p owners-project12445:W
gs://example-bucket i.e grant all project owners write access
○ -d <id>|<email>|<domain>|All|AllAuth|(viewers|editors|owners)-<project number> ex: -d
viewers-12345 gs://example-bucket i.e delete all viewers for project number 12345
▪ Note that you can set an ACL on multiple buckets or objects at once. For example, to set ACLs on all .jpg
files found in a bucket:
• gsutil acl set acl.txt gs://bucket/**.jpg
▪ If you have a large number of ACLs to update you might want to use the gsutil -m option, to perform a
parallel (multi-threaded/multi-processing) update:
• gsutil -m acl set acl.txt gs://bucket/**.jpg
• One strategy for uploading large files is called parallel composite uploads. In such an upload, a single
file is divided into up to 32 chunks, the chunks are uploaded in parallel to temporary objects, the final
object is recreated using the temporary objects, and the temporary objects are deleted.
• Parallel composite uploads can be significantly faster if network and disk speed are not limiting
factors; however, the final object stored in your bucket is a composite object, which only has a crc32c
hash and not an MD5 hash. As a result, you must use crcmod to perform integrity checks when
downloading the object with gsutil or other Python applications.
▪ Note that multi-threading/multi-processing is only done when the named URLs refer to objects, which
happens either if you name specific objects or if you enumerate objects by using an object wildcard or
specifying the acl -r flag.
○ bucketpolicyonly - Configure uniform bucket-level access
▪ When you enable uniform bucket-level access on a bucket, Access Control Lists (ACLs) are disabled, and
only bucket-level Identity and Access Management (IAM) permissions grant access to that bucket and the
objects it contains. You revoke all access granted by object ACLs and the ability to administrate permissions
using bucket ACLs.
• You might not want to use uniform bucket-level access and instead retain fine-grained ACLs if you
want to control access to specific objects in a bucket via legacy ACLs.
▪ gsutil bucketpolicyonly set (on|off) gs://<bucket_name>...
▪ gsutil bucketpolicyonly get gs://<bucket_name>...
▪ The bucketpolicyonly command is used to retrieve or configure the uniform bucket-level access setting of
Cloud Storage buckets. This command has two sub-commands, get and set.
▪ The bucketpolicyonly get command shows whether uniform bucket-level access is enabled for the specified
Cloud Storage bucket.
▪ The bucketpolicyonly set command enables or disables the uniform bucket-level access feature on Cloud
Storage buckets.
▪ The Bucket Policy Only feature is now known as uniform bucket-level access. The bucketpolicyonly
command is still supported, but we recommend using the equivalent ubla command.
GCP Page 89
command is still supported, but we recommend using the equivalent ubla command.
○ defacl - Get, set, or change default ACL on buckets
▪ gsutil defacl set <file-or-canned_acl_name> gs://<bucket_name>...
Allows 3 categories:
▪ Set: The "defacl set" command sets default object ACLs for the specified buckets. If you specify a default
object ACL for a certain bucket, Cloud Storage applies the default object ACL to all new objects uploaded to
that bucket, unless an ACL for that object is separately specified during upload.
▪ Similar to the "acl set" command, the file-or-canned_acl_name names either a canned ACL or the path to a
file that contains ACL text.
▪ Setting a default object ACL on a bucket provides a convenient way to ensure newly uploaded objects have
a specific ACL. If you don't set the bucket's default object ACL, it will default to project-private. If you then
upload objects that need a different ACL, you will need to perform a separate ACL update operation for
each object. Depending on how many objects require updates, this could be very time-consuming.
▪ Get:Gets the default ACL text for a bucket, which you can save and edit for use with the "defacl set"
command.
▪ Ch:
• The "defacl ch" (or "defacl change") command updates the default object access control list for a
bucket. The syntax is shared with the "acl ch" command, so see the "CH" section of gsutil help acl for
the full help description.
• Grant anyone on the internet READ access by default to any object created in the bucket example -
bucket:
• gsutil defacl ch -u AllUsers:R gs://example-bucket
• Ch Options:
• The "ch" sub-command has the following options
○ -d Remove all roles associated with the matching entity.
○ -f Normally gsutil stops at the first error. The -f option causes it to continue when it
encounters errors. With this option the gsutil exit status will be 0 even if some ACLs couldn't be
changed.
○ -g Add or modify a group entity's role.
○ -p Add or modify a project viewers/editors/owners role.
○ -u Add or modify a user entity's role.
• compose - Concatenate a sequence of objects into a new composite object.
○ gsutil compose gs://bucket/obj1 [gs://bucket/obj2 ...] gs://bucket/composite
○ The compose command creates a new object whose content is the concatenation of a given sequence of source
objects under the same bucket. gsutil uses the content type of the first source object to determine the
destination object's content type.
• cp command - Allows you to copy data between your local file system and the cloud, within the cloud, and between
cloud storage providers.
○ gsutil cp [OPTION]... src_url dst_url
○ For example, to upload all text files from the local directory to a bucket, you can run:
▪ gsutil cp *.txt gs://my-bucket
○ You can also download text files from a bucket:
▪ gsutil cp gs://my-bucket/*.txt .
○ Use the -r option to copy an entire directory tree. For example, to upload the directory tree dir:
▪ gsutil cp -r dir gs://my-bucket
GCP Page 90
▪ gsutil cp -r dir gs://my-bucket
○ If you have a large number of small files to transfer, you can perform a parallel multi-threaded/multi-processing
copy using the top-level gsutil -m option:
▪ gsutil -m cp -r dir gs://my-bucket
○ You can use the -I option with stdin to specify a list of URLs to copy, one per line. This allows you to use gsutil in
a pipeline to upload or download objects as generated by a program:
▪ cat filelist | gsutil -m cp -I gs://my-bucket
or:
▪ cat filelist | gsutil -m cp -I ./download_dir
○ where the output of cat filelist is a list of files, cloud URLs, and wildcards of files and cloud URLs.
• defstorageclass - Get or set the default storage class on buckets
○ gsutil defstorageclass set <storage-class> gs://<bucket_name>...
○ gsutil defstorageclass get gs://<bucket_name>...
○ The defstorageclass command has two sub-commands:
○ Set: The "defstorageclass set" command sets the default storage class for the specified bucket(s). If you specify a
default storage class for a certain bucket, Cloud Storage applies the default storage class to all new objects
uploaded to that bucket, except when the storage class is overridden by individual upload requests.
▪ Setting a default storage class on a bucket provides a convenient way to ensure newly uploaded objects
have a specific storage class. If you don't set the bucket's default storage class, it will default to Standard.
○ Get: Gets the default storage class for a bucket.
• du - Display object size usage
○ gsutil du url...
○ The du command displays the amount of space in bytes used up by the objects in a bucket, subdirectory, or
project.
• label - Get, set, or change the label configuration of a bucket.
○ gsutil label set <label-json-file> gs://<bucket_name>...
○ gsutil label get gs://<bucket_name>
○ gsutil label ch <label_modifier>... gs://<bucket_name>...
▪ where each <label_modifier> is one of the following forms:
▪ -l <key>:<value> : To add/change a label
▪ -d <key> : To delete a label with key <key>
• lifecycle - Get or set lifecycle configuration for a bucket
○ gsutil lifecycle get gs://<bucket_name>
○ gsutil lifecycle set <config-json-file> gs://<bucket_name>..
▪ To delete object lifecycle rules, either use console and delete them, or create an empty lifecycle config and
set it:
• {
"lifecycle": {
"rule": []
}
}
○ The following lifecycle configuration defines three rules. Note that the second and third rules are applicable only
when using Object Versioning on the bucket:
▪ Delete live versions (isLive:True) of objects older than 30 days.
GCP Page 91
▪ Delete live versions (isLive:True) of objects older than 30 days.
• If the bucket uses Object Versioning, such objects become noncurrent and are subject to the other
two rules.
• If the bucket does not use Object Versioning, such objects are permanently deleted and cannot be
recovered.
▪ Delete noncurrent versions of objects if there are 2 newer versions (numNewerVersions) of the object in
the bucket. Objects subject to this rule are permanently deleted and cannot be recovered.
▪ Delete noncurrent versions (isLive:False) of objects older than 35 days. Objects subject to this rule are
permanently deleted and cannot be recovered.
{
"lifecycle":
{
"rule": [
"condition": {
"age": 30,
"isLive": true
}
},
{
"action": {"type": "Delete"},
"condition": {
"numNewerVersions": 2
}
},
{
"action": {"type": "Delete"},
"condition": {
"age": 35,
"isLive": false
}
}
}
• logging - Configure or retrieve logging on buckets
○ Cloud Storage offers usage logs and storage data in the form of CSV files that you can download and view.
○ The logs and storage data files are automatically created as new objects in a bucket that you specify, in 24 hour
intervals.
○ Usage logs provide information for all of the requests made on a specified bucket in the last 24 hours, while the
storage logs provide information about the storage consumption of that bucket for the last 24 hour period. The
logs and storage data files are automatically created as new objects in a bucket that you specify, in 24 hour
intervals.
○ The logging command has two sub-commands:
GCP Page 92
○ The logging command has two sub-commands:
○ Set: The set sub-command has two sub-commands:
▪ The "gsutil logging set on" command will enable usage logging of the buckets named by the specified
URLs, outputting log files in the specified logging_bucket.
▪ logging_bucket must already exist, and all URLs must name buckets (e.g., gs://bucket). The required bucket
parameter specifies the bucket to which the logs are written, and the optional log_object_prefix parameter
specifies the prefix for log object names. The default prefix is the bucket name. For example, the
command:
▪ gsutil logging set on -b gs://my_logging_bucket -o UsageLog \ gs://my_bucket1 gs://my_bucket2
• will cause all read and write activity to objects in gs://mybucket1 and gs://mybucket2 to be logged to
objects prefixed with the name "UsageLog", with those log objects written to the bucket
gs://my_logging_bucket.
○ In addition to enabling logging on your bucket(s), you will also need to grant cloud-storage-
[email protected] write access to the log bucket, using this command:
▪ gsutil acl ch -g [email protected]:W gs://my_logging_bucket
▪ Note that log data may contain sensitive information, so you should make sure to set an appropriate
default bucket ACL to protect that data.
○ Off: This command will disable usage logging of the buckets named by the specified URLs. All URLs must name
buckets (e.g., gs://bucket).
▪ No logging data is removed from the log buckets when you disable logging, but Cloud Storage will stop
delivering new logs once you have run this command.
○ Get: If logging is enabled for the specified bucket url, the server responds with a JSON document that looks
something like this:
{
"logBucket": "my_logging_bucket",
"logObjectPrefix": "UsageLog"
}
▪ You can download log data from your log bucket using the gsutil cp command.
• rsync - Synchronize content of two buckets/directories
○ gsutil rsync [OPTION]... src_url dst_url
○ The gsutil rsync command makes the contents under dst_url the same as the contents under src_url, by copying
any missing files/objects (or those whose data has changed), and (if the -d option is specified) deleting any extra
files/objects. src_url must specify a directory, bucket, or bucket subdirectory
• setmeta - Set metadata on already uploaded objects
○ gsutil setmeta -h [header:value|header] ... url...
○ The gsutil setmeta command allows you to set or remove the metadata on one or more objects. It takes one or
more header arguments followed by one or more URLs, where each header argument is in one of two forms:
○ If you specify header:value, it sets the provided value for the given header on all applicable objects.
○ If you specify header (with no value), it removes the given header from all applicable objects.
○ For example, the following command sets the Content-Type and Cache-Control headers while also removing the
Content-Disposition header on the specified objects:
▪ gsutil setmeta -h "Content-Type:text/html" \ -h "Cache-Control:public, max-age=3600" \ -h "Content-
Disposition" gs://bucket/*.html
○ You can also use the setmeta command to set custom metadata on an object:
GCP Page 93
▪ gsutil setmeta -h "x-goog-meta-icecreamflavor:vanilla" gs://bucket/object
▪ Custom metadata is always prefixed in gsutil with x-goog-meta-. This distinguishes it from standard request
headers. Other tools that send and receive object metadata by using the request body do not use this
prefix.
○ Stat : Can be used to display current metadata
• versioning - Enable or suspend versioning for one or more buckets
▪ gsutil versioning set (on|off) gs://<bucket_name>...
▪ gsutil versioning get gs://<bucket_name>...
○ The Versioning Configuration feature enables you to configure a Cloud Storage bucket to keep old versions of
objects.
○ Caution: Object Versioning does not protect your data if you delete the entire bucket.
○ The gsutil versioning command has two sub-commands:
○ Set: The "set" sub-command requires an additional sub-command, either "on" or "off", which, respectively, will
enable or disable versioning for the specified bucket(s).
○ Get: The "get" sub-command gets the versioning configuration for a bucket and displays whether or not it is
enabled.
• signurl - Create a signed url
○ gsutil signurl
▪ [-c <content_type for which valid>]
▪ [-d <duration>] : Max is 7 days
▪ [-m <http_method>] : Specifies the HTTP method to be authorized for use with the signed url, default is
GET (download). PUT (upload) can also be use or RESUMABLE to allow resumable upload
▪ [-p <password>] : Specify the private key password instead of prompting.
▪ [-r <region>]
▪ [-b <project>] : Allows you to specify a user project that will be billed for requests that use the signed URL.
▪ (-u | <private-key-file>) : Use service account credentials instead of a private key file to sign the url.
▪ (gs://<bucket_name> | gs://<bucket_name>/<object_name>)...
○ The signurl command will generate a signed URL that embeds authentication data so the URL can be used by
someone who does not have a Google account.
○ Multiple gs:// urls may be provided and may contain wildcards. A signed url will be produced for each provided
url, authorized for the specified HTTP method and valid for the given duration.
GCP Page 94
GCP Page 95
Cloud storage: Object lifecycle management
Saturday, February 6, 2021 11:16 PM
• To support common use cases like setting a Time to Live (TTL) for objects, retaining
noncurrent versions of objects, or "downgrading" storage classes of objects to help
manage costs, Cloud Storage offers the Object Lifecycle Management feature.
• You can assign a lifecycle management configuration to a bucket. The configuration
contains a set of rules which apply to current and future objects in the bucket. When an
object meets the criteria of one of the rules, Cloud Storage automatically performs a
specified action on the object. Here are some example use cases:
○ Downgrade the storage class of objects older than 365 days to Coldline Storage.
○ Delete objects created before January 1, 2013.
○ Keep only the 3 most recent versions of each object in a bucket with versioning
enabled.
• Lifecycle configuration
○ Each lifecycle management configuration contains a set of rules. When defining a
rule, you can specify any set of conditions for any action. If you specify multiple
conditions in a rule, an object has to match all of the conditions for the action to be
taken. If you specify multiple rules that contain the same action, the action is taken
when an object matches the condition(s) in any of the rules. Each rule should
contain only one action.
○ If multiple rules have their conditions satisfied simultaneously for a single object,
Cloud Storage performs the action associated with only one of the rules, based on
the following considerations:
▪ The Delete action takes precedence over any SetStorageClass action.
▪ The SetStorageClass action that switches the object to the storage class with
GCP Page 96
▪ The SetStorageClass action that switches the object to the storage class with
the lowest at-rest storage pricing takes precedence.
▪ Once an action occurs, the object is re-evaluated before any additional actions
are taken. So, for example, if you have one rule that deletes an object and
another rule that changes the object's storage class, but both rules use the
exact same condition, the delete action always occurs when the condition is
met. If you have one rule that changes the object's class to Nearline Storage
and another rule that changes the object's class to Coldline Storage, but both
rules use the exact same condition, the object's class always changes to
Coldline Storage when the condition is met.
○ Delete: The Delete action deletes an object when the object meets all conditions
specified in the lifecycle rule.
▪ Exception: In buckets with Object Versioning enabled, deleting the live version
of an object causes it to become a noncurrent version, while deleting a
noncurrent version deletes that version permanently.
○ SetStorageClass: The SetStorageClass action changes the storage class of an object
when the object meets all conditions specified in the lifecycle rule.
○ SetStorageClass supports the following storage class transitions:
○
Original storage class New storage class
Durable Reduced Availability (DRA) Storage Nearline Storage
Coldline Storage
Archive Storage
Multi-Regional Storage/Regional
Storage1
○
Standard Storage, Multi-Regional Storage, or Nearline Storage
Regional Storage Coldline Storage
Archive Storage
Nearline Storage Coldline Storage
Archive Storage
Coldline Storage Archive Storage
○ For buckets in a region, the new storage class cannot be Multi-Regional Storage. For
buckets in a multi-region or dual-region, the new storage class cannot be Regional
Storage.
• Lifecycle conditions: A lifecycle rule includes conditions which an object must meet before
the action defined in the rule occurs on the object. Lifecycle rules support the following
conditions:
○ Age
○ CreatedBefore
○ CustomTimeBefore
○ DaysSinceCustomTime
○ DaysSinceNoncurrentTime
GCP Page 97
○ DaysSinceNoncurrentTime
○ IsLive
○ MatchesStorageClass
○ NoncurrentTimeBefore
○ NumberOfNewerVersions
• Options for tracking Lifecycle actions: To track the lifecycle management actions that
Cloud Storage takes, use one of the following options:
○ Use Cloud Storage usage logs. This feature logs both the action and who performed
the action. A value of GCS Lifecycle Management in the cs_user_agent field of the
log entry indicates the action was taken by Cloud Storage in accordance with a
lifecycle configuration.
○ Enable Pub/Sub Notifications for Cloud Storage for your bucket. This feature sends
notifications to a Pub/Sub topic of your choice when specified actions occur
GCP Page 98
Databases
Monday, January 6, 2020 7:40 PM
GCP Page 99
ACID DBs Properties:
• Atomic transactions: Executes many queries as a single entity, such that if 1 fails, entire
query should fail.
• Consistency: Data that doesn't follow the DB's rules, should not be added
• Isolation: Multiple transactions shouldn't affect each other
• Durability and scalability: High performance and redundancy
Cloud SQL: Managed MySQL, PostGRESQL and SQLServer databases. Auto replication, failover
and backup but manual scaling. Can do automatic db backups when traffic is low or can be
manually triggered.
• Failover replicas: Are used to provide high-availability for your db. Automated backups
and point-in-time recovery must be enabled for high availability (point-in-time recovery
uses binary logging). This can include multiple read replicas.
○ The high-availability feature is not a scaling solution for read-only scenarios; you
cannot use a standby replica to serve read traffic.
• Read replicas: You use a read replica to offload work from a Cloud SQL instance. The read
replica is an exact copy of the primary instance. Data and other changes on the primary
instance are updated in almost real time on the read replica. Read replicas are read-only;
you cannot write to them.
• You must enable point-in-time recovery to enable binary logging on the primary instance
to support read replicas. Binary logging is supported on read replica instances (MySQL 5.7
and 8.0 only). You enable binary logging on a replica with the same API commands as on
the primary, using the replica's instance name instead of the primary's instance name.
Cloud spanner: 1st horizontally scalable, consistent, RDBMS. Has scaling from 1-1000s of nodes.
Provides consistency over cloud sql. Not based on failover, but rather any system can reply to
any query. It is a transactional and relational db. Basically its like cloud SQL, but more modern
wherein you don’t have to worry about scaling a sql db, as it spins up more nodes based on
your requirements.
External tables:
• An external data source (also known as a federated data source) is a data source that you
can query directly even though the data is not stored in BigQuery. Instead of loading or
streaming the data, you create a table that references the external data source.
• BigQuery offers support for querying data directly from:
○ Cloud Bigtable
○ Cloud Storage
○ Google Drive
○ Cloud SQL
• Supported formats are:
○ Avro, CSV, JSON (newline delimited only), ORC, Parquet
• Use cases for external data sources include:
○ Loading and cleaning your data in one pass by querying the data from an external
data source (a location external to BigQuery) and writing the cleaned result into
BigQuery storage.
○ Having a small amount of frequently changing data that you join with other tables.
As an external data source, the frequently changing data does not need to be
reloaded every time it is updated.
• Query performance for external data sources may not be as high as querying data in a
native BigQuery table. If query speed is a priority, load the data into BigQuery instead of
setting up an external data source.
Cloud Bigtable: Low latency, high throughput NoSQL db for large operational and analytical
apps. Scales auto but processing nodes must be scaled manually. Cloud Bigtable is the most
performant storage option to work with IoT and time series data. Google Cloud Bigtable is a
fast, fully managed, highly-scalable NoSQL database service. It is designed for the collection and
retention of data from 1TB to hundreds of PB.
• BigTable does not autoscale. BigTable does not store its data in GCS.
Bigtable is not an ideal storage option for state management. It allow use to lookup the data
Cloud datastore (now called cloud firestore): Similar to cloud bigtable (is NoSql), except it
provides a SQL like syntax to work with data. Must be indexed. Has built-in indexes for simple
fileting and sorting. Pay for IO operations (read, write and deletes) and not data stored. Cloud
Datastore is not the most performant product for frequent writes or timestamp-based queries.
Datastore has much better functionality around transactions and queries (since secondary
indexes exist). Is ACID compliant
• Datastore can autoscale. Hence is preferred db for app engine
• It also provides a SQL-like query language.
Datastore emulator:
• The Datastore emulator provides local emulation of the production Datastore
environment. You can use the emulator to develop and test your application locally. In
addition, the emulator can help you generate indexes for your production Firestore in
Datastore mode instance and delete unneeded indexes.
• The Datastore emulator is a component of the Google Cloud SDK's gcloud tool. Use the
gcloud components install command to install the Datastore emulator:
○ gcloud components install cloud-datastore-emulator
• Start the emulator by executing datastore start from a command prompt:
○ gcloud beta emulators datastore start [flags]
○ where [flags] are optional command-line arguments supplied to the gcloud tool. For
example:
--data-dir=[DATA_DIR] changes the emulator's data directory. The emulator creates
the /WEB-INF/appengine-generated/local_db.bin file inside [DATA_DIR] or, if
available, uses an existing file.
--no-store-on-disk configures the emulator not to persist any data to disk for the
emulator session.
• After you start the emulator, you need to set environment variables so that your
application connects to the emulator instead of your production Datastore mode
database. Set these environment variables on the same machine that you use to run your
application.
○ You need to set the environment variables each time you start the emulator. The
environment variables depend on dynamically assigned port numbers that could
change when you restart the emulator.
○ $(gcloud beta emulators datastore env-init)
• By running your application using the emulator, you can generate indexes for your
production Datastore mode database, as well as delete unneeded indexes.
• If your application and the emulator run on the same machine, you can remove the
environment variables automatically:
○ Run env-unset using command substitution:
○ $(gcloud beta emulators datastore env-unset)
○ Your application will now connect to your production Datastore mode database.
Firebase db: NoSQL doc store with real-time client updates via managed websockets. Revolves
around a single JSON doc located in central US. Mostly created for app development. Provides a
login interface for new and existing clients to login to the db. Data is synced across each client
and remains available even if the app is offline. Using the Blaze plan you can scale across
multiple locations.
Cloud memorystore: Fully managed in-memory data store to build app caches and sub-
millisecond data access. You can use redis or memcache. Access is limited mostly to apps in the
same location/subnet.
Both redis and memcache are powerful solutions but there are few aspects that redis does that
memcache doesn't and vice-versa:
• Redis Only: Snapshots , replication, transactions, pub/sub, advanced data structures
• Memcache Only: Multi-threaded architecture and is Easier to use
Differences:
App Engine
• App Engine standard does not support C++ application and the testing application needs to be dockerized to be used with flexible
engine. A major advantage of using the App Engine flexible environment is the ability to customize the runtime.
• Note: Gradual traffic migration traffic between versions running in the flexible environment is not supported. You must migrate
traffic immediately to versions that are running in the flexible environment.
• Google App Engine - Standard is like a read-only folder in which you upload your code. Read-only means there are a fixed set of libraries
installed for you and you cannot deploy third-party libraries at all). DNS / Sub-domains etc are so much easier to map.
• Google App Engine - Flexible is like a real file-system where you have more control as compared to the Standard App engine, you have
write permissions, but less as compared to GCP Compute Engine. In Flexible App Engine, you can use whatever library your app depends
on.
• Google App Engine Standard cannot directly use Cloud VPN but Flexible can.
App Engine is regional, which means the infrastructure that runs your apps is located in a specific region and is managed by Google to
be redundantly available across all the zones within that region.
Meeting your latency, availability, or durability requirements are primary factors for selecting the region where your apps are run.
You can generally select the region nearest to your app's users but you should consider the location of the other Google Cloud
products and services that are used by your app. Using services across multiple locations can affect your app's latency as well as
pricing.
App Engine is available in the following regions:
• northamerica-northeast1 (Montréal)
us-central (Iowa)
• When you deploy to App Engine, the dependencies specified in the requirements.txt file will be installed automatically with
your deployed app.
• By default, App Engine caches fetched dependencies to reduce build times, to install uncached use 'gcloud beta app deploy --no-
cache'
• To import private dependencies, you need to add a separate module alongside your app
• You can use pip install -t lib priv_module to copy it to the lib dir
• Add an empty __init__.py file to the lib dir
• Keep the lib directory alongside your app
Splitting Traffic
You can use traffic splitting to specify a percentage distribution of traffic across two or more of the versions within a service. Splitting
traffic allows you to conduct A/B testing between your versions and provides control over the pace when rolling out features.
Traffic splitting is applied to URLs that do not explicitly target a version. For example, the following URLs split traffic because they
target all the available versions within the specified service:
IP address splitting
If you choose to split traffic to your application by IP address, when the application receives a request, it hashes the IP address to a
value between 0–999, and uses that number to route the request.
1) IP addresses are reasonably sticky, but are not permanent. Users connecting from cell phones might have a shifting IP address
throughout a single session. Similarly, a user on a laptop might be moving from home to a cafe to work, and will also shifting
through IP addresses. As a result, the user might have an inconsistent experience with your app as their IP address changes.
2) Because IP addresses are independently assigned to versions, the resulting traffic split will differ somewhat from what you
specify. Although, as your application receives more traffic, the closer the actual split gets to your target. For example, if you ask
for 5% of traffic to be delivered to an alternate version, the initial percent of traffic to the version might actually be between
Cookie splitting
If you choose to split traffic to your application by cookies, the application looks in the HTTP request header for a cookie named
GOOGAPPUID, which contains a value between 0–999:
Using cookies to split traffic makes it easier to accurately assign users to versions. The precision for traffic routing can reach as close
as 0.1% to the target split. Although, cookie splitting has the following limitations:
1) If you are writing a mobile app or running a desktop client, it needs to manage the GOOGAPPUID cookies. For example, when a
Set-Cookie response header is used, you must store the cookie and include it with each subsequent request. Browser-based
apps already manage cookies in this way automatically.
2) Splitting internal requests requires extra work. All user requests that are sent from within Google's cloud infrastructure, require
that you forward the user's cookie with each request. For example, you must forward the user's cookie in requests sent from
your app to another app, or to itself. Note that it is not recommended to send internal requests if those requests don't originate
from a user.
• Cloud Dataproc is a fast, easy-to-use, fully managed cloud service for running Apache Spark, Apache Hadoop, HIVE, Pig clusters
in a simpler, more cost-efficient way. While Dataproc is very efficient at processing ETL and Big Data pipelines, it is not as
suitable for running a ruby application that runs tests each day.
• Dataflow/Beam provides a clear separation between processing logic and the underlying execution engine. This helps with
portability across different execution engines that support the Beam runtime, i.e. the same pipeline code can run seamlessly on
either Dataflow, Spark or Flink.
• Endpoints is API management gateway which helps you develop, deploy, and manage APIs on
any Google Cloud backend. It runs on GCP and leverages a lot of Google's underlying
infrastructure. It also has native hooks to integrate with other products in GCP suite. If you have
applications running on GCP, then Endpoints can enable you quick build APIs around them.
○ Apigee on the other hand is a comprehensive API management platform built for
Enterprises, with deployment options on cloud, on-premises or hybrid. The feature set
includes a full-fledged API gateway, customizable portal for on-boarding partners and
developers, monetization, and deep analytics around your APIs. It also out of the box
configurations to support traffic management, mediation, security, packaging APIs,
developer key management etc. You can use Apigee for any http/https backend, no
matter where they are running (on-premises, any public cloud etc.).
○ If you are using GCP, then there are use-cases for you to use both in your architecture. But
if you have not GCP backend, you only use Apigee.
• Basically, both product do the same thing. But they are very different.
○ First, Endpoint, is integrated to App Engine and can be deployed elsewhere, like on Cloud
Run. Endpoint has the basic features of an Endpoint Proxy: authentication, API key
validation, JSON to gRPC transcoding, API monitoring, tracing and logging. Endpoint is free
(or you pay only the Cloud Run when you deploy on it)
○ Apigee do the same things, but with more advance features, like quota, billing, request
pre and post processing,... In addition, it has the capability to connect APIs that differ than
REST and gRPC and thus can be integrated with a legacy application and allow it to expose
API even if it hasn't designed for. Apigee is EXPENSIVE, but POWERFUL!
• https://cloud.google.com/endpoints/docs/openapi/about-cloud-endpoints
• https://cloud.google.com/apigee/docs/api-platform/get-started/what-apigee
Google Cloud Marketplace lets you quickly deploy functional software packages that run on Google Cloud Platform.
Even if you are unfamiliar with services like Compute Engine or Cloud Storage, you can easily start up a familiar
software package without having to manually configure the software, virtual machine instances, storage, or network
settings. Deploy a software package now, and scale that deployment later when your applications require additional
capacity. Google Cloud Platform updates the images for these software packages to fix critical issues and vulnerabilities,
but doesn't update software that you have already deployed.
Search for a package and select one that meets your business needs. When you launch the deployment, you can use
the default configuration or customize the configuration to use more virtual CPUs or storage resources. Some packages
allow you to specify the number of virtual machine instances to use in a cluster.
Deployment guide:
• With the gcloud command-line tool, use the deployments create command:
○ gcloud deployment-manager deployments create my-first-deployment \ --config vm.yaml
○ The --config flag is a relative path to your YAML configuration file.
• By default, if your configuration includes resources that are already in your project, those resources are acquired
by the deployment, and can be managed using the deployment. If you don't want to acquire a resource, you must
use the --create-policy option, as in the following gcloud beta command:
○ gcloud beta deployment-manager deployments create my-first-deployment \ --config vm.yaml --create-
policy CREATE
• If your deployment is successfully created, you can get a description of the deployment:
○ gcloud deployment-manager deployments describe my-first-deployment
• You can use the following policies for creating your resources:
○ CREATE - Deployment Manager creates resources that do not exist. If any of the resources in your
configuration already exist in the project, the deployment fails.
○ ACQUIRE - Deployment Manager acquires resources that already exist, using the same criteria as
CREATE_OR_ACQUIRE.
○ Use the ACQUIRE policy if you have a number of resources already in your project, and want to manage
them together, as a single deployment
○ The default policy for removing resources is DELETE.
• Key replacement
• Use the following guidelines when replacing the key you use to encrypt Cloud
Storage objects with a new key:
a. Check your buckets to see which use the key as their default encryption key.
For these buckets, replace the old key with a new key.
b. This ensures that all objects written to the bucket use the new key going
forward.
c. Inspect your source code to understand which requests use the key in
ongoing operations, such as setting bucket configurations and uploading,
copying, or rewriting objects. Update these instances to use the new key.
d. Check for objects, in all of your buckets, encrypted with the old key. Use the
Rewrite Object method to re-encrypt each object with the new key.
e. Disable all versions of the old key. After disabling old key versions, monitor
client and service logs for operations that fail due to a version becoming
unavailable.
Cloud ML engine: Enable apps to use Tensor flow on datasets of any size.
Cloud natural language api: analyzes sentiment, intent, content classification and extracts info. Can extract toekn, parts of
speech, and dependency trees. Has more than 700+ categories.
Cloud Iot Core: Manage, connect and ingest data from devices globally.
Cloud pub/sub: Scalable for msg ingestion and decoupling.
Cloud data prep: Visually explore, create and prepare data for analysis. For business analytics, cleans data and finds missing
data using ML.
Data proc: Batch map reduce via spark and hadoop clusters. Best for moving existing spark and hadoop clusters to gcp. Use
cloud data flow if starting fresh in gcp.
Data flow: Smart autoscaled managed batch or stream map reduce like processing. Not completely similar to map reduce.
Cloud data lab: Tool for data exploration, analysis and visualization and ML. Uses jupyter notebooks.
Cloud data studio: Big data visualization tool for dashboards and reporting.
Cloud genomics: Store and process genomes. Similar to big query for large research experiments. Can process many
experiments in parallel.
For pricing purposes, all units such as MB and GB represent binary measures. For example, 1 MB is 220
bytes. 1 GB is 230 bytes. These binary units are also known as mebibyte (MiB) and gibibyte (GiB),
respectively. Note also that MB and MiB, and GB and GiB, are used interchangeably.
Cloud run: Cloud Run (fully managed) charges you only for the resources you use, rounded up to the
nearest 100 millisecond.
Tier CPU Memory Requests Networking
Free First 180,000 vCPU- First 360,000 GiB- 2 million requests 1 GiB free egress within
seconds free per seconds free per free per month North America per month
month month
1 $0.00002400 / vCPU- $0.00000250 / GiB- $0.40 / million Google Cloud Network
second beyond free second beyond free requests beyond Premium tier pricing
quota quota free quota beyond free quota.
($0.00000250 if ($0.00000250 if
idle*) idle*)
BigQuery:
BigQuery pricing has two main components:
• Analysis pricing is the cost to process queries, including SQL queries, user-defined functions,
scripts, and certain data manipulation language (DML) and data definition language (DDL)
statements that scan tables.
○ BigQuery offers a choice of two pricing models for running queries:
• On-demand pricing. With this pricing model, you are charged for the number of bytes
processed by each query.
○ The first 1 TB of query data processed per month is free.
○ $5 for every Tb after that
• Flat-rate pricing. With this pricing model, you purchase slots, which are virtual CPUs.
When you buy slots, you are buying dedicated processing capacity that you can use to
run queries. Slots are available in the following commitment plans:
• Flex slots: You commit to an initial 60 seconds.
• Monthly: You commit to an initial 30 days.
• Annual: You commit to 365 days.
With monthly and annual plans, you receive a lower price in exchange for a longer-
term capacity commitment.
• Storage pricing is the cost to store data that you load into BigQuery.
Operation Pricing Details
Active storage (1st 90days) $0.020 per GB The first 10 GB is free each month.
Long-term storage $0.010 per GB The first 10 GB is free each month.
Compute Engine:
• All vCPUs, GPUs, and GB of memory are charged a minimum of 1 minute. For example, if you run
your virtual machine for 30 seconds, you will be billed for 1 minute of usage. After 1 minute,
instances are charged in 1 second increments.
• vCPU and memory usage for each machine type can receive one of the following discounts:
○ Sustained use discounts
○ Committed use discounts
○ Discounts for preemptible VM instances
• N1 machine types can receive a sustained use discount up to 30%.
• N2 machine types can receive a sustained use discount up to 20%.
• E2 machine types do not offer sustained use discounts but provide larger savings directly through
the on-demand and committed-use prices.
Dataproc pricing:
Dataproc pricing is based on the size of Dataproc clusters and the duration of time that they run. The
size of a cluster is based on the aggregate number of virtual CPUs (vCPUs) across the entire cluster,
including the master and worker nodes. The duration of a cluster is the length of time between cluster
creation and cluster deletion.
The Dataproc pricing formula is: $0.010 * # of vCPUs * hourly duration.
Although the pricing formula is expressed as an hourly rate, Dataproc is billed by the second, and all
Dataproc clusters are billed in one-second clock-time increments, subject to a 1-minute minimum billing.
Dataflow: Each Dataflow job uses at least one Dataflow worker. The Dataflow service provides two
worker types: batch and streaming. There are separate service charges for batch and streaming workers.
Dataflow workers consume the following resources, each billed on a per second basis.
Cloud SQL: Pricing for Cloud SQL depends on your instance type:
• SQL Server: Charged for compute and memory, storage, network and licensing (Microsoft).
• MySQL: Charges for Instances and Networking. Cloud SQL charges per second for instances and
rounds-off to the nearest integer (0.49s->0s). Egress network charges depend on the destination.
• PostgreSQL: Charged for compute and memory, storage, network and instances.
Cloud Spanner: Charges for No. Of Nodes in your instance, Amount of storage, Amont of backup
storage and Network bandwidth.
Cloud functions: Cloud Functions are priced according to how long your function runs, how many times
it's invoked and how many resources you provision for the function. If your function makes an outbound
network request, there are also additional data transfer fees.
• Invocations charges:
Invocations per Month Price/Million
First 2 million Free
Beyond 2 million $0.40
• Compute resources used: From time it receives a request till its completed
Memory CPU1 Price/100ms (Tier 1 Price)
128MB 200MHz $0.000000231
256MB 400MHz $0.000000463
• 512MB 800MHz $0.000000925
1024MB 1.4 GHz $0.000001650
2048MB 2.4 GHz $0.000002900
4096MB 4.8 GHz $0.000005800
• Networking: Outbound data transfer (that is, data transferred from your function out to
somewhere else) is measured in GB and charged at a flat rate. Outbound data to other Google
APIs in the same region is free, as is inbound data. Google APIs that are global (i.e. not region-
specific) are considered to be the same region.
Type Price/GB
Outbound Data (Egress) $0.12
• Outbound Data per month 5GB Free
**Hands-on practice
Get extra practice with Google Cloud through self-paced exercises covering a single topic or theme
offered via Qwiklabs.
Complete the recommended quests and labs:
• Quest: Getting Started: Create & Manage Resources
• Quest: Perform Foundational Infrastructure Tasks in Google Cloud
• Quest: Setup and Configure a Cloud Environment in Google Cloud
• Quest: Build and Secure Networks in Google Cloud
• Quest: Deploy to Kubernetes in Google Cloud
• Hands-on lab: Cloud Run - Hello Cloud Run
• The binary log contains “events” that describe database changes such as table creation
operations or changes to table data. It also contains events for statements that potentially
could have made changes (for example, a DELETE which matched no rows), unless row-based
logging is used. The binary log also contains information about how long each statement took
that updated data. The binary log is not used for statements such as SELECT or SHOW that do
not modify data.