Edu en Vsanft7 Lec Se
Edu en Vsanft7 Lec Se
Edu en Vsanft7 Lec Se
com
mcse2012.blogfa.com
[email protected]
Lecture Manual
VMware vSAN™
Copyright © 2022 VMware, Inc. A ll rights reserved. This manual and its accompanying materials are
protected by U.S. and international copyright and intellectual property laws. VMware products are covered
by one or more patents listed at http://www.vmware.com/go/patents. VMware is a registered trademark or
trademark of VMware, Inc. in t he United States and/or other jurisdictions. A ll other marks and names
mentioned herein may be trademarks of their respective companies. VMware vSphere® w ith VMware
Tanzu™, VMware vSphere® vMotion®, VMware vSphere® Web Client, VMware vSphere® Virtual Volumes™,
VMware vSphere® Syslog Collector, VMware vSphere® Storage vMotion®, VMware vSphere® Replication™ ,
VMware vSphere® Lifecycle Manager™, VMware vSphere® High Availability, VMware vSphere® Fault
Tolerance, VMware vSphere® ESXi™ Shell, VMware vSphere® ESXi™ Dump Collector, VMware vSphere®
Distributed Switch™, VMware vSphere® Distributed Resource Scheduler™ , VMware vSphere® Distributed
Power Management™, VMware vSphere® Client™ , VMware vSphere® Add-on for Kubernetes, VMware
vSphere® AP I for Storage Awareness™, VMware vSphere® 2015, VMware vSphere®, VMware vSAN™
Enterprise Plus, VMware vSAN™, VMware vRealize® Operations™ Enterprise, VMware vRealize®
Operations™, VMware vRealize® Operations™ Standard, VMware vRealize® Operations™ Advanced,
VMware vC loud® Air™ Network, VMware vCenter® Server Appliance™, VMware vCenter Server®, VMware
Virtual SAN™, VMware View®, VMware Horizon® V iew™, VMware Verify™, VMware Skyline™ Health,
VMware Horizon® 7, VMware Horizon® 7, VMware Horizon® 7 on VMware Cloud™ on A WS, VMware HCI
Mesh™, VMware Customer Connect™, VMware vSphere® VMFS, Stretched Clusters for VMware Cloud™ on
A WS , VMware vSphere® Storage 1/0 Control, VMware Skyline Collector™, VMware Skyline Advisor™,
VMware Site Recovery Manager™, VMware PowerCLI™, VMware Platform Services Controller™, VMware
Photon™, VMware vSphere® Network 1/0 Control, VMware Lab Connect™ , VMware Pivotal Labs® Health
Check™, VMware Go™, VMware vSphere® Flash Read Cache™, Enhanced vMotion™ Compatibility, VMware
ESXi™, VMware ESX®, VMware vSphere® Distributed Resource Scheduler™, and VMware ACE™ are
registered trademarks or trademarks of VMware, Inc. in the United States and/or other jurisdictions.
The training material is provided "as is,'' and all express or implied conditions, representations, and warranties,
including any implied warranty of merchantability, fitness for a particular purpose or noninfringement, are
disclaimed, even if VMware, Inc., has been advised of the possibility of such claims. This material is designed
to be used for reference purposes in conjunction with a training course.
The training material is not a standalone training tool. Use of the training material for self-study without class
attendance is not recommended. These materials and the computer programs to which it relates are t he
property of, and embody trade secrets and confidential information proprietary to, VMware, Inc., and may
not be reproduced, copied, disclosed, transferred , adapted or modified w ithout t he express written approval
of VMware, Inc.
www.vmware.com/education
mcse2012.blogfa.com
~ontents
x
7-11 Compression-Only Mode (1) ........................................................................................................... 259
7-12 Compression-Only Mode (2) ..........................................................................................................260
7-13 Configuring Space Efficiency .......................................................................................................... 261
7-14 Verifying Space Efficiency Savings ............................................................................................. 262
7-15 Using RAID 5 or RAID 6 Erasure Coding (1) ............................................................................ 263
7-16 Using RAID 5 or RAID 6 Erasure Coding (2) ........................................................................... 263
7-17 Using RAID 5 or RAID 6 Erasure Coding (3) ........................................................................... 264
7-18 Reclaiming Space Using TRIM/UNMAP (1) ............................................................................... 265
7-19 Reclaiming Space Using TRIM/UN MAP (2) .............................................................................. 265
7-20 Reclaiming Space Using TRIM/UNMAP (3) .............................................................................. 265
7-21 Enabling TRIM/UN MAP Support .................................................................................................. 266
7-22 Monitoring TRIM/UNMAP ................................................................................................................ 267
7-23 Lab 6: Configuring vSAN Space Efficiency .............................................................................. 268
7-24 Review o f Learner Objectives ....................................................................................................... 269
7-25 Key Points .............................................................................................................................................. 269
xv
10-45 Replacing Capacity Tier Disks ......................................................................................................... 419
10-46 Replacing Cache Tier Disks ............................................................................................................ 420
10-47 Removing Disk Groups ...................................................................................................................... 421
10-48 Replacing vSAN Nodes .................................................................................................................... 422
10-49 Decommissioning vSAN Nodes .................................................................................................... 423
10-50 Lab 13: Decommissioning the vSAN Cluster ............................................................................ 424
10-51 Lab 14: Scaling Out the vSAN Cluster ........................................................................................ 424
10-52 Review of Learner Objectives ....................................................................................................... 425
10-53 Lesson 3: Upgrading and Updating vSAN ................................................................................ 426
10-54 Learner Objectives .............................................................................................................................426
10-55 vSAN Upgrades ................................................................................................................................... 427
10-56 vSAN Upgrade Process ................................................................................................................... 428
10-57 Preparing to Upgrade vSAN .......................................................................................................... 429
10-58 vSAN Upgrade Phases .................................................................................................................... 430
10-59 Supported Upgrade Paths ............................................................................................................... 431
10-60 About the vSAN Disk Format ....................................................................................................... 432
10-61 vSAN Disk Format Upgrade Prechecks .................................................................................... 433
10-62 Verifying vSAN Disk Format Upgrades .....................................................................................434
10-63 vSAN Build Recommendations ..................................................................................................... 435
10-64 vSAN System Baselines ................................................................................................................... 436
10-65 Review of Learner Objectives ....................................................................................................... 43 7
10-66 Key Points .............................................................................................................................................. 437
xx
13-50 Key Points ..............................................................................................................................................584
•
XXIV
Module 1
Course Introduction
1-3 Importance
vSAN is a policy-driven software-defined storage solution that is integrated with vSphere. vSAN
simplifies storage provisioning and management in the software-defined enterprise.
1
1-5 Learner Objectives (2)
• Detail vSAN File Service architecture and configuration
• Describe the use of vSphere Lifecycle Manager t o aut omate driver and firmware
installations
• Explain how to use proactive tests to check t he int egrity of a vSAN cluster
• Apply a structured approach to troubleshoot vSAN cluster configuration and operat ional
problems
2
1-6 Course Outline
1. Course Introduction
2. Introduction to vSAN
3
1-7 Typographical Conventions
The following t ypographical conventions are used in t his course.
Mono space Identifies command names, command opt ions, parameters, code fragments,
error messages, filenames, fo lder names, directory names, and path names:
4
1-8 References
Title Location
5
1-9 VMware Online Resources
• Start a discussion.
• Access communities.
6
1-10 VMware Learning Overview
You can access the following Education Services:
Help you find the course that you need based on the product, your role, and your level
of experience
• VMware Customer Connect Learning, w hich is the official source of digital training, includes
the fallowing options:
On Demand Courses: Self-paced learning t hat combines lecture modules with hands-on
practice labs
VMware Lab Connect: Self-paced, technical lab environment where you can practice
skills learned during instructor-led training
7
1-11 VMware Certification Overview
VMware certifications validate your expertise and recognize your technical knowledge and skills
with VMware technology.
Enterprise Architect
Design
Senior Administrator
Solution A rchitect
VCAP VMware Certified Advanced Professional - -
Deploy
Administrator
Developer
Operator
Application Data Center Cloud Management Net work Security End - User
Modernization Virtual ization and Automation Virtualization Computing
VMware certification sets the standards for IT professionals who work with VMware technology.
Certifications are grouped into technology tracks. Each track offers one or more levels of
certification (up to four levels).
For the complete list of certifications and details about how to attain these certifications, see
https:/ /vmware.com/ certification.
8
1-12 VMware Credentials Overview
VMware badges are digital emblems of skills and achievements. Career certifications align to job
roles and validate expertise across a solution domain. Certifications can cover multiple products
in the same certification.
Specialist certifications and skills badges align to products and verticals and show expanded
expertise.
vmware vmware
• Easy to share in social media (Linkedln, Twitter, Facebook, biogs, and so on)
9
Module 2
Introduction to vSAN
2-2 Importance
Understanding the logical architecture and relationships between vSAN elements provides the
necessary foundation to build a software-defined data center.
Objects and components are the building blocks of vSAN data storage. Understanding how
objects are created and distributed across multiple components is important for planning a
datastore that retains performance as objects and components are managed.
11
2-4 Lesson 1: Introduction to vSAN
12
2-6 About vSAN
vSAN is a software-defined storage solution that provides shared storage for VMs.
vSAN virtualizes local physical storage in the form of HDD or SSD devices on ESXi hosts in a
cluster, turning them into a unified datastore.
vSAN is a vSphere cluster feature that you can enable on an existing cluster or when creating a
cluster, similar to how you enable the vSphere HA and vSphere DRS features.
-----------------------------------------
• I
I I
I
I
----------
1
I
I
I I c
I
I I
I
I I
I
I I
I
I I
I
I I
I 1 vSAN Datastore 1
l _________ J I
I I
I
,..--------- .. .. --------- .. ,..--------- ..I I
I
I
I
I
I
G: SSD :
I
1
I
I
I
I
G
:SSD:
I
I
I
I
I
I
.
'
.
SSD
'
I
I
I
I
I
~: :~~:
I . . . . I
: . SSD. I
I SSD SSD 1
I I · L..::.:.J f
• I · ' ' ' I I L..::.:.J L..::.:.J I I
I
I
I
:~~:
1 L..::.:.J L..::.:.J I
""---------~
:... I:_________
SSD :11: SSD :1:
., :GB:
~---------""
I
I
I
I
I
Ill 0 Ill Ill 0 Ill Ill 0 Ill I
I
I I
ESXi Host ESXi Host ESXi Host
I I
I I
I I
I I
I I
I I
I I
I I
I vSAN Network I
·- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ..
vSAN provides enterprise-class storage that is robust, flexible, powerful, and easy to use. It
aggregates locally attached storage devices to create a storage solution that can run at the
edge, core, or cloud, and all easily managed by vCenter Server. vSAN is integrated directly into
the hypervisor.
13
2-7 vSAN Node Minimum Requirements
vSAN nodes must have the following minimum hardware resources available .
• For All-Flash
or Hybrid mode
1 Gb for hybrid mode.
You must verify that the ESXi hosts in your organization meet the vSAN hardware requirements.
All capacity devices, drivers, and firmware versions in your vSAN configuration must be certified
and listed in the vSAN section of the VMware Compatibility Guide.
14
2-8 About the vSAN Datastore
A datastore is the basic unit of storage in virtualized environments. When you enable vSAN on a
cluster, a vSAN datastore is created automatically.
Only one vSAN datastore is created, regardless of the number of storage devices and hosts in
the cluster.
The vSAN datastore appears as another datastore in the list of datastores that might be
available, including vSphere Virtual Volumes, VMFS, and NFS.
vSAN Datastore
[ 411 •
..
•
- - - - ----
- - - ----
-- -
vSAN Cluster
• • • • • •
SSD SSD SSD
• • • • • •
The size of the vSAN datastore depends on the number of capacity devices per ESXi host and
the number of ESXi hosts in the cluster. For example, if a host has seven 2-TB capacity devices
and the cluster includes eight hosts, the approximate storage capacity is 7 x 2 TB x 8 = 112 TB.
When using the all-flash configuration, f lash devices are used for capacity. For hybrid
configuration, magnetic disks are used fo r capacity.
15
2-9 vSAN Datastore Characteristics
The vSAN datastore has the following characteristics:
• vSAN provides a single vSAN datastore accessible to all hosts in the cluster.
• A single vSAN datastore can provide different service levels for each VM or each virtual
disk.
• The capacity of the cache devices does not affect the size of the vSAN datastore.
vSAN works best when all ESXi hosts in the cluster share similar or identical configurations,
including storage configurations, across all cluster members.
A consistent configuration balances VM storage components across all devices and hosts in the
cluster.
You can increase the vSAN datastore capacity by adding capacity devices or hosts with
capacity devices to the vSAN cluster.
16
2-10 vSAN Disk Groups
A disk group is a unit of physical storage capacity on a host and a group of physical devices that
provide performance and capacity to the vSAN cluster. On each ESXi host that contributes its
local devices to a vSAN cluster, devices are organized into disk groups.
Hosts can include a maximum of five disk groups, each of which must have one flash cache
device and one or more capacity devices (a maximum of seven). In vSAN, you can configure a
disk group w ith either all-flash or hybrid configurations.
•
SSD
• •
• •
SSD
•
•
•
•
SSD
•
• •
•
SSD
•
•
·&,
•
·&,
• • •
·&,
•
.&.
• • •
Capacity Tier •
•
SSD
• •
• •
SSD
•
•
•
•
SSD
• •
• •
SSD
•
•
·&,
•
·&,
• • •
·&,
•
·&,
• • •
Disk Group Disk Group Disk Group Disk Group
The devices used for caching cannot be shared across disk groups and cannot be used for other
purposes. A single caching device must be dedicated to a single disk group. In hybrid clusters,
flash devices are used for the cache layer and magnetic disks are used for the storage capacity
layer. In an all-flash cluster, flash devices are used for both cache and capacity.
17
2-11 Hybrid Disk Groups
The vSAN hybrid disk group configurations include one f lash device for cache and between one
and seven magnet ic devices for capacity. Cache devices are used for performance.
The cache device should be sized at a minimum o f 10% of the disk group capacity:
• • • • • • • • • •
SSD SSD SSD SSD SSD
Cache Tier • • • • • • • • • •
Capacity Tier
·e=:•
·e=: ·e=:
• •
·e=: • • • • •
·e=: ·e=:
• • • •
·e=: ·e=:
• • • •
·e=: ·e=:
• • • •
18
2-12 All-Flash Disk Groups
The vSAN all-flash disk group configurat ions include one flash device for cache and between
one and seven capacity f lash devices.
Flash devices are used in a two-tier format for caching and capacity, and 100% of t he available
cache is used for writ e buffering. The administrat or decides which f lash devices to use for the
capacity t ier and t he cache tier.
• • • • • • • • • •
SSD SSD SSD SSD SS D
... - - - - - -
• • •
....
• • • • • • • • • •
SSD SSD SSD SSD SSD
• • • • • • • • • •
• • • • • • • • • • • • • • • • • • • •
SSD SSD SSD SSD SSD SSD SSD SSD SSD SSD
Capacity Tier • • • • • • • • • • • • • • • • • • • •
• • • • • • • • • • • • • • • • • • • •
SSD SSD SSD SSD SSD SSD SSD SSD SSD SSD
• • • • • • • • • • • • • • • • • • • •
• • • • • • • • • • • • • • • • • • • •
SSD SSD SSD SSD SSD SSD SSD SSD SSD SSD
• • • • • • • • • • • • • • • • • • • •
... Disk Group - - Disk Group - ... Disk Group - ... Disk Group - - Disk Group -
19
2-13 vSAN Storage Policies
vSAN storage policies define VM storage requirements for performance and availability.
Storage policies also define the placement of VM objects and components across the vSAN
cluster.
The number of component replicas and copies that are created is based on the VM storage
policy.
After a storage policy is assigned, its requirements are pushed to the vSAN layer during VM
creation. Stored files, such as VMDKs, are distributed across the vSAN datastore to meet the
required levels of protection and performance per VM.
- o·IS kG roup
,.
- - o·IS kG roup - - o·IS kG roup -
• • • • •
• • • • • • • • • • • •
SSD SSD SSD SSD SSD SSD
• • • • • • • • • • • •
• • ,. • • • • • • • • •
•
SSD
• •
SSD
• •
SSD
• •
SSD
• .
,
SSD
• ,. SSD
•
20
2-14 vSAN RAID Types
vSAN supports the following common RAID types:
• RAID 1, Mirrored: Good performance, full redundancy w ith 200% capacity usage
• RAID 10, Mirrored plus striped: Best performance, redundancy with 200% capacity usage
• RAID 5, Striped plus parity: Good performance with redundancy that has slower drive writes
because of parity calculations
• RAID 6, Striped plus double parity: Good performance with redundancy that has the slowest
drive writes because it has twice the parity calculations as RAID 5
21
2-15 Multiple Storage Policies
Different vSAN storage policies can be applied to different objects in the same VM . For
example, if a VM has two virtual disks, each drive can be assigned different storage policies.
Boot Policy
• Avai lability A r- - - - - - - - - - - -
I I
• Performance A Boot I
• Capacity A .. .. - JI
-- - • .. •
I
vmdk
VM
I
I
I I
Application Pol icy 0 I I
• Avai lability B 0 App I
• Performance B 0
-- --
I
1 --- -- vmdk I
• Capacity B v
I
-
---- -- -- I
1 ___________ 1
I
22
2-16 vSAN Storage Policy Resilience
When configuring a VM storage policy, you can select a RAID configurat ion that is optimized
with suitable availability, performance, and capacity for your VM deployments.
23
2-17 Integrating vSAN with vSphere HA
You can enable both vSAN and vSphere HA on the same cluster.
When enabling vSAN and vSphere HA for the same cluster, the vSphere HA agent traffic, such
as heartbeats, and election packets flow over the vSAN network rather than the management
network.
vSphere HA uses the management network only when vSAN is disabled. vCenter Server
chooses the appropriate network based on the order in which the two services are enabled and
disabled.
24
2-18 Integrating vSAN with VMware Products
(1)
vSAN combined with vSphere and t he VMware ecosystem makes t he ideal storage platform for
the VMware Horizon virtual desktop infrastructure (VDI).
vSAN provides scalable storage in a VMware Horizon environment. You can scale up by adding
disk drives in each host or scale out by adding hosts to the cluster. vSAN supports both all-flash
and hybrid storage configurations for the VMware Horizon 7 VDI.
•
- - _.......
Horizon 7
vSphere vSAN
25
2-19 Integrating vSAN with VMware Products
(2)
vSAN 7 support s using native file services as persist ent volumes for Tanzu clusters.
When used with vSphere w it h Tanzu, persistent volumes can support t he use of encryption and
snapshots. vSAN also enables vSphere Add-on for Kubernet es so that stateful containerized
workloads can be deployed on vSAN datast ores.
•I ---------------------------1I
I
ub s I
I I
I I
I M I
I
Persistent I I
I I
Storage
Volumes Class
I I
~------------------------ --~
I I
CNS Control Plane I
~-----------------~----1
26
2-20 vSAN Use Cases
Some of the most common vSAN use cases include:
• VDI: Use vSAN as a VDI storage solution for VMs and user data.
• Remote and branch offices: Use vSAN as a storage solution to increase local IT
performance, start with a small physical footprint, and control costs with f lexible licensing
models.
• Disaster recovery: Use vSAN as a disaster recovery solution to lower disaster recovery
costs, manage a disaster recovery from a unified UI, and orchestrate and automate
recovery.
27
2-21 vSAN Licensing
When planning your vSAN cluster, you must configure a license. vSAN licenses have per-CPU
capacity.
When you assign a vSAN license to a cluster, the amount of license capacity used is equal to the
total number of CPUs on the hosts that participate in the cluster.
The vSAN cluster must be assigned a license key before its evaluation period expires or before
its currently assigned license expires. If you upgrade, combine, or divide vSAN licenses, you
must re-assign the licenses to vSAN clusters.
vSAN license editions include Standard, Advanced, Enterprise, and Enterprise Plus packaging.
For more information about vSAN licensing, see the vSAN licensing guide at
https://www.vmware.com/content/ dam/ digitalmarketing/vmware/ en/pdf/products/vsan/vmw
are-vsan-licensing-guide.pdf.
28
2-22 vSAN Licensing Differences (1)
All vSAN licenses include the fallowing featu res.
• Rack Awareness
• All-Flash Hardware
• QoS - IO PS Limit
• Shared Witness
29
2-23 vSAN Licensing Differences (2)
The remaining licenses enable specific functionality.
File Services
HCI Mesh
30
2-24 Review of Learner Objectives
• Describe the vSAN architecture
31
2-25 Lesson 2: vSAN Objects and
Components
32
2-27 vSAN and Object-Based Storage
vSAN is an object-based datastore.
vm
•
SSD
• •
sso • •
SSD
• •
sso • •
SSD
• •
sso • •
SSD
• •
sso •
Disk Group Disk Group Disk Group Disk Group
Ill 0 Ill Ill 0 Ill Ill 0 Ill Ill 0 Ill
vSAN Network
VMs include the fallowing objects:
• VM disks (VMDK)
• VM swap object
• VM snapshots
• Vmem object
33
vSAN stores and manages data as flexible data containers called objects. Each object on the
datastore includes data, part of the metadata, and a unique ID. One of the most common objects
in the datastore is the v irtual machine disk (VMDK) object which contains VM data. Using a
unique ID, the object can be globally addressed by more than the filename and path. The use of
objects enables a detailed level of configuration on t he object level, for example, RAID type or
drive usage at a level higher than the physical drive blocks.
In a block-level file system, blocks are arranged in a RAID set or a logical drive first. You create
the file system on top of the RAID set. The file system includes the metadata or file allocation
table that defines filenames, paths, and data location. In this environment, the system places file
blocks on the drive according to the file system structure and bases the data protection on the
logical drive or RAID set.
Consider the following use case. The VMBeans company evaluates what sort of storage to use
for the content generated by their media development team. The company's current storage is
terabytes in size and needs the ability to grow. Their professional services vendor recommends
using object-based storage to store the media files. The vendor explains the benefits of object-
based storage for datastores that reach into the petabyte range and upwards. VMBeans agrees
that object-based storage seems ideal for this use.
34
2-28 About vSAN Storage Policies
vSAN storage policies define storage requirements for VMs:
• They can be constructed from capabilities advertised to vCenter Server through vSphere
API for Storage Awareness.
vSAN
vS AN
35
2-29 Default vSAN Storage Policy
vSAN has a default VM storage policy:
• It cannot be deleted.
• It can be modified.
v o~ ge Pol1c1e
AT OT C L
0
0
0 Ol ey
Storage policies define VM storage requirements, such as performance and availability, in the
form of a policy. vSAN requires t hat VMs deployed to a vSAN datastore are assigned at least
one VM storage policy. If a storage policy is not explicitly assigned to a VM, the default storage
policy of the datastore is applied to the VM. If a custom policy has not been applied to the vSAN
datastore, the vSAN default storage policy is used.
36
2-30 About Objects and Components
Objects are made up of components.
If objects are replicated, multiple copies of the data are located in replica components.
In this example, vSAN creates two replicas of the object data. Each of these replicas is a
component of the object.
vm
c --
vSAN Network
Each object is composed of a set of components, determined by storage policies. In this
example, with the fa ult tolerance set to 1, vSAN places protection components, such as replicas,
on separate hosts in the vSAN cluster, where each replica is an object component.
For example, a storage policy that tolerates a failure creates a copy of the VMDK data in
another location of the vSAN datastore.
The VMDK is the object and each copy is a component of that object.
37
2-31 Component Replicas and Copies
The number of component replicas and copies that are created is based on the VM storage
policy.
Rule-set 1: VSAN
Placement
Storage Type VSA
Ste d sast r tot ranc Non
Fa ures to to! rate 1f ure RAIO 1 (Mirroring)
umber of d sk str pes per ob ect 1
IOPS I mil for Object 0
Ob ect space reservation Th!n provas on ng
Flash read cache reservat on 0
D sab e object checksum No
Force provtsion ng No
Ad~mxed
Component O Actow D sa-esx Ol.vclassJocal
Tasks and Events v
Component 0 ACllW 13 sa-esx 02 vdass local
Tasks
Evl!fllS Witness 0 Active El sa-esx 03 vdass.local
Utillzauon
v Q Hard d!Sk 1 (RAIO 1)
vSAN v
Performance
COmponcnt 0 Activo 0 sa-esx 03 vdass.local
The replica object's purpose is to allow t he VM to continue to run if a failure occurs in the vSAN
physical infrast ructure.
The number of repl ica objects creat ed is det ermined by the setting specified in t he vSAN
st orage policy used t o set the resiliency of the storage objects f or vSAN .
38
2-32 Object Accessibility
Object and component accessibility can be viewed in the fa llowing ways:
vSAN objects are available when more than 50 percent of the components comprising an object
are accessible. Quorum, an asymmetrical voting system to decide the availability of objects,
determines accessibility.
Each component can have one or more votes. In the case of a tie, a witness component is
provisioned to achieve quorum. Ties are primarily caused by a network partition or some other
split-brain scenario.
• When the DOM owner is not properly established and an 1/0 error is returned
39
2-33 About Witnesses
When needed, vSAN automatically creates witness components.
• Quorum is achieved when more than 50 percent of the votes are available.
Storage Policy ./
FTT = 1 - RAID 1 ~- vm
0 - ......--
- I I I I I I ...
I vmdk I
I I
L. - - - -
--------1
1 solated 1
I - I
I I
I NVM e I I I
• • • • • • I · · I
SSD SS D
• • • •
Component : Component :
• • • •
SSD SSD I I
• • • • • • • • • • • •
I I
Disk Group Disk Group Disk Group 1 Disk Group I
I111 0 111 I Ill 0 111 111 0 Ill I
I
Ill 0 111
I
I
I I
• @ --------- • I I
vSAN Network
Like many clustering solutions, having an even number of replicas creates a partition risk, also
called split-brain scenario. A split-brain scenario occurs when hosts containing an even number of
replicas for a specific VM cannot communicate with other hosts over the network. To resolve
this issue, vSAN typically creates another component called a witness.
The witness component is small (only a few megabytes) and contains only metadata, no
application data. The purpose of the witness is to serve as a tiebreaker when a partition occurs.
40
vSAN supports a quorum-based system in which each component might have more than one
vote to decide the availability of VMs. A minimum of 50 percent of the votes that make up a VM
storage object must be accessible at all times. When fewer than 50 percent of the votes are
accessible to all hosts, the object is not available to the vSAN datastore.
The default storage policy states that any object can sustain at least one component failure. This
illustration represents RAID 1 w ith two replicas on two separate capacity disks in two hosts. A
witness can be created and placed on a third host. If the system loses a component or access to
a component, the system uses the witness. The component that can communicate with the
witness is declared to have integrity and is used for all data read/write operations until the
broken component is repaired.
41
2-34 Example: Witness
vSAN controls witness configurations in the background, transparent to the user.
• Primary
• Secondary
• Tiebreaker
This example shows a three-way mirror across five nodes.
--- --- ----- - ---- ------ ---- --- -- --- --- ---- -- --- --- --
'
- - - ' I Votes = 1 I I Votes= 1 I I Votes= 1 I I Votes= 1 J I Votes= 1 I I
I
I
I
vmdk Component
I
I
I
Ill Ill Ill Ill Ill Ill I
I I
A primary witness is the first witness that is deployed for any object.
Secondary witnesses are created to ensure that every cluster node has equal voting power
toward a quorum.
If an even number of total components exists after adding primary and secondary witnesses, a
tiebreaker witness is added to make the total component count an odd number.
In the example, each component is given one vote. Two witnesses are used to guarantee an
adequate quorum if a component failure occurs. The witness count is dependent on how the
system places components and data on the nodes in the cluster.
For more detailed examples of the witness architecture and logic, see VMware Virtual SAN:
Witness Component Deployment Logic at
https:/ /biogs. vmware .com/vsphere/2014/ 04/ vmw are-virtual-san-witness-component-
deployment-logic.html.
42
2-35 Large vSAN Objects
When a VM disk exceeds 255 GB, the object is automatically split into multiple components.
vSAN 7.0 uses concatenation in this case. If the VM has a disk of size 300 GB, the first
component would be 255 GB, and the second component would be 45 GB.
When planning the vSAN datastore, consider the size of VMDKs and other objects planned for
the datastore.
v PAID 1
v RAID 1
(
•
CLOSE
When planning vSAN disk group architecture, you must plan for enough individual devices so
that vSAN can split objects exceeding limits, in addition to any striping planned through policies.
vSAN divides any object larger than 255 GB on a vSAN datastore, regardless of whether stripes
are defined in the storage policy applied to an object.
Any object, such as a VMDK, can be up to 62 TB. The 62 TB limitation is the same for VMFS and
NFS so that VMs can be cloned and migrated using vSphere vMotion between vSAN and other
datastores.
43
2-36 Review of Learner Objectives
• Define vSAN objects
44
2-37 Lesson 3: vSAN Software Underlying
Architecture
45
2-39 vSAN Architectural Components
The main vSAN architecture components are:
46
2-40 vSAN Architecture Analogy: Building a
House
47
2-41 CLOM and Its Role: Architect
The CLOM process runs on every vSAN node:
• It validates that objects can be created based on policies and available resources.
You manage the CLOM process w ith the I etc/ ini t. d/ clomd
status I stop I start I restart command.
48
2-42 DOM and Its Role: Contractor (1)
The DOM runs on each host in a vSAN cluster. The DOM process includes:
• Receiving instructions from the CLOM and the DOMs running on other hosts in the vSAN
cluster
DOM services on the hosts of a vSAN cluster communicate to coordinate the creation
of components.
49
2-43 DOM and Its Role: Contractor (2)
Each object has a DOM owner, a DOM client, and a DOM component manager.
DOM client:
DOM owner:
• Replicates 1/0 based on the object's RAID layout and determines in which components the
data block resides
• Forwards the I/ 0 to the DOM component manager where the components reside
50
2-44 LSOM and Its Role: Worker
The LSOM performs the following funct ions:
• Pert orms solid-state drive log recovery when t he vSAN node boot s
51
2-45 CMMDS and Its Role: Project Manager
The CMMDS performs the following functions:
• Provides topology and object configuration information to the CLOM and the DOM as
requested
The backup host accelerates the process of convergence if the master host fails.
Communication between the master host and other hosts occurs every second.
52
2-46 RDT and Its Role: Delivery Truck
ROT is a network protocol for the transmission of vSAN traffic.
- -
0 0
- - -
53
2-4 7 Activity: vSAN Component Layer
Which vSAN layer pert or ms I/ 0 ret ries on failing devices?
54
2-48 Activity: vSAN Component Layer
Solution
Which vSAN layer performs 1/0 ret ries on failing devices?
55
2-49 Component Interaction: Architect and
Contractor
When it receives a request to create an object, the CLOM determines whether the object can
be created with the selected VM storage policy.
If the object can be created, the CLOM instructs the DOM to create the components.
56
2-50 Component Interaction: Contractor and
Worker
During component creation, the DOM, CLOM, and LSOM have the following interactions:
• The DOM instructs the LSOM to create the local components. The LSOM interacts at the
drive layer and provides persistent storage. The DOM interacts w ith the LSOM across the
local hosts.
• If components are requ ired on other nodes, the DOM interacts w ith the DOM instance on
the remote node.
57
2-51 Component Interaction: Architect,
Contractor, and Project Manager
The DOM and the CLOM consult the CMMDS:
58
2-52 Activity: Drive Status Reporting
One of the capacity drives failed in a disk group. Which process flags the drive status?
1. CLOM
2. DOM
3. LSOM
4. RDT
59
2-53 Activity: Drive Status Reporting Solution
One of the capacity drives failed in a disk group. Which process flags the drive status?
1. CLOM
2. DOM
3. LSOM
4. RDT
60
2-54 Activity: Physical Space
Which vSAN architectural component displays the amount of physical space that is being used
on the disk?
1. CLOM
2. DOM
3. LSOM
4. CMMDS
61
2-55 Activity: Physical Space Solution
Which vSAN architectural component displays the amount of physical space that is being used
on the disk?
1. CLOM
2. DOM
3. LSOM
4. CMMDS
62
2-56 Step-by-Step VM Creation (1)
The process of creating a VM shows how vSAN software components interact.
1. A new VM is defined.
2. vCenter service daemon
(vpxd) receives the request.
Create
NewVM
Select
Host • VM
Created
63
2-57 Step-by-Step VM Creation (2)
7. CLOM checks the VM storage policy to
6. Hostd starts the VM determine how many components should be
and VMDK file created.
creation . 8. CLOM checks resources available to satisfy the
request.
9. CLOM decides initial placement of the
components.
Create
NewVM
Select
Host
VPXA LSOM .. VM
Created
64
2-58 Step-by-Step VM Creation (3)
65
2-59 Beginning-to-End VM Creation
Create
NewVM
Select
Host .. '
Hostd
• VM
Created
66
2-60 Activity: VM Creation
A user is trying to create a VM, but t he creation fails shortly after t he user clicks Finish.
67
2-61 Activity: VM Creation Solution
A user is trying to create a VM, but t he creation fails shortly after the user clicks Finish.
68
2-62 Lab 1: Reviewing the Lab Environment
Review information to become familiar with the lab environment:
69
2-63 Review of Learner Objectives
• Describe the CLOM, DOM, LSOM, CMMDS, and RDT vSAN software components
• vSAN virtualizes local physical storage resources of the ESXi host, turning them into object-
based storage.
• Only one vSAN datastore is created, regardless of the number of storage devices and
hosts in the cluster.
• Disk groups contain a maximum of one flash cache device and seven capacity devices, and
a host can include a maximum of five disk groups.
• With vSAN, you can configure a disk group with either all-flash or hybrid configurations.
• vSAN storage policies define VM storage requirements for performance and availability.
Questions?
70
Module 3
Planning a vSAN Cluster
3-2 Importance
You must understand how to plan for server hardware, storage capacity, and network
configuration requirements for a successful vSAN cluster deployment.
71
3-4 Lesson 1: vSAN Requirements
72
3-6 vSAN Cluster Requirements
When planning a vSAN cluster deployment, you must verify t hat all elements of the cluster meet
the minimum requirements for vSAN.
All devices, drivers, and firmware versions in your vSAN configuration must be certified and
listed in the vSAN section of the VMware Compatibility Guide.
A standard vSAN cluster must contain a minimum of three hosts that contribute to t he capacity
of the cluster.
73
3-7 vSAN Configuration Minimums and
Maximums
Familiarit y with vSAN minimum and maximum configurations is useful during the initial planning
phase o f your deployment.
ESXi host 3 64
74
3-8 vSAN Host CPU Requirements
When determining CPU requirements for hosts in the vSAN cluster, consider the following
information:
Additional CPUs must be considered for vSAN operational overhead if vSAN deduplication,
compression, and encryption capabilities are enabled.
The vSAN ReadyNode Sizer tool is useful for determining CPU requirements for vSAN hosts.
75
3-9 vSAN Host Memory Requirements
When determining memory requirements for host s in t he vSAN clust er, consider the following
information:
• Memory per VM
The memory requirements for vSAN hosts depend on t he amount of disk groups and devices
that t he ESXi hypervisor must manage.
Consider at least 32 GB o f memory for a fully operational vSAN node wit h five disk groups and
seven capacity devices per disk group.
DOD
For more information about calculating vSAN memory consumption, see VMware knowledge
base article 2113954 at https:/ /kb.vmware.com/s/article/2113954.
76
3-10 vSAN Host Network Requirements
When configuring your network for vSAN hosts, consider the fallowing recommendations:
If a network adapter is shared w it h other t raffic types, use V LA Ns to isolate t raffic types.
Consider configuring Network 1/0 Cont rol on a vSphere distributed switch to ensure that
sufficient bandwidth is guaranteed to vSAN.
77
3-11 vSAN Host Storage Controllers
Storage controller recommendations:
• Use controllers that support pass-through mode to present disks directly to a host.
• Use multiple storage controllers to improve pert ormance and to isolate a potential controller
failure to only a subset of disk groups.
• Consider the storage controller pass-through mode support for easy hot-plugging or the
replacement of magnetic disks and flash capacity devices on a host.
• • • • • • • •
SSD SSD SSD SSD
• • • • • • • •
Configure controllers that do not support pass-through to present each drive as a RAID 0 LUN
with caching disabled or set to 100% Read. If a controller works in RAID 0 mode, you must
perform additional steps before the host can discover the new drive.
The controller must be configured identically for all disks connected to the controller including
those not used by vSAN. Do not mix the controller mode for vSAN disks and disks not used by
vSAN to avoid handling the disks inconsistently, which can negatively affect vSAN operation.
In RAID 0 mode, each drive must be used to create one RAID 0 volume that only contains one
drive. If you have 12 drives, you create 12 RAID 0 volumes each with one of the drives in it. RAID
0 mode introduces additional complexity for a disk replacement.
78
3-12 vSAN Host Boot Device Requirements
You can boot vSAN hosts from a local disk, a USB device, SD cards, and SAT ADOM devices.
If you choose to boot vSAN hosts from a local disk, using separate storage controllers for boot
disks and vSAN disks is the best practice.
When booting vSAN hosts from a USB device, an SD card, or SAT ADOM devices, log
information and stack traces are lost on host reboot. They are lost because the scratch partition
is on a RAM drive. Therefore, a best practice is to use persistent storage for logs, stack traces,
and memory dumps.
Consider configuring the vSphere ESXi Dump Collector and vSphere Syslog Collector.
During installation, the ESXi installer creates a core dump partition on the boot device. The
default size of the core dump partition satisfies most installation requirements.
If the ESXi host has 512 GB of memory or less, you can boot the host from a USB, SD, or
SAT ADOM device. When booting a vSAN host from a USB device or SD card, the size of the
boot device must be at least 4 GB.
If the ESXi host has more than 512 GB of memory, you can boot the host from a SATADOM or
disk device w ith a minimal size of 16 GB. When you use a SAT ADOM device, use a single-level
cell (SLC) device.
79
3-13 About Hard Disk Drives
Hard disk drives (HDDs), also called magnetic or mechanical drives, include single or multiple
platters rotating at a specific speed to provide data access. The HDD provides larger storage
capacity at a lower cost.
The HDD has an arm with several heads or transducers that read and write data on the disk. The
arm moves the heads across the surf ace of the disk to access different data.
• 7,200 RPM
• 10,000 RPM
• 15,000 RPM
In vSAN, HDDs are used in the capacity tier for hybrid configurations.
80
3-14 Solid-State Devices
In vSA N, SSDs are used in cache tiers to improve performance. They can also be used in both
the cache t ier and t he capacity tier, which is called a vSAN all-flash configuration.
SSDs have a limited number of write cycles before t he cell fails, which is called its write
endurance rating. Every time t he drive writes or erases, the f lash memory cell's oxide layer
deteriorates. The type of cell affects the number o f write cycles before failure.
Solid-st ate devices (SSD) include a collection o f memory semiconductors and all data is stored in
integrat ed circuit cells. SSDs are more expensive than HDDs per amount o f storage space
available. SSD offers up to t en t imes fast er read and write speeds than a midrange HDD.
81
3-15 vSAN Limitations
When planning to deploy vSAN, you should st ay w ithin t he limits of what is supported by vSAN.
• SEsparse disks, which are a default format for all delta disks on VM FS6 dat astores
82
3-16 Review of Learner Objectives
• Identify requirements and planning considerations for vSAN clusters
83
3-17 Lesson 2: Planning Capacity for vSAN
Clusters
84
3-19 Capacity-Sizing Guidelines
When planning for the storage capacity of the vSAN datastore, you must consider the following
factors:
• Anticipated growth
• Failures to tolerate
When planning to use advanced vSAN features such as software checksum or deduplication and
compression, reserve addit ional storage capacity to manage the operational overhead.
Plan for additional storage capacity to handle any potential failure or replacement of capacity
devices, disk groups, and hosts. Reserve additional storage capacity for vSAN to recover
components after a host failure or when a host enters maintenance mode.
Keep at least 30% of storage consumption unused to prevent vSAN from rebalancing the
storage load. vSAN rebalances the components across the cluster whenever the consumption
on a single capacity device reaches 80% or more. The rebalance operation might affect the
pertormance of applications.
Plan extra capacity to handle any potential failure or replacement of capacity devices, disk
groups, and hosts. When a capacity device is not reachable, vSAN recovers the components
from another device in the cluster. When a flash cache device fails or is removed, vSAN
recovers the components from the entire disk group.
Provide enough temporary storage space for changes in the vSAN VM storage policy. When
you dynamically change a VM storage policy, vSAN might create a new RAID tree layout of the
object. When vSAN instantiates and synchronizes a new layout, the object may consume extra
space temporarily. Keep some temporary storage space in the cluster to handle such changes.
Enabling deduplication and compression with software checksum features requires additional
storage space overhead, approximately 6.2 percent capacity per device.
85
3-20 vSAN Reserved Capacity
vSAN requires free space set aside for operations such as host maintenance mode data
evacuation, component rebuilds, and rebalancing.
This free space also accounts for the capacity needed for host outages. Activities such as
rebuilds and rebalancing can temporarily consume additional raw capacity.
The free space required for these operations is called vSAN reserved capacity and it comprises
the fallowing elements:
• Operations reserve
Reserve Reserve
Capacity Capacity
(% cixed) (%Va ies)
• Operations reserve: Reserves storage space for internal vSAN operations, such as object
rebuild or repair.
• Host rebuild reserve: Reserves storage space to ensure that all objects can be rebuilt if host
failure occurs in the cluster.
In all vSAN versions earlier than vSAN 7 U1, the free space required for these transient
operations was called slack space. The limitations of vSAN in versions earlier than vSAN 7 U1 led
to a generalized recommendation of free space as a percentage of the cluster (25-30%),
regardless of cluster size.
86
When sizing a new vSAN cluster, use the vSAN Sizer tool which has the vSAN reserved
capacity calculation logic built in. As a best practice, do not use manually created spreadsheets
or calculators because they will no longer accurately calculate free capacity requirements for
vSAN environments using version 7 U1 or later.
87
3-21 Planning for Failures to Tolerate
When planning the storage capacity of the vSAN datastore, you must consider the failures to
tolerate (FTT) levels and the failure tolerance method (FTM) attributes of the VM storage
policies for the cluster.
For example, a VM configured with FTT set to 1 and FTM set to RAID 1 Mirroring can consume
twice the storage space on the vSAN datastore to support the configured availability.
Similarly, a VM configured with FTT set to 1 and FTM set to RAID 5/6 Erasure Coding can
consume 33% of the additional storage space on the vSAN datastore to support the configured
availability.
A Failures To Tolerate policy with RAID 1 can significantly affect the space consumption of a
vSAN datastore.
Consider another example in which you expect to have 100 VMs in your vSAN environment. On
average, each VM has 2 objects, each object has an average size of 40 GB. The total number of
VMDK objects is 200.
If you uniformly apply a Failures To Tolerate policy value of 1 to all objects, each object w ill have
a provisioned space of 40 GB x 2 = 80 GB. Therefore, the total provisioned space of 100 VMs is
expected to be 200 x 80 GB = 16 TB.
If you increase the Failures To Tolerate value to 3, the average space consumed by an object
will be 40 GB x 4 = 160 GB. Therefore, the total provisioned space of all 200 objects will be 200
x 160 GB = 32 TB.
88
3-22 Planning Capacity for VMs
When planning the storage capacity of the vSAN datastore, consider the space required fo r the
following VM objects:
• VM VMDK object
• VM snapshot object
• VM swap object
The VM snapshot object inherits the storage policy settings from the VM's base VMDK file. You
must plan extra space according to the expected size and number of snapshots required.
The VM swap object inherits the storage policy settings from the VM home namespace object.
You must also consider enabling thin provisioning for the VM swap object if your environment is
not overcommitted for memory.
Because VM VMDK files are thin-provisioned by default, prepare for future capacity growth.
The VM VMDK object holds the user data. Its size depends on the size of v irtual disk, defined by
the user. However, the actual space required by a VMDK object for storage depends on the
applied VM storage policy.
For example, if the size of the VMDK is 40 GB and the FTT is set to 3, the actual space
consumption can be up to four times the VMDK size (40 GB x 4).
The size of a VM swap object depends on the memory configured on a VM. Because vSAN
applies the Failures To Tolerate policy of 1 to a VM swap object, the actual space consumption
can be twice as much as the configured VM memory.
For example, if the memory configured on a VM is 8 GB, the actual storage space consumption
will be 16 GB for the VM swap object.
89
3-23 Plan and Design Consideration: VM Home
Namespace Objects
A home namespace object does not contain user data. It is a container object that contains
various files (such as VMX and log files) which, compared to other objects, occupies much less
space.
• Failures To Tolerate
• Force Provisioning
90
3-24 Plan and Design Consideration: VMDK
and Snapshot Objects
VMDK and snapshot objects hold user data, and their size depends on the size of the VMDK file
defined by the user. The space required by a VMDK object depends on the user-applied policy.
Because VMDKs are thin-provisioned by default, you must prepare for future growth in capacity.
The VM snapshot delta object inherits the storage policy settings of the VM's base VMDK file, so
plan extra space according to the expected size and number of snapshots required.
The space required by a VMDK object depends on the user-applied policy. For example, if the
size of the VMDK is 40 GB and the Failures To Tolerate is set to 3, the actual space
consumption can be up to 4 times of the VM DK file size ( 40 GB x 4 ).
In t he example, if the object space reservation is also set to 100%, the entire 160 GB will be
reserved when applying the policy.
91
3-25 Plan and Design Consideration: VM Swap
Object
The size of a VM swap object depends on the memory that is configured on a VM .
The VM swap object inherits the storage policy settings from the VM home namespace object
and thin provisioned by default.
92
3-26 vSAN Cache Tiers
A vSAN cache tier must meet the following requirements:
• Fo r hybrid clusters, a flash caching device must provide at least 10% of the anticipated
capacity tier storage space.
Disk Group
• •
Cache Tier SSD
(SSD) • •
• • • •
SSD SSD
• • • •
• • • •
SSD SSD
• • • •
For best performance, consider a PCle f lash device which is faster than SSD.
In vSAN all-flash configurations, the cache tier is not used for read ing operations. You can use a
small capacity with high write endurance flash device for cache t ier.
93
3-27 vSAN Capacity Tiers
A vSAN capacity t ier must meet the following requirements:
Environments that are planned and deployed according to requirements and best practices have
a better chance of avoiding failures and workflow interruptions.
Disk Group
• •
SSD
• •
• • • •
SSD SSD
Capacity Tier • • • •
(SSD or HDD) • • • •
SSD SSD
• • • •
94
3-28 Magnetic Devices for Capacity Tiers
When planning the size and number of magnetic disks for capacity in hybrid configurations,
follow the requirements for storage space and performance.
Use magnetic devices according to requirements for performance, capacity, and cost of the
vSAN storage. SAS and NL-SAS magnetic devices have faster performance.
Plan the configuration of magnetic capacity devices according to the fallowing guidelines:
• For better vSAN performance, use many magnetic disks that have smaller capacity.
• For balanced performance and predictable behavior, use the same type and model of
magnetic disks in a vSAN datastore.
Plan for enough magnetic disks to provide adequate aggregated performance, using more small
devices provides better pert ormance than using fewer large devices. Using multiple magnetic
disk spindles can speed up the process.
95
3-29 Flash Devices for Capacity Tiers
Plan the configuration of f lash capacity devices for vSAN all-flash clusters to provide high
performance and the required storage space, and to accommodate future growth.
Choose SSD flash devices according to requirements for performance, capacity, write
endurance, and cost of vSAN storage:
• For capacity: Using flash devices is less expensive and has lower write endurance.
• For balanced pert ormance and predictable behavior: Use the same type and model of flash
capacity devices.
96
3-30 Multiple vSAN Disk Groups
An entire disk group can fail if a f lash cache device or a storage controller stops responding.
vSAN rebuilds all components for a failed disk group from another location in the cluster.
Using multiple disk groups, with each providing less capacity, has benefits and disadvantages.
Benefits:
• Improved performance:
• The datastore has more aggregated cache, and I/ 0 operations are faster.
Disadvantages:
• Costs are increased because two o r more caching devices are required.
• Multiple storage controllers are required to reduce the risk of a single point of failure.
97
3-31 About vSAN Cluster Scaling
vSAN scales up and scales out if you need more compute o r storage resources in t he cluster.
Scaling out adds nodes to the clust er for compute and storage capacit y .
I -. - - . -. - - . I • -
. --~. -~· • I ' •
I SS ssr I I
1..,_ sso
SS '
. _....,. _
. _.....,. I
I I SS I SS I
I i.:..
• _ . . : , i - · _ . . . . .. , I I • • I
I ,..,...
, -....... . , I I r-,- - . ~---:t I I r:------:1 ,, . , . - -..,., I
I SS SSD I I SS sso I I $S I SS I
..· -_________
I
I -· - -· -· - -
~---------~
· I · --· •
..
• I
I -· _
~---------
_..
• ._
· _.-• I
..
•
Scaling Up
98
3-32 Planning for Scaling Up
Scaling up a vSAN cluster refers to increasing the storage capacity by adding disks to the
existing vSAN node.
Always increase capacity uniformly on all cluster nodes to avoid uneven data distribution, which
can lead to uneven resource utilization.
II 0 I II 0 II I 0 I
•I
[ c I: ij c ] [ ' II
I
I
I
I
I
I ~-
-- --- - - -
c Ii g U
99
3-33 Planning for Scaling Out
Scaling out a cluster adds storage and compute resources to the cluster.
You can also add compute-only hosts to the cluster, which add only CPU and memory
resources, not storage resources. If you have diskless servers or unused servers in inventory,
you can add them to a vSAN cluster.
100
3-34 Using the VMware Compatibility Guide
Using the VMware Compatibility Guide, you can verify that ESXi hosts in your organization meet
vSAN hardware requirements.
For vSAN, the guide provides access to a sizing tool and a configuration guide.
The vSAN Ready Node Sizer tool is not limited to storage. It also factors in CPU and memory
sizing. This tool incorporates sizing overheads for swap, deduplication, and compression
metadata, as well as disk formatting and CPU for vSAN.
You can use the step-by-step guide to select the version, platform, model, and vendor for your
vSAN ReadyNode.
101
3-35 Review of Learner Objectives
• Discuss how t o plan st orage consumption by considering dat a growth and failure t olerance
102
3-36 Lesson 3: Designing a vSAN Network
103
3-38 vSAN Networking Overview
A vSAN cluster requires a network channel for communication between hosts.
Hosts use a VMkernel adapter to access the vSAN network. You can create a vSAN network
with standard or distributed switches.
• If using a vSphere distributed switch, a single VMkernel port group attaches to the hosts
that are enabled with vSAN.
• If using standard switches, each host has its own standard switch configuration for the
vSAN network .
..------------------------------
: vSphere vSAN Cluster :
I ~-------, I
I I I
I I I I
I I I I
I
I I vSAN Datast.ore I I
I - - - - - - - .... I
I I
I [sso ] [sso] [sso] I
I I: sso ][ :I
SSD Bl: sso :J I: SSD :11: SSD ] :
: BB BB BB I
I I
I 1111 0 1111 1111 Q 1111 1111 0 1111 I
1 1 1 1
ESXi Host ESX!i Host ESXi Host
I I
I I
I I
1 vSAN Network I
------------------------------
104
3-39 Designing a vSAN Network
When planning the network for t he vSAN cluster, consider the following networking features
that vSAN supports to provide availability, security, and guaranteed bandwidth:
• Unicast support
• Network I/ 0 Control
• Jumbo frames
105
3-40 NIC Teaming and Failover
vSAN uses the NIC teaming and failover policy configured on the virtual switch for network
redundancy only. vSA N does not use the second NIC for load-balancing purposes.
Consider configuring Link Aggregation Cont rol Protocol (LAC P) or EtherChannel for improved
redundancy and bandwidt h use.
Adva nced
Load balancing Route based on originating virtual port
VLAN
Network failure detection Link status only
Secunty
Teaming ana fa1lover Notify switches Yes
••
.Ac!~ uplink•
tallover. standby ul)llnks activate 1n the
order speCffred below .
• Uplink I
S!endby uplink<
• Uplml<2
Unused uplinks
I CA NCEL I OK
106
3-41 Unicast Support
Unicast is the supported protocol for a vSAN network. Multicast is no longer required on the
physical switches that support vSAN clusters.
• To simplify network requirements for vSAN cluster communications for Cluster Membership,
Monitoring, and Directory Services (CMMDS) and VM 1/0 traffic
If hosts in your vSAN cluster are running earlier versions of ESXi, a multicast network is still
required.
1-+-L----~(111 6 111)
1111 0 111 1
1111 0 1111
107
3-42 Network I/ 0 Control
Network I/ 0 Control is available on vSphere distributed switches and provides the following
bandwidth controls:
• Controls the proportion of bandwidth allocated to each traffic type during congestion
If you plan to use a shared 10 Gb Ethernet network adapter, place the vSAN traffic on a
distributed switch and configure Network I/ 0 Control to guarantee sufficient bandwidth for
vSAN traffic.
Fault
Tolerance
o-D o-D
111111 111111
108
3-43 Priority Tagging and Isolating vSAN
Traffic
Priority tagging is a mechanism to indicate to the connected network devices that vSAN traffic
has high quality-of-service (QoS) demands.
You can assign vSAN t raffic to a certain class and mark the traffic accordingly with a class-of-
service (CoS) value from 0 (low priority) to 7 (high priority).
Use the vSphere Distributed Switch traffic filtering and marking policy to configure priority levels.
For example, a tag of 7 for vSAN traffic indicates that vSAN traffic has high QoS demands.
Consider isolating vSAN traffic by segmenting it in a VLAN for enhanced security and
performance, especially if the backing physical adapter capacity is shared among several other
traffic types.
109
3-44 Jumbo Frames
Jumbo frames can transmit and receive up to six t imes more data per frame than the default of
1,500 bytes. This feature reduces the load on host CPUs when transmitting and receiving
network traffic.
You should verify that jumbo frames are enabled on all network devices and hosts in a cluster.
By default, the TC P Segmentation Offload (TSO) and Large Receive Offload (LRO) features are
enabled on ESXi. These features offload TCP /IP packet processing work onto the NI Cs. If not
offloaded, the host CPU must perform this work.
110
3-45 vSAN Network Requirements
The net work infrastructure and configurat ion on the ESXi hosts must meet the minimum
networking requirements for vSAN.
Connection between host s Each host in the vSAN cluster must have a VMkernel
adapter for vSAN t raffic exchange.
Net work lat ency The maximum lat ency is 1 ms RTT f or st andard
(nonstretched) vSAN clusters between all hosts in
the cluster.
IPv 4 and IPv6 support vSAN network support s both 1Pv 4 and 1Pv6.
vSAN al l-flash cluster A vSA N al l-flash cluster requ ires 10 Gb or faster with
Latency <1 ms RT T, 1 Gb is not supported.
111
3-46 vSAN Communication Ports
The port s listed are used for vSA N communication.
112
3-47 vSAN Network Best Practices
When determining network requirements, the following practices can help improve performance,
throughput, and availability:
• vSAN traffic is independent of host management traffic and is better when isolated.
• If you plan to use a shared 10 Gb Ethernet network adapter, place vSAN traffic on a
distributed switch and configure Network I/ 0 Control to guarantee bandwidth to vSAN
traffic.
• Use jumbo frames fo r a vSAN networks in data centers where jumbo frames are already
enabled in the network infrastructure.
113
3-48 Review of Learner Objectives
• Identify vSAN networking features and requirements
• Using separate storage controllers for host boot disks and vSAN disks is a best practice.
• The vSAN ReadyNode Sizer tool is not limited to storage. It also factors in CPU and
• •
memory s1z1ng.
• When planning network configuration for a vSAN cluster, consider availability, security, and
bandwidth requirements.
Questions?
114
Module 4
Deploying a vSAN Cluster
4-2 Importance
Because vSAN can run extremely I/ 0-intensive workloads, your vSAN node hardware must
meet VMware compatibility requirements. Performance characteristics and hardware stability
cannot be guaranteed if you do not thoroughly test devices, firmware, and driver combinations.
Understanding how to configure vSAN cluster settings that meet the requ irements of your
environment is important. Failure to properly configure these settings can affect performance
and availability.
115
4-4 Lesson 1: Preparing ESXi Hosts for a
vSAN Cluster
• Describe the use of vSphere Lif ecycle Manager to automate driver and firmware
installations
116
4-6 Verifying Hardware Compatibility
The VMware Compatibility Guide has a dedicated section for vSAN.
See the VMware Compatibility Guide before upgrading to ensure that new drivers and firmware
have been certified and tested for use with vSAN.
Ensure that the hardware devices are compatible with the versions of vSphere and vSAN.
• High-write endurance value and disk firmware details of the cache tier devices
117
4-7 Configuring Storage Controllers
Storage controllers play a key role in the I/ 0 path for vSAN operations to ensure optimum
pert ormance.
vSAN supports pass-through and RAID 0 modes. Pass-through is the preferred vSAN operating
mode.
See the VMware Compatibility Guide to verify the mode that is supported for your storage
controller.
p------------ -----
I
• Passth ro ugh
------------.
I
r • • • • • • • • • • •
I
I RA ID- 0
.- - 1 · • • - - 1 • • • • • • • • • • • I
I
---
I I
I
------------- ----- -------------
I
·- - - - - - - - - - - I .....__ - - - I - - - e - - - - - - -·
• • •
SSD
•
·- • •
SSD
•
·- _
....._ •
___. ....._
• _ •
____. _
·-
....._ •
__. .....__
• _ •
____.
118
4-8 Considering Multiple Storage Controllers
You might need to install additional storage controllers in certain scenarios:
• You want to reduce the impact of a potential controller failure by placing disk groups on
separate controllers.
• The queue depth of the current single controller is inadequate to meet the workload and
physical disk configuration.
• The business wants better performance, which typically requires multiple controllers.
119
4-9 Configuring BIOS for High Performance
The frequency of the physical CPU is controlled by either the BIOS or by the ESXi hypervisor
(OS controlled).
System BIOS
120
4-10 CPU Power Management
The following CPU power management policies can be selected to manage energy consumption
and performance.
If you do not select a policy, ESXi uses the Balanced Power policy by default. For vSAN nodes,
the power policy should be adjusted for Hig h Performance.
When CPU is idle, ESXi can apply deep halt states, also known as C-states, to reduce power
consumption.
@ High performance
Do not use any power management features
0 Balanced
Reduce energy consumption with minimal performance compromise
0 Low power
Reduce energy consumption at the risk of lower performance
( Custom
User-defined power management policy
C AN C EL
121
4-11 Verifying OS Controlled Mode
You use the vSphere Client to verify OS controlled mode.
Select the ESXi host and select Configure > Hardware > Overview > Power Management.
If ACPI C-states or ACP I P-states appears in the Technology text box, your power
management settings are set t o OS controlled.
Total OMB
System > Avallab•o OMB
Hardware v
Overview
Powe r Management [ EDIT POW ER POLICY I
Firmware T~ hnolo 9 y ·ACPI C-states, ACPI P-states
Virtual Flash v
Active po icy Balanced
V1rtua• Flash Resource Manaq_
122
4-12 Using VMware Skyline Health to Verify
Hardware Compatibility
VMware Skyline Health ensures that installed devices, drivers, and firmware are compatible w ith
the installed vSAN release.
The Hardware Compatibility section shows warnings if the controller, firmware, or drivers are not
listed as compatible in the VMware Compatibility Guide.
To view details, select vSAN Cluster and select Monitor > vSAN > Skyline Health > Hardware
compatibility.
Skyline Health
Last checked: 09/04/2020, 6:0218 PM RE TES T
123
4-13 vSAN Hardware Compatibility List
Database
The vSAN Hardware Compatibilit y List Database (HCL DB) is used for the compatibility checks.
If vCenter Server has Internet connectivity, the HCL DB downloads automatically and regularly.
If automatic download is not possible, t he HCL DB can be updated manually using an offline
JSON file.
To view details, select vSAN Cluster and select Monitor > vSAN > Skyline Health > Hardware
compatibility > vSAN HCL DB up to date.
For information about updating t he vSAN HCL database manually, see VMware knowledge base
article 2145116 at htt ps://kb.vmware.com/s/article/2145116.
124
4-14 Manually Updating Drivers and Firmware
You can download drivers and firmware from a vendor's website or from the VMware
Compatibility Guide website.
To install the downloaded drivers manually, copy the VMware installation bundle (VIB) to a
datastore or file system accessible to your host and use es x c 1 i commands to install the
drivers.
[root@tese-01:-) esx cli software vib install -d /tmp/ -ESX-7 . 0 . 1-nhpsa-2 . 0 . 14-offline_bundle-5036227 . zip
Inst allation Result
Message : The update completed successfully, but the system needs to be rebooted for the changes to be effective .
Reboot Required : tru e
V!Bs Installe d : Microsemi_bootbank_nhpsa_2 . 0 . 14-lOE.M . 701 . 0 . 0 . 4598673
VIBs Removed : Microsemi_bootbank_nhpsa_2 . 0 . 10-lOE.M . 701 . 0 . 0 . 42 4 0417
VI Bs Skipped :
Because manually installing drivers on individual hosts in a larger infrastructure becomes complex
to manage, use vSphere Lifecycle Manager.
125
4-15 Automating Drivers and Firmware
Installation
vSphere Lifecycle Manager centralizes automated patch and version management by supporting
the fallowing activities:
126
4-16 About vSphere Lif ecycle Manager
vSphere Lif ecycle Manager is a unified software and f irmware management utility that uses the
desired-state model for all life cycle operations:
vSphere Lifecycle Manager has a modular framework to support vendor firmware plug-ins.
-0 0
0
Desired
vmware.com Vendor State r@~
AV AV i \'.:V
vSphere Lifecycle Manager
Drift
Base Image Vendor Add-On Firmware and Drivers Add-On Vendor Plug-Ins
ES Xi Drivers BIOS
1/0 Cont rollers HP DELL
Storage Devices ·--------·
NICs
BMC
l[Q]1
•• ••
·--------·
Desire d
Image
• • • •
Apply Image Across Cluster
I
I
I
I
I
Ill 0 Ill Ill 0 Ill Remediate Drift I
'11111111- • • • • • • • • • • • • • • • • • • I
127
4-17 vSphere Lifecycle Manager Desired
Image Feature
The v Sphere Lifecycle Manager Desired Image feature merges hy pervisor and host life cycle
management activit ies.
An image is created locally from desired state criteria comprised of t he hypervisor base image
and vendor drivers and firmware.
The Hardware Support Manager (HSM) vendor plug-in enables connectivity for the vendor
catalog and host management.
Releases Vendor
Catalog
vmware.com Vendor
AV AV
v v
Firmware and
Drivers Add-On
ESXi Version Host Cred entials
Vendor Add - Ons Cluster Reposito ry
Components
NICS
Firmware Repository
Driver Reposito ry
A A
User Input Desired Image (Per Cluster) User Input
To start to using vSphere Lifecycle Manager Desired Image, the cluster must meet fol lowing
requirements:
If a host has a version of vSphere earlier t han 7.0, you must first use an upgrade baseline t o
upgrade the host and then you can start using images.
128
4-18 Elements of vSphere Lifecycle Manager
Desired Image
vSphere Lifecycle Manager Desired Image defines the exact soft ware stack to run on all ESXi
hosts in a cluster. It includes the fol lowing elements:
To maintain consistency, you apply a single ESXi image to all hosts in a clust er.
Firmware and
Drivers Add-On
129
4-19 Configuring vSphere Lif ecycle Manager
Desired Image
To configure an image:
OJ vSAN-Cluster ACTIONS v
Summary Monitor Configure Perm issions Hosts VMS Datastores Networks Updates
VM Hardware
Firmwar e and Drivers Addon CD SELECT (optional)
SAVE I VALIDATE
130
4-20 Setting Up vSphere Lifecycle Manager
Desired Image for New Clusters
When creating a cluster, you can also create a corresponding image for t he cluster:
2. Select the Manage all hosts in the cluster with a single image check box.
Location OJ SA-Da1acen1er
(D vSphere DRS
(D vSphere HA
vSAN
These services will have de1ault se11ings · these can be changed later in the
Cluster Quickstart work11ow.
lrnage setup
I CANCEL l
131
4-21 Remediating Clusters
When you remediate a cluster, vSphere Lifecycle Manager applies the following elements to the
hosts:
• Vendor add-ons
• User-specified components
Remediation makes the selected hosts compliant with the desired image.
You can remediate the entire cluster or precheck hosts without updating them.
132
4-22 Review of Learner Objectives
• Explain the importance of hardware compatibility
• Describe the use of vSphere Lifecycle Manager to automate driver and firmware
installations
133
4-23 Lesson 2: Deploying a vSAN Cluster
134
4-25 vSAN Cluster Configuration Types
Because vSAN is a cluster-based solution, creating a cluster is the first logical step in the
deployment of the solution.
Single-site vSAN clusters are configured on one site to run production workloads. All ESXi hosts
run on that single site.
vSAN stretched clusters span across three sites, two data sites, and a witness site. You typically
deploy vSAN stretched clusters in environments where avoiding disasters and downtime is a
key requirement.
135
4-26 Configuring a vSAN Cluster
You add hosts to the newly created cluster and configure vSAN.
• Cluster Quickstart
• Manual configuration
136
4-27 About Cluster Quickstart
Cluster Quickstart groups common tasks and consolidates the workflow. You can configure a
new vSAN cluster that uses recommended default settings for functions such as networking,
storage, and services.
137
4-28 Comparing Cluster Quickstart and Manual
Configuration
You can use Cluster Quickstart only if hosts have A cluster can always be configured
ESXi 6.0 Update 2 or later. manually, regardless of the ESXi version
and hardware configuration.
ESXi hosts should have similar configurations. This method offers more f lexibility while
configuring a new or existing cluster.
Cluster Quickstart helps configure a vSAN cluster per This method provides detailed control
recommendations. over every aspect of cluster
configuration.
Cluster Quickstart is available only through the the This method is available through any
HTM L5-based vSphere Client. version of the vSphere Client.
138
4-29 Creating vSAN Clusters
To create a vSAN cluster, you must first create the vSphere cluster:
Na e SA-vSA -01
Location Qi SA-Datacenter
© vSphere DRS
© vSphere A
vSA
These services will have default settings - these can be cnanged later 1n the
Ctus~er Ou1ckstart \\1orkflow
O "1anage all hosts 1r the clus er \Vith a single image ()
CANCEL
139
4-30 Adding Hosts Using Cluster Quickstart (1)
To start Cluster Quickstart, click the existing cluster and select Configure > Configuration >
Quickstart.
0 SA-vSAN-01 ACTIONS v
vSAN >
140
4-31 Adding Hosts Using Cluster Quickstart
(2)
1. On the Add hosts page, enter informat ion for new hosts or click Existing hosts to select
from a list of hosts in the inventory.
3 Ready to complete
10.198 26 7
10.198.26_8
root
root
• .......
•••••••
II x
x
10.198 269 root ....... x
10.198 2610 root ••••••• x
10.198.26.11 root ••••••• x
10.198 2612 root ••••••• x
.... llOClress O< FODN
C AN C [ L NCXT
The selected hosts are placed in maintenance mode and added to the cluster. When you
complete the Cluster Quickstart configuration, the hosts exit maintenance mode.
If running vCenter Server on a host in the cluster, you do not need to place the host in
maintenance mode as you add it to a cluster using the Cluster Quickstart workflow. The host
that contains the vCenter Server VM must be running ESXi 6.5 or later. The same host can also
be running Platform Services Controller. All other VMs on the host must be powered off.
141
4-32 Verifying vSAN Health Checks
After the hosts are added to the cluster, the vSAN health checks verify that the host has the
necessary drivers and firmware.
2. Add hosts
142
4-33 vSAN Cluster Configuration (1)
To configure the vSAN cluster:
1. Configure the networking settings, including vSphere distributed switches, port groups, and
physical adapters.
4. Claim disks on each host for the cache and capacity tier.
5. (Optional) Create fault domains for hosts that can fail together.
6. On the Ready to complete page, verify the cluster settings and click Finish.
1 Distributed switches
St •age 1 arf
4 Aa\ n ed pt ,
5 Cam d ks
Revew
On the Configure distributed switches page, enter networking settings, including distributed
switches, port groups, and physical adapters. Network I/ 0 Control is automatically created on all
switches created. Make sure to upgrade existing switches if using a brownfield vSphere
distributed switch.
In the port groups section, select a distributed switch to use for VMware vSphere vMotion and a
distributed switch to use for the vSAN network.
In the physical adapters section, select a distributed switch for each physical network adapter.
You must assign each distributed switch to at least one physical adapter. This mapping of
physical network interface cards (NICs) to the distributed switches is applied to all hosts in the
cluster.
On the vMotion and vSAN traffic page, it is strongly encouraged to provide dedicated VLANs
and broadcast domains for added security and isolation of these traf fic classes.
143
4-34 vSAN Cluster Configuration (2)
The Cluster Quickstart setup is the ideal t ime to configure the required vSAN options.
CANCEL B NEX T
Configuring the required vSAN options such as encryption, compression, and fa ult domains
during the Cluster Quickstart setup eliminates the need to enable them later and reduces the risk
of moving or disrupting data availability in the cluster.
144
4-35 Scaling vSAN Clusters Using Cluster
Quickstart
You can also use Cluster Quickstart to scale the vSAN cluster by adding more hosts to an
existing cluster. Al l existing cluster configurations are automatically applied to the new hosts.
Cluster quickstart
I SKIP OUICKSTART
We have collected some common conflguraoon tasks to make 11 ea$1e1' to get your cl~ter up and rum1ng If you prefer to configure your cluster manually. you can choose not to use ttlls automated wori<flow
Selected services: Not configured hosts: I Configure netWOlk settings for vMotion and vSAN traffic,
review and custorr1ze cluster se"'ices. and set up a vSAN
• vSphere ORS v vSAN HCL OB up-to-dote
datastore
• vSphereHA G vSAN HCL OB Auto Update
• vSAN L Sl:SI controller is V '-1.,are cer~'ied
v Cootroller 1s VMware certJtied for ESX1release
v Controller ~r 1s V'-lNa·e cert1f 'id
v Controllef l1·mware is VM,,are cer~fled
v Controllef disk group mode 1s V'-lwa1e certified
v vSAN firmware provid« health
v VSAN firmware \ersion recommendation
v Adv.;oced vSAN con'gurat1on ll sync
v Tme 1s sy~chr01'1Zed across hosts and VC
v com vmware vsan.heOllh.tes: cluster hostonmm :estnome
v Host physocal memory complaaoce check
v SO!twa·e ~rsion compat btkty
11 A DD 11 RE·V.ALIJATE 11 CONftGURE
145
4-36 Skipping the Cluster Quickstart Workflow
Advanced users can skip the Cluster Quickstart workflow and configure the vSAN cluster
manually. However, you cannot later return to the workflow.
: S KIP OUICKSfA~
Cluster q uickstart
con
146
4-37 Manually Configuring a vSAN Cluster
vSAN can be manually configured on a new or existing vSphere cluster. All hosts must have a
VMkernel network adapter for vSAN traffic configured, and vSphere HA must be temporarily
disabled.
The cluster configure w izard takes you through the vSAN cluster configuration, vSAN services,
disk claim, and fa ult domain setup.
~ ~ My-vSAN-Cluster ACilONS v
v GJ sa-vcsa-01 vclass.local Summary Monitor Configure Permissions Hosts VMs Data stores Net1;vorks Updates
v DJ My-Datacen:er
Services > vSAN is turned OFF .,..._ _ _ _ _ _ _ _ _...,. [ CONFIGURt I
> ~ My-vSAN-Cluster
Scheduled Tasks
CJt Data-At-Rest encryption ©
vSAN v
©
Services
147
4-38 Manually Creating a vSAN Disk Group
Disks are assigned to disk groups for either cache or capacity purposes. Each drive can be used
in only one vSAN disk group.
• VMware Virtual disk ~ Do nol c1a1m HOO 2 drskS on • llosts Paraoel SCSI
- • II VMware Vlrtual disk ~ Cache ber Flash 1 disk on • llOsts Paranei SCSI
• II VMware Virtual dis~ B Capaclly 11er Flash 2 disks on • hosts ParaUel SCSI
• II VMware Virtual djsk Capac1ry 11e1 Flash 1 d1Sk on • llOsts Parane1 SCSI
I C•~tfl I. .
When you create a disk group, consider the ratio of flash cache to consumed capacity. The ratio
depends on the requirements and workload of the cluster. For a hybrid cluster, consider using at
least 10% of f lash cache to the consumed capacity ratio.
In a hybrid disk group configuration, the cache device is used by vSAN as both a read cache
(70%) and a write buffer (30%). In an all-flash disk group configuration, 100% of the cache
device is dedicated as a write buffer.
148
4-39 vSAN Fault Domains
vSAN fa ult domains can spread component redundancy across servers in separate computing
racks. By doing so, you can protect the environment from a rack-level failure, such as power and
network connectivity loss.
vSAN requires a minimum of three fa ult domains. At least one additional fa ult domain is
recommended to ease data resynchronization in the event of unplanned downtime or planned
downtime, such as host maintenance or upgrades.
If fault domains are enabled, vSAN applies the active VM storage policy to the fa ult domains
instead of to the individual hosts.
149
4-40 Implicit Fault Domains
Each host in a vSAN cluster is an implicit fault domain by default.
vSAN distributes data across fault domains (hosts) to provide resil ience against drive and host
failure. This approach sufficiently provides the correct combination of resilience and flexibility for
data placement in a cluster in the most environments.
Object
-·-
B RAID-1
FTT=1
< c-o ~-
~-
RAID-1
--
11 I C1 III III C2 I II III v SAN
W11n.••
11 I I 1I I 0 III I
0
vSAN Cluster
150
4-41 Explicit Fault Domains
vSAN includes the ability to configure explicit fa ult domains that include multiple hosts.
vSAN distributes data across these fa ult domains to provide resilience against entire server rack
failure resulting from rack power supplies and top-of-rack networking switch failure.
Explicit fa ult domains increase availability and ensure that component redundancy of the same
object does not exist in the same server rack.
Object
~-
B RAID-1
F 11 =1 < c-o ~-
RAID-1
In this example, you should configure four fault domains, one for each rack to help maintain
access to data if an entire server rack failure occurs.
Standard vSAN clusters using the explicit fa ult domains feature offer great levels of flexibility to
meet the levels of resilience requ ired by an organization.
151
4-42 vSAN Fault Domains: Best Practices
For a balanced storage load and fa ult tolerance when using fa ult domains, consider the following
guidelines:
• Configure a minimum of three fa ult domains in the vSAN cluster. For best results, configure
four or more fa ult domains.
• Dedicate one additional fault domain with available capacity fo r rebuilding data after a
failure.
• A host not included in an explicit fa ult domain is considered its own fault domain.
• You do not need to assign every vSAN host to a fa ult domain. If you decide to use fa ult
domains to protect the vSAN environment, consider creating equal-sized fa ult domains.
152
4-43 vSphere HA on vSAN Clusters
vSAN, in conjunct ion w ith vSphere HA, provides a high availability solution for VM workloads. If a
host that is running VMs fails, the VMs are restarted on other available hosts in the cluster.
• vSphere HA does not use the vSAN datast ore as a datastore heart-beating location.
Ext ernal datastores can still be used with this functionality if t hey exist.
153
4-44 Enabling vSphere HA on a vSAN Cluster
To use vSphere HA to provide high availability to the VMs that run on the vSAN cluster, the
following requirements must be met:
154
4-45 vSphere HA Networking Differences with
vSAN
When enabling vSphere HA and vSAN on the same cluster, consider the fol lowing points to
ensure that vSphere HA sees the same network topology as vSAN:
• vSphere HA interagent traffic traverses the vSAN network rather than the management
network.
r- .. - ..
~ X Virtual Distributed Switch X •
j
·-··-··-··-··-··-·· .. .. .. ··-··-··-··-··
Management Network
vSAN Network
vSphere HA Traffic
155
4-46 Recommended vSphere HA Settings for
vSAN Clusters
When configuring vSphere HA on a vSAN cluster, use the following recommended values.
156
4-47 Enabling vSAN Reserved Capacity
vSAN 7 U1 includes a reserve capacity workflow t o simplify storage capacity management for
vSAN backend operations and maintenance.
Operations reserve capacit y is used for internal vSAN operations, such as object rebuild or
•
repair.
Host rebuild reserve capacity is used to ensure that all objects can be rebuilt if any hosts fail.
-
OisiOleO ED•
~ EDI.,.
8 Aetu~try wnnen 18 56 GB (9 28%)
EllatJled EDtl
Oi!.atJltd £DtT
Disal:MO [t;Aa.,[
To enable the Host Rebuild reserve, you must have a minimum of four hosts in a vSAN cluster.
To enable, select vSAN Cluster > Configure > vSAN > Services to enable capacit y reserve.
When enabled, the operations reserve and host rebuild reserve options are available.
When vSAN Reserved Capacity reservation is enabled, and if t he cluster st orage capacity usage
reaches the limit, new workloads will fail t o deploy.
157
4-48 Reserving vSAN Storage Capacity for
Maintenance Activities
In earlier versions, 30% of the total capacity was used as slack space. In vSAN 7.0 U1, instead of
slack space, reserved capacity is used.
You can reserve vSAN storage capacity for the following maintenance activities:
• Operations reserve
Reserve Reserve
Capacity Capac·ty
I vSA I vSA I
vSAN 7 vSAN 7 U
158
4-49 Planning for Capacity Reserve
The operations Reserve reserves capacity for internal vSAN operations, such as object rebuild
•
or repair.
The host rebuild reserve reserves capacity to ensure that all objects can be rebui lt if host failure
occurs.
The host rebuild reserve is based on N+1. To calculate the host capacity reserve, divide 100 by
the number of hosts in the cluster. The answer is the percentage reserved on t he hosts:
• For example, in a 20-host cluster, the host rebuild reserve reserves 5% capacity on each
host to ensure sufficient rebuild capacity.
One caveat is t hat vSAN calculates t he amount of capacity to reserve using the the host with
the highest capacity in clusters that have hosts contributing differing capacities:
• For example, in a 20-host cluster with 9 hosts contributing 75 GB, 10 hosts contributing 100
GB, and 1 host contributing 200 GB, vSAN reserves 10 GB (5% of 200 GB) on all hosts, no
matter how much capacity they contribute.
159
4-50 VMware Skyline Health
VMware Skyline Hea lth is the primary and most convenient way t o monitor vSAN health.
VMware Skyline Hea lth provides an end-to-end approach to monitoring and managing t he
environment. It also helps ensure optimal configuration and operation of your vSAN environment
to provide the highest levels of availability and performance.
• Configuration inconsistency
• Failure conditions
The ideal methodology to resolve a hea lth check is t o correct the underlying situation. You must
determine t he root cause and fix the issue for all transient conditions.
Health check alert s that flag anomalies for intended conditions can be suppressed.
160
4-51 vSAN Logs and Traces
vSAN support logs are contained in the ESXi host support bundle in the form of vSAN traces.
vSAN support logs are collected automatically by gathering the ESXi support bundle of all hosts.
Because vSAN is distributed across multiple ESXi hosts, you should gather the ESXi support logs
from all t he hosts configured for vSAN in a cluster.
By default, vSAN traces are saved to the Iv a r I 1 o g Iv s ant races ES Xi host system
partition path. The traces can also be accessed from a symbolic link in Iv s ant races.
VMware does not support storing logs and traces on the vSAN datastore.
When USB and SD card devices are used as boot devices, the logs and traces reside in RAM
disks, which are not persistent during reboots.
Consider redirecting logging and traces to other, persistent storage when these devices are
used as boot devices.
For more information about redirecting vSAN logs and traces, see VMware knowledge base
article 1033696 at https://kb.vmware.com/s/article/1033696.
161
4-52 Backup Methodology
Regardless of the storage system, backup and restore operations are fundamental to achieving
your organization's recovery point objectives:
For more information about backup solutions supported by VMware vSAN, see VMware
knowledge base article 56975 at https://kb.vmware.com/s/article/56975.
162
4-53 Lab 2: Configuring a Second vSAN
Cluster
Manually configure the vSA N cluster and verify cluster information using the command line:
1. Create a Cluster
2. Verify VM Compliance
163
4-55 Review of Learner Objectives
• Deploy and configure a vSAN cluster using Cluster Quickstart
• vSphere Lifecycle Manager can automate driver and firmware updates for supported
controllers.
• vSAN is a cluster-based solution. Creating clusters is the first logical step in the deployment
of this solution.
• The new, streamlined method to configure vSAN clusters is to use Cluster Quickstart.
• During t he Cluster Quickstart setup is t he ideal time t o configure required vSAN services.
• Advanced users can skip the Cluster Quickstart workf low and configure vSAN clusters
manually.
Questions?
164
Module 5
vSAN Storage Policies
5-2 Importance
Storage policies are the logical rule sets that define how vSAN distributes objects and
components across a datastore. These ru les are how vSAN meets specific business needs for
redundancy, performance, high availability, and other benefits.
165
5-4 Lesson 1: vSAN Storage Policies
166
5-6 Storage Policy-Based Management
Storage Policy-Based Management (SPBM) helps you ensure that VMs use st orage that
guarantees a specified level of capacity, performance, availability, redundancy, and so on.
Knowing approximately how many objects and components are needed and their capacity
consumption guides the planning decisions for the datastore. Storage policies define these
numbers by applying storage requirements to objects which determines how many component s
any particular object will have. Planning both your most commonly applied policy and the other
edge case policies that you might need, and knowing what systems those policies are applied t o
is crit ical.
VM storage policies are used during the provisioning of a VM to ensure that VM objects and
components are placed on the datastore t hat is best for its requirements. Ideally, you want t o
create the best mat ch of predefined VM storage requirements wit h available physical storage
properties.
167
5-7 Defining Storage Policies: vSAN Rule
Sets
vSAN rule sets:
• Include advanced policy rules that allow for additional storage requirements
When you configure a VM storage policy, the vSphere Client and vSphere Web Client display
the datastores that are compatible with capabilities of the policy.
When a VM storage policy is assigned to a VM, datastores are grouped into compatible and
incompatible categories. Assigning the VM to a datastore incompatible with the storage policy
puts the VM in a noncompliant state.
By using VM storage policies, you can easily see which storage is compatible or incompatible.
You can eliminate the need to ask t he SAN administrator or refer to a spreadsheet of NAA IDs
each time you deploy a VM.
168
5-8 Storage Policy Naming Considerations
You use a standardized storage policy naming structure to ensure that policies are applied
appropriately.
• MultiCluster-ProductionVMs-BasicProtectionEnhancedPerf
• MultiCluster-ProductionVMs
• App-SharePointSQLBackEnd-EnhancedProtectionEnhancedPerf
169
5-9 Monitoring Storage Policy-Based
Management
A storage policy defines a set of capability requirements for VMs:
vSAN monitors and reports policy compliance during the VM life cycle. If a policy becomes
noncompliant, vSAN takes remedial actions. vSAN reconfigu res the data of the affected VMs
and optimizes the use of resources across the cluster.
Planning for these operations is not necessary. Standard daily operations, such as
reconfiguration processes, occur w ith minimal effect on the regular workload and are accounted
for in VMware best practices.
170
5-10 VM Storage Policy Capabilities for vSAN
Aside from the objects on the datastore themselves, storage policies are highly influential on
vSAN datastore planning.
Because the vSAN storage policies are critical to ascertain the object and component needs of
the datastore, which policy is applied to the majority of objects is an important consideration
when planning the deployment and can have a significant impact on a datastores architecture.
The default vSAN storage policy is created and implemented when vSAN is enabled on a
cluster. This policy contains a rule set with all rules defined at t heir default values. A VM w ith the
default policy applied supports a single failure and is striped across a single drive.
171
5-11 About Failures to Tolerate
The Failures to Tolerate configuration has a significant effect on datastore planning.
The number of failures to tolerate and the method used are important in determining how many
components are deployed to the datastore, how much capacity is consumed, and how the data
is distributed.
The number of failures to tolerate sets a requirement on the storage object to remain available
after a specified number of failures corresponding to the number of host or drive failures in the
cluster occurs. This value specifies that configurations must contain at least a number of failures
to tolerate + 1 replica.
Witnesses ensure that the object data is available even if the specified number of host failures
occur. If the Number of Failures to Tolerate is configured to 1, the object cannot persist if it is
affected by both a simultaneous drive failure on one host and a network failure on a second
host.
Consider the following use case: VMBeans approaches their professional services vendor to
advise them of a policy configuration for their vSAN datastore. The customer is using a four-
host cluster and wants to incorporate fault-tolerance on their systems. The vendor recommends
that a storage policy of Failures to Tolerate of 1 is attainable, using their current configuration.
172
5-12 Level of Failures to Tolerate
The number of failures tolerated by an object has a direct relationship w ith the number of vSAN
objects and components created.
• For n fai lures that are tolerated, n+ 1 copies of the object are created.
• For n fai lures that are tolerated, 2n+ 1 hosts contributing storage are required.
In t he example, the VMDK object tolerates 1 failure (FTT 1) and uses RAID 1 (Mirroring) to
protect from that one failure which means that 1 object is represented by 3 components on the
datastore. Because vSAN provides t his protection using mirroring, two full copies of t he data
exist so that one copy remains in place if the other becomes inaccessible.
173
5-13 vSAN Data Protection Space
Consumption
Ensure t hat coding has a significant space savings over mirroring. Space savings provided by
using RAID 5 (erasure coding) is 33% less t han mirroring. Space savings provided by using RAID
6 (erasure coding) is 50% less t han mirroring.
174
5-14 Comparing RAID 1 Mirroring and RAID
5/6 Erasure Coding
The number of failures t o tolerate and t he met hod used to tolerate those failures have a direct
effect on the architecture of the dat astore, including how many hosts to use, how many disk
groups are on each host, and t he overall size of the dat astore.
Erasure coding provides significant capacity savings over mirroring, but erasure coding incurs
additional overhead in IO PS and network bandwidth. Erasure coding is only support ed in all-flash
vSAN configurations.
While mirroring techniques excel in workloads where performance is a critical factor, t hey are
expensive regarding the amount of capacity t hat is required. RA ID 5/6 (erasure coding) can be
configured to help ensure the same levels of component availability while consuming less
capacity t han RAID 1.
The use of erasure coding results in a smaller capacity consumption increase for the same
number o f failures t o tolerat e, but at a cost of addit ional host requirements and write overhead,
in comparison to mirroring. This additional overhead is not unique t o vSAN and is common
among current storage platforms.
The space savings of erasure coding is guaranteed. For example, assigning a policy with a RAID
5 erasure coding rule reduces capacit y consumption by 33 percent, compared to the same level
of availability (FTT= 1) wit h RAID 1 mirroring.
Object s with RAID 5 or 6 applied are considered to have additional stripe properties not defined
by the number of st ripes rule:
175
For more information about differences between vSAN RAID 5 and traditional hardware-based
RAID 5, see Use of Erasure Coding in VMware Virtual SAN 6.7 at
https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.virtualsan.doc/GUID-
AD408FA8-5898-4541-9F82-FE72E6CD6227.html.
Consider the fallowing use case. VM Beans is planning to use the default storage policy with a
fa ult tolerance of 1. They have 1 cluster w ith 4 hosts, which is sufficient to support the failure
level. However, upon investigation, VMBeans notes that the datastore will likely have capacity
challenges with the mirroring configuration. VMBeans determines that changing from RAID 1 to
RAID 5 erasure coding will regain some of the used space and still provide fault tolerance of 1.
176
5-15 Number of Disk Stripes per Object
When a cluster is being planned, stripes are a direct contributor to the number of components
that make up a vSAN object and the increased need for capacity devices and disk groups.
vm Storage Policy
• FTI 1 - RAID 1 ./-
• Stripes 2 ./-
c-o
..
I I
I I
L. ..
RAID 1
-
I
I NVMe I
I I
I . . • • • ~. . . I
1 Stripe sso SSD ss o SSO I
I • • . • I
I . . . . . =:t.
....,_.. J:=::f:::l . . . . I
I SSD SSD SSD Stripe Stripe SS D • SSD Stripe :
I . . . . . .rt::::=:r "l::::::::fi · .
I
: Disk Group Disk Group Disk Group Disk Group 1
Overusing stripes has the potential to affect the ability of vSAN to balance components across
the datastore. This effect can be cascading, depending on how it interplays with other policy
settings. For example, an object mirrored once would create a single component per replica.
With a stripe width of two, those two replica components become four stripe components.
To understand the effect of the number of stripes, examine the context of write operations and
read operations. All writes go to the cache device and the value of an increased number of
stripes might not improve performance. The new stripes might use a different disk group or be in
the same disk group and therefore use the same cache device.
177
From a read perspective, an increased number of stripes helps when you experience many
cache misses. For example, if a VM consumes 4,000 read IOPS and experiences a hit rate of 90
percent, 400 read operations must be serviced directly from magnetic drives. A single hard
drive might not be able to service those read operations, so an increase in number of stripes
might help.
In general, the default number of stripes of 1 should meet the requirements of most, if not all, VM
workloads. Any performance improvements are dependent on the workload.
Consider the following use case. VMBeans has another location with a four-host hybrid cluster
and is concerned that a number of their more I/ 0-intensive application servers are performing
less than optimally. After investigating the performance statistics for the cluster, VMBeans
determines that there is a higher number of cache read misses for these servers than for others.
The administrator adds a stripe value of 2 to the policy that applies to those servers. The
increased stripe value spreads the data across more disk groups ensuring more 1/0 paths and
the cache read miss problems are resolved.
178
5-16 Planning Considerations: Stripe Width
The number of disk st ripes per object value determines t he number of capacity devices across
which each storage object copy is striped.
For example:
• The minimum number of drives is 6, which is t he t otal of dat a copies multiplied by the
number of stripes.
For data resiliency, st ripes from different mirrors o f the same object are never placed on t he
same host.
179
5-17 Flash Read Cache Reservation
The flash read cache rule is used only for hybrid configurations that allocate a portion of the
cache device to one or more VMs.
• When the f lash read cache is allocated, the rule uses the fu ll allocated storage of the object,
not the currently consumed storage.
I ----------- I
I
I
• I
I
I
I SSD I
I
40 GB Cache
I
I • I
I
I I
• I
I
I
I
I
0 I 400 GB Capacity
I
I
I
• • I
I
I I
5 GB of cache is reserved I Disk Group
for a single object. ----------- I
When planning a vSAN hybrid cluster, consider whether any application servers might benefit
from this sort of reservat ion and then size the cache device accordingly.
By default, a disk group cache device shares the read cache equally, based on demand between
the objects stored on the disk group. When a flash read cache reservat ion is applied, a specified
portion of the cache device is reserved for a specific vSAN object.
Reserved read cache is specified as a percentage of the logical size of the object:
180
• Def ault value: 0 percent.
• This setting has no effect on all-flash architecture and is intended for hybrid configurations.
As a best practice, avoid applying any flash read cache reservat ion to t he VM home object
because it w ill not benefit from this policy.
This value specifies the logical size of the st orage object in percentage up to the ten-thousandth
place (four decimal places). This specific unit size is needed so that administrat ors can express
appropriate sizes as t he SSD capacity increases. For example, in a 1 TB drive, if an administrator
is limit ed to 1 percent increments, these increments are equivalent to cache reservations in
increments of 10 GB. This value is t oo large for a single VM in most cases. Ideally, the read cache
should match t he working set of t he v irtual disk to maximize the read cache hit rate. The
reservation should be set t o 0, except to solve specific use cases regarding read-int ensive VMs.
Consider the fal lowing use case. VM Beans is experiencing a problem wit h one of t heir high-
traffic VMs. Occasionally, the VM fails to service request s for data in a timely fashion. After
investigating, IT finds that the cache device on the disk group does not have sufficient resources
to service the critical VM with the other requests that come in. The best solut ion would be to
move f rom a hybrid architecture to an all-flash vSAN archit ecture. Until they migrate to an al l-
f lash architect ure, VM Beans applies a read cache reservation to the VMDK of the VM. This policy
ensures that a portion of the cache device is always available to that VM object. The critical VM
now has t he necessary resources available for when the read requests are high.
181
5-18 Force Provisioning
Force provisioning allows an object to be created despite not having sufficient resources in the
cluster. vSAN makes the object compliant when addit ional resources are added.
Force provisioning carries additional considerations to be addressed during the planning of the
datastore:
• Placing hosts in maintenance mode could affect the accessibility of a VM with no failure
tolerance.
• Placing a host in maintenance mode could affect the pert ormance of the cluster if a large
number of force-provisioned machines must be moved.
• Consider the resources consumed by noncompliant VMs when adding resources to the
cluster.
Storage Policy
• FTT 2 - RAID 1 ./
vm - c-
...
I ...._...... vmdk._____ I
I I
L..
#'------------ --------------------------, \
'
I
I
I
I
I I
I
I NVMe I ( NVMe I :
I
1
=::::::=
-. .
-..-- . . . . I
:
I
I
. Replica
•
•
SSD
_......,.
•
•
__
•
•
SSD
•
•
----I
•
.
SSD
•
.
.
.
SSD
•
.
I
I
I
. . . . • • • •
I SSD SSD SSD SSD SSD SSD I
I . . . . • • • • . . . . I
Disk Group I
: Disk Group Disk Group I
I I
I
_____ I
' 4111s----- I
;
vSphere vSAN
The diagram shows a VM provisioned by using a policy in which the number of failures to
tolerate is set to 2. Using the 2n+1 equation for the number of fai lures, the policy requires at least
five hosts in the cluster. Using force provisioning, the VM is deployed to tolerate 0 failures and 1
stripe. When additional resources are available, vSAN makes the VM compliant with its policy.
182
vSAN prioritizes creating data components on the datastore so t hat, after the required
resources become available, data components are created first to secure the data as early as
possible.
Force provisioning overrides the fallowing policy rules to provision VMs to a dat astore unable to
meet the policy requirements.
Consider the fol lowing use case. VMBeans is deploying a vSAN cluster at a new locat ion and is
experiencing a delay in t he shipment o f some of the hardware that is required to fu lly implement
their solution. Some of t he new VMs need more storage for fa ult tolerance than t he cluster can
provide wit hout the delayed hardware.
The new location has a hard deadline for operational reasons, so the Operations team must
immediately build and configure the VMs. The team craft s a policy using force provisioning to
apply to the VMs. The VMs deploy even though there are insufficient resources for the policy
because force provisioning is enabled. When the addit ional resources are added to the cluster,
vSAN rea llocates t he objects to meet all policy needs.
183
5-19 Object Space Reservation
When planning a vSAN datastore, you must consider whether objects are provisioned thin, thick,
or somewhere in between. VMs are thin-provisioned by default.
The level of reservation dictates both real storage versus logical storage consumption, as well as
deduplication and compression:
• If deduplication and compression are disabled, objects can be provisioned with 25%, 50%,
75%, and thick configurations.
• If deduplication and compression are enabled, space reservation is limited to thick and thin
configurations.
,,,.------------
I
----------- ....' I
I
I I
I I
I NVMe I
Replica
I
I
•
I: sso :11:sso ] I
I
I
I
I
I
• •
I: sso :11: sso :I •
I
I
I
I Disk Group Disk Group Disk Group I
I
I
'
____ _
.....
vSphere vSAN ------""
I
I
This capability defines the percentage of the logical size of the storage object that is reserved
during initialization. Reserved storage is thick provisioned (lazy zero) to the value of the setting
and the remainder is thin provisioned. Lazy zero provisioning is used in calculations for total
capacity but does not consume the space. The value is the minimum amount of capacity to be
reserved.
As a best practice, avoid applying any object space reservation to the VM home object because
it w ill not benefit from this policy.
184
When used with deduplication and compression, object space reservation results in reduced
capacity availability because space must be reserved in case the blocks become non-unique.
This does not affect performance.
Consider the fallowing use case. The Operations team is building a new VM for a business unit
and one of the requirements is that its drives are thick provisioned from the beginning. The team
deploys the VM using a policy that reserves the space that is required for the VM DK.
185
5-20 IOPS Limits for Objects
IOPS limits for objects is a quality-of-service feature that limits the number of IOPS t hat an
object can consume.
The IOPS limits for objects have two primary use cases:
• To prevent workloads from affecting other workloads by consuming too many IOPS
• To create artificial standards of service as part of a tiered service offering, using the same
pool of resources
Limiting the IOPS of one or more VMs might be advantageous. In environments with a mix of
both low and high utilization, a VM with low utilization during normal operations can change its
pattern and consume larger amounts of resources, preventing other VMs from operating
properly.
Consider the following use case. One of the VMBeans servers that was deployed to a vendor-
maintained vSAN datastore has performance issues when usage is at its highest. The vendor
investigates and informs VMBeans that they are exceeding their IOPS for the server during peak
times. T he vendor suggests that VMBeans invest in a higher tier of service for that VM to
prevent this from affecting t heir operations.
186
5-21 Disabling Object Checksums
Software checksums are used to detect data corrupt ion t hat might be caused by hardware or
software components.
The checksums feature can be disabled if t his functionality is already included in an application
such as Oracle RAC.
Software checksums can detect corruption that might be caused by hardware or software
components during read or write operations.
• Latent sector errors: Are typically the result of a physical drive malfunction.
• Silent data corrupt ion: Can lead to lost or inaccurate data and significant downtime. No
effective means of detection exists wit hout end-t o-end integrity checking.
During read/writ e operations, vSAN checks for t he validit y o f the data based on the checksum.
If the data is not valid, vSAN takes t he necessary steps to either correct the data or report it t o
the user to take action.
vSAN has a drive-scrubbing mechanism that periodically checks the data on drives for errors. By
default, the data is checked once a year but this period can be modified with the
VSAN.ObjectScrubsPerYear advanced ESXi host setting.
187
5-22 Assigning vSAN Storage Policies (1)
The vSAN datastore has a default storage policy configured with standard parameters to
protect vSAN dat a. However, you can create user-defined cust om vSAN storage policies.
Default Policy
Failures= 1
Stripes= 1
• Reserves 0 storage
A datastore's default storage policy should have a rule set t hat applies to t he w idest range of
VMs t hat are to be hosted on the datastore. Individual VMs should have a custom st orage policy
applied that overrides the default policy for the datastore, as needed. When most VMs use the
default dat astore policy, the overhead of policy administration and compliance is minimized.
188
5-23 Assigning vSAN Storage Policies (2)
The VM home directory, virtual disks, and the VM swap object can have user-defined custom
vSAN storage policies t hat override the default storage policy.
0 o Hard disk 1 G Healthy (!} vSAN Default Storage Policy 94814760-68a1-b 525-aafa-005056013 66f
0 r::J VM home G Healthy (!} vSAN Default Storage Policy 92 814760-c472-a Sc c-3 759-005056013 66f
A datastore's default storage policy should have a rule set that applies to t he w idest range of
VMs that are to be hosted on the datastore. Individual VMs should have a custom storage policy
applied that overrides the default policy for the datastore, as needed. When most VMs use the
default datastore policy, the overhead of policy administration and compliance is minimized.
189
5-24 Storage Policies and the VM Home
Object
The VM home object does not apply storage policies in the same way as other objects.
• Tolerate failures
• Force-provision
Other policy rules, such as stripes, do not affect the VM home object.
Physical Placement 2 obj e cts
O Group comp onents by host placement
Virtual Object Componen1s
v RAID 0
v RAID 0
[)
The VM in t he example has a storage policy t hat tolerates 1 failure and 3 stripes. The VM home
object is mirrored to tolerate 1 failure but is not striped across multiple drives. In contrast, t he
hard disk has 1 object that is mirrored and each mirror is striped across 3 drives.
The VM home object is the location where VM configuration files, such as .vmx, .log, digest files,
and memory snapshots are stored.
190
The VM home object overrides the following storage policy rules:
191
5-25 Viewing Object and Component
Placement (1)
You can examine each VM to see where its components are physically located.
When vSAN creat es an object for a virtual disk and det ermines how t o distribute the object in
the cluster, it considers t he following factors:
• vSAN verifies t hat the v irtual disk requirements are applied according t o the specified VM
storage policy settings.
• vSAN verifies t hat the correct clust er resources are used at the time of provisioning. For
example, based on t he protect ion policy, vSAN determines how many replicas to create.
The performance policy determines the amount of Flash Read Cache allocated for each
replica and how many stripes to create for each replica and where to place them in the
clust er.
• vSAN continually monit ors and report s the policy compliance status o f t he virtual disk. If you
find any non-compliant policy status, you must troubleshoot and resolve the underlying
problem.
192
5-26 Viewing Object and Component
Placement (2)
To view the component layout, select the objects that make up a VM and click VIEW
PLACEMENT DETAILS.
VIE\\' PLACEMENT DETAILS
v 8 tp SA·Payload-01 O Healthy
•
r I
a Hard disk l
Healthy
(l\
(j1
No RecluAdancy
No Reclulldancy
d45d3d60-507f-738f-478e-00505601366f
c3503 d60-93a3-d a3 0-64t7 -00505601366'
•
0
Q Herd d isk 2
a Hard d isk 2 · SA-Payloacl-Ol_lvmdk
O
0
Healthy
Healthy
(2t
G1
vSAN Default Storage Polley
aa533d60-S8f9..S568-6251-00505601366f
II .1..-.. '11"'1 e-
' ('t\
In the vSphere Client, the administrator can view the location of each object and component.
The Capacity Disk Name column provides the physical drive name to which a particular object is
deployed.
In the example, the VM hard drive is protected by a policy to tolerate 1 failure through mirroring.
Two replicas and one t iebreaker w itness component exist, each on a separate host.
193
5-27 Viewing Object and Component
Placement (3)
The component layout shows where the data is located on the physical datast ore components,
down to the specific disk.
• Object size
• Redundancy
• Stripes
Component O Active D inf·esxi-03 vclass local g Local VMware Disk (mpx vmhba
Component O Active 13 inf-esxl-02 vclass local g Local Vf\1ware Disk (mpx vmhba
Component 0 Active 13 tnf-esxi-03 vclass local U Local VMware Disk (mpx vmhba
Component O Active l[J int-esxl-01.vclass local U Local VMvl are Disk (mp>. vmhba
194
5-28 Verifying Individual vSAN Object
Compliance Status
The compliance status shows whether an object is compliant with its assigned storage policy:
~ sa-vm-01 .vclass.loca l A CT IO NS v
vApp Options
Alarm Definitions
Name T VM Storage Polley
T I Compliance Stat us T I Last Checked T
VMwareEVC
2 items
195
5-29 Verifying Individual vSAN Component
States
To verify the individual vSAN component state, select a VM and select Monitor > vSAN >
Physical Disk Placement.
If the vSAN component state is not Active, but specifically Absent or Degraded, the object is
noncompliant wit h t he assigned storage policy.
Type
I Component State
I
v Q Hard disk 1 (RAID 5)
- Component O Active
- Component O Active
- Component O Active
- Component 0 Active
- v CJ VM home (RAID 5)
Component O Active
- Component O Active
- Component 0 Active
- Component O Active
~
196
4-47 Enabling vSAN Reserved Capacity
vSAN 7 U1 includes a reserve capacity workflow t o simplify storage capacity management for
vSAN backend operations and maintenance.
Operations reserve capacit y is used for internal vSAN operations, such as object rebuild or
•
repair.
Host rebuild reserve capacity is used to ensure that all objects can be rebuilt if any hosts fail.
-
OisiOleO ED•
~ EDI.,.
8 Aetu~try wnnen 18 56 GB (9 28%)
EllatJled EDtl
Oi!.atJltd £DtT
Disal:MO [t;Aa.,[
To enable the Host Rebuild reserve, you must have a minimum of four hosts in a vSAN cluster.
To enable, select vSAN Cluster > Configure > vSAN > Services to enable capacit y reserve.
When enabled, the operations reserve and host rebuild reserve options are available.
When vSAN Reserved Capacity reservation is enabled, and if t he cluster st orage capacity usage
reaches the limit, new workloads will fail t o deploy.
157
5-31 Review of Learner Objectives
• Explain how storage policies work with vSAN
198
5-32 Lesson 2: Analyzing vSAN Objects and
Components Placement
199
5-34 About Storage Policy Changes
When the def ault policy is changed, the number of components created depends on t he policy
variables.
100 GB VMDK
, .,
l
Policy ~
FTI=l
RAID 1
SW= 1
r
\.._ ~ - - RAID1 - -
• •
vSAN vSAN
Replica 1 Replica 2
100 GB 100 GB
1111 0 1111 1111 0 . 1111
200
5-35 Activity: Object Count (1)
Based on the image, answer the following questions.
ACllO S v
p en
ssues a v
Issues
Trigg r _d I rm_
- Group componen y os p1aceme t
e on a ce v
c s te
Ove 1e 'I
A · · nc d ID )
as s a E e ts v
Compon nt
ras s
Ee Componen Ac ve
U liza ·on
~SS ~ Ac
v
) - · VI home RAIO ~
Pr1ys1ca1d isk p lacement
Pe tormance > Virtual mac 1ne s • ap obJe
201
5-36 Activity: Object Count (1) Solution
Based on the image, answer the following questions.
- Hard disk 1
- VM home namespace
- VM swap object
Hard disk 1 is the VMDK object. It comprises two replica components and one w itness
component.
CllOt s v
\5
Issues '--"
Group c omponen by os placement
f 1199 r _d I rms
ts v
Compon nt Ac
Tas s
E e Comp on en c e
U liza ·on
~ Ac
s v
)
Physic di dis~~ placement
Performance )
202
5-37 Activity: Object Count (2)
Based on the details provided, answer the questions.
Failures to Tolerate = 0
Stripes = 1
vSAN
100GB
203
5-38 Activity: Object Count (2) Solution
Based on the details provided, answer the questions.
Failures to Tolerate = 0
Stripes = 1
One object.
One component.
One drive.
100GB
0
vSAN
FTI=O
RAID-0
Object
Stripes=1
,;------------ -- - - - - ----~,,
''\
I
,, ;
\
I
RAID-0
vSAN
Component
0 1111
,,
I
\
\
' ', ,, ;
-------------------------
204
5-39 Activity: Object Count (3)
Based on the details provided, answer the questions.
Failures to Tolerate = 1
RAID 1 (M irroring)
Stripes = 1
vSAN
205
5-40 Activity: Object Count (3) Solution
Based on the details provided, answer the questions.
Failures to Tolerate = 1
RAID 1 (Mirroring)
Stripes = 1
One object.
Two components.
Two hosts.
vSAN
0 100GB
FTT=1
Object RAID-1 (Mirroring)
Stripes=1
RAID-1 RAID-1
I
I
I
I
I
vSAN vSAN
Component Component
0 I I II I
0 IIII 0 I III
I I I I
\ I \ I \ I
206
5-41 Activity: Object Count ( 4)
Based on the details provided, answer the questions.
Failures to Tolerate = 1
Stripes = 1
vSAN
RAID 5 erasure coding is a space efficiency feature optimized for all-flash configurations. Erasure
coding provides the same levels of redundancy as mirroring but w ith a reduced capacity
requirement.
Erasure coding guarantees capacity reduction over a mirroring data protection method at the
same failure tolerance level. As an example, consider a 100 GB virtual disk. Surviving one disk or
host failure requires two copies of data at twice the capacity, that is, 200 GB. If RAID 5 erasure
coding is used to protect the object, the 100 GB virtual disk consumes 133 GB of raw capacity, a
33% reduction in consumed capacity versus RAID 1 mirroring.
For more information about Erasure Coding for RAID 5 and RA ID 6, see Erasure Coding (RAID-
5/6) at https://storagehub.vmware.com/t/vsan-6-7-update-1-technical-overview /erasure-
coding-raid-5-6-3/.
207
5-42 Activity: Object Count ( 4) Solution
Based on the details provided, answer the questions.
Failures to Tolerate = 1
Stripes = 1
One object.
Four components (three data components and one distributed parity component)
Four hosts.
vSAN
0 100GB
FTT=1
Ob1ect RAID-5 (Erasure Coding)
Stripes=1
0 I II I 0 I II I
I
I
RAID-5 !
I
RAID-5
I
I
I
I I
I I
I I
I I
I I
I I
I I
I I
\ t ' I
\ t
,,,
\ I
' ...... ,,,
\ I
' ......
·----------------------- ------------------------
0 1111 : 0 IIII
RAID-5 RAID-5
I
\ I \ I
\ I \ I
' ...... ,,, ' ...... ,,,
------------------------ ------------------------
208
RAID 5 erasure coding requires a minimum of four hosts. When a policy containing a RAID 5
erasure coding rule is assigned to this object, three data components and one parity component
are created. To survive the loss of a disk or host (FTT=1), these components are distributed
across four hosts in the cluster.
209
5-43 Activity: Object Count (5)
Based on the details provided, answer the questions.
Failures to Tolerate = 2
RAID 1 (M irroring)
Stripes = 1
vSAN
Failures o olera e © v
210
5-44 Activity: Object Count (5) Solution
Based on the details provided, answer the questions.
Failures to Tolerate = 2
RAID 1 (Mirroring)
Stripes = 1
One object.
Five hosts.
Five drives.
vSAN
0 100GB
FTI=2
Object RAID-1 (Mirroring)
Stripes=1
,,
,,----------- ------------,, \
\ ,,
,,----------- -----------,,' \ ,I ,
,------------- -----------,, \
\
I \ I \ \
I \
I '
I
\
I
I
\
,
lo fo lo 1111 I
RAID-1 RAID-1 RAID-1
I I I
' \ I \ I
' ...... , _________________________ ,
I
, ',...,, _________________________, ,' \
'•...... ________________________ _,'
I
I
,, •'----------- ----------... .. \
\. I
.
, , ------------ ----------....•,
\
I \ I \
I
vSAN vSAN
W1tnes<. W itness
.___Q - 0
\ ,' I
\
......... _________________________ , ,I
,
',,..., _________________________, ,' \
\
211
5-45 Activity: Object Count (6)
Based on the details provided, answer the questions.
Failures to Tolerate = 2
Stripes = 1
vSAN
Like RAID 5 erasure coding, RAID 6 erasure coding is a space efficiency feature optimized for
all-flash configurations. Erasure coding provides the same levels of redundancy as mirroring but
with a reduced capacity requirement. In general, erasure coding is a method of taking data,
breaking it into multiple pieces, and spreading it across multiple devices, while adding parity data
so that it can be recreated if one of the pieces is corrupted or lost.
212
5-46 Activity: Object Count (6) Solution
Based on the details provided, answer the questions.
Failures to Tolerate = 2
Stripes = 1
One object.
Six components (four data component and two distributed parity components).
Six hosts.
Six drives.
vSAN
0 100GB
FTI=2
Object RAID-6 (Erasure Coding)
Stripes=1
I
,,------------- -----------,...., \
\
,I ,'------------ -----------,', \ ,I ,'------------ ------------',
\
, ' , \
' , \
'
' 1
I
lo
I
I
I
I
I
I
I
I
I
I
I
' , ' , ' ,
'•,, ________________________ _/
\ I \ I \ I
'• ________________________ _/
,
....
',,.... ______________________ _,,''
,,
,, ---------- ---------••• \ ,,,,---------- ---------••• \ ,,,----------- ---------,,. \
, I \ I \
,
I \
'
' ' '
I
I
I
I
lo
I I
I I I I
I I
I
0 IIII 0 IIII I
I
I
I
I
I
I
I I
I
I
RAID-6 RAID-6 RAID-6 I
I
I
I
I
I
I
I
I
I
I
I
I
I
,
' \
' .------------------------- ,/
I
\
\
\
..------------------------- , ,
,,I
'
\
\
. _________________________,, '
........
I
213
RAID 6 erasure coding requires a minimum of six hosts. Using our previous example of a 100 GB
v irtual disk, the RAID 6 erasure coding rule creates f our data components and two parity
components. This configuration can survive the loss of two disks or hosts simultaneously
(FTT= 2).
214
5-47 Activity: Object Count (7)
Based on the details provided, answer the questions.
Failures to Tolerate = 3
RAID 1 (Mirroring)
Stripes = 1
vSAN
215
5-48 Activity: Object Count (7) Solution
Based on the details provided, answer the questions.
Failures to Tolerate = 3
RAID 1 (Mirroring)
Stripes = 1
One object.
Seven hosts.
Seven drives.
0 ~~~~~
L...:::;vSA
;::::.
N_1 RAI 0-1 (Mirroring)
Stripes=1
I. o 1111 I ••
lo 1111 I (o 1111 1
I
!. lo 11111
:
!
'
· ----
••
•
RAID-1 RAJD- 1
•.. r
RAID-1
.). r•N
·. ----
:
•
••
••
RAJD-1
~ •N
... ''
'''
.',
•' •''
,,'
'-~------------------------
,
....---·-----
,
,.
---------....... ,,'
,------------ ------·-- .... ••
•• (
,•..---------- -----·--- ...·,,
•
•
I •• •
•
216
5-49 Activity: Object Count (8)
Based on the details provided, answer the questions.
Failures to Tolerate = 1
RAID 1
Stripes = 1
400GB
FTT=1
RAID=1
vSAN Stripes=1
Object
217
5-50 Activity: Object Count (8) Solution
Based on the det ails provided, answer the questions.
Failures to Tolerat e = 1
RAID 1
Stripes = 1
One object.
Two drives.
400GB
vSAN
0 FTT=1
RAID -1
Object Stripes=1
,,
, ,--------------- ---------------- ,,'• ,;
, ---------------- --------------- ,,',
I
,, ' \
\ I
I , ' \
\
I \ I \
f I I I
vSAN vSAN
Mirro r 1 Mirror 2
RAID-1 RAID-1
0 IIII 0 IIII
I
\
\
\ , I
I \
\
\ I
, I
218
5-51 Activity: Object Count (9)
Based on the details provided, answer the questions.
219
5-52 Activity: Object Count (9) Solution
Based on the det ails provided, answer the questions.
Yes, double t he storage is required while the storage policy updat e operation takes place.
~-
~-
Stor:tge Po I'icy
• Two-Way (t.t or 100GB 100 GB
~
~-
- St orage P oI'icy
• Two-Way (t.lirror)
c-o • Stnpes=2
0
Ob,ect
. 0
Ob,.ct c-o • S1r pes=3
1. New RAID 1 mirrored components are created w ith new stripe width.
2. The 1/0 is directed simultaneously to both RAID 1 component sets during switchover.
3. The original RAID 1 component set is removed from t he object and delet ed.
220
5-53 Activity: Objects and Witnesses
Based on the image, answer the following questions.
Utilization
Component ~ Active t1 sb-esxi-04.vclass.local
vSAN v
Witness ~ Active L1 sb-esxi-01.vclass.local
Physical disk placen1ent
Performance Witness O Active sb -esxi-02 .vet ass.local
221
5-54 Activity: Objects and Witnesses Solution
Based on the image, answer the following questions.
Three objects.
Utilization [J sb-esxi-04.vclass.local
Component $ Active
vSAN v
Witness $ Active L1 sb-esx1-01.vclass.local
Physical disk fJlacen1ent
Performance Witness O Active sb -esxi-02 .vcl ass.local
222
5-55 Activity: VMs and Failures
Based on the image, answer the following questions.
0 6J sa-vm-02.vclass.local 0 Healthy
223
5-56 Activity: VMs and Failures Solution
Based on the image, answer the following questions.
sa-vm-02 has the vSAN Default Storage policy, so it supports one failure by default.
v 0 ~ sa-vm-01.vclass.local O Healthy
Addit ional information: sa-vm-01 has a RAID 5 policy applied. RAID 5 can protect against up to 2
failures.sa-vm-03 has RAID 0 policy. RAID 0 does not create replicas of the data. Hence, it does
not protect against any failure.
224
5-57 Activity: Failures and Witnesses
Based on the image, answer the following questions.
Summary Monitor Conf igure Permissions Dat astores Ne1works Snapshot s Updat es
225
5-58 Activity: Failures and Witnesses Solution
Based on the image, answer t he following questions.
The hard disk 1 object of the VM has the RA ID 1 policy applied. As shown in the image, it has
a total of four component s. Each of these component s represent s a replica of the data,
w hich implies that t he Failures to Tolerate policy value is 3. This means that the VM can
tolerate a maximum o f three concurrent failures.
Utilization
Component O Active [] sa-esxi-02 vclass.local
vSAN v
Component 0 Active [] sa-esxi-03.vclass.local
Physical disk placement
Performance Witness ~ Active [] sb-esxi-03.vclass.local
226
5-59 Activity: RAID Levels and Stripes
Based on the image, answer the following questions.
227
5-60 Activity: RAID Levels and Stripes
Solution
Based on the image, answer the following questions.
RAID 10.
Under each RA ID 0 tree, two component s are created, so a stripe width o f 2 was used.
sa-vm-05.vclass.local I I ACTIONS v
Utilization
Component 9 Active t sa-esxi-04.vclas s. local
vSAN v
RAID 0
Physical disk placement
Performance Component 0 Active [ sb-esxi-02.vclass.local
228
5-61 Activity: Failures and Snapshots
Based on the image, answer the following questions.
Utilization
Con1ponent ~ Active E1 sb-esxi-02.vclass.local
vSAN v
Component ~ Active sa-esxi-03 vclass local
Physical disk placement
Performance Component G Active sa-esx i-02.vclass.lo ca I
229
5-62 Activity: Failures and Snapshots Solution
Based on the image, answer the following questions.
1 How many failures can any of the objects tolerate and why?
The objects are using RAID 6. The highest Failures to Tolerate value supported by RAID 5
and RAID 6 is 2, so each of the objects can tolerate up to 2 failures.
Utilization
Con1ponent 0 Active E1 sb-esxi-02.vclass.local
vSAN v
Component 0 Active sa-esxi-03 vclass local
Physical disk placement
Performance Component G Active sa-esx i-02.vclass .lo ca I
230
5-63 Lab 4: Analyzing the Impact of Storage
Policy Changes
Analyze the impact of storage policy changes on VMs:
231
5-65 Review of Learner Objectives
• Verify the VM storage policy compliance status
• Storage policies help guide decisions regarding datastore architecture from the very
beginning.
• Policy-based storage enables you to ensure that performance and availability requirements
for VMs are met.
• Policy-based storage enables you to create and update many VM storage requirements
without downtime and maintenance windows.
Questions?
232
Module 6
vSAN Resilience and Data Availability
6-2 Importance
Maintaining a fault-tolerant vSAN environment strengthens the resiliency of the environment and
minimizes downtime.
233
6-3 Lesson 1: vSAN Resilience and Data
Availability
234
6-5 About Failure Handling
Failure is handled differently in traditional storage arrays and vSAN environment s.
• Hot spare disks are either set aside t o immediately replace failed disks or inst alled in the
system.
vSAN cluster:
• During a failure, component s such as st ripes or mirrors o f object s are dist ributed t o other
resources.
235
6-6 About vSAN Component States
vSAN components can exist in different st ates:
• Stale: No longer in sync with other components of the same vSAN object
Component
• Active
Component 0 Reconfiguring
Component
0 Absent
Component Degraded
236
6-7 About the vSAN Object Repair Timer
vSAN waits before rebuilding a disk object after a host is either in a failed state or in
maint enance mode. Because vSAN is uncertain if the f ailure is transient or permanent, the repair
delay value is set to 60 minutes by defa ult.
To reconfigure the Object Repair Timer, select the vSAN cluster and select Configure > vSAN >
Services > Advanced Options > EDIT.
> Data·At ·Rest Encryption Enabled GE NERATE NEW EN CRYPTION KEYS EDIT
237
6-8 Overriding the Object Repair Timer
The vSAN object healt h test includes funct ionality to rebuild components immediately, rather
than waiting as specified by t he Object Repair Timer.
To repair objects immediat ely, select the vSAN cluster and select Monitor > vSAN > Skyline
Health > Data > vSAN object health > REPAIR OBJECTS IMMEDIATELY.
9 Healthy 26 a64b5a5f-3cf0-ea6e-432d-00505601d5ca,504b5a5f-Bdcd·
-O Inaccessible 7 9c4b5a5f-059c-6c3e-a698-00505601d5bd,884b5a5f-827f·
238
6-9 Resynchronizing Components
The resynchronizing of components can be initiat ed in two ways.
• Host failure
• Policy change
• User-triggered reconfiguration
239
6-10 Failure Handling Scenario (1)
When restoring I/ 0 flow:
• Failure is detected and the failed components are removed from the active set.
• Assuming most object components are available, the 1/0 f low is restored.
VM
r - - - - .. - - - - -
I
I
I
I
I
I
I
vmdk I
I
I
I
I
I
I
I
I
I RA ID - 1 I
I
I
I
I
- - - - - - - - - - .i
vSAN ~ vSAN
Replica 1 Replica 2
Ill 0 Ill Ill )( Ill Ill 0 Ill Ill 0 Ill Ill 0 Ill
esxi-01 esxi-02 esxi-03 esxi-04 esxi-05
240
6-11 Failure Handling Scenario (2)
When rebuilding component s to establish protection:
• If the component state is absent, wait 60 minutes before init iating a rebuild .
• Start rebuilding.
VM
r - - - - .. - - - - -
I
I
I
I
I
I
I
vmdk I
I
I
I
I
I
I
I
I
I RAID - 1 I
I
I
I __________ .. I
• •
vSAN vSAN
Replica 1 Replica 2
Ill 0 Ill Ill )( Ill Ill 0 Ill Ill 0 Ill Ill 0 Ill
esxi-01 esxi-02 esxi-03 esxi-04 esxi-05
241
6-12 Failure Handling Scenario (3)
When a cache device failure causes degraded components, an instant mirror copy is created if
the component is affected.
I I I
I I I
I I I
VM I I I
I I I
-• .• .
•
RAID - 1 • • I
• I I
I I I
vSAN Network I I I
I I I
I I I
I I I
Ill 0 Ill I Ill 0 111 I Ill )( Ill I Ill 0 111
I I I
esxi-01 I esxi-02 I esxi-03 I esxi-04
I I I
• • • • • • • •
I I I
SSD SSD SSD SSD
• • I • • I • • I • •
I I I
~
I I I '
I I I
vSAN • • vSAN I vSAN I
vSAN
New Mirror Copy Replica ~
Replica Replica Witness
~ I I I
I I I 0
242
6-13 Failure Handling Scenario ( 4)
When a capacit y device fails wit h error and causes degraded component s, an inst ant mirror
copy is created if t he component is affect ed.
I I I
I I I
I I I
VM I I I
I I I
•
•
•
• .•
RA ID - 1 I I I
I I I
I I I
vSAN Network I I I
I I I
I I I
I I I
111 0 111 I 111 0 111 I 111 0 111 I 111 0 111
I I I
esxi - 01 I esxi - 02 I esxi -03 I esxi -04
I I I
• • • • • • • •
I I I
SSD SSD SSD SSD
• • I • • I • • I • •
I I I
I I I
Immediate Change ' • I ' r I ' r
I ' r
~ I I ~ I
vSA N . • vSAN I v I vSAN
New Mirror Copy Replica ,
Replica a Witn ess
~ I I I
I I ' I 0
243
6-14 Failure Handling Scenario (5)
When a capacity device fails without error and causes absent components, a new mirror copy is
created after 60 minutes.
I I I
I I I
I I I
VM I I I
I I I
•.
-• -•
RAID - 1 • • I
• I I
I I I
vSAN Network I I I
I I I
I
I - I
I
I
I
Ill 0 Ill I 111 0 111 I 111 0 111 I 111 0 Ill
I I I
esxi -01 I esxi- 02 I esxi -03 I esxi - 04
I I I
• • • • • • • •
I I I
SSD SSD SSD SSD
• • I • • I • • I • •
I I I
I ,r I I
vSAN
• I
I
I vSAN
I
I
I v
•, I
I
I
vSAN
•
New Mirror Copy Replica Rep lica a W·tness
~ I I I
I I
4
I 0
New Mirror Copy Absent
After 60 Minutes Component
244
6-15 Failure Handling Scenario (6)
When a storage contro ller fails and causes degraded components, resynchronizing begins
immediately.
I I I
I I I
I I I
VM I I I
I I I
. . .
- - -
RAID - 1 • • I
• I I
I I I
vSAN Network I I I
I I I
--
I I I
I I I
111 0 Ill I 111 0 111 I Ill 0 111 I Ill 0 Ill
I I I
esxi-01
• •
I
I
esxi-02
• •
I
I
esxi-03
•
~
•
I
I
esxi-04
• •
I I I
SSD SSD SSD SSD
• • I • • I • • I • •
I I I
I
,r I ,r I
,r
Immediate Change ' • I I I
~ I I I
vSAN ~ . vSAN I vSAN I vSAN
New Mirror Copy Replica ' Rep lica Replica W itness
~ I I I
I I I 0
245
6-16 Failure Handling Scenario (7)
When a host failure causes absent components, vSAN waits 60 minutes before rebuilding
absent components.
If the host returns w ithin 60 minutes, vSAN synchronizes the stale components.
I I I
I I I
I I I
VM I I I
I I I
.• .• .•
RAID - 1 • • I
I I I
I I I
vSAN Network I I I
I I I
- I
I
I
I
I
I
111 0 Ill I Ill 0 111 I 111 111 I Ill 0 Ill
I I I
esxi -01 I esxi -02 I esxi-03 I esxi - 04
I I I
• • • • • • • •
I I I
SSD SSD SSD SSD
• • I • • I • • I • •
I I I
, I
, I
'
, I
,
' I ' I I '
I I I
New Mirror Copy
After 60 Minutes
/ vSAN
Replica
~
'
• vSAN
Replica
I vSAN
Replica
I
vSAN
Witn~ss
I I I
I I I 0
246
6-17 Failure Handling Scenario (8)
When host isolation resulting from a network failure causes absent components, vSAN waits 60
minut es before rebuilding absent components.
If the net work connection is restored within 60 minutes, vSAN synchronizes the stale
components.
I I I
I I I
I I I
VM I I I
I I I
-- -- --
RAID - 1 • • I
I I I
I I I
vSAN Network I I I
I I I
I I I
I I I
Ill 0 111 I Ill 0 Ill I 111 0 111 I Ill 0 Ill
I I I
esxi -01 I esxi -02 I esxi - 03 I esxi -04
I I I
• • • • • • • •
I I I
SSD SSD SSD SSD
• • I • • I • • I • •
I I I
I I I
247
6-18 Failure Handling Scenario (9)
The net work partition results in isolating the esxi-01 and esxi-04 hosts.
vSphere HA restarts t he affected VMs on eit her the esxi-02 or the esxi-03 host. These hosts
are st ill in communication and own more t han 50% of the VM components.
Isolated I I I
I I I Isolated
I I I
VM I I
I I
RAID - 1
I
I I I
vSAN Network I I I
I I I
I I I
111 0 111 I 111 0 Ill I 111 0 111 I Ill 0 111
I I I
esxi-01 I esxi-02 I esxi-03 I esxi-04
I I I
• • • • • • • •
I I I
SSD SSD SSD SSD
• • I • • I • • I • •
I I I
I I I
I I I
I I I
I vSAN I vSAN I vSAN
Replica 1 Replica 2 Witness
I I I
I I I 0
248
6-19 Review of Learner Objectives
• Describe how to configure the Object Repair Timer
Questions?
249
Module 7
Configuring vSAN Storage Space Efficiency
7-2 Importance
As the number of virtual machines increases in the vSAN cluster, you can consider using the
vSAN storage space efficiency techniques to reduce the amount of space required for storing
data and storage cost.
251
7-3 Configuring vSAN Storage Space
Efficiency
252
7-5 About vSAN Storage Space Efficiency
vSAN storage space efficiency techniques reduce the t otal storage capacity required t o meet
your workload needs.
Enable the deduplication and compression on a vSAN cluster to eliminate duplicate data and
reduce the amount o f space required to store data.
You can set the VMs to use RAID 5 or RA ID 6 erasure coding which can protect your data while
using less storage space than the default RA ID 1 mirroring storage policy.
You can use TRIM/U NMAP to reclaim storage space, for example, when files are delet ed within
a virtual disk.
253
7-6 Using Deduplication and Compression (1)
Enabling deduplication and compression can reduce the amount of physical storage consumed
by as much as seven times.
Environments with highly redundant data, such as full-clone virtual desktops and homogeneous
server operating systems, naturally benefit the most from deduplication.
Likewise, compression offers more favorable results with data that compresses well, such as
text, bitmap, and program files.
You can enable deduplication and compression when you create a vSAN all-flash cluster or
when you edit an existing vSAN all-flash cluster.
Deduplication and compression are enabled as a clusterwide setting, but they are applied on a
disk group basis.
vSAN performs deduplication and compression at the block level to save storage space.
Deduplication and compression might not be effective for encrypted VMs because VM
encryption encrypts data on the host before it is written to storage.
254
7-7 Using Deduplication and Compression (2)
When you enable or disable deduplication and compression, vSAN performs a rolling reformat of
every disk group on every host, which involves all data to be evacuated.
Depending on the data stored on the vSAN datastore, this process might take a long time to
complete.
255
7-8 Using Deduplication and Compression (3)
Deduplication and compression occur inline when data is written back from the cache tier to the
capacity tier.
The deduplication algorithm uses a fixed block size and is applied within each disk group.
The compression algorithm is applied after deduplication but before the data is written to the
capacity tier.
Given the additional compute resource and allocation map overhead of compression, vSAN
stores compressed data only if a unique 4K block can be reduced to 2K or less. Otherwise, the
block is written uncompressed.
vSAN Layer
Disk Group
256
7-9 Disk Management
Consider the fallowing guidelines when managing disks and disk groups in a cluster with
deduplication and compression enabled:
• Consider adding additional disk groups to increase the cluster storage capacity.
:I
I
•
Cache
• • •
I:
I
:I
I
•
Cache
• • •
I:
I
:I
I
•
Cache
• • •
I:
I
I I I I I I
SSD SSD SSD SSD SSD SSD
I • • • • I I • • • • I I • • • • I
I I I I I I
• • • • • • • • • • • •
I SSD SSD I I SSD SSD I I SSD SSD I
I •
L---------.1
• • • I
.. _________ .,,I
I • • • • I •
L---------.J
• • • I
r---------,
1
Disk Group 1
I I
:I Cache I:
I I
• • • •
, ______
I 1
SSD SSD
I
I
.
•
--- ,
SSD
.
•
SSD 1
.
•
I
.. _________ ..
I .._
. _ ___,
. ..._
. _ ___,
. I
257
7-10 Design Considerations
Consider the fallowing guidelines when you configure deduplication and compression in a vSAN
cluster:
• VM storage policies must have either 0 percent or 100 percent object space reservations.
• The processes of deduplication and compression incur compute overhead and potentially
impact performance in terms of latency and maximum IOPS.
• However, the extreme performance and low latency of flash devices easily outweigh the
additional compute resource requirements of deduplication and compression in vSAN.
• The space consumed by the deduplication and compression metadata is relative to the size
of the vSAN datastore capacity.
258
7-11 Compression-Only Mode (1)
You can enable compression-only mode on an all-flash vSAN cluster to provide storage space
efficiency without the overhead of deduplication.
• Compression-only mode can reduce the amount of physical storage consumed by as much
as two t imes.
• Compression-only mode reduces the failure domain from the entire disk group to only one
disk.
If a vSAN cluster is enabled for deduplication and compression, any disk failure affects the
entire disk group operation.
• You can scale up a disk group without unmounting it from the vSAN cluster.
259
7-12 Compression-Only Mode (2)
The compression-only mode algorithm moves data from the cache t ier to individual capacity
disks, which also ensures better parallelism and throughput.
vSAN Layer
Disk Group
260
7-13 Configuring Space Efficiency
To configure space efficiency, select a vSAN cluster and select Configure > vSAN > Services >
Space Efficiency > Edit.
Select the Compression only or Deduplication and compression mode and click APPLY.
I
space efficiency I Compression only
None
v
v
Key provider
261
7-14 Verifying Space Efficiency Savings
To verify the storage space savings information, select a vSAN clust er and select Monitor >
vSAN > Capacity > CAPACITY USAGE.
capac ity
Capacity Overview
262
7-15 Using RAID 5 or RAID 6 Erasure Coding
(1)
You can use RAID 5 or RAID 6 erasure coding to protect against data loss and also increase the
storage efficiency.
Erasure coding can provide as much data protection as RAID 1 failures to tolerate = 1 while using
less storage capacity. For example, a VM protected with RAID 1 requires twice the virtual disk
size, but with RAID 5 it requires only 1.33 times the virtual disk size.
You can configure RAID 5 on all-flash vSAN clusters w ith four or more nodes and RAID 6 on six
or more nodes.
RAID 5 or RAID 6 erasure coding does not support a failures to tolerate value of 3.
263
7-17 Using RAID 5 or RAID 6 Erasure Coding
(3)
RAID 5 or RAID 6 erasure coding is a storage policy at tribute that you can apply to VM
components.
vSAN
264
7-18 Reclaiming Space Using TRIM/UNMAP (1)
vSAN supports SCSI UNMAP commands directly from a guest OS to reclaim storage space.
The guest operating systems can use TRIM/UNMAP commands to reclaim space that is no
longer used.
A TRIM/UNMAP command sent f rom the guest OS can reclaim the previously allocated storage
as f ree space. This opportunistic space efficiency feature can deliver much better storage
capacity utilization in vSAN environments.
• Faster repair
Because reclaimed blocks do not need to be rebalanced or rem irrored if a device fails, repairs
are much faster.
Removal of dirty cache pages from the write buff er reduces the number of blocks that are
copied to the capacity tier.
On Linux operating systems, off line unmaps are performed w ith the f strim command. lnline
unmaps are performed when the file system is mounted using the moun t -o dis card
command.
265
7-21 Enabling TRIM/UNMAP Support
TRIM/UNMAP support is disabled by default. You can enable TRIM/UNMAP support in the
following ways:
• PowerCLI:
266
7-22 Monitoring TRIM/UNMAP
To monit or TRIM/UNMAP statistics, select a host in the vSAN clust er and select Monitor >
vSAN > Performance > BACKEND.
Unmap Throughput measures UNMAP commands that are being processed by t he disk groups
of a host.
Performance
IOPS (D
12
0
7:30 AM S-00 AM S IS AM 9 30 AM
- Read IOPS - Write !OPS - Resync Read IOPS - Recovery write IOPS - ~ IOPS - Recovery lk\map IOPS
1 00 KB/s
I TrimNnmap Throughput <D I
512008/s
0.00 B/s
7:45AM 8-00 AM S·l SAM
267
7-23 Lab 6: Configuring vSAN Space
Efficiency
Configure vSAN space efficiency features:
268
7-24 Review of Learner Objectives
• Describe vSAN storage space efficiency
Questions?
269
Module 8
vSAN Security Operations
8-2 Importance
Maintaining data security is critical in any organization to meet enterprise security compliance.
vSAN offers the data-in-transit and data-at-rest encryption methods to ensure that data remains
secure.
Encrypting your vSAN datastore requires you to set up a key management server cluster.
When you enable encryption, vSAN encrypts everything in the vSAN datastore.
vSAN data-in-transit encryption secures the traffic exchanged between vSAN nodes.
271
8-3 Lesson 1: vSAN Security Operations
272
8-5 vSAN Encryption
vSAN encryption is a native HCI encryption solution. It is built in to the vSAN layer:
[ CA NCEL I APP L Y
273
8-6 Design Considerations for vSAN
Encryption
Consider the following points when working with vSAN encryption:
• Do not deploy your key provider on the same vSAN datastore that you plan to encrypt.
• The witness host in a two-node or stretched cluster does not participate in vSAN
encryption.
• vSAN data-at-rest encryption and vSAN data-in-transit encryption are independent of each
other. These features can be enabled and configured independently.
274
8-7 About Permissions
In secure environments, only authorized users should be able to perform crypt ographic
operations:
• You should consider assigning this role to a subset of administrat ors when enabling vSAN
encryption.
• You should review and audit role assignments regu larly to ensure that access is limited only
to aut horized users.
ROI' es
.. Alarms
. ~ ~9!~~·m
• ~~·~ al¥n'I
• Disao:e NIM «:.on
• M.xttr~~
• ~~ 111.l""'I
• .,... l"" ia1us
Perm ss1ons
• "lo<. .""'6SO'I
I"',er)~~ ~~
c
eOO!ir.'l:'r.!>f I • MoO•t
• Moo•r lo"c'
• ~
• ~ ~ ,llOf'e
Auto Deploy
• ~ I
.• ~-
Ctc '<::
·~
• E61
.• ~'
c;rui..
• Oc;c·e
• Ed.I
. •~ '~
;.r; ·e
- 1
Certificates
.. Conte.it Library
• ....... .... t
275
8-8 Setting Up Key Providers
Use a supported key provider to distribute the keys to be used with vSAN encryption.
To support encryption, add the KMS to vCenter Server and establish the trust. vCenter Server
requests encryption keys from the key provider.
The KMS must support the Key Management Interoperability Protocol (KMIP) 1.1 standard.
For more information about KMS vendor solutions supported by VMware, see
https://www.vmware.com/resources/compatibility/pdf/vi_kms_guide.pdf.
- - - - t '
11111111 s . '-
I
- - - /
'
/
vCenter Server
I/
276
8-9 KMS Server Cluster
Set up the KMS cluster for high availability to avoid a single point of failure.
• The KMS cluster is a group of KM IP-based key management servers that replicate keys to
one another.
• The KMS cluster must have key management servers f rom the same vendor.
_____....,. ,
r. - - - - - - - -
I 111111 § • I
I I t ,
- - - -
: 111 s •
..........___.. ,
11 I :
---- /
'/ vCenter Server
I lllll!!!l ~-----.
I I 111111 s . :
, ________ ...
KMS Cluster
277
8-10 Adding a KMS to vCenter Server (1)
You add a KMS to your vCenter Server system from the vSphere Client.
vCenter Server defines a KMS cluster when you add the first KMS instance and sets this cluster
as a default.
You can add KMS instances from the same vendor to the cluster and configure them to
synchronize keys among each other.
If your environment requires KMS solutions from different vendors, you can create KMS clusters
for each vendor.
Trust Authority
Kev Providers
278
8-11 Adding a KMS to vCenter Server (2)
To set up communication between the KMS and vCenter Server, trust must be established.
vCenter Server uses KMIP to communicate with the KMS over SSL or TLS.
....------ 1
r. - - - - - - - -
1
11111111 @ • 1 SSL or TLS
I I / 1,
~
____
: ..._
11111111 @ •....... ,I </
,,.. L-
__________ _
- - - - - - - - - - -
vCenter Server
' I/
: 11111111 @ • : KMIP Protocol
l ________ .J
KMS Cluster
279
8-12 KMIP Client Certificates
The type of certificate used by vCenter Server (KMIP client) depends on the KMS vendor.
Always check with the KMS vendor for their certificate requirements to establish the trust.
O vCenter Certificate
00'.>mlO!d the vCente cert f1cate and uoload t to tne- K.V.S
CANCEL NEXT
280
8-13 vSAN Data-at-Rest Encryption (1)
When vSAN data-at-rest encryption is enabled on a vSAN cluster:
1. vCenter Server requests a key encryption key (KEK) from the KMS.
3. Hosts use the KEK ID to request the KEK from the KMS.
4. Hosts create a unique data encryption key (DEK) for each drive.
6. vCenter Server requests host encryption key (HEK) from the KMS which is used to encrypt
core dumps.
I KEK :
.--
1
--
KEK ID :
·
-- - - - - - -
I DEK I I DEK I
Core
Dumps 88 88 88
•
88 88 88
- Disk Group - - Disk Group - - Disk Group -
281
8-14 vSAN Data-at-Rest Encryption (2)
When data-at-rest encryption is enabled on a new vSAN cluster, disks are encrypted
immediately on the creation o f disk groups.
If dat a-at-rest encryption is enabled on the existing vSAN clust er, a rolling disk format change is
perf ormed. Each disk group is evacuated in turn. The cache and capacity devices are
reformatted using the DEKs.
r- · -- · -- · -- · -- · -- · -- · -- · -- · -- · -- · -- · -- · -- · -- · -- · -- · -- · 1
- I
I vSAN Cl ust er -
I I
- I
I
I Ill 0 Ill Ill 0 Ill Ill 0 Ill I
- I
I • • • • • •
I SSD SSD SSD I
I
• • • • • •
I -
I • •- • • • • -• • • • ..... • • I
- SSD s. SD Rol ling SSD SSD Encryption SSD SSD I
-
•-• •- •
.....
I • • • • • • • •
I I
- -- t -- t - - t - - t - - t - - I - - I - - t - - t - - I - - I - - t - - l - - t -- t -- I - - I - - l
I
282
8-15 Operational Impact When Enabling
Encryption
On new vSAN clusters, the appropriate disk format is selected automatically, but an existing
vSAN cluster requires a disk format change (D FC).
You must consider the following points before enabling encryption on an existing vSAN cluster:
283
8-16 Enabling vSAN Data-at-Rest Encryption
To enable vSAN data-at-rest encryption, select t he vSAN cluster and select Configure > vSAN
> Services > Data-At-Rest Encryption > EDIT.
Enable Data-at-Rest encryption. Optionally, select Wipe residual data and Allow reduced
redundancy. Click APPLY.
I CANCEL I A PPLY
284
8-17 Wiping Residual Data
Select the Wipe residual data check box to erase any existing data f rom devices before you
enable data-at-rest encryption on an existing cluster.
This setting is not necessary for enabling data-at-rest encryption on new vSAN clusters.
Key provider
I ©
( C AN CEL l APPLY
285
8-18 Allowing Reduced Redundancy
If your vSAN clust er has a limited number of fault domains or hosts, select the Allow reduced
redundancy check box.
If you allow reduced redundancy, your dat a might be at risk during the disk reformat operation.
I C ANCEL I APPLY
286
8-19 Writing Data to an Encrypted vSAN
Datastore
Data is written to the cache tier:
2. Checksum is created.
3. Encryption is performed.
5. Decryption is performed.
8. Encryption is performed.
----------------r----------------r----------------
: ESXi-01 : ESXi-02 : ESXi-03 :
I~---------------~----------------~---------------J
I I I
I I I I
I I I
I
I I
I VM I
I
I
I I
I I
I I
I I
I I
I I
I I
I I I I
I I I I
I I
I RAID 1 I
I
I
I
I
I I
I I
I I
I I
I I
I I
I I
I
I I
I
I
I
I
I
• 1111 1111 I
I
I I
I
I I
• • I • • • •
I I
I SSD I
I
SSD SSD I
I • • • • • • I
I
I I
I
I I
I
I I
I
I • I
I
I I
I
I • • I
I
I
I
I SSD I
I
I • • I
I I
I I
I I
I I
~------------------------------------------------~
287
8-20 Scaling Out a Data-at-Rest Encrypted
vSAN Cluster
When you add a host to the data-at -rest encrypted vSAN cluster, the new host receives the
KEK and t he HE K from t he KMS.
The DEK is generated for each cache and capacity device, and disk groups are creat ed using t he
correct format.
-----------------------------------------------------
I I
I
vSAN Cluster I
I
• ......Ill---- ...... I
I
I
I
I
____......
...._ 0 Ill Ill 0 Ill Ill 0 Ill Ill 0 Ill I
I
I
I I
I
I
a> KEK I
I
I I
I
• • • • • • • •
I
I SSD SSD SSD SSD I
I • • • • • • • • I
I I
I I
I • • • • • • • • I
I SSD SSD SSD SSD I
I • • • • • • I
I
.. • • •. • • • • • • • • • • • • • • .• I
I . . I
1_ - - - - - - - - - - - - - - - - - - - - - - ~~~~~- - - - - - - - - - - - - - - - - - - - - - I
......... - ...... ..
•
• '
•
•
288
8-21 Performing Rekey Operations (1)
As part of security auditing, regularly generate new encryption keys to maintain t he enterprise
security compliance.
vSAN uses shallow rekey and deep rekey, which is a two-layer encryption model to make data
very well secured.
summary Monitor Configure Permissions Hosts VMS Data stores Networks Updates
Licensing > v Data-At-Rest Encryp~on Enabled ,- " ,,, - .,_ '' · '· "1• " · " , -, · 1
Trust AuthontV
Key provider SA·KMS-Cluster·01
Alarm Definitions
Scheduled Tasks D sk w ping Disabled
VSAN v
> D!lta·ln-Transit Encryption Enabled EDIT
Datastore Shannq
> File Service Disabled ENABLE
289
8-22 Performing Rekey Operations (2)
A shallow rekey operation replaces only the KEK, and the data does not require re-encryption.
A deep rekey operation replaces both the KEK and the DEK and requires a full DFC.
Perf arming a deep re key is time-consuming and might temporarily decrease the performance of
the cluster.
All encrypt on keys on the key management server cluster are regenerated .
a Also re· encryp t all de la on th e storage using the new k eys(!)
Allov1 reduced redundancy ©
290
8-23 Rotating KMIP Client Certificates
As part of enterprise security auditing, you might be required to periodically rotate the KMIP
client certificate on vCenter Server:
• When the client certificate is replaced, you must reconfigure the KMS to trust the new client
certificate.
0
o_ 0 -------------· I
_ ____..0- I
I
I
I
Trust KM IP Client Certificate I vCenter Server I
I
I
KMS I
I
I
I
I I. 1111 J I. 1111 1 I
I
I
I I. 1111 J
I· 1111 1 I
I
Trust KMS Certificate I
I I. 1111 )
I· 1111 I I
I
I ••••••• ••• •••• • ••••
• I
I •
~~~~
I
I
I
I
•• I
I •
••••••• • • •••••• I
I
I
I
I ' I
291
8-24 Changing the Key Provider
You can change the key provider. The process of changing the key provider is essentially a
shallow rekey operation:
SA-KMS ·Cluster·01
Allo•.v reduced redundanc. SA·KMS·Cluste<-02
I C A NCEL I AP PLY
292
8-25 Verifying Bidirectional Trust
After you change the key provider, you verify that the KMS connection is operational.
Communication between the KMS and the KM IP client is temporarily interrupted until
bidirectional trust is established.
Key Providers
293
8-26 About Encrypted vSAN Node Core
Dumps
Core dumps for vSAN nodes in crypto-safe mode are always encrypted using the HEK. Set a
password to decrypt encrypted cored dumps.
Password:
Confirm password:
(D You can upload tiles directly to V Mwar& by going to A dmln1stratJon >support > Upload Fiie to Se<'v1ce Request
A core dump is a state of memory that is saved at the time when a system stops respond ing
with a purple error screen.
Core dumps are used by VMware Support representatives for diagnostic and technical support
purposes.
ESXi host creates a VMFS-L based ESX-OSData volume and configures a coredump file.
294
8-27 vSAN Data-in-Transit Encryption (1)
vSAN data-in-transit encryption encrypts vSAN traffic exchanged between vSAN nodes.
vSAN uses a message authentication code to ensure authentication and integrity of the vSAN
traffic.
vSAN data-at-rest and vSAN data-in-transit encryption features are independent of each other.
They can be enabled and configured independently.
vSAN data-in-transit encryption does not rely on the KMS cluster for encrypting vSAN traffic
between vSAN nodes.
I CANCEL l APPLY
295
8-29 vSAN Data-in-Transit Encryption
Workflow
vSAN enforces encryption on vSAN traffic exchanged between vSAN nodes only when data-in-
transit encryption is enabled:
1. vSAN creates a TLS link between vSAN nodes intended to exchange the traffic.
2. vSAN nodes create a shared secret and attach it to the current session.
3. vSAN uses the shared secret to establish an authenticated encryption session between
vSAN nodes.
296
8-30 vSAN Data-in-Transit Encryption Rekey
As part of t he security compliance audits, vSAN init iates the rekey process to generat e new
keys at t he scheduled interval.
By default, the rekey interval is set to one day. Depending on enterprise security compliance,
the rekey interval can be adjusted as needed.
6 hours
12 hours
1 day
1. Select the vSAN cluster and select Configure > vSAN > Services > Data-In-Transit
Encryption > EDIT.
3. From the Rekey interval drop-down menu, select the required interval.
4. Click APPLY.
297
8-31 vSAN Data-in-Transit Encryption Health
Check
Individual vSAN node readiness for data-in-transit encryption is verified, and inconsistent
configuration can be remed iated.
To view t he status of t he vSAN data-in-transit encryption healt h check, select the vSAN cluster
and select Monitor > vSAN > Skyline Health > Data-in-transit-encryption > Configuration
check.
298
8-32 Scaling Out Data-in-Transit Encrypted
vSAN Clusters
When you add a host to the data-in-transit encrypted vSAN clust er, the new host configuration
is aut omatically remediated.
If you add a host running an older version of ESXi that does not support data-in-transit
encryption, the host is partitioned because it cannot communicat e with other hosts in t he cluster.
r----------------------------------------------------
I
1
I
I
I
I
I
I
I
I
I
I
I
I III 0 Ill Ill 0 Ill Ill 0 Ill Ill 0 Ill I
I
I
I
I
I
I
I • • • • • • • •
I
I SSD SSD SSD SSD I
I • • • • • • • • I
I
I
I • I
I
• • • • • • • • I
I
SSD SSD SSD SSD
I
I
~ . • - . • • • - . •
I
I
I
I
I
I
I
I vSAN Clust er I
I I
L----------------------------------------------------
299
8-33 Lab 7: Managing vSAN Security
Operations
Configure vSAN cluster encryption:
300
8-35 Review of Learner Objectives
• Explain how vSAN encryption works
• Rotating keys and certificates is a best practice to maintain high levels of security.
Questions?
301
Module 9
Introduction to Advanced vSAN
Configurations
9-2 Importance
Administrators must know how to enable and configure advanced vSAN features to benefit
their business.
Advanced featu res include using vSAN File Service to create file shares in the vSAN datastore,
scaling storage and compute independently w ith VMware HCI Mesh, using vSAN Direct to
manage local disks, and using the vSAN iSCSI target service to provide block storage to physical
workloads.
3. vSAN Direct
303
9-4 Lesson 1: vSAN File Service
304
9-6 vSAN File Service
With vSAN File Service, you can provision NFS and SMB file shares on your existing vSAN
datastore. You can access these file shares from supported clients, such as VMs and physical
workstations or servers.
You can also create a container file volume that can be accessed from Kubernetes pods.
vSAN File Service is based on the vSAN Distributed File System (VDFS), which provides the
underlying scalable file system by aggregating vSAN objects.
The VDFS provides resilient file server endpoints and a control plane for deployment,
management, and monitoring.
vSphere vSAN
305
9-7 vSAN File Shares
vSAN file shares are integrated into the existing vSAN Storage Policy- Based Management
(SPBM) on a per-share basis for resilience.
vSAN File Service uses a set of file service VMs (FSVMs) to provide network access points to
the file shares.
FSVMs run a containerized NFS service, and each file share is dist ributed bet ween one or more
containers.
r I
Fi le Share
I vSA N :
I
I
Docker Docker Docker File Se rvice •
I
I
Domain :
I
I
I
I
FSVM-1 FSVM -2 FSVM-3 I
I
·-----------------------------------------------------·
vSphere vSAN
306
9-8 vSAN Distributed File System
When you configure vSAN File Service, a VD FS is created to manage the following activit ies:
Scale-Out of Services
r I r I
r ' r ' r I
r ' r ' r I
Dr
Dr
Dr
Dr
Dr
D Dr
D
I I I I I
r ' I
r '
D D D D D D D D
l i l i l i l i l i l i l i l i
~f\lf-S ;
1111 0 111] 1111 0 111 I 1111 0 1111 1111 0 111 1 1111 0 111I 1111 0 1111 1111 0 111! 1111 0 1111 1111 0 111! ••• 1111 0 111 !
vSAN Datastore
307
9-9 File Service VMs
FSVMs are preconfigured Photon Linux-based VMs.
Up to 32 FSVMs can be deployed per vSAN clust er to provide multiple access points and
f ailover capability for vSAN file shares.
One FSVM per host is deployed when vSAN File Service is enabled. ESX Agent Manager is used
to provision FSVMs and pin them to a host (affinity).
·-------------------------------------------------------------------,
I I
I I
I vSAN File Server Agent Virtual Machines - FSVM I
I I
I I
I I
I I
I I
I I
I I
I I
I
I
Docker Docke r Docker vSAN I
I
I
I
Fil e Serv ice I
I
I FSVM-1 FSVM-2 FSVM-3 Cluste r I
I I
I I
I I
~-------------------------------------------------------------------·
308
9-10 Provisioning File Service Agent Machines
When you configure vSAN File Service, ESX Agent Manager performs the following tasks:
If vSAN File Service is disabled, the solution sends a Destroy Agent signal to remove all the
FSVMs.
VCenter Server
EAM Permissions
309
9-11 File Service Agent Machines Storage
Policy
The FSVM is configured w ith a custom vSAN storage policy called
FSVM Profile DO NOT MODIFY.
This custom storage policy offers no data redundancy and pins t he FSVM to a host.
n r I
FSVM_Profde_oo_ OT_MODIFY
Description Storage profile for FSV s
Rut • t 1: VSAN
Plac ment
Type VSAN
S te disaster to ranee None - st ndard cluster
No d redund ncy th host f an ty
,. .......mber of d str s per ob t 1
Force proviSto n 0
310
9-12 Enabling vSAN File Service
vSAN File Service is disabled by default. You can enable it from vSAN > Services. Select a
vSAN cluster and select Configure > vSAN > Services > File Service > Enable.
Summary Monitor Configure Permissions Hosts VMs Data stores Networks Updates
Trust A uthontv
> Data-In-Transit Encryption Disabled ED IT
Alarm Definitions
vSAN v
> vSAN iSCSI Target Service Disabled ED IT
Disk Manaqement
> File Service Disabled I ENABLE I
Fault Domains > Enable Capacity Reserve EDIT
Datastore Shanni:i
> Advanced Options ED IT
311
9-13 vSAN File Service Configuration
vSAN File Service maintains a per-cluster configuration. You can easily configure it t hrough t he
guided workflow in vCenter Server.
Domain vclass.local
Number of shares 0
Net\vork ~ VMNelwork
Gateway 172.20-10 10
IP addresses
(sEE AL L J
version Last upgrade: 09/04/2020, 12:16:43 AM; OVF file version: 7.0.1.1000·16596215
312
9-14 vSAN File Service Domain Configuration
A vSAN File Service domain is a namespace for a group of file shares with a common
networking and security configuration.
You must enter a unique namespace for the vSAN File Service domain you are creating.
I Domain I vclass.local
Number of shares 0
Net\vork VMNelwork
Gateway 172.20.1010
IP addresses
!sEE ALL J
vers·on Last upgrade: 09/04/2020, 12:16:43 AM; OVF file version: 7.0.1.1000-16596215
313
9-15 vSAN File Service Network Configuration
In t his re lease, vSAN File Service supports only 1Pv4 for file share access.
Select a network port group for FSVMs to provide access to file shares.
You should select a distribut ed port group to ensure consistency across all hosts in a vSAN
cluster.
Domain vclass.local
Number or shares 0
I Net\vork I ~ VM Network
Gatew ay 172.20.10.10
v ersion Last upgrade: 09/04/2020, 12:16:4 3 AM; OVF file version: 7.0.1.1000·16596215
314
9-16 FSVM IP Address Configuration
You must provide a pool of static IP addresses for FSVMs.
Provide the same number of IP addresses as the number of ESXi hosts present in the vSAN
cluster during setup.
Domain vclass.local
)(
Numb er of shares
172.20.10.100 (primary) fsvm 01.v class.local
Network
172.20.10.101 fsvm -02.vclass.local
Subnet m ask
172.20.10.102 Fsvm ·03.vclass.local
IP addr ~s es
(seEALL J
315
9-17 Viewing ESX Agent Deployment
As part of t he vSAN File Service configuration, ESX Agent Manager deploys vSAN File Service
nodes.
v OJ SA·Datacente<
Vir t ual Machines VM T empla t es VAPPS VM Folders
> CJ Discovered virtual machine
v
& vSAN F1 e Service Node (3) f:.~ vSAN File Serv1c e Node (1) Powered On v Normol 12331 GB 4 6 GB 812 MHz 2.18 GB
~ vSAN F1 e Service Node (4 ) t~ vSAN File Servtce Node [2) Powered On v Normol 1233 GB 4 Gt GB 812 MHl 18 GB
E; sa-vcsa-01.vclass.local t~ vSAN File Service No de (3) Powered On v Norm81 1233 GB 4 6 GB 78'1 MHz 172 GB
(:~ vSAN File Serv1c e Node t4) Powered On v Normol 1233 GB 4 6 GB 12 GHt 171 GB
316
9-18 Creating a vSAN File Share
After vSAN File Service is enabled, you can create a file share to access from NFS clients and a
container file volume to access from Kubernet es pods.
To create a file share, select a vSAN cluster and select Configure > vSAN > Services > File
Shares > ADD.
Na mo Doployrnent typo Proto col T Storage Policy Usa ge/Quota Actua l Usage
Scheduled Tasks 0 T
VSAN v
SeNices
Disk Manaqemenl
Fault Domains
File Shares
Datastore Sharinq
317
9-19 Configuring a vSAN File Share
Choose a suitable name for the file share.
You can select eit her t he NFS or the SMB protocol to access the file share. You also select the
required protocol version.
A file share supports both AU TH_SYS and the Kerberos security mode. Based on t he selected
protocol version, select t he supported security mode.
The vSAN default storage policy is assigned for file shares. You can select a policy based on
your availability and performance requirements.
Define the file share quota to limit the capacity t hat t he file share can consume on t he vSAN
datastore. Include a warning t hreshold .
Labels are key-value pairs t hat can be used to identify file shares. Labels are useful when
assigning file shares to Kubernetes pods.
General x
N ame prod rs
NFS v
Protocol
Enable active directory configuration in the File service configuration bef01 e using
SM B protocol
318
9-20 Configuring Network Access Control
You use network access cont rol to limit which clients can access a file share.
A vSAN file share can be allowed to access from any NFS client IP addresses or from a specific
list o f client IP addresses.
(D
- -
rhe rules are honored trori top to bottom Top rules override bottom ones Put more genEKal rules below the spec1rlc ones You can use
· ··to denote ·any other IP addresses not mentioned above·
319
9-21 Viewing vSAN File Share Properties
After a file share is created, you can record the file share mount path details to mount from NFS
clients.
And you can modify the file share storage quota properties by editing the file share.
Seivices
Disk ManaQement
Fa ult Dom a1ns
Frie Shares
Datastore Sharrnq
320
9-22 Monitoring vSAN File Share Performance
Metrics
You can monitor throughput, IOPS, and latency-related information per file share.
To monitor performance metrics, select a vSAN cluster and select Monitor > vSAN >
Performance > FILE SHARE.
Security
Pe rform ance
Resyncini:i Objects
321
9-23 Viewing VMware Skyline Health Details
for vSAN File Service
VMware Skyline Health provides detailed information about vSAN File Service infrastructure
health, file server health, and share health.
To view the health details, select a vSAN cluster and select Monitor> vSAN > Skyline Health >
File Service.
> N etwork
Resource Allocation >
Utilization > Physical disk
Storaqe Overview
Security > Data
VSAN v > Cluste r
Skyline Health
> Capacity ut ilization
Virtual Objects
Physical Disks > Performan ce service
Resyncinq Ob1ects
Proactive Tests >Hyperconve rged cluster config urat ion
compliance
Capacity
Performance
Performance Diaqnostics
Iv File Service f
Support ~·~'-"_fr_as_t_ru_ct_u_r_e_H_e_a_lt_h~~~~__,>
[
322
9-24 vSAN File Service Considerations
Consider the fol lowing points when selecting vSAN File Service:
• Only 1Pv4 is supported, and static IP addresses are required for FSVMs
For more about vSAN File Service, see the FAQ page at
htt ps:/ I storagehub. vmware.com/t/vsan-frequently-asked-questions-f aq/file-service-7 I
323
9-25 Lab 9: Configuring vSAN File Service
Configure vSAN File Service:
324
9-26 Review of Learner Objectives
• Describe vSAN File Service
325
9-27 Lesson 2: VMware HCI Mesh Using
Remote vSAN Datastores
326
9-29 About VMware HCI Mesh
VMware HCI Mesh is a technology for the disaggregation of compute and storage resources in
vSAN.
With the VMware HCI Mesh architecture, you can remotely mount datastores from other vSAN
clusters (server clusters) to one or more vSAN clusters (client clusters). The client and server
clusters must be w ithin the same vCenter Server inventory. This approach maintains the
simplicity of the existing HCI model without requiring specialized hardware.
327
9-30 Previous vSAN Challenges
In traditional vSAN clusters, there was no way to use storage from cluster 1 in cluster 2 without
physically relocating storage devices.
I
I
I
I VM VM VM VM
I
I
I
VM VM VM VM I VM VM VM VM
I
I
I
I
I
I
I
I
I
I
I
I
I
I
40% I 80%
I
I
I
vSAN Cluster 1 I
vSAN Cluster 2
Capacity Underused I Capacity Overused
I
One cluster is under utilized and the other is running out of capacity.
328
9-31 VMware HCI Mesh Advantages
VMware HCI Mesh provides t he following advantages over traditional vSAN:
• License optimization: The st orage and comput e can now be separated. You can use t he
appropriate licenses as required by t he clusters and save cost s.
• Heterogeneous storage classes: Different types of storage classes provide bett er efficiency
in hyperscale deployments.
...
..-.
--------------------
I
vSAN Cluster Federation -------------------, I
I I
I
I
I
I
I VM VM VM VM VM VM VM VM I
I I
I
I
I
I
I Ill 0 Ill Ill 0 Ill Ill 0 Ill ,Ill 0 Ill Ill 0 Ill Ill 0 Ill I
I
'
I
I Ill 0 Ill Ill 0 Ill Ill 0 Ill Ill 0 Ill Ill 0 Ill Ill 0 Ill I
I
I
I I
I
I
I Existing Cluster Exist ing Cluster New Cluster
I
I Capacity underused Capacity Overused No VMs Running I
I Storage Only
329
9-32 VMware HCI Mesh: Use Cases (1)
Example VMware HCI Mesh architecture use cases:
• VMware HCI Mesh can be used for balancing capacity. vSphere Storage vMotion can
move data (not compute) to other vSAN clusters based on capacity usage and the
configured threshold.
• Hardware maintenance:
• VMware HCI Mesh can be used to move data during patching or maintenance windows.
• Storage as a service:
• Cloud providers can provide a managed pool of storage for multiple tenant consumers
by using storage-only cluster topology, which is easily scalable. In this model, the cloud
provider owns and manages the storage-only cluster. The tenant cluster mounts the
datastore remotely.
During hardware maintenance, the cluster might not have sufficient space to move data. In such
cases, some VMs can be migrated to another vSAN datastore using VMware HCI Mesh.
330
9-33 VMware HCI Mesh: Use Cases (2)
• Scaling compute without adding storage:
• Traditional vSAN scales by adding hosts, which results in inefficient storage use.
• You might have more storage than t he workloads on that cluster need.
• You can use VMware HCI Mesh to scale storage and compute independently of each
other.
• License optimization:
• You can use the appropriate licenses as required by t he clusters and save costs.
Oracle requires the licensing of every socket in the cluster where the compute is located. If the
workload requires more storage, the customer must procure not only compute and VMware
licensing but also Oracle licensing.
With VMware HCI Mesh, smaller vSAN clusters can be configured to run the Oracle compute
and still use capacity from participating clusters.
331
9-34 Stranded Capacity Issues
A fixed compute-to-storage rat io can lead to a stranded capacity problem in an HCI
environment.
Stranded capacity:
• Compute-intensive workloads are likely to run out of CPU and memory before storage.
• Storage-intensive workloads run out of storage, either capacity or IOPS, before CPU and
memory.
Because of the flexible VMware HCI Mesh architecture, VMware HCI Mesh can be used w ith
different topologies to address such issues.
332
9-35 Comparing Homogeneous and
Heterogeneous Storage
A vSAN cluster exposes storage policies which are homogeneous across all hosts in the cluster.
• A database, such as oracle or SOL Server, requires low latency offered by an all-flash
NVMe configuration.
• In this case, the web server is placed on a faster and more expensive storage tier than
needed.
With VMware HCI Mesh using remote vSAN datastores, you can place VMs on heterogeneous
hardware.
333
9-36 VMware HCI Mesh Architecture: Example
Setup
The diagram shows a typical VMware HCI Mesh architecture example setup.
VM VM VM VM....-.
--------- ---------
Ill
111
Ill
Ill
Ill
Ill
Ill
Ill
vSphere vSAN
VM VM VM VM VM VM VM
334
9-37 VMware HCI Mesh: Terminology (1)
The following key terms are referenced in the VMware HCI Mesh architecture.
Local cluster:
Server cluster:
• The cluster where the storage is locally hosted. This cluster provides storage resources to
other clusters.
• From the server cluster viewpoint, the vSAN datastore is considered a local datastore.
• From the client cluster viewpoint, the vSAN datastore is considered a remote datastore.
Cross-cluster vMotion:
• The remote vSAN datastore is mounted on both the client and the server clusters, which
means that cross-cluster vSphere vMotion migration is possible.
335
9-39 VMware HCI Mesh: Common Topologies
Different VMware HCI Mesh topologies can be used, based on the infrastruct ure requirement.
• Cross-cluster topology
336
9-40 Storage-Only Cluster T apology
In this topology, one vSAN cluster is used only to provide storage to other clusters.
The storage-only cluster topology has the fal lowing use cases:
VM VM VM VM VM VM VM VM VM VM VM VM
I I I
I I I
I I I
I I I
I I
I I
I I
I I
337
9-41 Cross-Cluster T apology
The cross-cluster t opology has the following features and use cases:
• Can be bidirectional.
The diagram shows an example of cross-clust er VMware HCI Mesh. You can use variations of
this setup.
VM VM VM VM VM VM VM VM VM VM VM VM
I I I I I I
I
I '••••••••••••••••• ~ ••••••&••••••••••••••••• 1 ••••••~
I I I
I I I I
I I I I
I -------------------------------------------------~ I
I I
I I
I I
I
I
VM VM VM VM VM VM VM VM I
I
I I
I I
I
I
I
I1111 . J ( 1111 .I ( 1111 · I (1111 .I I
I
I
I
I
I
I1111 . J ( 1111 .I ( 1111 · I I1111 .I I
I
I
I
I
I vSAN Cluster I I vSAN Cluster I I
I
I I
I I
--
I
---------
I
338
9-42 VMware HCI Mesh: Network
Requirements and Recommendations
Network connectivity requirements:
• A leaf spine topology is preferred for core redundancy and reduced latency.
• HCI Mesh has the same latency and bandwidth requirement as local vSAN.
339
9-43 Example Network Architecture
The diagram shows an example o f good network architecture to support VMware HCI Mesh.
Core/Spine
Switch/Router
;
.-~..L-..#--~
--- ' ''
''
--,
~-----~~
'
Uplink v~ic 1 Uplink
vmnic2 vmnic1 vm nic2
portgroup portgroup
\ I
\ I
vSAN vmknic
•
vSAN vmknic
340
9-44 VMware HCI Mesh: Scalability limits
As multiple vSAN clusters participate in a mesh topology, consider the following scalability
limitations:
• A single vSAN datastore can be mounted on a maximum of 64 hosts, which include both
server and client hosts.
• A server cluster can export its datastore to a maximum of f ive client clusters.
341
9-45 VMware HCI Mesh: Mounting the Remote
vSAN Datastore
Consider the fol lowing while mounting the vSAN datast ore of one cluster to another vSAN
cluster:
• You can mount vSAN datastores only from the same vCenter Server instance.
• Two-node clusters and stretched clusters are not compatible with VMware HCI Mesh.
342
9-46 Mounting Remote Datastores (1)
To mount a remote datastore on the client vSAN cluster, select vSAN > Datastore Sharing on
the Configure tab and click the MOUNT REMOTE DAT ASTORE link.
Alarm Definitions Data store Servl!r Cluster Capacity Frl!e Space VM Count 0 Clil!nt Clusters
Scheduled Tasks
0 g (local) vsanDatastore-ClusterOl !Ell SA-Cluster-01 59 97 GB 56.37 GB 1
vSAN v
Services
Disk Management
Fault Domains
343
9-48 Mounting Remote Datastores (3)
A compatibility check is performed in the Check Compatibility section
2 Check compatibility
O The remote datastore type 1s vSAN.
O Server and client dusters are 1rom the same datacenter.
O The remote vSAN dat astore can provision objects wit h its defau lt policy.
O Server cluster is healthy.
O Server and client dusters have no connectivity issues.
O Latency between client and server hosts 1s below 5000 microseconds.
CANCEL B F INIS H
344
9-49 Client Datastore View
The Datastore Sharing view provides information about shared vSAN datastores and displays
client and server cluster information. Local datastores are identified w ith the (Local) prefix.
IO SA-Cluster-01 ACTIO!\I S v
Conligurabon > O•••s•ore Serve1 Clusler C1p1c1ty Free Sp1<t VMCount 0 Client CluSltrs
Trust Autnonty
vSAN
ServKes
O sk Management
Fet 11• De n no
Datasto re Sharing
345
9-50 Server Datastore View
The screenshot shows the datastore v iew of the server cluster.
Configuration )
Data store Server Cluster Capacity rree Space VM Count 0 Chent Clusters
Tru!a Authority
0 G (Local) vsanDatastore-Cluster02 1rJJ SA-Cluster-02 8998 GB 82.0SGB 1 [jJ SA-Cluster-01
Alarm Definitions
Sche<1u1eo Tasks
vSAN v
Services
Disk Management
Fault Domains
Oatastore Shanng
346
9-51 Hosts: Access Status
In t he Datastore view, select t he remote datastore and click the Hosts tab.
The hosts from both the client and t he server clusters display a Connected status.
~ vsanDatastore-Cluster02 ACTIONS V
347
9-52 VM Creation Test
A quick way to verify t he remote vSAN mount is to perform the vSAN VM creation test.
Select Monitor > vSAN > Proactive Tests > VM Creation Test and click Run.
All Issues For storage performance test, use HCIBench . HCIBench is a storage performance testing automation to
Triggered Alarms customer Proof of Concept (POC) performance testing in a consistent and controlled way. V Mware v SA
for HCIBench.
Performance v
Advanced
RUN ASK VMWARE @
Tasks and Events v
Name Last Run Result
Tasks
Memory
Hosts VM Creation Test Result
Storage Virtual machine creation test creates a very simple tiny VM on every host. If that creation succeeds, the
Utilization concluded that a lot of aspects of vSAN are operational. The management stack are all operational on al
Storage Overview working, creation/deletion and 10 to objects on vSAN is working, etc. Doing such an active test can find i
Security not be able to detect. By doing so systemically it is also very easy to isolate any particular faulty host an
vSAN v
Host Status Error
Skyline Health
Virtual Objects
LI sa-esxi-02.vclass.local success
Physical Disks
LI sa-es xi·Ol.vclass.local success
Resyncing Objects
Capacity
LI sa-es xi-04.vclass.local success
Performance
348
9-53 Remote Accessible Objects
To see the remote accessible objects that are placed on the remote vSAN dat astore, select
Client cluster > Monitor > vSAN > Skyline Health > vSAN object health.
Summary Monitor Configure Permissions Hosts VMs Data stores Networks Updates
CPU [~O~v_s_A_N_o_b_i_ec_t_h_e_a_lth~~~~~_,?
Memory fi vSAN object format health
Storage
Security
> Capacity utilization
Remote accessible objects denot e the objects which are being accessed f rom the client cluster
to the server clust er.
349
9-54 Server Cluster Partition Health Check
To see the client and server hosts listed, select Monitor > vSAN > Skyline Health > Network >
Server Cluster Partition.
Skyline Health
Server Cluster Partition
Lhl.Cne<.ked 08/27/2020. 4840AW RETEST
Parti tion list Info
v Onnne health (Disabled) •
O Advisor
'9 vSAN SUpport lnsoght I IOI SB·VSAN·OI ~·~sxi.-04 .vclass IOclll s.a-esxi-02 vclass IOcal , sa·esx1·03 vclasstocal. Sll esxi-01-vcJass local sb esx1-0l.vclass.k>cal. sb-esxt·OS vclass.locat. Sb·eUt·02 vcl.ass local
v N etwork
350
9-55 Remote VM Performance
To see remote vSAN VM performance met rics, select Monitor > vSAN > Performance >
Remote VM .
Issues and Alarms v Performance
Overview
Target cluster IOI SB-VSAN-01 v
Advanced
Tasks
IOPS (D
Events
42
Resource Allocation v
CPU
21
Memory
Storage
Utilization 0
2.25 AM 2.40 AM 2.55 AM 3. 10 AM 3.25 AM
Storage Overview
- Read IOPS - Wn te IOPS
Security
vSAN v
Latency <D G ",,
Skyline Health
629.253 ms
Virtual Objects
Physical Disks
Proactive Tests
Capacity Oms
2 25 AM 2 40 AM 2 55 AM 3 IOAM 3 25 AM
Performance
351
9-56 Physical Disk Placement
To see physical disk placement, select the VM and select Monitor > vSAN > Physical disk
placement > Remote Objects .
•
& Ubuntu-1 ACTIONS V
All Issues © This V irtual Machine is placed on a remote datastore managed by llJJ SB·VSAN·Ol
Triggered A larms
Remote objects Remote o bjects details
Performance v
Tasks
g Hard disk 2 0 Remote-accessible
Events
t:J V M home 0 Remote-accessible
Utll1zatton
vSAN v
Virtual machine swap object O Remote-accessible
The blue highlight ed text indicates that this VM resides on a remot e dat astore.
352
9-57 VMware HCI Mesh Interoperability: VM
Component Protection
With VMware HCI Mesh using remote vSAN datastores, you can have VM compute resources
allocated from one cluster and storage space allocated from another cluster.
When the cross-cluster communication fails, an all paths down (APD) state is declared after 60
seconds of isolation.
The APD response is triggered to automatically restart the affected VMs after 180 seconds.
353
9-58 VMware HCI Mesh Interoperability: SPBM
Integration
A single vSAN VASA provider acts on behalf of all vSAN clusters managed by the same
vCenter Server instance:
• The vSAN VASA provider dispatches all policy requests targeting a vSAN datastore to one
of the hosts in the corresponding vSAN cluster.
• The vSAN VASA provider maintains an up-to-date list of hosts capable of satisfying VASA
provider API calls to a vSAN datastore, using the datastore property collector:
• For local vSAN, the list of hosts comprises all hosts mounting the vSAN datastore.
• For VMware HCI Mesh, the list also includes hosts from client clusters remotely
mounting the same datastore.
Datastore Management
- - - - - - - - - - - - - VPXD
I
I
I SPBM API
I
I Datastore
I vCenter Server SPB M Property
I Collector
I
I
VASAAPI
I
I vSAN VASA Host
I vSAN Health Provider 14---~ Filter
I I
I --~--~~~~~~--~~~~~~~~-
! I VASA AP I
I Datastore Management
----------y----------------,-----------.
I I I
I I I
I Remote I
I Mount I
---------------~
I
I
I
I
I
-----------
354
9-59 VMware HCI Mesh Interoperability:
vSphere vMotion and vSphere Storage
vMotion
VMware HCI Mesh is compatible with both vSphere vMotion and vSphere Storage vMotion.
• VMs can be migrated using vSphere vMotion wit hin t he client cluster, regardless of whether
they reside on a local or remote vSAN datastore.
• VMs are allowed to migrate w ith vSphere vMotion across client or server clusters, as long as
all VM objects reside on a mutually shared remote vSAN dat astore.
• Migration between a local vSAN datastore and a remotely mounted vSAN datast ore
• Migration between a remotely mounted vSAN dat astore and a local vSAN datastore
355
9-60 VMware HCI Mesh Interoperability:
vSphere DRS
VMware HCI Mesh is compatible w ith vSphere DRS.
vSphere DRS rules are applicable on the client cluster in the fo llowing cases:
In t his cont ext, vSphere DRS ru les include standard rules on t he client cluster, such as affinity and
anti-affinity rules.
356
9-61 Lab 10: Managing Remote vSAN
Datastore Operations
Migrate and run VMs from remote vSAN datastores:
357
9-62 Review of Learner Objectives
• Describe VMware HCI Mesh technology
358
9-63 Lesson 3: vSAN Direct
• Describe how vSAN Direct datastores work with vSphere native Kubernetes
359
9-65 About the vSAN Direct Datastore
vSAN Direct enables users to create a vSAN Direct datastore on a single blank hard drive on the
ESXi host. With vSAN Direct, users can manage local disks by using vSAN management
capabilit ies.
• vSAN Direct manages and monitors disks formatted with VMFS and provides insights into
the health, performance, and capacity of these disks.
• vSAN Direct enables users to define placement policies and quotas for the local disks.
360
9-66 vSAN Direct Use Cases
vSAN Direct is an excel lent fit for cloud-native applications using Kubernetes. Various tiers o f
cloud-native applications benefit from vSAN Direct.
• Applications with high-end storage requirement s t hat are required to support up t o a million
IOPS, such as high-end Cassandra and MongoD B
• Applications with midrange storage requirements that require greater IO PS density, such as
Kafka (as a st orage buff er)
• Applications with low-end st orage requirements that must have a minimum t otal cost of
ownership
361
9-67 vSAN Direct Architecture
vSAN Direct support s cloud-native applications running in a vSphere 7.x supervisor Kubernetes
cluster.
These dat astores form t he vSA N Direct storage pool. This pool can be claimed by Kubernetes in
the form of persistent volumes (PVs).
-------------
1 Tensor Flow
I
EJEJEJ
I .;.. I
'0
----1 -------------
-------------
I
••
·::
I
I
- - - 1
I 0
•• I
I
1
I
Microsoft SOL Server :
'---- 1 '----1
Virtual Machine Application
,- --- --- - - - - - Kubernetes
Service Service Services
362
9-68 vSAN Direct with Kubernetes
Key concepts for using vSAN Direct wit h Kubernetes:
---------------------------------------
---------------------------------------
Virtual
Volumes
363
9-69 Cloud-Native Operations Workflow
The diagram shows the steps taken by the vSphere administrator and the DevOps administrator
to use vSAN Direct:
1. The vSphere administrator provisions vSAN Direct datastores, using unformatted drives on
the ESXi hosts.
2. The vSphere administrator creates policies using tags to map the vSAN Direct datastores.
3. The DevOps administrator claims the PVs from the vSAN Direct storage pools identified by
the tag-based storage policies.
4. The DevOps administrator creates Kubernetes applications that consume the PVs residing
on vSAN Direct.
"----
vSphere
''' - - - -
Create namespace and assign
vSAN Direct storage policy.
Administrator
vCenter
Server
, ,
Consuming PV in
persistent service + -----' DevOps
Administrator
K8s
364
9-70 Claiming Disks for vSAN Direct
Key points about claiming vSAN Direct disks:
• You can use the vSAN claim disk w izard to claim disks for vSAN Direct.
• vSAN Capacity 60.00 GB (39.47%) • vSAN Cache 32.00 GB (21.05%) vSAN Direct 60.00 GB (39.47%)
Name T Claim For Drive Type T Total Capacity T Transport Type T Adapter T
Custom
v D sa-esx1-01 vclass local (C "
D Local VMware Disk (mp ~ vSAN Direct 15 OOGB Parallel SCSI vmhbaO
custom
v LI sa-esx1-0<: vclass local CC "
D '"-ocal VMware Disk (mp ~ vSA N Direct v Flas 1500 GB Parallel SCSI vmhbaO
- -
Custom
v D sa-esx1·02 vclass local (C "
D Local VMware Disk (mp ~ vSAN Direct v F s 1500 GB Parallel SCSI vmhbaO
Custom
v D sa-esxi-03 vclass local (C "
D Local VMw are Disk (mp & vSAN Direct F 1500 GB Parallel SCSI vrnhbaO
8 items
[ CANCEL l '
CREATE
365
9-71 After Claiming Disks for vSAN Direct
After claiming disks for vSAN Direct:
Disk Group T Disks in Use T State T Health T Type T Fault Domain T Network Partition Group T
366
9-72 Default Tags
A set of default vSAN Direct tags and categories is available. You can create your own tags to
use with your storage policies.
8 vSANDirect_sa-esxi-01.vclass.local_mpx.vmhbaO:CO:T2:LO ACTION S V
URL: ds:///vmfs/volumes/5f577be1-52b4c492-3e85-00505601d5bd/ I
Used; 1.41 GB
vSANDirect vSANDirectStorage
367
9-73 Tag-Based Policies
vSAN Direct support s tag-based storage policies.
1. Tag the vSAN Direct datast ores with the appropriate t ags. A default vSAN Direct tag is
already assigned.
368
9-74 Storage Compatibility
The supervisor Kubernetes cluster requires a storage policy to identify datastores to store PVs:
• After you create the vSAN Direct storage policy, matching datastores w ith the vSAN Direct
tag are available as compatible storage.
• These vSAN Direct datastores can be used by cloud-native applications as a common pool
for persistent data storage .
12 :em
'k•lM#ii•i!I INCOMPATIBLE
T F t.:-l'
369
9-75 Capacity Reporting
You can monitor vSAN Direct capacity independently of vSAN capacity.
Select Monitor > vSAN > Capacity > CAPACITY USAGE and click vSAN Direct.
$ vSAN usage 3.58 GB/59.97 GB (5.97%) $ vSAN 011e-ct usage 5.62 GB/59.00 GB (9.53%)
Capacity Overview
Usage breakdown
Tota usage
5.62GB
System usage
370
9-76 Review of Learner Objectives
• Describe vSAN Direct as a new type of vSAN datastore
• Describe how vSAN Direct datastores work with vSphere native Kubernetes
371
9-77 Lesson 4: vSAN iSCSI Target Service
372
9-79 About the vSAN iSCSI Target Service
With the vSAN iSCSI target service, a remote host w ith an iSCSI initiator can transport block-
level data to an iSCSI target in the vSAN cluster. You can configure one or more iSCSI targets in
your vSAN cluster to provide block storage to legacy servers.
You can add one or more iSCSI targets that provide storage objects as logical unit numbers
(LUNs). Each iSCSI target is identified by its own unique iSCSI qualified name.
Legacy Server
1111 •
- -
D D
.J I I .J I I
iSCSI Network
I • • • • • •
•
iSCSI Target •• •• • •• •• •• • •• •• •• • •• •• •• • •• •• •• • •• •• •• • •••• •• • •• •• •• • iSCSI Target • • •••• •
•
•
• I I
c I I
•
•
• •
• •
• •
•
•
iSCSI iSCSI iSCSI iSCSI •
•
•
•
•
Object Object I ~
Object Object •
•
•
• •
• •
•
•
... ~
•
•
• •
• •
•
•
•
4 • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
vSAN Datastore
• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
•
•
•
•
EJEJ I: sso ]EJ EJEJ EJEJ EJEJ EJEJ EJI: :I EJEJ EJEJ SSD
EJE] I: :I E1 EJEJ EJEJ EJG EJEJ EJI: sso ] EJB r sso :ia
SS D
Use the vSAN iSCSI target service to enable hosts and physical workloads that reside outside
the vSAN cluster to access the vSAN datastore.
After configuring the vSAN iSCSI target service, you can discover the vSAN iSCSI targets from
a remote host. To discover vSAN iSCSI targets, use the IP address of any host in the vSAN
cluster and t he TCP port of the iSCSI target.
373
9-80 vSAN iSCSI Target Service Networking
Before enabling t he vSAN iSCSI target service, you must configure your ESXi hosts with
VM kernel ports and NI Cs that are connected t o the iSCSI network.
Storage N etwork Infrastructure
-- - --
iSCSI
Network
vmk2
vSAN iSCSI
iSCSI iSCSI
Target Target
ESXi Host
I I 1111
iSCSI storage traffic is transmitted in an unencrypted format across the LAN. Therefore, a best
practice is to use iSCSI on trusted networks only and to isolate t he traffic on separate physical
switches or to manage a dedicated LAN.
To ensure high availability of the vSA N iSCSI target, configure multipath support for your iSCSI
applicat ion. You can use the IP addresses of two or more hosts to configure t he multipath.
374
9-81 Enabling and Using the vSAN iSCSI
Target Service
To enable and use the vSAN iSCSI target service:
375
9-82 vSAN iSCSI LUN Objects
vSAN iSCSI objects can appear as unassociated in vSAN storage reports because they are not
mounted directly into a VM as a VMDK. vSAN iSCSI objects are mounted through a VM 's guest
OS iSCSI initiator.
Be aware that the vSAN iSCSI LUN object is not directly mounted into VMs and will be listed as
unassociated. This designation does not mean that the vSAN iSCSI LUN object is unused or safe
to be deleted.
376
9-83 Lab 11: Configuring a vSAN iSCSI Target
Configure an iSCSI LUN and connect to it from the student desktop:
377
9-84 Review of Learner Objectives
• Describe the vSAN iSCS I target service
• With vSAN File Service, you can provision NFS and SM B file shares on your existing vSAN
datastore.
• Using VMware HCI Mesh, you can remotely mount datastores from other vSA N clust ers.
• A vSAN cluster can act as a storage-only cluster using VMware HCI Mesh.
• vSAN Direct is most suitable for cloud-native applicat ion persistent st orage.
• You can configure vSAN iSCSI target s to provide block storage to legacy servers.
Questions?
378
Module 10
vSAN Cluster Maintenance
10-2 Importance
To maximize the availability of services in your environment, the maintenance of compute,
network, and storage resources on production systems must be achieved without causing
downtime.
379
10-4 Lesson 1: vSAN Cluster Maintenance
Operations
380
10-6 Maintenance Mode Options
ESXi hosts in vSAN clusters provide storage resources in addition to compute resources.
You must use appropriate maintenance mode options to maintain data accessibility.
When placing the host into maintenance mode, you can select one of the following vSAN data
migration options:
• Ensure accessibility
• No data migration
381
10-7 About the Data Migration Precheck
You run the data migration precheck before placing a host into maintenance mode. This
precheck determines whether the operation can succeed and reports the state of the cluster
after the host enters maintenance mode.
You can select a host and the type of vSAN data migration to test.
To start the data migration precheck, select the vSAN cluster and select Monitor > vSAN >
Data Migration Pre-Check.
Select a host, disk g roup, o r disk, and check the impact on the cluster if the object is removed or placed into maintenance mode.
No va lid test is available for the selected entity and vSAN data migratio n option.
382
10-8 About the Ensure Accessibility Option
The Ensure accessibility option ensures that VMs with FTT = 0 remain accessible.
Unprotected components on the host, such as objects with FTT = 0, are migrated to other
hosts. Components of objects w ith FTT > 0 are not migrated. If sufficient components to
maintain quorum are active on other hosts in the cluster, the objects remain available. However,
the objects are noncompliant while the host is in maintenance mode.
V Ml V M2 VM1 V M2
Component is
marked as absent.
I
I
I
I
. ·--------------·
I I
I
I
I
- ---------- . '·--------------
))~)))
I I I I
Cl I I
I
I
Cl I
I
~
Cl C2
vSAN
w;
C2 ....
vSAN
I111 0 111 J I111 0 111 I I111 0 111 I J111 0 111 I I111 0 Ill I Irr 1 0 II I I I111 0 111 I
ESXil ESXi2 ESXi3 ESXi4 ESXil ESXi2 ESXi3 ESXi4
............................ ............................ -.. ................................ ... ............................... ................................ .. ................................ .. ................................ .. ...............................
383
10-9 Ensure Accessibility: Assessing Impact
T he dat a migration precheck provides a list of VMs and objects that might become
noncom pliant.
Select a host, diSk group, or dis\C, and check 1he impact on 1he duster II 1he object Is removed or p laced 1n10 ma1n1enance mode.
v ~ sa-vm-01 VM
a Hard diskl Disk A Non-compliant (ii vSAN Default storage Polley 27582b5t-3abe·a273·3act-00505601d5c5
v ~ sa-vm-02 VM
a Hard d1skl Disk A Non-compliant (!' vSAN Default storage Policy ab022c5t-b058<l848-621b-00505601d5cf
O VMhome Folder A Non-compliant (!) vSAN Default storage Policy a9022c5f-7ect-05c7-5159-00505001d5ct
Virtual machine swap object V Mswap A Non-compliant (i vSAN Default storage Policy b0022c5t-3204-3e3d.fb85-00505601d5ct
384
10-10 Ensure Accessibility: Delta Component
If an additional host with sufficient capacity is available in the vSAN cluster, a temporary delta
component is created to capture the new I/ 0.
The delta component contains only the new data generated after the original component is
marked absent.
After the absent component is back online, it syncs w ith the delta component. The delta
component is then discarded after the sync operation is complete.
The delta component signif icantly reduces the overall t ime required to take a component from
Active-Stale to Active state.
VM2
:~:
I I
I
I
I
I
111 0 111 I I
I
I
I
I
I
I
111 0 111 I I
I
I
I
I
I
I
111 0 111 I I
I
ESXi1
._ __________
ESXi2 ESXi3 ESXi4
I I I
.... ------- -- -
I I
... --------- -I I
.... ----------
I
385
10-11 Delta Components in the vSphere Client
The delta component and the original absent component are linked in a special RAID-type
structure called RAID_D. You can view the component layout in the vSphere Client.
Absent
,• ................................................... . RAID D t--~--A~c~
tiv~e..._~___,
.•
..•
••
386
10-12 Ensure Accessibility: Time Considerations
If the host does not ret urn, absent components are rebuilt aft er 60 minut es.
VM2
~ 60 minutes
r- - - - - - - - - - , ------, r- -------,
I I
I I
C2 I I
vSAN
Witness
-----
1111 o 111
'
1: I111
-----
o 111 I : : I111
I I ----- I
o 111 I:
I I
ESXi1 ESXi2 ESXi3 ESXi4
.... - - - - - - - - - - .... - - - - - - - - -
I
- ._ __________ I I
'-----------
I
387
10-13 Object Repair Timer Considerations
You might need to increase the Object Repair Timer value w hen planned maintenance is likely to
take more than 60 minutes but you want to avoid rebuild operations.
• Rebuild operations are designed to restore redundancy. The higher the Object Repair Timer
value, t he longer your data is vulnerable to additional failures.
• You should reset the Object Repair Timer value to the default value when maintenance is
complete.
388
10-14 Object Inaccessibility: Example (1)
vSAN host sa-esxi-01 is put int o maintenance mode with Ensure data accessibility from other
hosts selected.
In a healthy vSAN environment, hosts are placed into maintenance mode for software updat es
or patching. In this scenario, host sa-esxi-01 is placed into maintenance mode.
389
10-15 Object Inaccessibility: Example (2)
While vSAN host sa-esxi-01 is in maintenance mode, host sa-esxi-03 unexpectedly becomes
unavailable.
Component Absent
While host sa-esxi-01 was in maintenance mode, a failure occurred making host sa-esxi-03
unavailable. Because two of the three components are offline, the VM becomes disconnected.
After host sa-esxi-01 is taken out of maintenance mode and comes back online, the component
becomes Active. However, the sequence number of the component on host sa-esxi-03 is
outdated and the VM remains disconnected. In other words, the component on host sa-esxi-03
is missing the most recent changes, which vSAN is aware of because of the difference in
sequence numbers.
390
10-16 Object Inaccessibility: Example (3)
Host sa-esxi-01 is taken out of maintenance mode to resume the operations of the VM.
______________________________________________________
Component A Ii Absent _,,. sa·esx1·0l.vclass local
Component A Absent
Host sa-esxi-01 is taken out of maintenance mode and comes back online, and the component
becomes Active. However, the sequence number of the component on host sa-esxi-01 is
outdated and the witness is still offline. In other words, the component on host sa-esxi-01 is
missing the most recent changes, which vSAN is aware of because of the difference in sequence
numbers.
Even though two of the three components that make up the object are active, vSAN keeps the
object inaccessible to avoid data loss or corruption. The VM remains disconnected until the
object on host sa-esxi-03 is online with the most recent data. vSAN then synchronizes the stale
component with the component that contains the latest data and enables access to the object.
If a component is Active but its sequence number is different or older than the current sequence
number for the object, the component is marked as Stale. This behavior usually occurs when the
components of an object go offline and come back online concurrently at different times.
391
10-17 About the Full Data Migration Option
The Full data migration option evacuates all components from the disk groups of the host
entering maintenance mode onto other available ESXi hosts.
You use this option only when the host is being decommissioned, permanently removed, or put
into maintenance mode for a considerably long time.
The remaining hosts in the vSAN cluster must be able to satisfy the policy requirements of the
objects being evacuated.
392
10-18 Full Data Migration: Component
Placement
If you select the Full data migration option, vSAN determines the placement of each component.
----------, r--------------,
••
I I I
..---.. ••
)))).))
I
I
C1 : C1 :
. ---
I
~
I
•'
I
I I I
I • I I I
@] >j>~>>
vSAN I I I vSAN
I I I
C2 C2 w "• C1 C2 I
•••
I
I I
I I
I
• I
: 1111 0 1111 : 1111 0 1111 1111 0 1111 1111 0 111 I !111 0 1111 !111 0 111 I : 1111 0 111 I
I
•• I
I
I
: ESXi1 ESXi2
.. _____________ _ ••.. _____________ _
I
ESXi3 ESXi4
-------------- ~
I ESXi1
.. ________ ______
I . ESXi2
· -- - -----------
I
ESXi3 ESXi4
· -- - ----------- · --------------
393
10-19 Full Data Migration: Cluster Size
Considerations
To use the Full data migration option, you must have additional ESXi hosts available in t he vSAN
cluster.
_______ , _______ ,
,. -- - ---- r------- -------., .. ------- - - - .. --------------, .. ---- - - - ------- .. .. -------
I
.. ------- - - - - - - - .,
I
'r • • • I
I '
I
I
vSAN vSAN vSAN vSAN I vSAN vSAN
Replica 1 Replica 1 W1tne:: Replica 2 I W1tne:: Replica 1
I
I
1111 0 1111 1111 0 1111 1111 0 1111 1111 0 1111 1111 0 1111
ESXi1 ESXi2 ESXi3 ESXi1 ESXi2 ESXi3 ESXi4
394
10-20 Full Data Migration: Assessing Impact
Before you begin a full data migration, run the data migration precheck to understand the
potent ial impact on the cluster.
Select a host, disk group, or d isk, and check the impact on the clust er if the object is r emoved or p laced into maintenance mode.
08/11/2020, 8:35:01 AM O The host can enter maintenance mode. 1.02 GB of data will be moved .
0 sa-esxi-01. vclass.loca/ •
~~.......~~,,~~~~''''~''~~~~
1 86 GB/ 49 .99 GB (4 %)
Maintenance mode - no capacity
395
10-21 About the No Data Migration Option
When you select t he No data migration option, vSAN does not evacuate any data from t he
host. However, some VM objects might become inaccessible.
The No data migration option is useful when you want to shut down all hosts in a vSAN cluster
for maintenance or when data on the hosts is not required.
396
10-22 No Data Migration: Assessing Impact
Before selecting No data migration, you run the data migration precheck to understand the
potent ial impact on the vSAN objects in the cluster.
Select a host, disk group, or disk, and check the impact on the cluster if the object is removed or placed into maintenance mode.
397
10-23 Changing the Default Maintenance Mode
The default vSA N maintenance mode is Ensure accessibility, which can be changed t hrough an
advanced host-level setting. This setting must be identical on all hosts in t he cluster.
In t he vSphere Client, select the ESXi host and select Configure > Advanced System Settings.
The available options are ensureAccessilibity, evacuateAllData, and noAction.
VSAN.ClomRebelenceThreshold 80
VSAN.DedupScope 0
VSAN DefeultHostDecomm1ss1onMode ensureAccess1b1hty
VSAN.DomBriefloTroces
VSAN.DomFullloTreces
0
0
I
VSAN.DomlongOpTraceMS 1000
VSAN DomlongOpUrgentTreceMS 10000
VSAN.MexComponentsPerWitness 0
VSAN.MaxWltnessClusters 0
661tem$
Default host decommission mode for a gtVen node
I CANCEL I
398
10-24 Planned Maintenance
When perf orming maintenance, you must plan your tasks t o avoid failures and consider the
following recommendations:
• Unless Full Evacuation is select ed, components on a host become absent when the host
enters maint enance mode, which counts as a failure.
• Data loss can occur if too many unrecoverable failures occur and no backups exist.
• Never reboot, disconnect, or disable more hosts than the FTT values allow.
• Never start another maintenance activity before all resyncs are complet ed.
• Never put a host into maintenance mode if another failure exists in t he cluster.
399
10-25 About vSAN Disk Balance
The vSAN Disk Balance health check helps to monitor the balance state among disks.
By default, automatic rebalance is disabled. The status of this check turns yellow if the imbalance
exceeds a system-determined threshold.
Skylir1e Health
vSAN DisJ< Balance
Last check~d : OB/11/2020, 7.07:12 AM RE TES T
O'terview Disk Balance Info
v Cluster
CO NFIGU RE AUTO MAT IC REBA LANCE
9 Ad\ia need vSAN con fig ura t1on in s...
Metric Value
9 VSAN daemon liveness
Average Disk Usage 3%
[~•~v_s_A_N_o_i_s_k_B_al_a_nc_e_~~~~~_,>
Maxim um Disk Usage 3%
9 Resync operations throt tling
Maximum Load Variance 1%
9 vcenter state is authorit ative
Average Load Variance 0%
9 VSAN cluster configur ation consist ...
400
10-26 About Automatic Rebalance
When automatic rebalance is enabled, vSAN automatically rebalances the cluster to keep the
disk balance status green.
Rebalancing can wait up to 30 minutes to start, giving time for high-priority tasks, such as
entering maintenance mode or object repair, to use resources before rebalancing.
The rebalancing threshold determines when the background rebalancing starts in the system.
For example, rebalancing begins if any two disks in the cluster have defined variance.
Rebalancing continues until it is turned off o r the variance between disks is less than half of the
rebalancing threshold.
401
10-27 Enabling Automatic Rebalance
To enable automatic reba lance and set a rebalancing threshold, you select the vSAN cluster and
select Configure > vSAN > Services > Advanced Options > Edit.
~ Thinswap
Wheri enabled, swap objects will not reserve 100'6 of their space on vSAN datestore; storage policy
reservation w111 be respected
I Automatic rebalance
I
Wheri the cluster is unbalanced, rebalance starts automatically after enablirig automatic rebalance. Rebalance
can wait up to 30 minutes to start, 9iv1n9 time to high pr1orll:y tasks hke EMM, repair, etc. to use the resources
before rebalancing
Rebalancing threshold % 30
- - ------ - - - - -
u eterm1nes when background rebalancing starts in the system If any two disks 1n the cluster have tflis
much variance theri rebalancing begins It will continue until it is turned off or the the variance between
drsks is less than 1/2 of the rebalancing threshold
[ C AN CEL ) APPLY
402
10-28 Reserving vSAN Storage Capacity (1)
You can reserve vSAN storage capacity for the following maintenance activities:
• Operations reserve: Reserves capacity for internal vSAN operations, such as object rebuild
•
or repair.
• Host rebuild reserve: Reserves capacity to ensure that all objects can be rebuilt if host
failure occurs. To enable, you must have a minimum of four hosts.
When reservation is enabled and capacity usage reaches the limit, new workloads fail to deploy.
Reserve Reserve
Capacity Capacity
\,o .i, ed) (% va. cs)
I vSAN I I v SAN I
vSAN 7 vSAN 7 U1
403
10-29 Reserving vSAN Storage Capacity (2)
To enable vSAN capacit y reserve for internal operat ions and host rebu ild, select t he vSAN
cluster and select Monitor > vSAN > Capacity > Capacity Usage > Configure.
«) Operations reserve
[ CA NCEL ) APPLY
404
10-30 Shutting Down and Restarting vSAN
Clusters
To safely shut down a vSAN cluster, you must power off all VMs and put all hosts into
maintenance mode:
0 Move powered·off and suspended virtual machines to other hosts in the cluster
G 0 TO PRE-CHECK C A NCEL I
405
10-31 Rebooting vSAN Clusters Without
Downtime
When rebooting a vSAN cluster, you must reboot one host at a t ime so that the VMs do not
incur downtime:
1. Select the Ensure accessibility data migration option when placing hosts into maintenance
mode.
406
10-32 Moving vSAN Clusters to Other vCenter
Server Instances
You might be required to move the vSAN cluster from the existing vCenter Server instance t o
another:
1. Build a new vCenter Server inst ance using the same or a later version.
2. Ensure that networking is configured correctly on the new vCenter Server instance.
4. Configure ot her vSAN data services t o match the original clust er.
5. Create vSAN st orage policies to match the vSAN policies o f the original cluster.
6. Disconnect and remove all hosts from the inventory in the original vCenter Server instance.
7. Add hosts to t he cluster enabled with vSAN in t he new vCenter Server instance.
407
10-33 vSAN Logs and Traces
vSAN support logs are contained in the ESXi host support bundle in the form of vSAN traces.
The vSAN support logs are collected automatically by gathering the ESXi support bundle of all
hosts.
Because vSAN is distributed across multiple ESXi hosts, you should gather the ESXi support logs
from all the hosts configured for vSAN in a cluster.
VMware does not support storing logs and traces on the vSAN datastore.
By default, vSAN traces are saved to the /var/log/vsantraces ESXi host system
partit ion path.
408
10-34 Redirecting vSAN Logs and Traces
When USB and SD card devices are used as boot devices, the logs and traces reside in RAM
disks, which are not persistent during reboots.
Consider redirecting logging and traces to other persistent storage when t hese devices are used
as boot devices.
To redirect vSA N traces t o a persistent dat astore, use the esxcli vsan trace set
command.
For more information about redirecting vSAN logs and traces, see VMware knowledge base
article 1033696 at https://kb.vmware.com/s/article/1033696.
409
10-35 Configuring Syslog Servers
It is good pract ice to configure a remote Syslog server to capture all logs from ESXi hosts.
To configure a Syslog server in t he vSphere Client , select the ESXi host and select Configure >
Advanced System Settings > Syslog.global.logHost.
The remote host to output logs to. Reset to default on null. Mul11ple hosts are supported and muS1 be separated w11h comma (,).
Example· udp:UhostNamel514, hostName2, ssl//hostName31514
[ CANCEL ]
410
10-36 Lab 12: Verifying the vSAN Cluster Data
Migration Precheck
Examine data migration options and their effect on components:
411
10-37 Review of Learner Objectives
• Describe vSAN maintenance mode and data evacuation options
412
10-38 Lesson 2: vSAN Cluster Scaling and
Hardware Replacement
• Detail the removal and replacement o f disk and disk groups in a vSAN cluster
• Describe how to add a disk and a disk group t o scale up a vSAN clust er
413
10-40 About vSAN Cluster Scaling
vSAN scales up and scales out if you need more compute o r st orage resources in t he cluster.
..-------
1 Disk Group I
...-------
1 Disk Group I
...-------
1 Disk Group I
I I I I I I
Cache 1 Cache 1 Cache 1
.----·
I
I I
I
----1
I .
I SSD
• •
SSD
•
I
.-----·
I
I I
I I I . . . . I I I
I • • • •
I . . . . I I • • • •
I
I ______ ..
•
SSD
• •
SSD
• I
I
I .
SSD
. .
- - - - - - ...
SSD
.
I
I ______
•
SSD
• •
SSD
• ...I
• • • • • • • • • • • •
SSD SSD SSD SSD SSD SSD
• • • • • • • • • • • •
Scaling Up
Ill 0 Ill
r------- r------- --
....I -Disk
- Group- -I r-------
1
I
Disk Group
Cache
I
I
1
1
:
Disk Group
Cache
I
I
1
----·
: Cache 1
1
I
Disk Group
Cache
I
I
I
1
I
I
----1
I
.----1
I I
.----·
I I
I
I • • •
I • SSD • • SSD
•
I
I
• I
I I I I I I I
I • • • • • • • • • • • • I
I I I I
______ ..
I
I
I
I
SSD
- - - - - - ...
• • •
SSD
• I I
SSD
- - - - - - ...
• • •
SSD
• I
I SSD
I • • •
-----
SSD
•
..
I
Scaling Out
414
10-41 Increasing Capacity by Scaling Up
To increase storage capacity in a vSAN cluster:
Before replacing disks, ensure that the vSAN cluster has sufficient capacity to migrate your data
from the existing capacity devices.
415
10-42 Adding New Hosts to vSAN Clusters
You can add an ESXi host to a running vSAN cluster without disrupting any ongoing operations:
• Use the vSAN Disk Balance health check to rebalance the disks.
:I
I
•
Cache
• • •
I:
I
:[
I
•
Cache
• • •
I:
I
:[
I
•
Cache
• • •
]:
I
I
I
SSD SSD I
I
I
I
SSD sso I
I
I
I
SSD SSD I
I
• • • • • • • • • • • •
I I I I I I
• • • • • • • • • • • •
I SSD SSD I I sso sso I I SSD SSD I
I • • • • I I • • • • I I • • • • I
L - - - - - - - - - -' L--------- -" L ----------"
Ill 0 Ill
r----------.
1 Disk Group 1
I I
:I Cache I:
I I
1 • SSD . . SSD . 1
,------.
I .
1
•
SSD
. .
•
SSD
.
•
I
1
I
------
·
L---------"'
· · · I
416
10-43 Adding New Capacity Devices to Disk
Groups
You can expand the capacity of a disk group by adding disks:
• Add devices with the same performance characteristics as the existing disks.
417
10-44 About Disk Claim Management
vSAN has a uniform workflow for claiming disks in any scenario. Available disks are grouped
either by model and size or by host.
• Select the vSAN cluster, select Configure > vSAN > Disk Management, and click Claim
Unused Disks.
• Select the vSAN cluster, select Configure > vSAN > Disk Management, select a host, and
click Create disk group.
Disl< Management
418
10-45 Replacing Capacity Tier Disks
If you detect a failure, replace a capacity device:
1. Select the disk group and remove the capacity disk from the disk group.
:I Cache SSD I:
I I
:( Cache SSD I:
I I
:I Cache SSD ]:
I I
• • • • • • • • • • •
I I I I I I
SSD SSD SSD SSD SSD SSD
I • • • I I • • • • I I • • • I
I I I I I I
• • • • • • • • • •
I SSD I I SSD SSD I I SSD SSD I
• • I I • • • • I I • • • • I
L - - - - - - - - - ..1 L - - - - - - - - - .1 L - - - - - - - - - .1
If deduplication and compression is enabled on the cluster, a capacity device failure affects the
entire disk group. If you must replace a capacity device in a disk enabled for deduplication and
compression, you must remove the entire disk group.
4 19
10-46 Replacing Cache Tier Disks
When decommissioning a cache tier device, you must take the entire disk group offline:
420
10-4 7 Removing Disk Groups
When you are removing a disk group from a vSAN cluster, the vSphere Client describes the
impact of a disk group evacuation.
Running a precheck before removing a disk group is a best practice. Prechecks determine if the
operation will be successful and report the state of the cluster after the disk group is removed.
421
10-48 Replacing vSAN Nodes
When you are replacing a host, the replacement should have the same hardware configuration,
whenever possible.
422
10-49 Decommissioning vSAN Nodes
To permanently decommission a vSAN node, you must follow the correct procedure:
2. Place the host in maintenance mode and select Full data migration.
3. Wait for the data migration to complete and the host to enter maintenance mode.
4. Delete disk groups that reside on the host that you want to decommission.
5. Use the vSphere Client to move the ESXi host from the cluster to disassociate it from the
vSAN cluster.
423
10-50 Lab 13: Decommissioning the vSAN
Cluster
Evacuate and delet e t he vSA N cluster:
424
10-52 Review of Learner Objectives
• Describe vSAN cluster scaling
• Detail the removal and replacement of disk and disk groups in a vSAN cluster
• Describe how to add a disk and a disk group to scale up a vSAN cluster
425
10-53 Lesson 3: Upgrading and Updating
vSAN
426
10-55 vSAN Upgrades
The vSAN upgrade process includes several stages.
Depending on the vSAN and disk format versions that you are running, an object and disk
format conversion might be required.
If you upgrade the disk format, you cannot roll back software on the hosts or add incompatible
hosts to the cluster.
427
10-56 vSAN Upgrade Process
Before attempting a vSAN upgrade, review the complete vSphere upgrade process to ensure a
smooth, uninterrupted, and successful upgrade:
428
10-57 Preparing to Upgrade vSAN
Before upgrading to the latest version of vSAN, always verify your current environment.
Review the VMware Compatibility Guide to verify support for the fallowing items:
• Device drivers
x .....,. •••
"'
x
- • •
·- •
DOD @ SSD
- • • • •
1111111
429
10-58 vSAN Upgrade Phases
vSAN is upgraded in two phases:
•••
I
•
Cache
•
I I Cache
• •
I I
•
Cache I
SSD SSD SSD SSD SSD SSD
• • • • • • • •
• • • • • • •
SSD SSD SSD SSD SSD SSD
• • . • • • • • • . •
430
10-59 Supported Upgrade Paths
VMware supports a range of upgrade paths for vSphere, which includes vSAN.
0 Compatible
0 Incompatible
Not supported
VMware Product Interoperability Matrices
1. Select a Solution
I V Mware VSANW
VMware VSAN• 7.0 U1 7.0 6.7 U3 6.7 U2 6.7 U1 6.7 6.6.1 U3 6.6.1 U2 6.6.1
7.0
0
6 .7 U3
0 0
6 .7 U2
0 0 0
6.7 U1
0 0 0 0
6.7
0 0 0 0 0
6.6.1 U3
0 0 0 - - -
6.6.1 U2
0 0 0 0 0 - 0
6.6.1
- - 0 0 0 0 0 0
6.6
0 0 0 0 0 0 0
6 .5
- - 0 0 0 0 0 0 0
431
10-60 About the vSAN Disk Format
The disk format upgrade is optional. Your vSAN cluster continues to run smoothly even if you
use a previous disk for mat version.
For best results, upgrade disks to use the latest disk format version, which provides the new
vSAN feature set.
After you upgrade the on-disk format, you cannot roll back software on the hosts or add certain
older hosts to the cluster.
Disk format upgrade is an optional final step when you upgrade a vSAN cluster. You mig ht
choose not to upgrade the disk format if you want to maintain backward compatibility with hosts
on an earlier version of vSAN. For example, you might want to retain the ability to add hosts to
the cluster with vSAN 7.0 GA to provide burst capacity.
Disk format upgrades from v3.0 (vSAN 6.2) to a later version only update disk metadata and do
not require data evacuation. For more information, see VMware knowledge base article
2148493 at https://kb.vmware.com/s/article/2148493.
432
10-61 vSAN Disk Format Upgrade Prechecks
When you initiate an upgrade precheck of the on-disk format, vSAN verifies for the following
conditions:
vSAN also verifies that no outstanding issues exist that might prevent upgrade completion.
433
10-62 Verifying vSAN Disk Format Upgrades
After you complete the disk format upgrade, you must verify whet her the vSAN cluster is using
the new on-disk format.
Select the vSAN cluster and select Configure > vSAN > Disk Management.
Disk Management
services >
Configura tion > t# All 8 disks on version 13.0.
Llcensrng >
CLAIM U NUSE;O DISKS CRE;ATE; DISK GROUP GO TO PRE·CHECK
Trus1 Authority
Alarm Definrtions Disk Group T DI.sics In Use T State T Health T
Scheduled Tastes
> 13 sa-esxi-01.vdass local 2 of 2 connec1ed Healthy
vSAN v
Di sk tv1anagemen1
> 13 sa-esxi-03 vc1ass.1oca1 2 o1 2 Connec1ec:l Healthy
f3ult Domains
434
10-63 vSAN Build Recommendations
vSAN build recommendat ions include patch and applicable driver updates. To update the
firmware on vSAN 7.0 clusters, you must use an image through vSphere Lifecycle Manager.
435
10-64 vSAN System Baselines
vSAN build recommendat ions are provided through vSAN system baselines for vSphere
Lifecycle Manager:
• vSAN system baselines are listed in the baselines pane of the vSphere Lifecycle Manager.
• vSAN system baselines can include custom ISO images provided by certified vendors.
• vSphere Lifecycle Manager automatically scans each vSAN cluster to verify compliance
against the baseline group.
To upgrade your cluster, you must manually remediate the system baseline through vSphere
Lifecycle Manager.
436
10-65 Review of Learner Objectives
• Describe the stages in the vSAN upgrade process
• The Full data migration maintenance mode option migrates all data and can be used when
the host is being decommissioned or removed permanently from the cluster.
• The No Data migration maintenance mode option is to be used when the entire cluster must
be shut down.
• A running vSAN cluster can be scaled up and scaled out without disrupting any ongoing
operations.
• vSAN on-disk for mat conversion enables new data services whose impact on your
environment must be considered before upgrading.
• Before upgrading to the latest version of vSAN, always verify your current environment.
Questions?
437
Module 11
vSAN Stretched and Two-Node Clusters
11-2 Importance
The vSAN stretched cluster is a solution that is implemented in cases where disaster avoidance
or swift disaster recovery is important.
439
11-4 Lesson 1: vSAN Stretched Clusters
• Explain how read and write I/ 0 management works in vSAN stretched clusters
440
11-6 About vSAN Stretched Clusters
A standard vSAN cluster is limited to one site. The data availability and fa ult tolerance provided
by VM storage policies are limited to a single site.
A vSAN stretched cluster spans three sites to protect against site-level failure. If one site goes
down, the VMs can be powered on at the other site with minimal downtime.
A vSAN stretched cluster extends the concept of fa ult domains so that each site represents a
fa ult domain. The distance between the sites is limited, such as in metropolitan or campus
environments.
W itness
f111 0 1111
VM VM VM VM
01 02 03 04
vSphere vSA N
r------ r------
1 1111 0 1111 : 1 1111 o 1111 :
I
: -11-11-0--11-11 1 1
1 1111 0 1111
I I I I
I
- ::-: I 111 0 Ill I: 1 1111 0 1111 1 • • •
• • •
. . ·--
••• L - - - - - - I • • •
L------ • • •
• • • • • •
- -1 · .. • • •
• • • • • •
• • • • • •
vSAN stretched clusters can be used in environments where disaster and downtime avoidance
is a key requirement.
Stretched clusters protect VMs across data centers, not only racks.
441
11-7 vSAN Stretched Cluster Use Cases (1)
vSAN stretched clusters have the fallowing use cases:
• Automated recovery
With stretched clusters, you can perform planned maintenance of one site without any service
downtime. You can use vSphere DRS affinity rules to run VMs on a specific data site.
You prevent production outages before an impending service outage, such as power failures.
442
11-8 vSAN Stretched Cluster Use Cases (2)
vSAN stretched clusters can be used with vSphere Replication and Site Recovery Manager.
Replication between vSAN datastores enables a recovery point objective (RPO) as low as 5
minutes.
Site
Recovery
Manager
vSphere Q vSAN
·---
• • • • • • • • •
• • • • • • • • •
. . . ....--..... ... ....--.....
...
... - - ... --
... ...
.. .-
-
... -
... - ... -
... - ... -
.. . -
Data Site 1 Data Site 2 Recovery Site
443
11-9 Design of vSAN Stretched Clusters
A vSAN stretched cluster spans three sites to protect against site-level failure.
• Witness site
Only the preferred and secondary data sites contribute to the compute and storage resources.
Pref erred and secondary sites can have a maximum of 15 ESXi hosts each, so a stretched
cluster can have a maximum of 15+15+1 hosts.
W itness
(111 0 111J
VM VM VM VM
01 02 03 04
vSphe re vSA N
r------ r------
1 f111 0 1111 : 11111 0 1111 :
I I
1 1
1 1111 0 1111 1 1111 0 1111
I I I I
• • •
1 1111 0 111 1 1 , 1111 0 1111 1 • • •
• • • • • •
• • • I I • • •
L------ L------
• • • • • •
• • • • • •
• • • • • •
• • • • • •
• • • • • •
VMs deployed on a vSAN stretched cluster will have one copy of its data on site A, the second
copy of its data on-site B, and any witness components placed on the witness host in site C. This
configuration is achieved through fa ult domains alongside hosts, VM groups, and affinity rules.
The witness site contains a single virtualized witness host that stores only w itness components.
The purpose of the witness site is to provide a mechanism to break the tie if a split-brain
•
scenario occurs.
444
If a complete site failure occurs, a ful l copy of the VM data and more t han 50% of the
components will be available. This enables t he VM to remain available on the vSAN datastore. If
VMs need to be restarted on another site, you can configure vSphere HA to manage this task.
445
11-10 About Pref erred Sites
The preferred site is the data site that remains active when a network partition occurs between
the two data sites.
If a failure occurs, VMs on the secondary site are powered off and vSphere HA restarts them on
the preferred site.
Wit ness
f111 0 111)
VM VM VM VM
vSphere vSAN
r------ r------
1 f111 1111 :
0 1 1111 1111 :
0
1 _ _ __ 1 _ _ __
1 1
1 1111 0 1111 1 1111 0 111)
I I I I
: : : I f 111 0 1111 1 1 1111 0 111) : -::-:
•. ·--
. • .
• • •
• • •
L------ I L------
--· .. •••
• • •
• • •
• • • • • •
• • • • • •
Failure
446
11-11 About Witness Hosts
A vSAN stretched cluster requires a witness host to store the witness components for VM
objects:
• The witness host stores only witness components to provide a cluster quorum.
The Witness host can be deployed as either a physical ESXi host or a vSAN witness appliance. If
a vSAN witness appliance is used for the witness host, it w ill not consume any of the customer's
vSphere licenses. A physical ESXi host that is used as a witness host will need to be licensed
accordingly.
447
11-12 Sizing Witness Hosts
When deploying a vSAN witness appliance, you must estimate how many VMs are required for
the business and the amount o f components t hat make up a VM. This estimat e depends on t he
number o f virt ual disks, policy settings, and snapshot requirements.
Tiny 750 10
448
11-13 vSAN Stretched Cluster Heartbeats
vSAN designates a master node on the preferred site and a backup node on the secondary site
to send and receive heartbeats.
If communication is lost between the witness host and one of the data sites for five consecutive
heartbeats, the witness is considered down until the heartbeats resume.
Witness
(111 0 1111
Master Backup
Node Node
- - -
I I I I
I Ill 0 Ill I I Ill 0 Ill I
•
•
• •
• •
: I111 0 111 I: :I111 0 111 I: • • •
• • •
•
•
•
• •
•
•
•
•
- - - - - - - - - - - - - - • • •
• •
• •
•
•
• • • • • •
• • • • • •
• • • • • •
449
11-14 Managing Read and Write Operations
vSAN stretched clusters use a read localit y algorithm t o read 100% from the data copy on the
local site. Read locality reduces the latency incurred during reading operations.
0 Witness Site
100°/o Read
100010 W r i te 100°/o Writ e
Read
Cach e .
Write
Buffe . Read
Cache •
W r ite
Buffer
nr
....
':"'
11. l\/ ~A ~
.
I NV Me
I I NV Me
I ....
':"'
1\1\1 ~.A ~
.
I NV Me
I I NV Me I
vSAN IS D
..:... Replica 1
-
•
.
BB BB vSAN
..:... Replica 1
-
SD
.
.
BB BB
BB BB BB BB
• •
_SD
. --- . . . . -- - . . SD
.
II
Disk Group Disk Group Disk Group Disk Group Disk Group Disk Group
In vSAN stretched clusters, the mirrors are located on different sites. The distance increases the
latency. As a result, read ing the data from the remote sit e is not efficient because it can affect
the performance of applications. However, the writes must be sent to all the available mirrors on
both pref erred and secondary sites.
450
11-15 Stretched Cluster Networking
A stretched cluster has the fallowing network requirements:
• Connectivity to the management network and the vSAN network on all three sites
Both data sites must be connected to a vSphere vMotion network for VM migration.
Witness
(111 0 111(
Layer 3 Network
• • • • • •
• • • • • •
• • • • • •
• • • • • •
• • • • • •
• • • • • •
• • • • • •
• • • • • •
451
11-16 Network Requirements: Between Data
Sites
A vSA N stretched cluster network requires connectivity across all t hree sit es. It must have
independent routing and connect ivity between the data sites and the wit ness host.
Bandwid t h bet ween sites hosting VM objects and the witness node is dependent on how many
objects reside on the vSAN cluster. You must appropriately size the dat a site to t he witness
bandwidth for both availability and growth.
In a vSAN stret ched configuration, you size the writ e 1/0 according to t he int ersit e bandwidth
requirements. By default, the read traffic is handled by t he site on which the VM resides.
vSAN stretched cluster (between data sites) 10 Gb or faster with latency <5 ms RTT.
The required bandwidth between two data sit es (B) is equal to write bandwidth (Wb) * data
multiplier (md) * resynchronization mult iplier (mr): B = Wb * md * mr.
The dat a mult iplier comprises overhead for vSAN metadata traffic and miscellaneous re lated
operations.
Using a data multiplier of 1.4 is a best practice. The resynchronization multiplier is included t o
account for resynchronizing event s. To make room for resynchronization traffic, another best
practice is to allocate 25% addit ional bandwidth capacity to the required bandwidth capacity for
resync hronizat ion events.
• A workload of 10,000 writes per second to a workload on vSAN with a typical 4 KB size
write would require 40 MBps or a 320 Mbps band width.
• Including the vSAN network requirements, the required bandwidth would be 560 Mbps.
452
11-17 Network Requirements: Between the
Data Sites and the Witness Site
The net work bandwidth required between the data sites and the w itness site is calculated
differently from the intersite bandwidth required f or data sites.
Witness sites do not maintain VM data. They contain only component metadata.
Between data sit es and witness 2 Mbps per 1,000 vSAN components • < 500 ms lat ency
host RTT (1 host per site)
The bandwidth required between the witness and each data site is equal to ~ 1 138 B x number of
components I 5 seconds. 1138 B x number o f component s I 5 seconds.
• 166 VMs wit h the requirement for t he w itness to contain 996 components (166 VMs * 3
components/VM * 2 (FTT +1) * 1 (stripe widt h)
To sat isfy the witness bandwidth requirements for a total of 1,000 components on vSAN, use
the f allowing calculation:
• A best pract ice is to add a 10% safety margin and round up.
• With the 10% buffer included, 2 Mbps is generally appropriat e for every 1,000 components.
453
11-18 Static Routes for vSAN Traffic
By default, vSphere uses a single default gateway. All routed t raffic tries to reach its destination
through this common gateway.
You might need to create static routes in your environment to override the default gateway for
vSAN traffic in certain situations:
• If the stretched cluster deployment has both data sites and t he w itness host on different
networks
You can create a static route before overriding the default gateway by using fallowing es x c 1 i
command:
vrnk2 - Ed t S ttlngs
•
Pott proPQ1tl s
0192 1
...
I CANCEL I
454
11-19 Planning for High Availability
For high availability, a best pract ice is to run at 50% of resource consumption across t he vSA N
st retched cluster. If a complete site failure occurs, all VMs could be run on the surviving site.
Some customers might prefer to use more than 50% of the available resources. However, if a
failure occurs, not all VMs will be restarted on the surviving site.
4 55
11-20 Configuring Stretched Clusters
To configure a st retched cluster, select the vSAN cluster and select Configure > vSAN > Fault
Domains > Enable Stretched Cluster.
Group hosts into preferred and secondary fa ult domains, select a w itness host, and create a disk
group on t he witness host.
C AN CEL El F INI S H
456
11-21 Replacing a Witness Host
If the wit ness host fails, a new witness host can easily be added to the st retched configuration.
Fault Domain s
0 0 sa-eSXJ-02.v_. I 3% O 0 sa-eSXJ-04.v__ 2%
457
11-22 Stretched Clusters and Maintenance
Mode
In a st retched cluster, you can use maintenance mode on data site hosts and on the witness
host:
• For a data site host , select the required vSAN data migration option.
• For the wit ness host, data migration does not occur.
This hoS1 1s n a vSAN ciuS1er Once the hoS1 ts PUt In maintenance mO<!e, It will not have access to the vSAN ciataS10f"e and the
state o1 any Virtual machines on that dat as1ore A host 1n maintenance mode does not per1orm any ac11v ties on virtual machines,
1nclud1n9 virtual machine prov1sion1ng 'l'ou m19ht need to either power off or migrate tne virtual machines f rom t he host
manually
~
.,, Molle powereo-011 ano suspencsecs 111nua1machines10 other nosts 1n the cluster
I CA"I CEL I
A host in main1enance mode does not pertorm any activities on virtual machines,
including virtual machine provisioning. The host con1iguration is still enabled. The
Enter Main1enance Mode 1ask does not complete until the above state is
completed. You migh1 need to either power 011 or migrate the virtual machine s
1rom the host manually. You can cance l the Enter Maintenance Mode task at any
t ime.
CAN CEL
458
11-23 Monitoring Stretched Clusters
VMware Skyline Health provides a range of tests to verify t he health status of stretched
clusters.
S1orage Overview
9 unicast agent configuration incon...
Securi1y
v
9 Invalid preferred fault domain on ..
vSAN
Vir1ual Objects
9 Witness host within v Center cluster
Physical Disks
Resync1ng Objects
9 Witness host faul t domain miscon...
Pertormance
9 No disk claimed on witness host
459
11-24 Review of Learner Objectives
• Identify characteristics of vSAN stretched clusters
• Explain how read and write I/ 0 management works in vSAN stretched clusters
460
11-25 Lesson 2: vSAN Stretched Cluster
Failure Handling
461
11-27 vSAN Stretched Cluster Failure Handling
(1)
Each site in a stretched cluster resides in a separate fa ult domain. A vSAN stretched cluster can
tolerate one link failure at a time w ithout data becoming unavailable.
The witness host serves as a tiebreaker when a decision must be made regarding the availability
of datastore components and t he network connection between the two data sites is lost . In this
case, the witness host typically forms a vSAN cluster with t he preferred site.
W itness
f111 0 1111
VM VM VM VM
01 02 03 04
vSphe re vSA N
r------ r------
1 f111 0 1111 : 1 f111 o 1111 :
I I
1 1
1 f111 0 1111 1 f111 0 1111
I I I I
• • • 111 1 1
1 1111 0 , 1111 0 1111 1 • • •
• • • • • •
• • • I I • • •
• • •
L------ L------ • • •
• • • • • •
• • • • • •
• • • • • •
• • • • • •
Failure
462
11-28 vSAN Stretched Cluster Failure Handling
(2)
If the pref erred site becomes isolated from the secondary site and the witness, the witness host
forms a cluster using the secondary site. When the preferred site is online again, data is
resynchronized to ensure that both sites have the latest copies of all data. If the witness host
fails, all corresponding objects become noncom pliant but are ful ly accessible.
W itness
(111 0 111)
VM VM VM VM
01 02 03 04
vSphere vSA N
r------ r------
11111 0
1 _ _ __
1111 : 11111 0 1111 :
I
1l••1 o 1111 1
111••
I
o 1••1 I' I I
I
- ::-: I 111 I0: Ill , 1111 0 1111 1 • • •
·--
• • •
••• L - - - - - - L------ I • • •
.• .• • • • •
• • •
• • • • • •
• • • Preferred Site Isolated • • •
• • • • • •
Failure
463
11-29 vSAN Stretched Cluster Site Disaster
Tolerance
Use stretched cluster-aware storage policies to tolerat e a site failure.
If one data site fails, t he rep lica on the other site is available to continue VM operations:
vSAN
Use dual site mirroring (stretched cluster) VM storage policy rules t o determine the fault
tolerance.
The site disaster tolerance governs t he failures to tolerate across sites. If a dat a site goes down,
the replica on t he remaining site and the w itness component remain available to cont inue
operations.
The failures to tolerat e governs t he failure tolerance within each sit e. This setting ensures that
the object can survive failure within the sites.
464
11-30 Site Disaster Tolerance: Dual Site
Mirroring
The dual site mirroring policy in a stretched cluster maintains one replica on each data site.
If one data site goes down, the replica on the remaining site and the witness component remain
available to continue operations.
When choosing this policy, you must ensure that both data sites have sufficient storage capacity
to each accommodate a replica.
Consider the number of objects and their space requirements when applying a dual site mirroring
policy.
465
11-31 Dual Site Mirroring with RAID 1
Dual site mirroring with RAID 1 ensures that the object remains accessible in t he event o f a site
failure, in addit ion to a node failure on the remaining site.
You must ensure that the number o f hosts and drives available on each site can satisfy the
Failures To To lerate and St ripe Width policy sett ings.
---------------
•I W itness Site I
I
I
I
11 Witness II I
. ---- -------- I
-- - - - ----, I - - - -
r"""':!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!~.., I
I I I 11 Witness 11 W itness I~ Component , I I
I
I ~!!!!!!!!!!!!!!!~~~., I
I I~ Component , I I 11
~~ Component p
~11_ 1 ......
_ ,
• • •
I L....:;;iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii~..I
Preferred Site
I
I No n-Preferred Site • • •
. . . I • • •
... - - - - - - - - - - - - - - - - -
.. .. ·-- --·
L - - - - - - - - - - - - - - - - - •••
. - ..
-- ...
.. .. .. -- ...
-- ...
... - ...
Data Site 1 Data Site 2
466
11-32 Dual Site Mirroring with RAID 5/6
vSAN stretched clusters also support RAID 5/6 erasure coding within t he two data sites. Wit hin
each site, four or six hosts are required for RAID 5 and RAID 6, respectively.
In t he example, the vSAN object is mirrored between t he sites. If a single host failure occurs
within a site, the object can tolerat e using RAID 5.
I--------------~
W itness Site I
I
I
I
11
Witness II I
.- ---
I
I
- - - - - ----, I - - - - . - - - - - - - · - - - -I
~ I
ponent I 1
I
I I~ Component
~I I~ Component
~I I
- I
I
I1~ ~ 1I
I
I~ Component ~I I~ Com ponent ~I I Component
I~ Component
~I i
• • • Preferred Site 1 I --
Non-Preferred Site
I
I . . .
• • •
·--
· · · I
... - - - - - - - - - - - - - - - - -
... -
.. .. .. --
I
L - - - - - - - - - - - - - - - - - •••
--· ..
-- ...
...
... - -- ...
...
Data Site 1 Data Site 2
4 67
11-33 Keeping Data on a Single Site
You can use t he following VM storage policy opt ions to place t he components of an object on a
single site within the stretched clust er:
vSphere Fault To lerance for VMs is supported for VMs t hat are restricted t o a single sit e.
468
11-34 Symmetrical and Asymmetrical
Configuration
vSAN 6.1 or later supports symmetrical configurations where site 1 and site 2 contain the same
number of ESXi hosts, and the witness host in a third site.
With the asymmetrical configuration, some workloads would be available only in site 1 (using a
PFTT=O/SiteAffinity =Preferred), and others would be available in both site 1 and site 2.
469
11-35 Activity 1
Scenario factors:
Witness
1111 0 1111
VM VM VM VM
01 02 03 04
vSphe re vSAN
------- -------
:I111 0 111 I: : 1111 o 1111 :
I
I
I111 0 111 I 1
I
' 1111
I
0 1111 1
I
• • •
• • •
I 1111 0 111 I 1 I IIll 0 1111 1 : : :
• • •
• • •
• • •
• • •
I ______ J I
------ --· ..
J • ••
• • •
• • •
470
11-36 Activity 1 Solution
How does t he cluster respond to the outage?
vSphere HA restarts t he VMs on the preferred site using t he replica component that is available
on t he preferred site and t he witness component.
Witness
1111 0 111f
01 02 03 04 03 04
I
I
-- ---·..
.
I
vSphe
r------ r------
1 1111 0 1111 : 1 1111 0 111) :
I
1 1111 0 111) : : -111_1_0--11-1) 1
I I
•
•
•
•
•
•
:I111 0 111 ) 1 1 f111 0 1111 : -:_:_:
•
•
•
•
•
•
•
•
•
•
•
•
L------ I L------
--· .. •••
• • •
• • •
471
11-37 Activity 2
In this scenario, a host in the preferred site failed.
Scenario factors:
How does the system respond to the outage of one host in the preferred site?
How does the system respond to the outage w ith multiple hosts in the preferred site?
Witness
1111 0 1111
VM VM VM VM
01 02 03 04
vSphe re vSA N
------- -------
: ( 111 0I: 111 : ( 111 0 111 I:
I
I
II II x 111 I ' I
I ( 111
I
0 111 ) 1
I
I______ 111 I
• • •
• • •
• • •
• • •
I
[
I II 0
, 1 I 1111
[
0
______ ,
111 ( 1 • • •
• • •
• • •
• • •
• • • • • •
• • • • • •
• • • • • •
• • • • • •
472
11-38 Activity 2 Solution
How does the system respond to the outage of one host in the Preferred site?
How does the system respond to the outage of multiple hosts in the preferred site?
W itn ess
1111 0 1111
vSphere HA vSphere HA
Restart Restart
r - - - - - - - - ..
I
r-------- .
I
I
VM VM I : VM VM VM I VM
or
__
I
I 01 02 ..
_________
I
I
I , 01 .____02_ .. 03
I
I
I
04
---------
vSphe re vSA N
------- -------
:I111 0 111 I: :I111 0 111 I:
I
I
1111 x 111 ) 1
I
I
I
1111 0 111 ) 1
I
•••
• • •
I 1111
, ______ J
0 111 ) 1 I
I 111 0
l ______
111 ) 1
J
•
•
•
•
•
•
• • • • • •
• • • • • •
••• • • •
••• • • •
••• • • •
• • • • • •
473
11-39 Activity 3
In t his scenario, the witness host stopped responding t o the data sites but both data sites remain
connected to each other.
Scenario factors:
Witness
I111 ) ( 111 I
VM VM VM VM
01 02 03 04
vSphe re vSA N
r------ r------
I ( 111
I
0 111 I: I ( 111
I
0 111 I:
I ( 111 0 111 I : I ( 111 0 111 I :
• • • : ( 111 0 111 ( 1 : ( 111 0 111 ( 1 • ••
• • • • ••
I I
• • •
• • • L------ L------ •••
• • •
• • • • ••
• • • • • •
• • • • ••
• • • • ••
474
11-40 Activity 3 Solution
How does the cluster respond to the outage?
VMs witness components are marked absent, and VMs continue to run without interruption.
Because the components on the two data sites constitute a quorum, the object remains
available.
W itness
I111 x I111
VM VM VM VM
01 02 03 04
vSphe re vSA N
.. - - - - - - .. - - - - - -
I ( 111
I
0 111 I: I ( 111
I
0 111 I:
I 1111 0 111 I : I 1111 0 111 I :
111 I
I I
::: 1 (111 1 0 111 I I 1111 0 1 • • •
• • •
• • •
• • •
L------ I L------ I • • •
• • •
• • • • • •
• • • • • •
• • • • • •
• • • • • •
475
11-41 Activity 4
In t his scenario, a network outage occurs between the witness and t he preferred site.
Scenario factors:
Witness
Failure I111 o 111 J
VM VM VM VM
01 02 03 04
vSphere vSA N
.------ I - - - - - -
I
I
1111 0 111 I: I
I
1111 0 111 I:
I 1111 0 111 I : I 1111 0 111 I :
• •
• •
I
• I
•
I111 0 111 11
I
I
I 1111 0 111 I '
I
•••
• • •
• • •
• • • L------ L------ •••
• • •
• • • • • •
• • • • • •
• • • • • •
• • • • • •
476
11-42 Activity 4 Solution
How does t he cluster respond to the outage?
VMs continue to run without interruption. The wit ness is placed in a network partition until t he
communication is re-established.
The dat a sit es maintain a quorum for the VM dat a. Because the witness host does not have
connectivity to all hosts, it is placed in its own network part it ion to prevent conflicts.
Witness
Failure I111 o 111 I
VM VM VM VM
01 02 03 04
vSphere vSA N
.------ r------
____
,' 1111 0 1111 : I
I
1111 0 111 I:
1 1111 0 1111 1 I 1111 0 111 I :
I I
IIll
: : : I 0 1111 1
I
I 1111 0 111 1 1 •••
• • •
. . ·--
••• L ______ I I
L------ •••
• • •
• • • • • •
- -t · .. • ••
• • • • • •
• • • • • •
477
11-43 Activity 5
In t his scenario, a network outage occurs between the preferred and t he secondary sit es.
Scenario factors:
W itness
111 0 111
1 1
VM VM VM VM
01 02 03 04
vSphe re vSA N
r- - - - - - - ,_. - - - - - -
:I111 0 111 I: :I1•1 0 111 I:
I
I
I111 0 111 11
I
I
I
I111 0 111 1 1
I
•
•
•
•
•
•
•
•
•
I 1111
l ______ I
0 111 I 1 II111 1
l ______ I
0 111 I •
•
•
• •
• •
• •
• • • • • •
• • •
-- -- • • • • • •
• • •
• • •
• • • -- -- ...
...
Failure
478
11-44 Activity 5 Solution
How does t he cluster respond to the outage?
All VMs running on t he preferred sit e cont inue t o run uninterrupted. All VMs on t he secondary
site are aut omatically powered o ff. and vSphere restarts t hem on t he preferred site.
After the out age is reso lved, vSphere DRS migrates VMs based on the defined affinity rules in
place.
W itness
f111 0 1111
r------ r------
1 f111 0 1111 : 1 f111 0 1111 :
: -,1-11-0--11-1f I : -111-1-0--11-11 1
I I I I
- ::-: I f 111 0 Ill I: I f 111 0 111 I :-::-:
-
.. ·--
••• L - - - - - -
• • •
-1 · ..
L------
--· .. •••
• • •
• • •
• • • • • •
• • • • • •
Failure
4 79
11-45 Activity 6
In t his scenario, the preferred site has failed and vCenter Server also resides on the failed site.
Scenario factors:
Witness
(111 0 1111
VM VM
vSphere vSA N
~------ ~------
480
11-46 Activity 6 Solution
How does the cluster respond to the outage?
vSphere HA restarts all the VMs from the failed site to the other data site.
Witness
1111 0 1111
vSp here HA
p estart
,_ -- - r --------- I
I
VM VM I
VM VM I
I
------ --------- I
vSphere vSA N
,.- - - - - - - ...-------
: 1111 0 1111 : I
I
I••• 0 111 I:
1 1111
I
0 1111 1
I
I
I
I111 0 111 1 1
I
: : : I I111
··· L ______ I
0 Ill I 1 I ( 111 0 111 I 1
I
•
•
• •
• •
L- - - - - - •
•
• •
• •
• • •
• • •
• • •
• • •
481
11-4 7 Lab 15: Configuring the vSAN Stretched
Cluster
Configure the vSAN stretched cluster:
482
11-48 Review of Learner Objectives
• Explain how vSAN stretched clusters handle failures
483
11-49 Lesson 3: Two-Node vSAN Clusters
484
11-51 About Two-Node vSAN Clusters
vSAN two-node clusters are implemented with two ESXi hosts and a witness host.
Two-Node Cluster
' ~I
vSphere vSAN
~
,____~~~~~~~---)\'--~~~~~~~~~
r----------------~-------,
485
11-52 Two-Node vSAN Cluster Use Cases
The two-node architecture is ideal for remote office/branch office (ROSO) use cases:
Sharing a witness host between a two-node cluster and a stretched cluster or between multiple
stretched clusters is not supported.
• • •
------------------I
·--
• • •
• • •
...
.. - I
. . . -_ ..
ROBO Site 1 ...
... - I I
~ I I
I I
I111 0 111 I 111 0 '"'11,1 I I
I I
vSphere ( vSAN ''• I I
' ,,
I I
Shared Witness I
• • • I' ~
Appliances I
·--
• • •
• • • I
• • •
.: .: :
I
I
' ~ • • •
• • • I
RO BO Site 2 - -
.. .. .. -- + - - - - - - I1- .. + VM • • •
• • •
I
0 11111111 0 1111
I
.,, _
......__ ___,
• • •
• • •
• • •
I
I
I
I .'
, , I I
, , I
I
, , I
• • •
• • •
, , I
ROBO Site 3
• • •
..
• • •
·--"" ,
.. .. ..
Centralized
Data Center
I
I
... - ~
------------------
I
486
11-53 Two-Node Direct Connect vSAN
Clusters
Two-node direct connect vSAN clusters are intended for small site configurations w ithout a
physical network switch at the remote site. You can connect both data nodes of the remote site
cluster using a direct network cable.
A two-node direct connect configuration further reduces the cost of deploying a two-node
cluster. It eliminates the need to configure a physical switch at each remote site for local
connectivity between the hosts of a two-node cluster. It also eliminates physical networking
configuration overhead between the two hosts.
You can use the following command to define a VMkernel port that can be used for w itness
traffic:
Two-Node Cluster
vSphere vSAN
------------------------ ·------1
I ESXi Host 1O Gb Cable ESXi Host I 1 ESXi Host
I 111 0 Ill :
vSAN Traffic I
1 1 1
1
Site 1 1 Site 2
.. _ _ _ _ _ _ I
-----------------------1
To set up a two-node direct connect vSAN cluster:
1. Connect both data nodes of the remote site cluster using a direct network cable.
2. Configure a separate network adapter to communicate w ith witness node over the WAN.
3. Configure static routes to allow communication between data nodes and the witness node.
487
11-54 Shared vSAN Witness Nodes
Multiple remote or branch office sites can share a common vSAN witness host to store the
witness components for their vSAN objects.
A single witness host can support up to 64 two-node vSAN clusters. The number of two-node
vSAN clusters supported by a shared witness host is based on the host memory.
• • •
• • •
------------------I
• • •
..
... ·--
--
ROBO Site 1 ...
...
... -_ ..
~
I
I
I
I111 0 111 I 111 0 ~Ill I
I
vSphere vS AN ', I
• I
1
', Shared W itness I
• • •
:' ~ Appliance Nodes I
·--
• • •
• • • • • • I
RO BO Site 2
..
::: -
•
•
•
•
•
• I
.. .. ..- -- + - - - - - - I1- ~ + VM • •
• • •
• I
• • • I
I
I ~
, 1f • • •
• • •
I
I
I vSphere Q
___v_S_A N_ _____.I , ~
I , I
I
, , I I
, , I I
I
, , I
I I
• • •
, I I
·---"" , ,
• • •
• • •
I
..
• •
I Centralized
ROBO Site 3 •
. .. .. I I
.... - ~
I
Data Center I
------------------
1111 o 11111111 o 1111
I vSphere Q vS AN
488
11-55 Witness Node Locations
You can run a vSAN witness node at the following locations:
• On a vSphere hypervisor (free) inst allation using any supported storage (VM FS dat astore or
NFS datastore)
489
11-56 Shared vSAN Witness Node Memory
Requirements
How many wit ness components that a shared vSAN witness node can support depends on the
allocated memory during t he appliance deployment . A best practice is to allocat e 16 GB and 32
GB memory for sharing between multiple two-node vSAN clusters.
>=32 GB 64,000 64
16 GB 32,000 32
8 GB 750 1
490
11-57 Shared vSAN Witness Node for a Mixed
Environment
The vSAN shared w itness node can be shared with multiple two-node vSAN clusters running
different versions of ESXi.
If the two-node vSAN cluster version is later t han the version of the vSAN shared witness node,
the witness node cannot part icipat e in t hat cluster.
• • •
• • • ~-----------------
• • •
• • •
ROBO Site 1 •
•
•
• •
• •
• •
--
vSAN [v7] • • • =~
I111 0 111 I Ill 0
' ,.Ill
, .,, • • •
• • •
• • •
• • •
,,
,~
, , , I
, ,
•
•
• •
• •
, ,
ROBO Sit e 2
vSAN [v6]
•
•
• •
• •
• • •
-- , , Centralized
• • •
Data Center
• • •
• • •
-- ------------------
I111 0 11111111 0 111 I
I vSphere Q......__ _v_
S_A N_ ____,
491
11-58 Configuring a Two-Node vSAN Cluster
You can use the Cluster Quickstart guide to configure a two-node vSAN cluster. However, the
witness host must be added to the data center, not to the cluster, before starting the w izard.
t on
C AN CE L B NEXT
492
11-59 Review of Learner Objectives
• Explain the two-node vSAN cluster architecture and use cases
• Only data site hosts contribute to the cluster compute and storage resources.
• The witness host stores only the w itness components for vSAN stretched cluster VM
objects.
• vSAN stretched cluster-aware VM storage policy rules are available to determine fault
tolerance.
• Two-node vSAN clusters are suitable for running a small number of workloads that require
high availability.
• Shared vSAN w itness nodes can participate in multiple two-node vSAN clusters.
Questions?
493
494
Module 12
vSAN Cluster Monitoring
12-2 Importance
You must regularly monitor the health and pert ormance of a vSAN environment. Doing so
enables you to make t imely decisions and to avoid any unwanted events that might affect the
pertormance of your virtual infrastructure.
vSphere Client provides several tools to enable you to monitor the health and pert ormance of
the vSAN cluster.
495
12-4 Lesson 1: vSAN Health Monitoring
496
12-6 About CEIP
CEIP helps VMware improve its products and services by regularly collecting anonymized,
technical information about VMware products from your organization.
The technical information that is collected includes any or all the fallowing types of data:
• Configuration
• Feature use
• Pert ormance
• Product logs
497
12-7 Joining CEIP
To enable CEI P from vSphere Client, select Menu > Administration > Deployment > Customer
Experience Improvement Program > Join.
Admi...Utr1tion
> Parbc1pating on VMw111e·s Customer E•penence ~provement Program (CEIP) enables VMware to provide you w111l a proactive, ret.able, and cons1sten1 vsphere env1ton<nen1 and e•per1ence e.amples of such
enhancernenlS can be seen 1n Ille lollowong features·
Solutions > • s«yt1ne HealIll for vsphere
• Skyline Heallll for vSAN
Otployment
• vCenler Server Update Planner
System Conr.gurai.on • vSAN Performance Analyucs
• Host l-larclware CompaLb1hty
• vSAN Support insight
Support > CE P collects conr.gurai.on, feature usage. and performance 1nlormallon. No per$0nally 1denbfoable 1nlormatoon s co!ected. All data 1s sanouzed and obfuscated pnor to being recer.'ed by VMnare
Cortific•l•s > Data col e<:Uon can be roabled or disabled at any ume and uses port 443 for commurncauon to vu1.vmw1re.com.
> If your organization blOcks Ille access to the internet by a lweY1a , you cans~ use \'Sphere Heallll Service ~can configure an HTTP pro• y tot1ow1ng Ille steps descnbed he<e
For add1t1ona infomiai.on on CEIP and the data collected, please see VMware's Trust & Assurance Center
~IP_r_o_gr_•_m_s_1_•t_us_=___________~__J_o_inoc1
____1 LEAVEPROGRA~
La!Ht Payload: /storage/anatytJCs/Cl!lp/aud1t AU01T OATA
Nodt T
sa·vcsa·01.vclass.IOcal Opro VA Mt
498
12-8 Running Proactive Tests
You can use proact ive tests to check the integrity of your vSAN cluster. These tests are useful
to verify that your vSAN cluster is working properly before you place it in product ion.
Proactive Tests
I I
RU N ASK VMWARE 1:3'
'
Pertormance
VM Creation Test CD Passed 09/09/2020, 7·44:19 AM
Pertormance Diagnostics
499
12-9 VMware Skyline Health
VMware Skyline Hea lth is the primary and most convenient way t o monitor vSAN health.
VMware Skyline Hea lth provides you with findings and recommendations to resolve problems,
reducing the time spent on resolving issues.
500
12-10 Online Health
Online hea lth includes vSAN Support Insight and Skyline Advisor.
vSAN Support Insight help vSA N users maintain a reliab le and consistent compute, storage, and
network environment. This feature is available when you join CEIP.
Skyline Advisor, included w ith your Premier Support cont ract, enhances your proactive support
experience with additional features and functiona lity, including automatic support log bundle
transfer with Log Assist.
v Online hea lt h
0 Advisor
501
12-11 VMware Skyline Health: vSAN Cluster
Partition
To ensure the proper operation of vSAN, all hosts must be able to communicate over the vSAN
network. If they cannot, the vSAN cluster splits into multiple partitions.
vSAN objects might become unavailable until the network misconfiguration is resolved .
Skyline Health
Last checked: 09/0 8/2020. 7:00:27 AM RETEST
IVSAN cluster partition I [ SILENCE ALERT
O A ll hosts have a vSAN vmkn1c con .. B sa-esxi-02.vclass local 1 51116e6a· 7 c37 -a495·8d4 2·00505601dS cs
O vSAN: Basic (unicast) connecllvity •. EJ sa-esxi-04.vclass local 1 511161c3·10c2· 7 d81·4 7 a9-00505601 d5c1
502
12-12 VMware Skyline Health: Network Latency
Check
The network latency check looks at vSAN hosts and reports warnings based on a threshold of 5
milliseconds.
If this check fails, check VMKNICs, uplinks, VLANs, physical switches, and associated settings to
locate the network issue.
O All hosts have a VSAN vmknic co.• 0 sa-esxi-02 vclass local D sa -esxi ·Ol vclass local 0 .98 5
503
12-13 VMware Skyline Health: vSAN Object
Health
This check summarizes the health state of all objects in the cluster.
You can immediately initiate a repair object action to override the default absent component
rebuild delay of 60 minutes.
~·O~vSA
~N_o_b_je_ct_h_ea_lt_h~~~~~>
~ Healthy 13 9b50575f-56fc-28ae·6258-00505601d5c5.
0 vSAN object fetmat health
504
12-14 VMware Skyline Health: Time
Synchronization
This check looks at time differences between vCenter Server and hosts. A difference greater
than 60 seconds leads this check to fail.
If this check fails, you should review the NTP server configuration on vCent er Server and the
ESXi hosts.
Skyline Health I Time is synchro nized across hosts an d VC I I SILE NCE ALERT I
Last checked 09/oe/2020, s 52 .10 AM RE TES T
Result Info
•
I v Cluster
I Host Ti me difference wit h VC NTPServ1ce Enabled Stat us Reason
505
12-15 VMware Skyline Health: vSAN Disk
Balance
This check monitors the balance state among disks. By default, automatic rebalance is disabled.
When automatic rebalance is enabled, vSAN automatically reba lances disks if a difference
greater than 30% usage is found between capacity devices.
Rebalance can wait up to 30 minutes to start, providing time for high-priority tasks such as Enter
maintenance mode and object repair to complete.
[~O_v_SA_N_o_
~k_s_
aia_nc_e~~~~? Maximum Disk usage 5%
506
12-16 VMware Skyline Health: Disk Format
Version
This check examines the disk format version. For disks with a format version lower than the
expected version, a vSAN on-disk format upgrade is recommended to support the latest vSAN
features.
vSAN 7.0 U1 introduces on-disk format version 13, which is the highest version supported by any
host in the cluster.
Skyline Health l
Last checked· 09/0S/2020, S38:34 PM RE TES T
IDisk format version I [ SILE NCE ALE RT
0 Advanced vSAN configuration ins... vSAN host Oisks w 1lh older fo rmat Cheek Result Recommenclal lon
507
12-17 VMware Skyline Health: vSAN Extended
Configuration
This check verifies the default settings for the Object Repair Timer, site read locality, customized
swap object, and large-scale cluster support.
For hosts with inconsistent extended configurations, vSAN cluster remediat ion is recommended.
The default clusterwide setting for the Object Repair Timer is 60 minutes. T he site read locality
is enabled, customized swap object is enabled, and large-scale cluster support is disabled.
508
12-18 VMware Skyline Health: vSAN
Component Utilization
This check examines component utilization for the entire cluster and each host. It displays a
warning or error if the utilization exceeds 80% for t he cluster or 90% for any host.
The deployment of new VMs and rebuild operations is not allowed if the component limit is
reached.
509
12-19 VMware Skyline Health: What if the Most
Consumed Host Fails
This check simulates a failure of the host with most resources consumed and then displays the
resulting cluster resource consumption.
Skyline Health
Last checked: 09/0S/2020, 5 3834 PM RE TES T
!w hat if the most co nsumed host fails I [ SILENCE ALERT ]
I9 What if the most consumed host ... > Read cache reservations 0% (O.OGB of O.OGB)
510
12-20 Review of Learner Objectives
• Describe how the Customer Experience Improvement Program (CEIP) enables VMware to
improve products and services
511
12-21 Lesson 2: vSAN Performance
Monitoring
• Explain how the writing of data generates I/ 0 traffic and affects vSAN performance
512
12-23 vSAN Online Performance Diagnostics
The vSAN online performance diagnostics tool collects performance data and sends it to
VMware for diagnostic and benchmarking purposes. VMware analyzes the performance data
and provides recommendations.
Perlormance d1agnos11cs analyzes previously execu1ed benchmarks It de1ects issues. si..ggests remed1a11on steps, and provides supporltng perlormance graphs for lur1her
1nsigh1. Select a desired benchmark goal and a ume range dunng which 1he benchmark ran. The analysis m ght 1ake some tme. depending on the duster size and the time
range chosen. '111s feature 1s not expected to be used for general evaluation ol performance on a production VSAN cluster. Learn more g
Utihzation
StoraQe C>velV ew
IBenchmark goal: I MAX IOPS v Time Range: LAST v 1 Hour(s) SHOW RESUl TS
Secunty MaxlOPS
VSAN v l5iue Max Throughput T More In lo
513
12-24 vSAN Performance Service
The vSAN performance service monitors performance-based metrics at the cluster, host, VM,
and virtual disk levels.
The pert ormance service is enabled by default. The performance history database is created
and stored as the StatsDB object in the vSAN datastore.
vSAN v
Stats object health O Healthy
SeMces
Stats Object UUI O 93f14a51·ce5d·20e8· 44f6·00505601d5c5
Disk Manaqemenl
Fault Domains suits object storage porcy vSAN Default Storage Policy
Datastore Shannq
Comp iance status O Compliant
Verbose mode Disabled
Be aware that the vSAN performance service object is not directly mounted into VMs and will
be listed as unassociated. This does not mean they are unused or safe to be deleted.
514
12-25 About 1/0 Impact on Performance
When analyzing performance, you must consider the sources of 1/0 traffic.
VM VM VM VM
vSphere vSAN
I
I
Back-End
Storage Traffic •• 1111 I ·: 1111 J • • • .. 1111
•... l ............. , ........................ l ..........•
I I I
I • • • • • • • • • • • • I
• SSD SSD SSD SSD • • • SSD SSD 1
I •
~~
• • • • • • • • • • • I
I I
I I
:I-------
I I
I
I
I
vSAN Datastore - - - - - - - I
1 I
•••••••••••••••••••••••••••••••••••••••••••••••••••••••
515
12-26 About vSAN Cluster Metrics
In addit ion to standard cluster pert ormance metrics, vSAN clusters record st orage I/ 0 met rics
for bot h VM and vSAN back-end traffic.
• Cluster
• Specific VMs
Perfo rmance
516
12-27 Cluster-Level Metrics for VMs
The chart displays cluster-level metrics from the perspective of VM I/ 0 traffic.
IOPSQ) Throughput©
o.s
27.50 t3/•
13 7S ~I•
-
0 0.00 81•
10·00 PM 10·1 S FM 10·30 P~I 11-00 FM 11):00 PM 11)·15 FM 11)·30 FM 11):45 FM 11-00 FM
Latency© Congestions©
IS 9$9 MS 10
',, ,.., s
Oms 0
IO:ISPM 10:30 PM 10:45 PM 11:00 PM 10:00 f>M 10:1SPM 10 30 FM 10.45 PM 11:00 PM
10:00 ' "'
OutstandSlg 10 <D
13
517
12-28 Back-End Cluster-Level Metrics
The chart displays cluster-level metrics from the perspective of the vSAN back end.
0 0.00 91•
10 00"" IOISN 10-45 , .. 11'.00PM 10 00"" 10:1 S PM 10 IO • M 10 4SPl.1 I LOO PM
10 ao ' "
ft.tad Throwghp111 - ,.syR< Rt•d Throughput Wr it• Thr oughp.,1 - Ritcowry Wrh• Throwghpvt
-------
uten<Y G .!'
10
I S4S m: s
0 ,., 0
10;00 FM 10:1 S f'M 10 ao"" 10:4S PM 1000 PM 10 15 PM 10 JO PY 10:4S PM ti OOFM
t.aad U t •t1<y - R.ryt'lc bad Latency Wrhe Lltency - RacowryWrite Latency ( 0119et1lon t
OutsUnd.-.g 10 <D
'
• 5
IOttl Pl.I 10 15 PM
10 ao ' " 104S PM 1100 PM
518
12-29 Throughput Comparison
Compare the charts for t he throughput metric of VMs and the back end.
Back-end throughput shows higher values when compared with VM throughput because 1/0 is
generated for writing data to mirror copies and object repair or rebuild traffic.
VMs
Throughput©
21 SO
13 lS
~/S
~/s
~~--~~--~~~~-. ---- ---------------
0.008/s
1000 PM 10 1S FM 10 30 PM II ~0 PM
BACKE ND
Ttlroughput (j) c. .. ,
20 23 llJh
0001/s
lOOO F\I 10. ISPM 10.30 PM IO~S Fl.I 11 .0 0 FM
519
12-30 IOlnsight
IOlnsight captures 1/0 traces from ESXi and generates met rics that represent the st orage 1/0
behavior at the VM DK level.
To start IOlnsight, select t he vSAN cluster and select Monitor > vSAN > Performance >
IOINSIGHT > NEW INSTANCE.
NEW INSTAN CE
520
12-31 Preparing an IOlnsight Instance
You select a VM or host to monitor all VMDKs associated with them.
Name the IOlnsight instance, and select the duration to run (default is 10 minutes).
The system limits IOlnsight monitoring overhead of CPU and memory to less than 1%.
Duration 10 m nutes
Target
521
12-32 Viewing IOlnsight Instance Metrics
After the IOlnsight instance completes the collection, you can view detailed disk-related metrics.
Resyncing Objects
Proactive Tests
•
Capac11y
v SA N-Cluster, 09/09/2020, 2:34 AM I C.Ompleted) •
•
Per1ormance
Per1ormance Diagnostics
09/09/2020, 2:48AM - 09/09/2020, 2:58AM (10 minutes) I View Metrics
Rename
I
Support
Rerun
Da1a Migration Pre-Check
Delete
Cloud Native Storage v
522
12-33 Host-Level Metrics for Disks
The DISKS tab displays performance metrics at disk and disk group levels.
You can use the Disk Group drop-down menus t o select individual cache or capacity disks, or
the entire disk group.
Tasks and Events > Time Range: LAST .., 1 Hour(s) [ SHOW RESULTS I
Hardware Heal1h
523
12-34 Host-Level Metrics for the Cache Tier
The Write Buffer Free Percentage chart indicates the amount of free capacity on the cache tier.
As the buff er starts to fill up, the destaging rate increases. This increase is shown in the Cache
Disk De-stage Rate chart.
If the write buffer-free percentage is less than 20%, artificial latency (congestion) is introduced
to slow down the incoming data rate.
t ;5 AM 1 50 A ! ~ 05 AM 2 20AM 2 ;5 AM
-
-
!\fl:.. t uf-f.;r Ftc• : .;,.:.t n:_.:_:-:
12.9i i ~
000;'
1 as At.t 1 50 AJ.l 2 OS AU 2 ZOAU 2 2SAM
524
12-35 Host-Level Metrics for Resync
Operations
Use the Resync IOPS, Resync Throughput, and Resync Latency charts to observe the impact of
resync operations on a disk group.
The charts display metrics for the following resync operation types:
• Policy change
• Evacuation
• Rebalance
• Repair
Resync OPS
90
1 S.O AM
-
-
F-a
Po
~
~
Ch• ri•
Cll.in; '
~;i;_:
t
-
-
~ -. :u• t10-
Ii - t ••IJ"
~Q• =
it• -
~; ::- -.l •nt• :i ;i;;;t
F•b ~ , rt . . .. ,..
• .. • tl"
.
2 nMi ~
0 00~ s
1 6> .UI 1 5.. AM 2~5 Ad 2 20AIA 2 as .ui
- :o cy C"z"i5 ii . :ul - iv on r lid
:ull" ll&b:i: 1!!-:0 lt ;1: q• r z,.. ~t:id
- ;~ cy C" z~~· n tc - :l~:::m,,., ss ::n ,\ntt - ~•t1' ll":: a \\r tt - " •: r lint•
Resync Latency
0 m:.
l i5 A 205 AM ;: .ZO All 2 ~5 AM
525
12-36 Host-Level Metrics for Network
Performance
Net work t hroughput is important to the overall health of the vSAN cluster.
The PHYSICAL ADAPTERS and HOST NETWORK tabs enable you to monitor physical NICs
and VMkernel adapters, respectively.
Performance
The perforn1ance statistics count all netvvork IOs processed in the network adapters used by vSAN .
3 52 1<13/s
9 {9 /20, 11; I 5 PM 9 f9 {20, 1 I :30 PM 9/9/20, I 1·45 PM 9 /I 0 f20, 12:00 A tu1 9(10/20,
526
12-37 VM Metrics
The VM tab shows the IOPS, throughput, and latency statistics of individual VMs.
The VIRTUAL DISKS tab shows metrics for each individual disk on the selected VM.
The Virtual Disk drop-down menu lists all the disks that you can select from.
Sum rnary Monitor Configure Permissions Datas tores Net\vorks Snapshots Updates
Tasks anc:I Events > Time Range: LAST " 1 Hour(s) ( SHOW RESULTS I
Utilization
527
12-38 Review of Learner Objectives
• Use performance views t o access metrics for monitoring vSAN clusters, hosts, and VMs
• Explain how the writing of data generates I/ 0 traffic and affects vSAN perf ormance
528
12-39 Lesson: vSAN Capacity Monitoring
529
12-41 Capacity Usage Overview
v
Capacity
vSAN
Virtual Objects
Proactive Tests
• Used 8.15 GB/199.97 GB (4.07%) Free space on disks 191.82 GB CD
Capacity
530
12-42 Capacity Usage with Space Efficiency
Deduplication and compression savings provide an overview of the space savings achieved.
Capacity
Capacity Overview
531
12-43 Usable Capacity Analysis
The Usable capacity analysis panel enables you to select a different storage policy and see how
this policy affects the available free space on the datastore.
The effective free space is half of the free space on disks when the vSAN default storage policy
is selected.
Capacity Overview
Use this panel to estimate the effective free space to a new \ivorkload with the selected
storage policy (not considering deduplication and com pression). ©
Change policy t o vSAN De1aul1 S1orage Policy v
532
12-44 Capacity Usage Breakdown
The capacity usage breakdown section provides detailed information about the type of objects
or data that are consuming the vSAN storage capacity.
System usage
533
12-45 Capacity History
The CAPACITY HISTORY tab displays changes in t he used capacity over a selectable date
range.
You can use the tab t o extrapolate future growth rates and capacity requirement s.
Capacity
overview
199 97 CB
99 98 CB
0 008
9(8(20, 8 55 Al' 9/8(20, 2 55 pu 9/8/20, 8 55 PM 9/9(20, 2 55 AM 9/9(20, 8 55 AM
534
12-46 vSAN Capacity Reserve
You can enable vSAN capacity reserve for the following use cases:
• Operation reserve
Enabling operation reserve for vSAN helps ensure enough space in the cluster fo r internal
operations to complete successfully.
Enabling host rebuild reserve allows vSAN to tolerate one host failure.
When reservations are enabled, and if capacity usage reaches the limit, new workloads cannot
be deployed.
Capacity
w CA PACITYH ISTORY)
!You can enable operations and host rebuild reserve. If= ONFIGUREJ I
535
12-4 7 Lab 16: Monitoring vSAN Performance
and Capacity
Monitor the vSAN cluster performance and capacity details:
536
12-48 Review of Learner Objectives
• Use vSphere Client to monitor the vSAN capacity utilization
537
12-49 Key Points
• VMware Skyline Health actively tests and monitors the vSAN environment.
• You must regularly analyze performance charts re lated to vSAN clusters, hosts, and virtual
disks.
Questions?
538
Module 13
Troubleshooting Methodology
13-3 Importance
The process of troubleshooting is key to restoring application functionality and performance, as
well as objects that have become inaccessible.
539
13-5 PNOMA Troubleshooting Framework
VMware Global Support uses the PNOMA troubleshooting framework when t roubleshooting
vSAN issues.
I: SSD :II: SSD :I I: SSD :11: SSD :J I: SSD :II: SSD :I I: SSD :II: SSD :I I: SSD :II: SSD :1 I: SSD :II: SSD :I
Disk Group Disk Group Disk Group Disk Group Disk Group Disk Group
The following vSAN components are listed in the PNOMA troubleshooting framework:
• Physical layer:
LLOG: The logical log. LSOM uses LLOG for log recovery on reboot.
The LLOG and the physical PLOG share t he write buf fer.
When a data block arrives in the write buffer, a corresponding entry for t he data block
is kept in the LLOG fo r log recovery on reboot.
However, aft er the data block is in the write buffer, vSA N must calculate where t o
place this block o f dat a on the magnetic disk when dest aging in hybrid configurat ions.
To calculat e, it consult s the fi lesystem on the magnetic disk. This placement process
could cause the filesystem t o generat e it s metadata updates. For example, the logical
block address t o physical locat ion mapping. The I/ 0 is int ercepted and buffered on the
SSD and a record in kept , t he PLOG. Aft er the physical locations for the dat a blocks
from t he filesystem are obtained, vSAN stores t he location in the PLOG. A t t his point ,
the LLOG entry is no longer kept.
540
For more information about the LLOG and PLOG, see vSAN Monitoring and
Troubleshooting at https://docs.vmware.com/en/VMware-vSphere/7.0/vsan-701-
monitoring-troubleshooting-guide.pdf.
• Network layer:
RDT: The Reliable Datagram Transport is the communication mechanism within vSAN. It
uses TCP at the transport layer. It also is responsible for creating and destroying TCP
connections (sockets) on demand.
• Object layer:
CMMDS: The Cluster Membership, Monitoring, and Directory Services is the vSAN
record keeper.
• Management layer:
DISKLIB: The DISKLIB invokes object creation. vSAN objects created are vdisk,
namespace, vswap, or vmem.
OSFSD: The OSFSD, also called the OSFS-Daemon, is responsible for the object
creation and query tasks w ithin the vSAN filesystem.
VSANVPD: vSAN uses the vSANVPD or vSAN VASA Provider Daemon to expose
SPBM, RAID, fault-tolerance, object space reservation, and striping operations to
vCenter Server over port 8080. If the vSANVPD services are down, you cannot create
VMs or change policies for existing VMs.
• Application layer:
VPXD: If the VPXD, also know as the vCenter Server daemon service, is stopped, you
are unable to connect to vCenter Server through the vSphere Client.
VPXA: The VPXA, also called the vCenter Server agent, acts as an intermediary
between vCenter Server and hostd, allowing communication between the vCenter
Server and ESXi hosts. VPXA is the communication conduit to hostd, w hich in turn
communicates with the ESXi kernel.
HOSTD: HOSTD, also called the vmware-hostd management service, is the main
communication channel between ESXi hosts and VMkernel. HOSTD runs in the service
console and is responsible for managing most ESXi operations. It has v isibility to VMs
registered on that host, the LUNs, and VMFS volumes visible by the host.
541
13-6 PNOMA vSAN Physical Layer
I: SSD :II: SSD :I I: SSD :11: SSD :I I: SSD :11: SSD :1 I: SSD :II: SSD :1 I: SSD :11: SSD :1 I: SSD :11: SSD :1
Disk Group Disk Group Disk Group Disk Group Disk Group Disk Group
• Host hardware
• Storage hardware
• Physical disks
The LSOM independently runs on each host and manages data placement, access, and disk
health.
You can find the following information on the vSAN physical layer:
• Whether hardware listed in the VMware Compatibility Guide has been used
• Whether validated and correct versions of drivers and firmware are being used
542
13-7 Activity: vSAN Physical Layer
I: SSD :II: SSD :I I: SSD :11: SSD :I I: SSD :11: SSD :1 I: SSD :II: SSD :1 I: SSD :11: SSD :1 I: SSD :11: SSD :1
Disk Group Disk Group Disk Group Disk Group Disk Group Disk Group
543
13-8 Activity: vSAN Physical Layer Solution
I: SSD :II: SSD :I I: SSD :11: SSD :I I: SSD :11: SSD :1 I: SSD :II: SSD :1 I: SSD :11: SSD :1 I: SSD :11: SSD :1
Disk Group Disk Group Disk Group Disk Group Disk Group Disk Group
• Host failure
• Offline controllers
• Offline disks
You can find information about the first three layers of PNOMA in VMware Skyline Health:
• Host failure: You can determine if all hosts are in the vSAN cluster and contributing.
• Networking: You can validate that all vSAN hosts are correctly communicating throughout
the environment.
• Physical disk: You can validate that vSAN storage devices are healthy and available. You
can also determine if the physical disks have reached capacity or if storage space is
available.
• Storage limits: You can view if your vSAN environment is fu ll at the disk or the cluster level.
You can also view the informat ion in VMware Skyline Health by using the following commands:
• You can verify whether all hosts are in the cluster from the command line by running the
esxcli vsan cluster get command.
• You can verify if all disks are in the cluster and healthy by running the es x c 1 i vs an
debug dis k summary ge t command.
• You can verify if any of the disks in the cluster are full or if limits, such as Max Components,
Used Disk Space, or Reserved Read Cache Size, are reached by running the esxc l i
vs an d ebug limi t ge t command.
544
13-9 PNOMA: vSAN Network Layer
• Jumbo frames
• NIC teaming
• Unicast transmission
• Data flow
You can find the following information on the vSAN network layer:
The network layer manages how host_ 1 talks to host_2, and network configuration. You can
use the esxcli vsan network l i s t command to list the VMkernel port being used for
vSAN.
You verify traffic in both directions to eliminate issues caused by a firewall, either by design or in
error.
To validate bidirectional traffic, use the vmkp ing - I command to specify the vmkO interface
and enter the IP address f rom one of the destination nodes. For example, vmkping -I
vmkO 10 . 21 . 21 . 181. The command response shows you if the communication path is
established.
For vSAN versions 6.5 and earlier, or environments using multicast, additional troubleshooting is
necessary.
545
13-10 Activity: vSAN Network Layer
546
13-11 Activity: vSAN Network Layer Solution
• Upstream networking issues such as physical hardware failure, power events, or routing
misconfigurations
• Network inconsistencies such as CRC errors, dropped packets, and buffer overrun or
underrun
Issues that are seen in the network layer typically happen because of configurations. When
troubleshooting at the network layer, test network configurations in the following order:
2. The VMkernel
4. The uplinks
You have a four-node environment with two uplinks configured for each ESXi host for
redundancy. One port is configured as an access port, which allows tagging for the one VLAN.
The second port is configured as a trunk port w ith no native VLAN. A situation occurred, causing
the ports to f lip from the access port to the trunk port and resulting in a vSAN outage because
communication to vSAN was now broken. The correction required that the ports be returned to
their original configuration, resulting in the recovery of vSAN.
547
13-12 PNOMA: vSAN Object Layer
Object
I CMMDS I
The vSAN object layer includes:
• CMMDS: Tracking of vSAN hosts, disk groups, objects, network configurations, and policies
You can find the following information on the vSAN object layer:
• Whether the objects are on the same disk, disk group, host, or spread out in the vSAN
environment
• The type of policy assigned to the object, for example FTT=1, FTT=2, or FTT=3
• The DOM process handles the initial 1/0 requests that come from vSAN.
Example:
1. A VM must read or write to or from a file. The host recognizes that the VM is on vSAN
and passes the request to the DOM.
2. The DOM receives the request and walks through the path that allows the VM to talk to
the vSAN disks.
3. The DOM takes the read or write request and passes it to the correct party.
• The CLOM is responsible for ensuring that all objects understand their storage policy and
master storage policies. The CLOM ensures and validates that objects match with their
assigned policies, including faults to tolerate, policy compliance, and RAID levels.
• The CMMDS manages everything in the vSAN environment, including hosts, disk groups,
disks, objects, network configurations, policy configurations, and many other things. If the
object does not exist in the CMMDS, it does not exist in vSAN.
548
13-13 Activity: vSAN Object Layer
_I ~~~~-o_
oM~~~~--'• ~_I~~~~c_L_
OM~~~~-1
Object
I CMMDS I
What can go w rong at the vSAN object layer?
549
13-14 Activity: vSAN Object Layer Solution
Object
I CMMDS I
What can go wrong at the vSAN object layer?
• VM object is inaccessible
• Performance issues
The most common causes of failure and performance issues seen in vSAN environments are:
For example, an administrator places an ESXi host into maintenance mode. During this period, a
disk fails on a different host within the vSAN cluster. These separate actions caused inaccessible
objects. The ESXi host, currently in maintenance mode, must be taken out of maintenance mode
to restore the inaccessible objects.
In a similar scenario where both hosts have disk or hardware failures, a double fa ult would occur,
which might result in data loss.
550
13-15 PNOMA: vSAN Management Layer
Management I 01 s KLIB / osFs o / vsANVPD I I 0 1s KLIB / osFs o / vsANVPD I I 01s KLIB / osFs o / vsANVPD I
The vSAN management layer includes:
• Object creat ion, such as vdisk and namespace objects through DISKLIB
You can find t he following informat ion on the vSAN management layer:
• VM power stat e
The management layer sees all st orage types in the same way.
For example:
• Applications do not see vSAN, they simply read from or writ e to an object, without requiring
an underst anding of vSAN.
• VMs simply use a simulat ed folder structure. Data st ored on vSAN does not resemble what
can be seen in the datastore browser.
• vSphere A PI for Storage Awareness API funct ions more at the applicat ion layer. vSphere
A PI for St orage Awareness A Pls are used by vCenter Server to t alk to storage appliances.
551
13-16 Activity: vSAN Management Layer
Management I 01 s KLIB / osFs o / vsANVPD I I 01s KLIB / osFs o / vsANVPD I I 01s KLIB / osFs o / vsANVPD I
What can go wrong at the vSAN management layer?
552
13-17 Activity: vSAN Management Layer
Solution
Management I 01sKLIB / osFso / vsANVPD I I 0 1s KLIB / osFso / vsANVPD I I 0 1sKLIB / osFs o / vsANVPD I
What can go wrong at the vSAN management layer?
Storage policies are created in vCenter Server using the vSphere Client. Storage policies
created for objects on vSphere API for Storage Awareness appliances are passed from vCenter
Server to the ESXi host using vSphere API for Storage Awareness.
Various issues can prevent the creation of storage policies for objects on vSphere API for
Storage Awareness appliances. The most common of these issues are:
• Services associated with vSphere API for Storage Awareness did not start or are not
•
running.
The Performance service is an object. If the Performance service does not start, validate the
Pert ormance service object by mounting the object to verify that the object is accessible.
If you are unable to create a VM, consider that VM creation also involves the PNOMA application
layer. Consider validating communications from VPXA to VPXD that occur between vCenter
Server and the ESXi server that will host the VM.
553
13-18 PNOMA: vSAN Application Layer
When troubleshooting at the vSAN application layer, consider the following points:
• When hosts are nonresponsive, verify if the host services have been restarted.
• Verify if the vSphere API for Storage Awareness providers are registered.
The latest copy of the vSAN release catalog provides up-to-date information for VMware
Skyline Health checks, and for the drivers and firmware for the vSAN controllers.
554
13-19 Activity: vSAN Application Layer
555
13-20 Activity: vSAN Application Layer Solution
If a host is not responding in vCenter Server, verify that hosted VMs can still access their
associated storage. When a host is not responding to vCenter Server, it does not mean that it is
absent from the vSAN cluster. vCenter Server is not required for vSAN to work. The benefit of
vCenter Server for vSAN is the ease of operation. Communication issues from vSphere to
vCenter Server do not mean that the host is not working from a vSAN perspective. If this
scenario occurs, do not cold-reboot the ESXi host. Powering off the host can cause a second
outage or a double fa ult.
VMware Skyline Health runs at the application layer but shows you alarms and alerts from the
first three layers. Troubleshoot issues from the physical, network, and object layers as far as you
can go using the health checks.
vSAN objects left behind after delete operations are random issues that are caused by
applications trying to bypass vCenter Server and vSAN. Occurrences of this type of problem
have been reduced with newer applications that understand how to talk with vCenter Server.
In a healthy vSAN environment, vCenter Server should see the vSphere API for Storage
Awareness provider's status as Online and Active.
556
13-21 vSAN Layers: Creating the vSAN Cluster
Object
I CMMDS I
.. --------------------------------------------------------------------------------------------------------------------------------·
Network Reliable Datagram Transport
·--------------------------------------------------------------------------------------------------------------------------------·
LSOM LSOM LSOM
LLOG I PLOG LLOG I PLOG LLOG I PLOG
I: SSD :11: SSD :1 I: SSD :11: SSD :1 I: SSD :11: SSD :1 I: SSD :11: SSD :I I: SSD :11: SSD :1 I: SSD :II: SSD :1
Disk Group Disk Group Disk Group Disk Group Disk Grou p Disk Group
The vSAN physical, network, and object layers are involved and interact during the creation of
the vSAN cluster.
557
13-22 Troubleshooting by Layer and
Importance
vCenter
Server
ESXi
1111 0 1111
• vCenter Server:
VMware Skyline Health identifies risks and monitors, troubleshoots, and diagnoses
cluster component problems.
• ESXi:
ESXi connectivity issues can be diagnosed on the source, such as vmnic or vmk.
Use esxtop to find high loads on certain layers, disks, CPU, or memory.
• VMs:
Lower layer issues, such as CPU, memory, storage, and network, affect VMs.
558
13-23 Troubleshooting Process: Defining the
Problem
Troubleshooting is a systematic approach to identify the root cause of a problem and the best
solution to resolve the problem.
In this methodology, the troubleshooting process has three distinct phases. Defining the problem
is the first phase.
559
13-24 Defining the Problem (1)
A system problem is a fault in a system, or in one of its components, that negatively affects your
vSAN environment.
• Configuration problems
• Resource contention
• Hardware failures
Examine the effects of the problem and get an idea of how many systems are affected.
Through examination, audits, and logs, verify that your configuration has not changed.
Consult references such as release notes, VMware knowledge base articles, and
community forums to determine whether the problem is documented.
560
13-26 Defining the Problem (3)
For problems with VMs that are inaccessible, fail to start, or fail to respond, use a standard
approach by addressing the fa llowing questions:
• Does the cluster show aberrant behavior, aside from the affected systems?
Most VMware Support cases are seen in the physical or network layers.
561
13-28 Defining the Problem (5)
Most support cases arise from suboptimal configurations. Review some of t he more common
types of incidents:
• Ensure that t he hardware used for the vSAN cluster is listed as compliant with the VMware
Compatibility Guide:
VMware provides best-effort support for noncompliant hardware, including driver and
firmware versions.
For storage controllers, this support also includes such things as the mode of the
controller, its queue depth, and other variables.
• Network configuration:
• Build versions:
Your vSAN cluster should not include hosts w ith different versions of ESXi.
562
13-29 Activity: Defining the Problem
Your instructor leads a discussion to answer questions based on the screenshot of the problem.
Storage Overview
v 0 fj New Virtual Machine O Reduced ava lab11ity with no rebu Id
Sec·. ty
0 c;;;.) Hard disk 1 O Reduced ava lab1hty with no rebu Id
vSAN v
0 1::J VM home O Reduced ava !ability with no rebu Id
Skyi1ne Health
0 Virtual machine swap obiect O Reduced ava lab11ity with no rebutld
V11tual Objects
-
563
13-30 Activity: Defining the Problem Solution
Your instructor leads a discussion to answer quest ions based on the screenshot of t he problem.
1. Is this problem specific to a cluster o r a host? The problem is specific to host sa-esxi-02
but also affects the vSAN cluster.
BJ
r. sa-esx1·02.vclass.local (Not responding) >
O Reduced &'la Jabil ty w th no rebuild <4 VMS 1
t• sa-esxi-03 vclasslocal
vSphere DRS
* Ctllers I
Storage Overview
v 0 ~ New Virtual Machine O Reduced ava !ability with no rebu Id
Secunty
O G:;\ Hard disk 1 O Reduced ava lab1hty with no rebu Id
vSAN v
O tJ VMhome O Reduced ava !ability with no rebuild
Skyhne Health
0 Virtual machine swap ob1ect O Reduced ava lab1hty with no rebuild ._._
Virtual Obje{.ts
-
564
13-31 Troubleshooting Process: Identifying the
Root Cause of the Problem
Identifying the root cause of the problem is the second phase of the troubleshooting process.
565
13-32 Identifying the Root Cause
The vSphere Client contains powerful t roubleshooting and informat ional t ools for determining
the fallowing it ems:
Network
Disk
Object data
• Cluster capacity
The vSAN health service performs proactive health checks and the fo llowing tests:
• VM creation
• Storage performance
566
13-33 Identifying the Root Cause: Health
Checks
Because health check tests are weighted, a single problem can cause multiple health check
failures. Fixing t he problem that causes the first failed test often resolves other failed tests.
On the Health page, click the selected alarm or alert. Click Info fo r more information about the
health item.
For more information, click Ask VMware to open a VMware knowledge base article about the
selected alert or error.
0 Q 11 2 Q SA-vSAN-01 AC110. . v Skyline Health (last checked: 03/25/2020. 11:07 19 PM) Skyline Health (Last checked 03/25/2020. 11:07'19 PM) I llUUf I
..., 0 w-.-..cw.01 YClllHJoc• A vSAH 8ulld Re<Ol'N'Mndatoon v~N Build Recommendation Engine Health
v [)J SA oc;01 J
) 0 UGMl'HOiti .. , t!'.> .SAN Buold RKonvMndallOn- l'SSUl'S Info X
> CJ ~\HQ<
v Jt! SA..SAN-01 v
> a sa """ ~
Tag. v 0 Onl...,. health (Otsabladl
(D Advise<
•
Skyline Health (Last checked: 03/25/2020. 1t07.tb=>M> I AUUT I
vSAN Budd Recommendation Engine Health
567
13-34 Identifying the Root Cause: Questions to
Consider (1)
Consider the fallowing questions when you try to identify the root cause:
0 Data
0 Clustff
568
13-35 Identifying the Root Cause: Questions to
Consider (2)
• Is the host located in the cluster?
569
13-36 Identifying the Root Cause: Questions to
Consider (3)
• What are the drive states for the hosts?
570
13-37 Identifying the Root Cause: Questions to
Consider ( 4)
• Are the associated objects of the VM active?
Ptff~nce v
Ty.,. ComPontn\ Statt Hos I ,... •
Tos-:~
I ..., CJ VM homo (RAIO 1)
I
Events Component $ ActlW ll se-i!Sll 01 vdass.tocal
UtJizaoon
VSAN v I Component
I • AbSent r. se OSJ 02 vcbss local
' I I •
571
13-38 Identifying the Root Cause: Questions to
Consider ( 5)
• Is aberrant behavior seen in VMware Skyline Health for the cluster?
Issues and Alarms > Skyline Health (Last checked : 05/02/2020, 1:05:18 AM ) RETEST I
Performance > .., O Network •
vSAN v
0 vSAN: MTU check (ping with large packet size)
Sky1 ne Health
Virtual Ob1ects 0 vMot1on: Basic (unicast) connect1v1ty check
Physical Disks
0 vMot1on MTU check (ping with large packet size)
Resync1ng Ob1ects
•
572
13-39 Troubleshooting Process: Resolving the
Problem
Resolving the problem is the third phase of the t roubleshooting process.
573
13-40 A voiding and Resolving Common
Problems (1)
Only a handful of the problems reported to VMware Support occur frequently enough to merit a
discussion on how to avoid or resolve them.
574
13-41 A voiding and Resolving Common
Problems (2)
Problem:
• VMs become unavailable as or after the ESXi host enters maintenance mode.
Avoidance:
• Verify the status of the vSAN cluster and the health of its objects before you start
maintenance tasks.
• The best practice is to work on only one host at a time. Ensure that no resync activity exists
bet ore you begin maintenance work on other hosts.
Resolution:
• Taking the host out of maintenance mode typically enables unavailable VMs and objects to
regain quorum and become available.
575
13-42 A voiding and Resolving Common
Problems (3)
Problem:
Avoidance:
• Ensure that all VM components, such as the namespace and the VMDK objects, are
available.
Resolution:
• Remove hosts from maintenance mode, if applicable. Whether this operation is helpful in
solving the issue depends on the policies in place, as well as the evacuation mode selected.
576
13-43 A voiding and Resolving Common
Problems ( 4)
Problem:
Avoidance:
• Verify the status of the vSAN cluster and the health of its objects before you begin any
maintenance task. Inaccessible objects can block maintenance mode operations.
• Use the vSphere Client to examine the VM storage policy that is used for your VMs. Modify
them, if necessary, to allow the host to enter maintenance mode.
Resolution:
• If the problem persists after you try these steps, contact VMware Support.
577
13-44 A voiding and Resolving Common
Problems (5)
Problem:
• Erratic performance is seen, including disks or disk groups intermittently going offline.
Avoidance:
• Ensure that the HBA driver and firmware, solid-state drives, and capacity disks are at
supported levels according to the vSAN Compatibility Guide.
• Verify new third-party updates and hardware against the vSAN Compatibility Guide.
Resolution:
• For a controller or disk issue, use maintenance mode to try to isolate the problem host.
• Events, such as disk failure or vSAN congestion, might contribute to the problem.
Congestion on the vSAN network is not inherently bad but can be an indicator of a problem
when it is prolonged.
• Contact VMware Support if you are unsure about the appropriate resolution.
578
13-45 A voiding and Resolving Common
Problems (6)
Problem:
Avoidance:
• Do not manually delete or create disk groups during the upgrade, because these actions
disrupt the on-disk upgrade workflow.
Resolution:
• During the upgrade, objects and the disk format are upgraded in a sequential process.
Failures that occur during the upgrade result in unique scenarios. Contact VMware Support
for corrective steps.
579
13-46 A voiding and Resolving Common
Problems (7)
Problem:
Avoidance:
• Verify correct network settings and connectivity between all hosts in the cluster before
enabling vSAN.
Resolution:
580
13-4 7 A voiding and Resolving Common
Problems (8)
Problem:
Avoidance:
Resolution:
• VMware Skyline Health check failures often correlate w ith specific vSAN activities.
• Review the VMware Skyline Health check details to understand whether a corrective action
•
1s necessary.
• The VMware knowledge base has articles on each VMware Skyline Health check and what
failures indicate.
For links to these articles, see VMware knowledge base article 2114803 at
https:/ /kb.vmware.com/s/article/2114803.
581
13-48 A voiding and Resolving Common
Problems (9)
Problem:
Avoidance:
• Plan for the fact that VM storage policies affect the allocated space in the vSAN datastore.
Policies that include multiple failures to tolerate are especially storage-intensive.
• Editing storage policies in use can affect the datastore well beyond the estimated final
consumption as the changes are committed. New objects must sometimes be created
before old ones can be removed to free up space.
Resolution:
582
13-49 Review of Learner Objectives
• Use a structured approach to solve configuration and operational problems
583
13-50 Key Points
• The vSphere Client is the main t ool for monitoring the healt h and performance o f your
vSAN cluster.
• The health service act ively tests and monitors the vSAN environment.
• You can use ESXC LI commands to view information about your vSAN environment,
including information not available in the vSphere Client.
Questions?
584
Module 14
Troubleshooting Tools
14-2 Importance
Although vSAN is primarily configured and managed through the vSphere Client, vSAN has
additional troubleshooting tools.
585
14-4 Lesson 1: VMware Skyline Health
• Describe the use of VMware Skyline Health to identify and correct problems in vSAN
586
14-6 About VMware Skyline Health
All customers w ith active support for vSAN 6.7 and later are entitled to VMware Skyline Health.
• Configuration
• Patches
• Upgrades
• Security
Skyline Health
Key Capabilities
• vSphere and vSAN findings
• Available in vSphere Client
• Supports vSAN 6.7 and up
VMware Skyline Hea lth for vSAN provides findings based on VMware Skyline Health data from
thousands of vSAN deployments. It does not require the Skyline Collector w hich means t hat no
data need be sent t o VMware to receive the benefit s o f VMware Skyline Health.
VMware Skyline Hea lth findings include rules based on VMware knowledge base articles and
best practices. To get the most out of VMware Skyline Health, vCenter Server must be
connected online and enrolled in the Customer Experience Improvement Program (C EIP).
However, customers can st ill receive some health checks offline.
VMware Skyline Hea lth offline tests run hourly. Of f line t ests do not require active support or
CEIP enrollment.
VMware Skyline Hea lth replaces Health in the vSAN U I and contains both VMware Skyline Health
findings as well as the vSAN Health summary. VMware Skyline Health is available wit h vSphere
6.7 P01 (or vSAN 6.7 U3a) and later.
587
14-7 Accessing VMware Skyline Health
You access VMware Skyline Healt h using the vSphere Client by selecting Skyline Health under
vSAN for a selected vSA N cluster.
~ SA-vSAN-01 A CTION S v
vSphere DRS > Skyline Health (Last checked: 0 4/20/2020, 11:23:50 PM) [ RETEST ]
vSphere HA >
~ Online health (Last check 57 minute(s) ago)
.
Resource Allocation >
Ut zatlon
9 Network
Storage Overview
vSAN v
9 Physical disk
Skyline Health ~I
Vlftual Ob.iects e oata
Physical Disks
Resync1ng ObJects 9 Cluster
Proactive Tests
capacny
Performance
9 Capacit y utilization
Performance Diagnostics
Suppe>rt 9 Hardware compat1b1h t y
Data Migration Pre-check
VMware Skyline Hea lth provides proactive findings and recommendat ions to avoid problems
before they occur, reducing the t ime spent on resolving support requests.
588
14-8 VMware Skyline Health Check Categories
VMware Skyline Hea lth checks are sorted int o categories that cont ain individual healt h checks.
Online Health Monitors vSAN cluster health and sends failed health check to the
VMware analytics back-end system for advanced analysis
Capacity Utilization Monitors vSAN free disk space to ensure that capacity use does
not exceed the t hreshold setting
Hardware Compatibility Monitors t he cluster component s to ensure t hat t hey are using
support ed hardware, soft ware, and drivers
VMware Skyline Hea lth includes several health check categories. Many checks have
preconfigured health check tests that run every hour to monit or, t roubleshoot , diagnose the
cause of cluster component problems, identify issues in t he environment , and avoid problems
before they occur.
589
14-9 VMware Skyline Health for vSAN
VMware Skyline Health includes preconfigured health check tests to monitor, troubleshoot, and
diagnose the cause of vSAN cluster problems. It also identifies potential risks.
H 12 a ~ a SA-vSAN-01 •Ct•o•S " On the Skyline Health page, click vSAN Health Details, or the alarm or
-O
v
~
w vcsa-01 ...rtass k>Ca Sunvnary "' Hosts , alert, and then click Info for more information about the health item.
v [h SA·OC·01 (j) vS/<N he th alNm Al llOStS ,,,,.,. I \/SAN vml<tllC Con!>QYl'ICI'
) CJ MGMTHosts (j) \ISAN health a!Mm '\/SAN Clustet pan t10t! Skyline Health (Last checked: 04/20/2020. 7:2l:S2 PM) Skyline Health (Last checked: 04/20/2020. 7:21:52 PM) I •UESI I
) CJ Wotl'IO$SHO$tS
) e SA vSAN OI 0 NtlWOrk O Netw0<k vSAN cluster partition
)Q SB-\/SAN-01
Related Obteets x
© vSAN ckrster pa1 tittOn - -...~ I<J: vSAN cluiter ~rt1tion Siltnce Alert
vSphe<eHA
© A hosts heve a vSAN vmkn.c configured (j) A• tiosts have a vSAN vrr«noc ....
Tags "' e Hosts dlSConncK.tfld from VC 0 Hosts doconne<ted from VC
··-
Network vSAN cluster part1t1on
VSANHealth
e
-
Network klt"'1Cy check
x
I© vSAN cluster partition I A~k VMwere I
To~·~ iVi\ONly. e vSAN hosts n'IU\t ~ dble
0 All host). '4\IV a Y'SAN vmknic
For more information, click Ask VMware to open a VMware to c::onvnun1catct ovvr both muttic•st and urvcast f tlwy
carmot. a vSAN ctuster w t.?L! r.to multiple part1tJOnS. l•
Hl»tS dlSConrtKl@d from VC
knowledge base article about the selected alert or error. sub groups of hosts that can comm~ote but not to other
0 •SAN B•SIC (umcast) C""'*'' sub g<OUPS. When !hot happens, vSAN objects mogM
btocome i..nava ~bl• until ttMt network rnscoo'tg<JfatJOn is
0 •SAN M rv che<I. (ping w ·~.. resolved
VMware Skyline Health checks all aspects of a vSAN cluster. VMware Skyline Health performs
checks on several items, such as hardware compatibility, network connectivity, storage device
health, and cluster health.
Using VMware Skyline Health, vSAN administrators can ensure that the vSAN deployment is
fully supported, functional, and operational. Administrators can also receive immediate
indications to a root cause if a fai lure occurs.
3. Under vSAN, select Skyline Health to review the different vSAN health check categories.
4. If the Test Result column displays the Warning (yellow) or Failed (red) icon, expand the
category to review the results of individual health checks.
In the Info section, you can click Ask VMware to open a VMware knowledge base article
that describes the health check and provides information about how to resolve the issue.
590
14-10 Online Health Checks
vSAN 7.0 has a built-in online health check capability that monitors vSAN cluster health and
sends the collected data to the VMware analytics back-end system for advanced analysis.
vSphere DRS > Skyline Health (Last checked: 04/20/2020, 11:23:50 PM) I RETEST I
vSphere HA >
v O Online health (Last check: 57 minute(s) ago)
..
Resource Allocation >
G) Advisor
Uti izalion
Storage Overview
• vSAN Support Insight
Security
Skyl ne Health
0 Physical network adapter link speed consistency
Virtual Objeets
Physical Disks 0 Patch available for critical vSAN issue for All Flash clusters with deduphcauon enabled
Resync1ng Ob1ects
Proactwe Tests • vSAN max component size
capacity
Performance
Pertormance 01agnost1cs
O Network
Support
Data M grauon Pre-check 0 Physical disk
•
You must participate in the Customer Experience Improvement Program (CEIP) to use online
health checks.
The online health check feature improves as data about customer implementations is collected
and issues are documented. The cloud checks provide a link to a related VMware knowledge
base article for the issue detected so that customers can solve the issue without the need to
contact VMware Technical Support.
591
14-11 vSAN Release Catalog Up-to-Date
Health Check
The vSAN release catalog up-to-date health check object verifies the age of the vSAN release
catalog.
The vSAN release catalog is used for vSAN build recommendations. The catalog shows
warnings or errors when it is older than 90 or 180 days. The vSAN release catalog is updated
with new releases or critical patches.
VITI vSphere Client M··, v - , . . · C ") v 1.1,1,· ,., ,., 1. ,,'J""-<f r1f. • t.1 v _-:-
date Health check object and Storage~w (i) vSAN buold recomlTl<'Odauon update From File S ence Al&rt
R<tsyncing Objects
Proact1Ve Tests
From the Health and Performance pane, you can easily update the information in the vSAN
release catalog. If t he environment has Internet connectivity, updates can be obtained directly
from VMware. Otherwise, updates are obtained in the vSAN release catalog. Updates can be
downloaded as a file to enable offline updates.
For more information about the vSAN release catalog up-to-date health check object, including
the vSAN release catalog downloaded link, see VMware knowledge base article 58891 at
https:/ /kb. vmware.com/ s/ article/58891.
592
14-12 Scenario: Troubleshooting Network
Health Issues (1)
In this example, the Skyline Health check network category has two tests that failed.
0
~
0 SA-vSAN-01 AcT10Ns v
Surr '""' y Monitor Confi<Jurn Perm1s~ons Hosts V"" 0"' ~t'""" II' t ,..,i,c Undates
0 sa vcsa 01 vclass local
v [b SA·OC·OI
issues and Alarms Skyline Health (Last checked: 04/20/2020, 7:21:52 PM) I llE TEST I
) CJ MGMTHoslS
> r:::J W1tnessHosts AH Issues v O Network •
) • SA 11t:.AN 01 Triggered Alarms
) f2 SB-vSAN-01 Performance v
0 Hosts with conncct1v1ty issues
Monitor the Health pane regularly. Expand the test category to v iew the individual tests. Select a
test to view detailed test results. You can click Retest to manually run all tests. Otherwise, the
tests are run every 60 minutes.
Identify the test categories that do not have a status of Passed. If an issue is detected in the
environment, a result of Failed or Warning appears next to the test category in the Health pane.
Maximizing the test category displays t he specific tests that failed or produced a warning.
The example shows the Monitor tab of a vSAN cluster. The vSphere Client reports that two
vSAN health alarms were triggered.
593
14-13 Scenario: Troubleshooting Network
Health Issues (2)
In this example, the network category has two tests that failed:
For more information about a failed health check test, select the individual test.
Clicking a specific VMware Skyline Health check test provides more details about why the test
failed or produced a warning.
594
14-14 Scenario: Troubleshooting Network
Health Issues (3)
I0 Hosts with connect1v1ty lssuc·i::J Hosts with no vSAN vmkn1c present Info x
0 vSAN cluster partition A sk V Mware
You can view details of each test that did not pass in the Info panel. The Info panel explains the
health check to help you identify a possible root cause of the issue. For a more detailed
explanation, click Ask VMware.
595
14-15 Scenario: Troubleshooting Network
Health Issues ( 4)
Clicking Ask VMware takes you to a VMware reference.
In t his example, you are taken to VMware knowledge base art icle 2108062.
Sf ARCtt
Additional related articles
are available on the right
side of the page.
vSAN Health Service - Network Health - All Related Resources
0 KB • vsAN 11 "1th s. ~ o Haroware
hosts have a vSAN vmknic configured compaUOlilty • vSAN HCL DB Auto
Update 2t'6132)
(2108062) 0 KB • VSAN Health Service • I/SAN HQ
Heatth - VSAN HQ. OB up-to-date
(2109870)
To partlclpalo In a vSAN duster. and f0<m a single partition of !Ully connected ESXI hosts. each ESXI hOSt In a VSAN f-1<.allh . Hosts W1th connoctMty
Clicking Ask VMware takes you to a VMware knowledge base article t hat describes t he test and
probable causes, and o ffers advice on how to resolve the issue.
596
14-16 vSAN Capacity Check
The CAPACITY USAGE and CAPAC ITY HISTORY panels display the Capacity usage, Usable
capacity analysis, and Capacity hist ory views at t he vSAN cluster level.
VSAN
• Act<Mllly wr1tt.., 19 27 GB (10.71")
c;. cyi Health
Oedup & com1><ess10n savings 18 35 GB <Rato 2 88x)
V rtual Objects
Phvslca Dos.ks 9 r f SSIOl"IS Hosts VMS Oatastores N twMs Updates
Resvndno OlllO< •• Usable capacity analysis
> CAPACIJY USAGE CAPACITY HISTORY
- - - -
us.e tnts panel to estimate the effective fr.e space availabk! 1f RffOUICO .UOC.tlon ) The capac11y usage charts for a given period o f ume
"9 dedup rallO Is I (D
~1orrNnC.• lo 19 P< ·y a .u Ut z,,t.on O..te Ran99 LAST • t
-- Day(s) I SHOW AfSUl!S ]
For more information about capacity check usage categories, see Monitor vSAN Capacity at
htt ps:/ I docs. vmware.com/ en/VMware-vSphere/7. 0 I com. vmware. vsphere. vsan-
monitoring.doc/GUI D-6F7F134E-A6 F7-4459-8C31-C021 FF281F54.html.
597
14-17 Performance Service Charts
The performance service monitors performance-based metrics at the cluster, host, VM, and
v irtual disk levels.
vrn vSphere Chent M· v ' . 1--~" _ " :. i · , , • 1' •• , ••; Hf r..f • : •• ,, """
Point to a performance
0
chart for a specific metric
~==-----~----1
V W·VCW-01..:i..• local
Ill SA·OC-01
v
) CJ MGMTHosts
from the selected time.
) CJ W•t ~IS
) tJ SA 11SAN 01
) D S8·vSAN·01
You use the performance charts to monitor and troubleshoot performance problems like IOPS,
throughput, congestion, and latency concerns.
598
14-18 vSAN Performance Checks: VM
The VM panel for the selected vSAN cluster displays an overview of vSAN performance
statistics in a graphical format at the vSAN host and vSAN cluster levels.
IOPs
•A • I II:> Congestion
,.,... . . . ~··'... 1 ttcM(• ( • .,., Ill ~
~ Throughput .
<•"""""'
.....
,. · - G>
• •MU•----------
' ""'
0
... __ ,........,.'" ..........,..
_...,
'"
10 4\ AM
.. -- - - •
Latency
You can use the vSAN performance charts to monitor t he workload in your cluster and
determine the root cause of problems.
When the Performance service is running, the cluster summary displays an overview of vSAN
performance statistics in a graphical format, including vSAN IOPS, t hroughput, and latency. At
the cluster level, you can v iew detailed statistical charts for all VM consumption and the vSAN
back end.
To view iSCSI performance charts, all hosts in the vSAN cluster must be running ESX i 6.5 or
later, and the iSCSI Target service must be enabled.
vSAN cluster pert ormance charts are v iewed using the following steps:
2. Select the VM tab and select a time range for your query.
vSAN displays performance charts for VMs running on the cluster, including IOPS,
throughput, latency, congestions, and outstanding 1/0. The statistics on these charts are
aggregated from the hosts within the cluster.
vSAN displays performance charts for the cluster back-end operations, including IOPS,
throughput, latency, congestions, and outstanding 1/0. The statistics on these charts are
aggregated from the hosts within the cluster.
4. Select iSCSI and select an iSCSI target or LUN and select a time range for your query.
vSAN displays performance charts for iSCSI targets or LUNs, including IOPS, bandwidth,
latency, and outstanding I/ 0.
For more information about vSAN performance views and graphs, see VMware knowledge base
article 2144493 at https://kb.vmware.com/s/article/2144493.
599
14-19 vSAN Performance Checks: Disks
The DISKS panel displays performance metrics for the disk group and individual disks at the disk
group and vSAN host level.
0 sb-esxi-02.vclass.local •crio• • v
Morutor C • r P :-nissions VMS OatastOfcs Networks Updates
~ormonce
Tngoe-rtOAlMn
v ~~-~-~~
P -
~11da
_7_3·~-
1't9-
233 _
f--
~-fd0cl-76_
n -=
..noio~~~
------'~~ entire disk groups or individual disks.
Overview
M•tr.cs about disk 9roups A SK VMWAR(
Advancld
Frontond(Gutil) IOPS ©
10
~ sb-esxi-02.vclass.local
• I 5S PM 5t y Monitor ;, t: P. '" C t wes Ne Updotes
v fronl~nd Rt.td K>P!t 0
O"" Group Local VMworo 01.ic (mpx vmhbaO:Co·r2·tO)
0 - - - - - 1 ~.tdCM~ Rt~ tOPS
I ll""' I IO,,..
ffotl1trtd \\oult KW'> 0
0
V.ntf a.,,tftr 'Wntt K>PS 0 ' JN l1'ih.
S211da73-d41't·9233·f~·664fcl0ca7672 •
..... Performance v 1o
I 2S PM
Phy...c.if-l•Vf'r Rtid tOP\ 0
Phy\tUl-l•Yff \lrntt tOP$ 0
Tnks and Events v
Sll.001'
1~6001 s
The examples show the delayed 1/0 throughput for the entire disk group and the
physical/firmware layer latency for a local physical drive within the disk group.
vSAN requires at least one disk group and can be configured w ith up to five disk groups per
host.
Using the Disk Group pane, you can distinguish how each disk group is performing, independent
of any other disk groups on the same host.
The Disk Group pane has several graphs with which you can monitor the performance of the
cache tier and vSAN internal queues. This pane also has a graph that shows the disk group
capacity and usage.
600
14-20 vSAN Performance Check: Physical
Adapter
The PHYSICAL ADAPTER panel displays the pNIC Throughput, pNIC Packet Per Second, pNIC
Packet Error Rate, pNIC vSwitch Port Drop Rate, and pNIC Flow Control views.
Performance v
Physical Adapter. vmn1c3 v
Overview
Metrics fo r vSAN physical NIC. ASK V MWARE
Advanced
The performance statistics count all network IOs processed en the network adapters used by vSAN The counted
Tasks and Events v
network IOs are not lnnlted to vSAN traffic only
Tasks
Events pNIC Throughput <D
4 .37 K8 \
Hardware Health
vSAN v
2. 18 KS\
Performance
Sky! ne Health
0 .00 8 \
2 4'.> P\4 3 OOPM 31SPM 3 30PM 3 4S PM
...
If a host's physical network adapter used by vSAN is slow, compared with other hosts,
pert ormance issues might be introduced. You might notice symptoms such as high network
latency, congestion, and so on.
For more information about the vSAN health service and the physical network adapter link
speed consistency check, see VMware knowledge base article 50387 at
https:/ /kb. vmware.com/ s/ article/50387.
For more information about how the health test checks the network latency between vSAN
hosts and displays network latency in real-time, see VMware knowledge base article 2149511 at
https:/ /kb. vmware.com/ s/ article/2149511.
601
14-21 vSAN Performance Check: Host
Network
The HOST NETWORK panel displays the metrics for the host network and the vSAN VMkernel
Net work Adapter network.
vSAN v Ove<¥ w
Net-
--
vmk2
Skyline ~aitn TIW performanc• stall.Sltes count ••network K>s process.ct tn the network adapters used by vSAN The counted
Tasks and Events v
0001 \ l8tWOfk tOs "'e not lm1ted to vSAN lrdltfic only
IOOPM J IS PM ) TMks
V~l(errl91 NetWOtk Adapter ThrOU<Jhput CD
1\1 kl'\
vSAN
1l6klt
0001 ~
I OOPM
By selecting Host Network from the Network drop-down menu, the HOST NETWORK health
check table lists the physical network adapter used by vSAN on each host . To determine which
host has a link speed that is inconsistent with t he ot hers, examine the last column in the table,
which shows t he link speed.
Selecting t he VMkernel, in this example vmk2, the HOST NETWORK health check table lists
information about network traffic for the VMkernel.
For more information about the vSAN Health service and the physical network adapter link
speed consistency check, see VMware knowledge base article 503387
htt ps:/ /kb. vmware.com/ s/ article/50387.
602
14-22 vSAN Performance Check: VM Virtual
Disks
The VIR TUA L DISKS pane shows metrics for each individual disk (VMDK) on the select ed VM.
VSAN v 0
10 SO AM 11 OS AM 11 20AM 11 SS AM II SOAM
Physical disk plat«: '
IOPS Lu"'t
I
- Nonnahzttt - N0<m&1<zcd IOPS
Per10f'mance
I Delayed N0tmal1zed tOPS ©
10
0
10 SOAM 11 OS AM 11 20AM 11 SS AM 11 SO AM
A VM can be configured wit h one or more virtual disks. T he V IRTUAL DISKS pane shows IOPS,
throughput, and latency for the selected virtual disk.
Each virt ual disk can be assigned a different storage policy. T he storage policy sett ings can
contribute to the pertormance characteristics of the v irtual disk. For example, you might create a
st orage policy t hat sets an IOPS limit for t he VM object.
For more information about VM Consumption graphs, see VMware knowledge base article
214 4 493 at htt ps:/ /kb. vmware.com/ s/ article/21 4 4493#VMConsumptionGraph.
603
14-23 Running Proactive Tests
Proact ive tests are useful for verifying that your vSAN cluster is working properly before you
put it into production.
Proactive Tes ts
vSphere ORS >
For storage performance test. use HCIBench HCIBench 1s a storage performance testing automation tool that
vSphere HA > s1mp 1hes and accelerates customer Proof of Concept (POC) performance testing 1n a consistent and controlled way
Resource Allocation > VMware vSAN Community Forum provides support for HCIBench.
Ut1hzat1on 8 A SKVMWARE
Stor age Overview
N•me L•st Run Result L•st Run Time
Security
vSAN v -- -- -
Skyline Health
-- --
Virtual Ob1ccts
Physical Disks
Resync1ng Objeets 2 terns
I Proactive Tests
capacity
I
Performance
Ru n t he test t o view it s details here.
Performance Diagnostics
Support
"-·- ,4 -·-· -- ""·- ..............
vSAN includes two additional tests t hat you can run proactively, instead of reactively, on a
vSAN cluster. You can use these tests to verify t hat the vSAN environment is f unctioning
correctly and perf arming as expected.
• VM creation test : This test t ypically takes 20-40 seconds, and at most 180 seconds if
timeouts occur. One VM create and one VM delete task are spawned per host. These tasks
are displayed in t he Recent Tasks pane.
• Network Performance Test: This test assesses if connectivity issues occur and if t he
network bandwidth between hosts can satisfy vSAN requirements.
604
14-24 Exporting Support Bundles: Local Files
When a serious error occurs, VMware Technical Support might ask you to generate a vm-
support package. The package includes log files and other information, including core dumps.
Administrat ion
Upload File to Service Request
• Access Control
In this example, local log files are uploaded and sent to
Global Permissions
YO<. can uplc)<Jd f s dlrectry to VM w re u ng t
httPSJ/" vmware com 443/"
0
""'e the support representative by clicking Administration >
• Lkensing I UPLOAD FILE TO SERVICE REQUEST I Support > Upload File to Service Request.
Licenses
• Solutions
Client Plug"'s
vCenter Server Extensoons
Upload File to Service Request
x
• Deployment
System Configuration
Customer Experience Improvement _
Service request ID •
•Support
• Single Sign On
File to upload [ BROWSE I
Users and Groups
Configuration
• Certifkates
Certillcate Management [ C AN CEL I
An administrator can select Upload File to Service Request to enter an existing support request
number and upload the necessary logs to VMware Support.
The data collected in a host support bundle includes the name of the affected ESXi host, logs,
VM descriptions (but never the contents of virtual disks or snapshot files), information about the
state of the affected VM, and, if present, core dumps.
VMware Technical Support routinely requests diagnostic information from you when a support
request is addressed. Data collected in a host support bundle might be considered sensitive.
A lso, as of vSphere 6.5, support bundles can include encrypted information from an ESXi host.
For more information about data collected when gathering diagnostic information from vSphere
products, see VMware knowledge base article 214 7388 at
https:/ /kb. vmware.com/ s/ article/214 7388.
605
14-25 Exporting Support Bundles: vCenter
Server
The vm-support package and core dumps can be exported from different objects in the
vSphere Client. In this example, the system logs are exported from vCenter Server.
O sa-vcsa-01. vclass.local
v 0 sa-vcsa 01 vdass local Summary M or C"' •• gure
v l1J SA·DC-01
> LJ MGMTHosts Ill New Dat<>eenter Export System Logs - sa- Select hosts x
> O W1tnessHosts vcsa-01 vclass.local s.r.ct f'loil 1ogs1aia.c int i1n•qXlft t:ud9 vau~~ ~we..,,.. Slirwl'.., dof'wt• 1.110CJS" aw bu'df
CJ New FOider _ , . . , , _ ..
) l] SA·vSAN-01
) l] SB·vSAN·Ol
I Export System Logs
~ Assign License
I
_
I~-"
-g
Q IC~vei-.'I lrout
t.e-ftD-02-YdeulOcM
•
-...........
........,..
• ......
.," .........,,,
•
-
10!J
100
•
AOd Permiss.on.
• I
Cl --...........
....... .(11~-·
lt>-~--04 vcms.iQc..i
..........
CCl"l"lkl«I
Camo<!OO
0
ID
SA vSA"'IO'I
100
700
Alarms "' a K-wftl'W'fol
ustom Altnbutes c:i -.....c-.. ..._ Export System Logs· sa- Select logs x
- - - - - . vcsa-01 vclass.local ~~~~~~.c:::=wi::&.::.c:::.;m::;~=-..-.......
v 8 s.lltd Al I 0.S.W.C1 A.I
El •
.......
__,,
c .... Ii. I ••.::• I IXPO•t LOGS
If you deployed vCenter Server or vCenter Server Appliance, you can export a support bundle
containing log files for the node that you select in the vSphere Client.
For more information about exporting support bundles for vCenter Server, see Export a
Support Bundle at https://docs.vmware.com/en/VMware-
vSphere/ 6. 7I com. vmware. vsphere. vcsa.doc/ GUI D-C54CA3 F8-BD 7 4-4339-A2A5-
A E89F1C55175.html.
606
14-26 Exporting Support Bundles: ESXi Host
In this example, the system logs are exported from an ESXi host.
) D MGMTHosts v G System
~ :oreOumps
> bl W1tnessHosts +
a a. ;e
Ba•~Mnrool
) Qjl SA-vSAN-01
a ft{)u~
v l[!IJ SB-vSAN-01 Maintenance Mode .. Cl E Image
<D You can .,poao - meet: to VMwar"by ll0'"9 IOAdiBisUatlOn > 5'IA>O'I UplOacl
toServU~
Add Networking ...
C ANC EL EXPORT LOGS
Host Profiles
VMware Technical Support routinely requests the diagnostic information from you when a
support request is addressed. Data collected in a host support bundle might be considered
sensitive.
Furthermore, as of vSphere 6.5, support bundles can include encrypted information from an
ESXi host. You can make that password available to your support representative on a secure
channel. If only some hosts in your environment use encryption, some files in the package are
encrypted.
For more information on what information is included in the support bundles, see VMware
knowledge base article 2147388 at https://kb.vmware.com/s/article/2147388.
607
14-27 Review of Learner Objectives
• Discuss VMware Skyline Health and the associated service
• Describe the use of VMware Skyline Health to identify and correct problems in vSAN
608
14-28 Lesson 2: Commands for vSAN
• Discuss how t o run commands from the vCent er Server and ESXi command lines
609
14-30 About vSphere ESXi Shell
You can use vSphere ESXi Shell to obtain command-line access to an ESXi host.
• ESXCLI commands
610
14-31 Accessing vSphere ESXi Shell
You can access vSphere ESXi Shell in the fo llowing ways:
• Local access through the host's Direct Console User Interface (DCUI):
Enable the vSphere ESXi Shell service, either in the DCUI or the vSphere Client.
Swap between the DCUI and local ESXi Shell by pressing Alt+F2 and Alt+F1,
respectively.
Log out of the ESXi Shell by pressing Ctrl+C or entering the exit command.
Disable the vSphere ESXi Shell service when not using it.
Enable the SSH service, either in the DCUI or the vSphere Client.
Disable the SSH service when you are not using it.
611
14-32 Examining the vsantop Utility
The vsantop utility focuses on monitoring vSAN performance metrics at an individual host
level.
[root@sa-esxi-01 : -] vsantoo
8 : 13 : 38prn I entity type : host-dornclient
Information and metrics presented for the host sa-esxi-01 host -domclient entit y are:
• IOPS, nodeld, throughput, lat ency Avg, latencyStd, ioCount, congestion, and loD
vSAN includes a CLI called vsantop that provides t his data. The vsant op utility is built wit h an
awareness of vSAN architecture to retrieve focused metrics at a detailed interval.
This command executes vs an top in bat ch mode by capturing a snapshot of t he metrics every
10 seconds for 6 iterations and it stores it in t he respective location. As a result, the out put file
has one minute of statistical data.
612
14-33 Navigating vsantop
You can view or switch between entity types by ent ering the E command and choosing any of
the supported entit y types.
[root@sa-esx · - 1 : - vs~~to
9:00 :22pm I entity type: host-domclient
Enter the chosen entity type number to change the viewed entity type.
vSAN archit ecture comprises mu ltiple layers of software and hardware entities. An entity t ype
can denote a host, drive, or a soft ware component and has a unique identifier. Each instance of
vsantop can accommodate one entity type and up to nine columns of associated metrics. This
categorization helps you to understand usage patterns and correct or optimize the appropriate
entity.
Each ent ity type can have up to nine metric fields t hat can be displayed at any inst ance. You can
add or remove the relevant metric fields by using the f command. This command also displays a
list o f metrics associat ed with the ent ity.
For more information about navigating the vs an top utility, see Getting started with vsantop at
vsantop/ gett ing-started/">https:/I core. vmware.com/ resource/ gett ing-started-vsantop.
6 13
14-34 Examples of vsantop Entity Outputs
Information and metrics presented for the vsan-host-
[root@sa-esxi-01 : - ] vsantop net entity are nodeld, rx and tx throughput, and tcp tx
10 : 20 : 47pm 11 entity type : vsan- host- ne t J packets.
614
14-35 ESXCLI Commands
ESXCLI commands offer options in the following namespaces:
• esxcli namespace
• esxcli device namespace
• esxcli elxnet namespace
• esxcli fcoe namespace
• esxcli graphics namespace
• esxcli hardware namespace
• esxcli iscsi namespace
• esxcli network namespace
• esxcli nvme namespace
• esxcli rdma namespace
• esxcli sched namespace
• esxcli software namespace
• esxcli storage namespace
• esxcli system namespace
• esxcli vm namespace
• esxcli vsan namespace
615
14-36 Viewing vSphere Storage Information (1)
You use the esxcli storage command to display storage information, including
multipathing configuration, LUN specifics, and datastore settings.
esxcli storage
[root~sa-esxi-01 : -]
Usage : esxcli storage {cmd} [cmd options]
Available Namespaces :
core VMware core storage commands .
hpp VMware High Performance Plugin (HPP) .
nf s Operations to create , manage , and remove Network Attached
Storage filesystems .
nfs41 Operations to create , manage , and remove NFS v4 . 1
filesystems .
nmp VMware Native Multipath Plugin (NMP) . This is the VMware
default implementation of the Pluggable Storage
Architecture .
san IO device management operations to the SAN devices on the
system .
vf lash virtual flash Management Operations on the system .
vmfs VMFS operations .
vvol Operations pertaining to Virtual Volumes
f ilesystem Operations pertaining to filesystems , also known as
datastores , on the ESX host .
iofilter IOFilter related commands .
616
14-37 Viewing vSphere Storage Information (2)
You use the esxcli storage core device l ist command to display storage
device-related information.
[root@sa- esxi - 01 : -] esxcli storage core device list
mpx . vmhba0 : CO : T3 : LO
Display Name : Local VMware Disk (mpx . vmhba0 : CO : T3 : LO )
Has Settable Display Name : false
Size : 30720
Device Type : Direct- Access
Multipath Plugin : NMP
Devfs Path : /vmfs/devices/disks/mpx . vmhba0 : CO : T3 : LO
Vendor : VMware
Model : Vi r tual d i sk
Revision : 2 . 0
SCSI Level : 6
Is Pseudo : false
Status : on
Is RDM Capable : false
Is Local : true
Is Removable : false
Is SSD : true
Is VVOL PE : false
Is Offline : false
Is Perennially Reserved : false
Queue Full Sample Size : 0
Queue Full Threshold : 0
Thin Provisioning Status : unknown
Attached Filters :
V'AAI Status : unsupported
Othe r UIDs : vml . 0000000000766d686261303a333a30
Is Shared Clusterwide : false
617
14-38 Viewing vSphere Network Information (1)
You use the esxcli network command to display physical and v irtual network information.
esxcli network
Lroot~sa-esxi-Ul : -J
Usage : esxcli network {cmd} [cmd options]
Available Namespaces :
ens - Commands to list and manipulate Enhanced Networking Stack (ENS)
feature on virtual s wi tch .
firewall A set of commands for firewall related operations
.
1p Operations that can be performed on vmknics
multicast Operations having to do with multicast
•
nic Operations having to do wi th the configuration of Network
Interface Card and getting and updating the NIC settings .
port Commands to get information about a port
• •
sr1ovn1c Operations having to do wi th the configuration of SRIOV enabled
Network Interface Card and getting and updating the NIC
settings .
vm A set of commands for VM related operations
vswitch Commands to list and manipulate Virtual Switches on an ESX host .
diag Operations pertaining to network diagnostics
You use the esxcli network nic 1 is t command to display vmnic information. '
618
14-40 Listing Available Subcommands (1)
You use the esxcli esxcli command 1 is t command to display all available
subcommands.
You can include the grep command filters to more easily find the command you need.
You use the esxcli esxcli command 1 is t command to display all available
subcommands.
You can include grep debug to filter your search to include the debug level.
619
14-42 Other Useful Commands in vSphere ESXi
Shell (1)
In addit ion to ESXCL I commands, vSphere ESXi Shell provides other useful commands.
[root@sa-esxi-01 : -]
BootModuleConfig . sh host shutdown . sh scp
VmfsLatencyStats . py ho std sdrsinjector
Xorg ho std-probe secpolicytools
[ I hostd- probe . sh sed
[[ hostdCgiServer sensord
amldump hostname seq
apiForwarder hwclock services . sh
apply-host-profiles indcf g sets id
appl yHostProf ile inetd sf cbd
applyHostProf ileWrapper init sh
620
14-44 Other Useful Commands in vSphere ESXi
Shell (3)
Use the vdq command to see if the disks can be used in a vSAN cluster.
[root@sa-esxi-01 : -] vdq -iq
[
{
'' Name '' : ''mpx . vmhbaO : CO : T3 : LO '',
'' VSANUUID '' : '' 5292a16e-3e2b-8aal-4fc7-7ald5a3863bl '',
'' State '' : '' In-use for VSAN '',
'' Rea son '' : '' None '',
'' IsSSD '' : '' 1 '',
'' IsCapacityFlash '': '' l '',
'' Is PDL '' : '' 0 '' ,
'' Size (MB) '' : '' 30720 '',
'' FormatType '' : '' 512n '',
},
{
'' Name '' : ''mpx . vmhbaO : CO : T2 : LO '',
'' VSANUUID '' : '' 52c0ba5a-c9bf-3a 7d-4 9d5-8c3alb3abde8 '',
'' State '' : '' In-use for VSAN '',
'' Rea son '' : '' None '',
'' Is SS D'' : '' 1 '' ,
'' IsCapacityFlash '': '' 1 '',
'' Is PDL '' : '' 0 '' ,
'' Size (MB) '' : '' 3 0 7 2 0 '' ,
'' FormatType '' : '' 512n '',
},
621
14-45 Python Scripts for Testing Systems
Python scripts are useful when introducing f au Its for testing purposes. Several scripts are
applicable to vSAN.
[root@sa-esxi-01 : /usr/lib/vmware/vsan/bin] ls
VSANDeviceMonitor . py upgrade - vsanmgmtd- config . pyc
VsanSystemCmd vitRecoveryTool . pyc
__,..pycache~ vitd
clom-tool vitsafehd
clomd vsa.n-config. py
cmmdsAnalyzer . py vsan-health-status . pyc
cmmdsTimeMachine.py vsan-perfsvc-collector.py
cmmdsd vsanDiskFaultinjection . pyc
configVsanRP vsanobserver
dbobjtool vsanobserver.sh
ddecomd vsanObserverObfuscated.sh
epd vsansparseRealign
fixDescriptors . py vsanTraceCollector . pyc
iperf 3 vsanTraceReader
iperf3.copy vsanTraceReader.py
killinaccessiblevm.s . py vsanUpdateUuid.py
obfuscatecmmdsDump . py vsandf.pyc
obfuscateLog . pyc vsandpd
reboot_helper . py vsandpd-support.py
rpd vsanmgmtd
slotfstool vsansvcctl.py
tokenBucket.py
622
14-46 Using Python to Inject Errors
You can use the vsanDiskFau l tinj ection. pyc script to introduce a hot-unplug failure
state into a storage device on a host.
623
14-4 7 About PowerCLI
PowerC LI delivers a set of cmdlets for managing vSA N.
This list shown does not include all cmdlets available for vSAN.
624
14-48 PowerCLI Commands: Example 1
An example of the Get-VsanDisk PowerCLI command is shown.
625
14-49 PowerCLI Commands: Example 2
An example of the Get - VsanDiskGroup PowerC LI command is shown.
626
14-50 ESXCLI Namespaces in vSAN
In vSAN 7, the esxcli vsan command offers the following namespaces and ESXCLI
functions.
# vmware -1
VMware ESXi 7 . 0 GA
I
#
I esx cli vsan
Osage : esx cli vsan {cmd} [end options)
Available Namespaces :
c~uster Con:mands for vSAN host cluster configuration
crr.mds Commands for vSAN CMMDS (Cluster monitoring, reerr.bership, and directory service) .
datastore Corr.rnands for vSAN datastore configuration
debug Corr.mands for vSAN debugging
encryption Corunands for vSAN Encryption .
health Con:mands for vSAN Health
.
iscsi Corr.mands for vSAN iSCSI target configuration
network Corr.mands for vSAN host network configuration
perf Corr.mands for vSAN performa.~ce service configuration .
627
14-51 Using the esxcli vsan network Command
You use the esxcli vsan network command to gather information about t he vSAN
network and other network-related information.
Available Namespaces :
-
lp Commands for configuring IP network for vSAN .
ipv4 Compatibility alias for " ip"
Available Corrunands :
clear Clear the vSAN network configuration .
list List the network configuration currently in use by vSAN .
remove Remove an interface from the vSAN network configuration .
restore Restore the persisted vSAN network configuration .
628
14-52 Using the esxcli vsan network list
Command
You use the esxcli vsan network l ist command to verify if the VMkernel port is
used by vSAN.
629
14-53 Activity: Using the esxcli vsan network
Command
The esxcli vsan ne t wo rk command is used to list network details, and to add and
remove the vmnic that provides vSAN network connectivity.
Does running the remove command on a host in a vSAN cluster create a network partition?
630
14-54 Activity: Using the esxcli vsan network
Command Solution
The esxcli vsan network command is used t o list network details, and to add and
remove t he v mnic t hat provides vSA N network connectivity.
Does running t he remove command on a host in a vSAN cluster create a network partition?
Yes.
•
[root@sa-esxi-01 : - ] esxcli vs an network remove -1 vmk2
y,_ "1 • -
•
..
rrnnt-ra
..
[root@sa-esxi-01 : - ] esxcli vs an network list
[root@sa-esxi-01 : - ]
.
[root@sa-esxi-01 : - ] esxcli vs an network ipv4 add -1 vmk2
[root@sa-esxi-01: - ]
[root@sa-esxi- 01 : - ] esxcli vs an network list
Interface
VmkNic Name : vmk2
IP Protocol : IP
Interface UUID: 20fadb59-0ac3-4ba7-98a5-005056013df7
Agent Group Multicast Address : 224 . 2 . 3 . 4
Agent Group IPv6 Multicast Address : ff19 :: 2:3 : 4
Agent Group Multicast Port : 23 451
Master Group Multicast Address : 224 . 1 . 2 . 3
Master Group IPv6 Mu l ticast Address : ff19::1:2:3
Master Group Multicast Port : 12345
Host Unicast Channel Bound Port : 12321
11ul t icast TTL : 5
Traffic Type : vsan
[root@sa-esxi-01: - ]
631
14-55 Using the ESXCLI Debug Namespace
You can use the debug namespace to troubleshoot vSAN.
Available Namespaces :
disk Debug commands for vSAN physical disks
object Debug commands for vSAN objects
re sync Debug commands for vSAN resyncing objects
advcfg Debug commands for vSAN advanced configuration options .
controller Debug commands for vSAN disk controllers
evacuation Debug commands for simulating host , disk or disk group evacuation in
various modes and their impact on objects in vSAN cluster
limit Debug commands for vSAN limits
memory Debug commands for vSAN memory consumption .
mob Debug commands for vSAN Managed Object Browser Service .
vmdk Debug commands for vSAN VMDKs
632
14-56 Activity: Using the esxcli vsan debug
Command
See the screenshot to answer questions about the esxc l i vsan debug namespace.
• Which namespace is used to v iew the state of the virtual disks?
Available Namespaces :
disk Debug commands for vSAN physical disks
object Debug commands for vSAN objects
re sync Debug commands for vSAN resyncing objects
advcfg Debug commands for vSAN advanced configuration options .
controller Debug commands for vSAN disk controllers
evacuation Debug commands for simulating host , disk or disk group evacuation in
various modes and their impact on objects in vSAN cluster
limit Debug commands for vSAN limits
memory Debug commands for vSAN memory consumption .
mob Debug commands for vSAN Managed Object Browser Service .
vmdk Debug commands for vSAN VMDKs
633
14-57 Activity: Using the esxcli vsan debug
Command Solution
See the screenshot to answer questions about the e sxc l i vsan debu g namespace.
• Which namespace is used to v iew the state of the virtual disks?
Available Namespaces :
disk Debug commands for vSAN physical disks
object Debug commands for vSAN objects
re sync Debug commands for vSAN resyncing objects
advcf Debug commands for vSAN advanced configuration options .
controller Debug commands for vSAN disk controllers
evacuation Debug commands for simulating host , disk or disk group evacuation in
various modes and their impact on objects in vSAN cluster
limit Debug commands for vSAN limits
memory Debug commands for vSAN memory consumption .
mob Debug commands for vSAN Managed Object Browser Service .
[ vmdk- ) Debug commands for vSAN VMDKs
634
14-58 Using ESXCLI to Investigate Object
Health (1)
You use the esxcli vsan debug object overview command t o display the overall
health of individual objects.
635
14-59 Using ESXCLI to Investigate Object
Health (2)
You use the esxcli vs an debug object heal th summary get command in
troubleshooting vSAN.
636
14-60 Using ESXCLI to Investigate VMDK Files
You use the esxcli vsan debug vmdk list command to display the health of
individual VMDK objects.
[root@sa-esxi-01 : -] esxcli vsan debug vmdk list
...........................--~------------------------------------------
0 b j e ct : 14e4al5e-2898-70bl-9790-00505602b80e
Health : reduced-availability-with-no-rebuild
Type : vdisk
Path : /vmfs /volumes/vsan : 52dl7529cc48e68f-8lac938c97a594lb/Ode4al5e-f9e9-b5f4-54eb-005
05602b80e/New Virtual Machine . vmdk
Directory Name : N/A
Object : Ode4al5e-f9e9-b5f4-54eb-00505602b80e
Health : reduced-availability-with-no-rebuild
Type : vmnamespace
Path : /vmfs/volumes/vsan : 52dl7529cc48e68f-8lac938c97a594lb/New Virtual Machine
Directory Name : New Virtual Machine
Object : 6f813a5e-fa21-8792-3f80-00505602b80e
Health : reduced-availability-with-no-rebuild
Type : vmnamespace
Path : /vmfs/volumes/vsan : 52dl7529cc48e68f-8lac938c97a594lb/ . vsan . stats
Directory Name : . vsan . stats
637
14-61 Activity: Using the esxcli vsan debug
vmdk list Command
The esxcli vsan debug vmdk list command can show other objects on the vSAN
datastore.
638
14-62 Activity: Using the esxcli vsan debug
vmdk list Command Solution
The esxcli vsan debug vmdk list command can show other objects on the vSAN
datastore.
639
14-63 vSAN Health Check Results: Overall
State
The esxcli vsan heal t h c luster l is t command displays the overall state o f t he
cluster.
# esxcli v san health cluster list
Health Test Name Status
--------------------------------------------------
Overal l health
----------
green (OK)
Cluster green
ESXi vSAN Health service installation green
vSAN Health Service up-to- date green
Advanced vSAN configuration in sync green
vSAN CLOMD liveness green
vSAN Disk Balance green
Resync operations throttling green
Software version compatibility green
Disk format version green
Network green
Hosts disconnected from VC green
Hosts with connectivity issues green
vSAN cluster partition green
All hosts have a vSAN vmknic configured green
vSAN : Basic (unicast) connectivity check green
vSAN : MTU check (ping with l arge packet size) green
v Motion : Basic (unicast) connectivity check green
v Motion : MTU check (ping with large packet size) green
Network latency check green
Data green
vSAN object health green
Limits green
Current c l uster situation green
After 1 additional host failure green
Host comp onent limit green
Physical disk green
Operation health green
Disk capacity green
Congestion green
Component limit health green
Component metadata health green
Memory pools (heaps) green
Memory pools (slabs) green
Performance service green
Stats DB object green
Stats master election green
Performance data collection green
All hosts contributing stats green
Stats DB object conflicts green
-
640
14-64 Using ESXCLI to Investigate Health
Check Results
You use the esxcli vsan health cluster 1 is t command t o display the most
recent health check result s.
(root@sa-esxi-01 : -J esxcli vsan health cluster list
Health Test Name Status
( r~t@sa-esx i -01: -) esxcl i vsan hea l th cluster l i st
Overall health green (OK) Health Test Name Status
Advanced vSAN configuration in sync green Overa l l heal th red (Network misco n!i quration)
vSAN daemon liveness green Netwo rk red
vSAN Disk Balance green Hosts with co nnec tivity i ssues red
Resync operations throttling green vSAN luste a
Software version compatibility green All hosts have a vSAN vmlmic confiqured qreen
vSAN: Bas i c (unic ast) connect i vity check qreen
Disk format version green vSAN: MIU check (p i nq with larqe packet size ) qreen
Network green vMot ion: Basic (uni cast ) co nnec t ivi t y chec k qreen
Hosts with connectivit y issues green vMotio n: MTU chec k (pi nq with larqe packet size ) qreen
vSAN cluster partition green
All hosts have a vSAN vmknic configured green Data red
vSAN : Basic (unicast) connectivity check green vSAN obJect health red
vSAN: MTU check (ping with large packet size) green Perfo rtrAnce se rvice red
vMotion : Basic (unicast) connectivity check green Stats DB Ob)eCt red
vMotion: MTU check (ping with large packet size) green - .....,tats master e'lec tio n qreen
Perfo rmance data col l ec t ion qreen
Data green All hosts co ntributinq stats qreen
vSAN ob "ect health reen Stats DB ob)ect conflicts green
apac1 y ut1 green Phys ical di sk qreen
Disk space green Operation health qreen
Read cache reservations green Disk c apaci t y qreen
Component green Conqest ion green
What if the most consumed host fails green
J
Component limit health qreen
Physical disk green Component l!l!!tadata health qreen
Memor y pools (heaps } qreen
Operation health green Memor y pools (slabs ) green
Disk capacity green Cluster qreen
Congestion green Advanced vSAN co nfi quration i n sync qreen
Component limit health green vSAN daemon liveness qreen
Component metadata health green vSAN Disk Balance qreen
Memory pools (heaps) green Resync operations throttlinq qreen
p
Software version co:r:patibi lity green
Performance service green Disk f ormat versio n qreen
Stats DB object green Capaci t y ut ili zation qreen
..
~tin! etec-erl5ft~~-------------------------'l!jfi'l!~i""-- Disk spac e qreen
Performance data collection green Read c ache reservatio ns qreen
All hosts cont ributing stats green Component green
Stats DB object conflicts green What i f the most consumed host ta i ls q r een
641
14-65 vSAN Health Check Results: Query Failed
Tests
The esxcli vsan heal th cluster get - t ''name of test'' command can
query any failed test and show vSAN disks as absent.
[root@sa-esxi-01 : -] esxcli vsan health cluster get -t " vSAN cluster partition "
vSAN cluster partition red
Partition list
Host Partition Host UUID
642
14-66 Activity: Using the esxcli vsan health
cluster get t Command
The esxcli vsan heal th cluster get - t ''name of test'' command returns
the reason for the test result.
[root@sa-esxi-01 : -] esxcli vsan health cluster get -t " vSAN cluster partition "
vSAN cluster partition red
Partition list
Host Partition Host UUID
643
14-67 Activity: Using the esxcli vsan health
cluster get t Command Solution
The esxcli vsan heal th cluster get - t ''name of test'' command returns
the reason for the test result.
• Why does a warning appear? A vSAN cluster partition has been identified.
• What can you do to address the cluster partition? Verify vSAN host network connectivity.
rroot@sa-es xi-01 : -1 esxcli vsan health cluster get -t " vSAN cluster partition "
vSAN cluster partition red
Partition list
Host Partition Host UUID
644
14-68 Using ESXCLI to Investigate vSAN
Controllers
You use the es x cli vsan debug controlle r l i st command t o query the
contro ller f o r it s statist ics.
645
14-69 Activity: Using the esxcli vsan debug
controller list Command
You refer to the V Mware Compatibility Guide and determine that the minimum 1/ 0 controller
queue depth for your env ironment is 512.
646
14-70 Activity: Using the esxcli vsan debug
controller list Command Solution
You refer to the V Mware Compatibility Guide and determine that the minimum 1/ 0 controller
queue depth for your env ironment is 512.
Queue depth is important, as problems are with controllers that have small queue depths.
Controllers w ith queue depths of less than 256 can affect VM 1/0 performance when vSAN is
rebuilding components either because of a failure or when entering maintenance mode.
647
14-71 Using ESXCLI to Investigate Fault
Domains
The esxcli vsan faul tdomain get command ret rieves details about t he fault domain
membership of the hosts.
[root@sb-esxi-01: ~ ]
esxcli vsan faultdomain get
Fault Domain Id: a054ccb4-ff 68-4c73-cbc2-d272d4Se3 2 df
Fault Domain Name: Preferred
[root@sb-esxi-01: ~ ]
648
14-72 Activity: Using the esxcli vsan
f aultdomain Command
The esxcli vsan faul tdomain get command shows if a host is a member of a fau lt
domain.
649
14-73 Activity: Using the esxcli vsan
f aultdomain Command Solution
The esxcli vsan faul tdomain get command shows if a host is a member of a fau lt
domain.
650
14-7 4 Using ESXCLI to Investigate Drive Type
and Tier
You use the esxcli vsan storage list command to display information about all the
drives (also called disks) on a host.
[root@sb-esxi-03:- ) esxcli vsan storage list
mpx .~a0 :CO :T2:LO
Device: mpx.vmhbaO:CO:T2:LO
Display Name: mpx.vmhbaO:CO:T2:LO
Is SSD: true
VSAN UUID: 524e3cb7-d353-7519-7de0-cb0df5912c18
VSAN Disk Group UUIJ: 529e4al3- 76!6- !62c- 09d6- 4be300!d65c4
VSAN Disk Group Name: mpx.vmhbaO:CO:Tl:LO
Used by this host: true
In CMMDS: true
On-disk format version: 11
Deduplication: true
Compression: true
Checksum: 5086114152554878937
Checksum OK: true
Is Capacity Tier: true
Encryption Metadata Checksum OK: true
Encryption: :alse
DiskKeyLoaded : false
Is Mounted: true
Creation Time: Wed Feb 5 09 : 18 : 47 2020
mpx.vmhbaO:CO:Tl:LO
Device: mpx .vmhbaO:CO:Tl:LO
Display Name: mpx.vmhbaO:CO:Tl:LO
Is SSD: true
VSAN UUIJ: 529e4a13- 76f6- f62c- 09d6- 4be300fd65c4
VSAN Disk Group UUID : 529e4a13-76f6-f62c-09d6-4be300fd65c4
VSAN Disk Group Name : mpx . vmhbaO:CO:Tl:LO
Used by this host: true
In O!MDS: true
On-disk format version: 11
Deduplication: true
Compression: true
Checksum: 408489704197683578
Checksum OK: true
Is Capacity Tier: fa lse
Encryption Metadata Checksum OK: true
Encryption : false
DiskKeyLoaded: false
Is Mounted: true
Creation Time: Wed Feb 5 09 : 18:47 2020
651
14-75 Activity: Using the esxcli vsan storage list
Command
The esxcli vsan storage list command displays details about each storage device
attached to the host.
mpx.vmhbal:CO:Tl:LO
Device : mpx.vmhbal:CO:Tl:LO
Display Name: mpx .vmhbal : CO :Tl:LO
Is SSD : true
VSAN UUID: 52e4397e-cc90-71b3-c08d-cle0e51dde49
VSAN Disk Group UUID : 52e4397e-cc90-71b3 - c08d-cl
VSAN Disk Group Name: mpx .vmhbal : CO :Tl:LO
Used by this host: true
In C1'1MDS : true
On-disk format version: 5
Deduplication : false
Compression : false
652
14-76 Activity: Using the esxcli vsan storage list
Command Solution
The esxcli vsan storage list command displays details about each storage dev ice
attached to the host.
mpx . vmhbal:CO:T2:LO
Dev ice : mpx . vmhba1 : CO : T2 :LO
c:::--Disola~ Name: mpx . vmhbal : CO :T2 : LO
Is SSD : true
SAN UUTD: 528 6dd02 -4a63 - 52aa-63f3 - 8aaee4139eaa
VSAN Disk Group UUID: 52e 4 397e-cc90-71b3-c08d-cl
VSAN Disk Group Name: mpx.vmhbal : CO : Tl:LO
Used by this host : true
In CMMDS: true
On-disk format version: 5
Deduplication : false
Compression : false
Checksum: 2 1 627 195622 4 5289367
Che c ksum OK: true
Is Capacity Tier: true
ncryp ion: a se
DiskKeyLoaded : false
mpx . vmhbal:CO:Tl : LO
Device : mpx . vmhbal : CO :Tl:LO
Display Name: mpx . vmhbal : CO :Tl : LO
Is SSD : true
VSAN UUID: 52e4397e-cc90-71b3-c08d-c1e0e51dde49
VSAN Disk Group UUID: 52e4397e-cc90-71b3-c08d-cl
VSAN Disk Group Name: mpx . vmhbal : CO :Tl:LO
Used by this host : true
In C1'1MDS : true
On-disk format version: 5
Deduplication: f alse
Comp ressio n: false
653
14-77 Using ESXCLI to Investigate iSCSI
Information
You use the esxcli vsan lSCSl status get command to display iSCSI information
' '
for a host.
654
14-78 Activity: Using the esxcli vsan iscsi
Command
• •
The esxcli vsan lSCSl status get command displays details about the vSAN iSCSI
•
service.
• Is iSCS I enabled?
655
14-79 Activity: Using the esxcli vsan iscsi
Command Solution
• •
The es x c l i vsan l SCS l st a tus get command displays details about the vSAN iSCSI
•
service.
656
14-80 Using ESXCLI to Investigate Cluster
Details
The esxcli vs an cl u s t er command has several subcommands that enable the
administrator to manage the host's membership in a vSAN cluster.
[root@sa-esxi-04 : ~]
esxcli vsan cluster
Usage : esxcli vsan cluster {cmd} [cmd options]
Available Namespaces :
preferredfaultdomain Commands for configuring a preferred fault domain for vSAN .
unicastagent Commands for configuring unicast agents for vSAN .
Available Commands:
get Get information about the vSAN cluster that this host is joined to .
join Join the host to a vSAN cluster .
leav e Leave the vSAN cluster the host is currently joined to .
new Create a vSAN cluster with current host joined . A random sub-cluster UUID
will be generated .
restore Restore the persisted v SAN cluster configuration .
657
14-81 Activity: Using the esxcli vsan cluster get
Command
The esxcli vs an cl u s t er ge t command displays det ailed informat ion for the vSAN
cluster of which this host is a member.
658
14-82 Activity: Using the esxcli vsan cluster get
Command Solution
The esxcli vs an cl u s t er ge t command displays det ailed informat ion for the vSAN
cluster of which this host is a member.
659
14-83 About Ruby vSphere Console
Ruby vSphere Console is a Linux console UI for vSphere on vCenter Server and is used for
managing and troubleshooting vSAN environments.
660
14-84 Logging In to the Ruby vSphere Console
(1)
To log in to the Ruby vSphere Console, you first connect to vCenter Server Appliance, log in as
root, and run the she 11 command to access the Bash shell.
Connected t o service
o I
1 localhost/
You are prompted for a user@host user account. You must use a user who has administrator
privileges on vCenter, vSAN data center, and vSAN clusters, for example, the
administrator@vsphere. local user.
661
14-86 Navigating the vSphere and vSAN
Infrastructure
Ruby vSphere Console includes commands such as 1 s and cd to navigate t he vSphere
infrast ruct ure hierarchy. Press Ctrl+L to clear the screen.
o I
> cd 1
localhose> ls
0 SA-DC-01 (daeaceneer)
/ localhose> cd 0
/ localhose/ SA-DC-01> ls
0 seorage /
1 compueers [hose) /
2 neeworks [neework) /
3 daeaseores [daeaseore) /
4 vms (vm) I
..J-.LOJ:::.a..u· !!! r_ / S - c.d
/ localhose/ SA-DC-01/ compueers> ls
0 SA-vSAN-01 (cluseer) : cpu 33 G~z, memory 3 GB
1 SB-vSAN-01 (cluseer) : cpu S GHz , memory 3 GB
2 :ioe-Spare/
3 Wieness-Nodes/
/ localhose / SA-DC-01 / co~pueers> cd 0
/ localhose / SA-DC-01 / co~pueers / SA-vSAN- 01> ls
0 hoses/
1 resourcePool [Resources): cpu 33 . 82 / 33.82 / normal, mem 3.38/ 3.38/ normal
/ localhose/ SA-DC-01/ con:pueers/ SA-vSAN-01> ls 0
0 sa-esx i-02 . vclass . local (hose) : cpu lk4k2 . 80 G:i z, memory 8 . 00 Ga
1 sa-esx i-03 . vclass . local (hose) : cpu 1K4k2 . 80 GHz, memory 8 . 00 GB
2 sa-esx i-04 . vclass . local (hose) : cpu 1 K4k2 . 80 G:iz, memory 8 . 00 Ga
3 sa-esx i-01 . vclass . local (hose) : cpu lk4k2 . 80 GHz, memory 8 . 00 GB
662
14-87 Using Ruby vSphere Console Help
Run the help vs an command to get a list of all available RVC commands related to vSAN
administration and management.
Commands:
apply license to cluster : Apply license to VSAN
- - -
check limits : Gathers (and checks) counters against limits
-
check state : Checks state of VHs and VSAN objects
-
clear disks cache : Clear cached disks 1ntormat1on
- -
cluster change autoclaim : Enable/Disable autoclaim on a VSAN cluster
- -
cluster change checksum : Enable/Disable VSAN checksum enforcement on a cluster
- -
cluster info : Print VSAN config info a.bout a cluster or hosts
-
cluster set default policy : Set default policy on a cluster
- - -
crnrnds find : CH~tDS Find
-
disable vsan on cluster: Disable VS.AN on a cluster
- - -
disk object info : Fetch information about all VSAN obJects on a given physical disk
- -
disks info : Print physical disk info a.bout a host
-
disks stats : Show stats on all disks in VSAN
-
enable vsan on cluster : Enable VSAN on a cluster
- - -
enter maintenance mode : Put hosts into maintenance mode
- -
Choices tor vsan-mode : ensureObjectAccessibility, evacuateAllData, noAction
663
14-88 Using the Ruby vSphere Console to List
vSAN Commands
Enter vsan. and press Tab twice to list all the available vSAN Ruby vSphere Console
commands and namespaces.
> vsan .
vsan . apply license to cluster vsan . lldpnetmap
- - -
vsan . check limits vsan . obj status report
-
vsan.check state
-
vsan . object info
-
- -
vsan . clear disks cache vsan . object reconfigure
-
vsan . cluster change autoclaim vsan . observer
- -
vsan . cluster change checksum vsan . observer process statsfile
vsan.cluster info
- - vsan.perf.
- -
-
vsan . cluster set default policy vsan .p roactive rebalance
vsan . cmmds find vsan .p roactive rebalance info
- - -
vsan . disab l e vsan on cluster vsan .p urge inaccessible vswp objects
- - - - - -
vsan.disk object info vsan.reapply vsan vmknic config
-
vsan.disks info
-
vsan.recover spbm
- - -
- -
vsan . disks stats vsan .resync dashboard
- -
vsan.enable vsan on cluster vsan . scrubber info
- - - . . -
vsan.enter maintenance mode vsan. s1z1ng .
- -
vsan . fix renamed vms vsan.stretchedcluster.
- -
vsan .health . vsan . support information
-
vsan .host claim disks differently vsan . v2 ondisk upgrade
- -
vsan .host consume disks
- -
vsan . vm object info
-
-
vsan.host evacuate data
- -
vsan . vm perf stats
-
- - - -
vsan .host exit evacuation vsan . vmdk stats
- - -
vsan .host info vsan . vsanmgmt .
-
vsan.host wipe non vsan disk vsan . whatif host failures
- -
vsan.host wipe vsan disks
- - - -
- - -
> I
664
14-89 Viewing Host-Specific Information
The vs an. host info command displays information about hosts participating in the vSAN
cluster.
665
14-90 Viewing Host-Specific Disk Information
The vs an. dis ks info command displays disk information for a specific host.
/ localhost/ SA-DC-01/ computers/ SA-vSAN-01/ hosts> vsan . disks_info 0
0 0- 6 - 02 : 19 : 21 + : at ering dis information f o r ost sa- esxi - 02 . vclass . local
2020-0 6-02 19 : 19 : 22 +0000 : Done gathering disk information
Disks on host sa-esxi-02.vclass.local:
+-----------------------------------------+-------+-------+---------------------------------·
------+
I DisplayNaree I isSSD I Size I State
I
+-----------------------------------------+-------+-------+---------------------------------·
------+
I Local VMware Disk (mpx . vmhbaO :CO :T3 : LO) I SSD I 20 GB I inOse
I
I Vl1ware Virtual disk I I vSAN Format Version : vll
I
+- ----------------------------------- -----+-------+-------+---------- -----------------------·
------+
I Local Vl1ware Disk (mpx. vmhbaO:CO:T2:LO) I SSD 20 GB I inOse
I
I VMware Virtual disk I I I vSAN Format Version : vll
I
+-----------------------------------------+-------+-------+---------------------------------·
------+
I Local Vl1ware Disk (mpx. vmhbaO:CO:Tl : LO) I SSD 10 GB I inOse
I
I Vl1ware Virtual disk I I vSAN Format Version : vll
I
+-----------------------------------------+-------+-------+---------------------------------·
666
14-91 Using the Ruby vSphere Console to
Investigate VM Objects
You use the vsan. check state command to display the status of any invalid or
inaccessible VM or vSAN objects.
2020-06-02 19 :2 0 : 03 +0000: Step 3 : Check f or Vl1s f or which VC/hostd/ vmx are out o f s ync
Did not fi nd Vl1s f o r which VC/hostd/vmx are out of sync
2017-07-19 15:52 : 02 +0000: Step 3 : Chec k for VHs f o r which VC/ hostd/ vmx are out of sync
Did not find VHs for which VC/ hostd/ vmx are out of sync
Do not delete unassociated objects w ithout further investigation, including the stats database or
iSCSI LUN objects.
667
14-92 About Unassociated Objects
An unassociated object is a vSAN object that has no association w ith a valid entity, such as a
VM.
Some objects are unassociated by definition, such as the perf stats db or iSCSI LUNs.
• The object consumes vSAN resources and must be rebalanced periodically, accommodated
for during maintenance, and so on.
• The resources that vSAN puts into managing these unassociated objects are wasted if the
unassociated objects are useless.
668
14-93 Signs of Unassociated Objects
Various signs indicate the existence of unassociated objects:
• Unexplained f alders observed in the datastore explorer view indicate the existence of
objects that might be unassociated.
• A failure to create a VM on a vSAN datastore w ith sufficient available space often indicates
that vSAN has reached the component limit.
All data is migrated off a disk group, but some space is still consumed.
The calculated consumed space does not match the consumed space on the datastore.
669
14-94 Using the Ruby vSphere Console to
Investigate Unassociated Objects
You use the vs an. obj status report - t - u command to display the status of any
unassociated VM or vSAN objects.
/localhost/SA-DC-Ol/computers/SA-vSAN-01/hosts> vsan.obj_status_report O -t -u
•• eryinq a son v ...
2020-06-02 19:20:51 ...0000 : Queryinq 001-f_OBJECT in the system from sa-esxi-02. vclass. local ..
2020-06-02 19:20:51 +0000: Queryinq all disks in the s ystem from sa-esxi-02 .vclass.local ...
2020-06-02 19 :20 :51 +0000: Queryinq LSOM_OBJECT in the system from sa-esxi-02 . vclass.local .
2020-06-02 19:20:51 +0000 : Queryinq all object versions in the system ...
2020-06-02 19:20:52 +0000: Got all the info , computing table ...
+-------------------------------------+------------------------------+
I Num Healthy Comps I Total Num Comps I Nwn objects with such status I
+-------------------------------------+------------------------------+
I 4/4 (OK) I 9
I 1/1 (OK) I 8
I 3/ 3 (OK) I 1
+-------------------------------------+------------------------------+
Total non- orphans: 18
+-------------------------------------+------------------------------+
I Num Healthy Comps I Total Num Comps I Nwn objects with such status I
+-------------------------------------+------------------------------+
+-------------------------------------+------------------------------+
Total orphans : o
670
14-95 Creation of Unassociated Objects (1)
Unassociated objects might exist for various reasons:
• The objects were associated with a vSphere Replication job that failed.
• The objects were created manually, outside vCenter Server or the hostd management
agent.
• Uploading files such as ISO images not associated with a VM creates unassociated
namespaces.
• The objects are remnants of VMs that were removed from inventory but not deleted from
disk.
• The objects are remnants from external software that does not delete vSAN objects
correctly. This condition is common when using third-party replication and application
virtualization software.
• The objects are associated with advanced vSAN configurations such as iSCSI or vSAN
pert ormance metrics.
671
14-97 Using the Ruby vSphere Console to
Investigate a VM
You use the vs an. object info command to gather details about VM objects.
672
14-98 Activity: Using the vsan.object_info
Command
The vs an. object info command, followed by an object's UUID, displays details for
specific objects.
673
14-99 Activity: Using the vsan.object_info
Command Solution
The vs an . object info command, followed by an object's UU ID, displays details for
specific objects.
• What is preventing the VM from starting? Not enough components exist for a quorum.
674
14-100 Using the Ruby vSphere Console to
Investigate Swap Objects
You use the vs an. purge inaccessible vswp objects command to purge any
inaccessible VSWP objects.
The feature to purge inaccessible VSWP objects is also available in the vSAN Object Health
health check in the vSphere Client.
675
14-101 Using the Ruby vSphere Console to
Investigate Object Status
You use the vsan . obj status report command to display a status report for all vSAN
objects.
676
14-102Activity: Using the
vsan.obj status report Command
The vs an . ob j sta tu s r e p o r t command displays a summary of informat ion for all vSAN
objects.
+-------------------------------------+------------------------------+
I Num Healthy Comps I Total Num Comps I Num objects with such status I
+-------------------------------------+------------------------------+
I 3/3 (OK) 8 I
I 4/4 (OK) 2 I
I 1/ 1 (OK) 2 I
+-------------------------------------+------------------------------+
Total non-orphans : 12
+-------------------------------------+------------------------------+
I Num Healthy Comps I Total Num Comps I Num objects with such status I
+-------------------------------------+------------------------------+
+-------------------------------------+------------------------------+
Total orphans : O
Total vl objects : 0
Total v2 objects : 0
Total v2 . S objects : 0
Total v3 objects : 0
Total vs objects : 12
677
14-103Activity: Using the
vsan.obj status report Command
Solution
The vs an . ob j sta tu s repor t command displays a summary of information for all vSAN
objects.
+-------------------------------------+------------------------------+
I Num Healthy Comps I Total Num Comps I Num objects with such status I
+-------------------------------------+------------------------------+
I 3/3 (OK) 8 I
I 4/4 (OK) 2 I
I 1/1 (OK) 2 I
------~------------~-------------------+------------------------------+
Total non-orphans : 12
+-------------------------------------+------------------------------+
I Num Healthy Comps I Total Num Comps I Num objects with such status I
+-------------------------------------+------------------------------+
~-------------------------------------+------------------------------+
Total orphans : O
Total vl objects : 0
Total v2 objects : 0
Total v2 . S objects : 0
Total v3 objects : 0
Total vs objects : 12
678
14-104 Using the Ruby vSphere Console to
Predict Failures
The vs an . whatif host failures command runs a simulation on the cluster t o predict
st orage usage if a host failure occurs.
+-----------------+---------------------------- w~
· ------------~~---~-------------~
I Resource I Usage right no~ Usage after failure/re-protection
+-----------------+---------------------------- -----------------------------------
I HDD capacity I 11% used (71. 33 GB free) 14% used (51. 3 4 GB free )
I Components I 1% used (2966 available) 2 % used (2216 available)
I RC reservations I 0 % used (0 . 00 GB free) 0 % used (0 . 00 GB free )
+-----------------+---------------------------- --------~---------~------------~----
You can also view predictive storage usage information in the Skyline Health > Limits - After
one additional host failure check.
679
14-105Review of Learner Objectives
• Use vsantop to view vSAN performance metrics
• Discuss how t o run commands from the vCent er Server and ESXi command lines
680
14-106 Lesson 3: Useful Log Files
681
14-108 Log Files for vSAN
Several logs are usefu l when troubleshooting vSA N 7.0.
Log files located in the ES Xi host I var I log directory are:
• boot . gz
• clomd.log
• hostd . log
• vmkernel . log
• vmkwarning . log
• vobd . log
• vsanmgmt . log
• vsantraces . gz
• vsanvpd.log
Log files located in the vCenter Server /var I log /vmware and
/var I l og /vmware/ vs an - heal th/ directories are:
• sps . log
• vsanvcmgmtd . log
• vmware - vsan - health - service . log
682
14-109 Examining boot.gz
The boot . g z log captures everything t hat happens during the boot process.
Firmware validation
[root@sb- esxi - 04 : /var/log] zcat boot . gz lgrep -i firmware
2020- 03 - 03T23 : 07 : 29543Z cpu0 : 266071 ) lsi rnr3 : rnfiGetAdapterinfo : 1606 : firmware version
l 2s . s . 4 . ooo6 I -
Validating the successful mounting of vSAN disks during system boot up
.J..
[root@sb-esxi-04 : /var/log] zcat boot . gz lgrep PLOGAnnounceSSD
2020- 03- 03T23 : 06 : 25 . 143Z cpu0 : 262477)PLOG : PLOGAnnounceSSD : 8123 : Trace task started for devi
ce 527df3bd- c551 - 5da4 - 4afb- ba4932ed9a45
2020-03-03T23 : 06 : 25 . 143Z cpu0 : 262477)PLOG : PLOGAnnounceSSD : 8136 : Successfully added VSAN SSD
(mpx . vmhba0 : CO : Tl : L0 : 2) with UUID 527df 3bd-c551-5da4-4afb-ba4932ed9a45 . kt 1 , en 0 , enc 0 .
2020-03-03T23 : 06 : 43 . 467Z cpul : 262477)PLOG : PLOGAnnounceSSD : 136 : Successful y added VSAN SSD
(mpx . vmhba0 : CO : Tl : L0 : 2) with UUID 527df 3bd-c551 - 5da4 - 4a f b - ba4932ed9a45 . kt 1 , en 0 , enc 0 .
The boot . g z log is a compressed log file that captures everything t hat happens during t he
boot process. Everything t hat happens from the time that t he host is started is captured in the
boot . g z log. Because this log is compressed, you can extract t he file to read or use the z cat
tool to view it from the host.
• To find t he firmware of the controller: Not all controllers present t heir firmware t o the ESXi
host after t he boot process but most do present their firmware during t he boot process.
As a caution, verify the time stamp for the firmware. If the controller's firmware has been
updated but the host has not been reboot ed, the firmware version number might not be
w hat is running on t he ESXi host.
• If local st orage devices do not mount during the ESXi host boot cycle, look for errors and
issues relat ed to disk mounting during the boot process in the boot . gz file.
After a successful boot, you w ill see a success message and disk UUID listed for all vSAN
disks. Seeing the success message and disk UUIDs means that these disks can be
successfully read t hrough by the LLOG and PLOG .
683
14-110 Examining clomd.log
The clomd . log log is the clust er-level object manager daemon log.
• The DECOM_STATE related to the host's maintenance mode evacuation option settings
sb-esx - 1·
poWl;.ll:~
o s lurr.e e2 4d-ac 92 e - 5 154ed omd . o
2020- 05-20T15 : 47 : 36.481Z 263597 info clomd(30263987712 ) (Or1g1nator@687 6) CLOM_ProcessDecon:Update: Node 5e234bf7- 7b37-ld9d-e55a-005056
0154ed state change. Old : DECOM STATE NONE New : DECOM STATE ACTIVE Mode:l J obUu1d :OOOOOOOO- OOOO- OOOO- OOOO- OOOOOOOOOOOO
- nto- c on: - - r1g1nator _ t -
0154ed state change . Old : DECOM_STATE_ACTIVE New: DECOM_STATE_AUDIT Mode:l JobUu1d : 4dla3a46-9ff3-b799-416c-390!bc3bl2f9
The clomd . log log is t he clust er level object manager daemon log.
W hen talked about in PNOMA, CLOM ensures t hat all objects are compliant with their storage
policies.
You can see which maintenance mode evacuation option is applied, and at what time any stat e
change took place:
Every hour, t he cl omd . log log reports how many magnetic disks are in t he environment.
The term magnetic disks refers to capacity disks here. The reference to magnetic disks refers to
both the spinning disks and SS D within our vSAN environment. SSDs always refers to cache
devices and MDs refers to capacity devices whether they are HD or SSD.
In t he second screenshot, you begin by looking at the time stamp. You can see that, at 15:48:16,
eight magnetic or capacity disks and four SDD or cache disks are available in t he vSAN
environment . If the number o f available storage devices later is less t han eight and four, you can
see at what time t he loss of disks occurred and compare t he recorded time o f occurrence with
other logs.
Also, t he c l omd . log log is helpful for detecting object state changes, for example, healthy or
absent .
Every 24 hours, the cl omd . log log reports how many ESXi hosts are in t he vSAN clust er.
684
14-111 Examining hostd.log
The host d. 1 o g log tracks anything related to VM activities:
• Power On
• Power Off
• Resets
• Reconfiguration
2020-03-03T21 : 34 : 02 . 960z : [netcorrelator] 38903692us : [vob . net . vrnnic . linkstate . down] vrnnic vrnnicO linkstate down
2020 - 03 - 03T21 : 34 : 02 . 960z : [scsicorrelator] 38977221us : [vob . scsi . scsipath . add] Add path : vrnhbaO : CO : TO : LO
2020 - 03 - 03T21 : 34 : 02 . 960z : [netCorrelatorJ 39833699us : [vob . net . vrnnic . linkstate . down] vmnic vmnicS linkstate down
2020-03-03T21 : 34 : 02 . 960z : [netcorrelatorJ 39833728us : [vob . net . vmnic . linkstate.downJ vmnic vmnicO linkstate up
2020-03-03T21:34 : 03 . 960z : [netcorrelator] 39833699us : [esx . problem . net.vmnic . linkstate . down] vmnic vrnnicO linkstate down
The hos td. log log is important to vSAN in that it records everything related to VM
operations. In situations where VMs cannot access storage, you should see an entry in the
ho std. l og log indicating that the VM is having an issue talking with its storage. For example, it
cannot find its component or it cannot write to storage.
The first screenshot says VM STATE RECONFIGURING and indicates that a change was
made to the VM. You can see exactly when the change occurred and then use the
vmware. l og log to find more information about what happened. The vmware. log log is
inside the namespace for the VM.
In the second screenshot, the hos td. l og log reports the link-state of vmnicO as down. A
vmnic or physical NIC with a link-state of down indicates that a host is disconnected from the
vSAN network, which prevents VMs on that host from accessing their objects. The use of N IC
teaming can prevent this scenario.
685
14-112 Activity: Mounting vSAN Disks Issues
You cannot add a disk group to vSAN . Answer t he questions based on the screenshot of the
ho std . log log.
686
14-113 Activity: Mounting vSAN Disks Issues
Solution
You cannot add a disk group to vSAN . Answer t he questions based on the screenshot of the
h o std . log log.
• What is t he issue detected by LSOM? Corrupt Redolog.
Corrupt Redol og indicates some sort of corrupt record in LLOG and PLOG. If a corrupt LLOG
or PLOG record is preventing a disk group from coming back online, the disk group must be
deleted and recreated . vSAN automatically rebuilds lost components. If t he components cannot
be rebuilt, for example, if no other RAID1 mirrors or insufficient RAID-5 or RAID-6 st ripes exist,
then the VM data must be restored from backup.
687
14-114 Examining vmkernel.log (1)
The vmke rn e 1 . 1 og log records activities related to VMs and ESXi. Useful records when
troubleshooting are:
• Timestamps
• Heartbeat timeouts
C~\'\DS: MasterSendHeartbeatRequest:l474: Sendi ng a rel i able heartbeat reques t t o 5alfe47d-e4ef - ba4b- alc3 - d094660509fd
CW-\OS: C~Y.\OSHeartbea t RequestHBWork: 844: Request heartbeat: Retry the operation.
C~Y.\OS: CW-\OSHeartbea tRequestHBWork:844: R•o•n•'•'o•c•+o...;.h•o•~-~-+-h•o•~-+_·__~_.._r_r_o_c_c.._____________________________________________________.,.
C~\'\DS: C~Y.\OSHeartbeatCheckHBLogWork:726: Chec k node returned Failure for node 5alfe4 7d- e4ef - ba4b- alc3 - d094660509fd count 5
C~y.IDS: C~Y.\OSStateDestroyNode: 676: IDestroviniz node Sal f e47d - e4ef - ba4b- alc3 - d094660509fd: Heartbeat timeout I
C~Y.\OS: MasterSendHeartbeatRequest:l474: Sending a rel i able heartbeat request to 5alfe491-6612-be51-ae48-d0946604 e641
C~Y.\05: CW-\OSHeartbeat RequestHBWork: 844: Request heartbeat: Success.
C~Y.1DS: MasterRxHeartbeatRequest:2184 : Replied to a reliable heart beat req uest. Last msg sent: 238 ms back
C~Y.\05: RejoinRxMasterHeartbeat: 1941: Saw self listed in master heartbeat
688
14-115 Examining vmkernel.log (2)
For vSAN, important records in vmkerne l . l og log are SCSI read and write errors.
WARNING: NMP: n=p_Dev1ceRequeatraat0eT1ceProbe:237: NMP device ~t10.ATA____5ATA_5DD 8450075052400104305• state in doubt; requested fast path atate update
!c•iDev1ceIO: 2927: Clld(Ox4 39d4l008d001 Ox28, Clld!N Oxl:>l252 froa world 67389 to dev : "tl0.ATA~SATA_SDD 8450075052400104305" failed H:Ox 2 D:OxO P:OxO Inval1d
Events within the ESXi host are important to record. Timestamps are a critical piece of
information because they tell you when the issue occurred. Using the date and time provided by
the time stamp, you can look at all other logs to see what occurred at the time of the issue.
Additional information recorded in the vmkernel. log file includes issues with reads and
writes with the local disks:
If vSAN sees an Ox2A error associated with a disk UUID, vSAN will take that disk offline.
689
14-116 Examining vmkwarning.log
The vmkwa r ning . log log records act ivities related t o V Ms. You can use t he g re p and
ta i l commands t o search vmkwarn i ng . l og for indications o f the pro blem.
Use the grep command to search the vmkwarning . l og file for unregistered devices.
# grep unregistered vmkwarning .log_5.=4
20~:f7-0!5-30T1'l :
11: 'l6 .1--SSZ cpuO: oT5S"9 opID=a4e71359) WARNING: NMP: nmnUnclaimPath: 1579: Physical path "v
mhba0:CO:T2:L0" is the last path to NMP device " Unregistered Device". The device has been unregistere
d.
2017-05-30T14:11:52.830Z cpu0:67527 opID bclf970d\WARNING: NMP: nmnUnclaimPath:1579: Physical path " v
mhbaO:CO:Tl:LO " is the last path to NMP device "Unregistered Device". The device has been unregistere
d.
2017-05-30T14:12:18.636Z cpu0:67559 opID=4a68e3cf)WARNIN?: NMP: nmpUnclaimPath : 1579: Physical path " v
mhbaO: CO: T2: LO " is the last path to .I
NMP device ( 11 Unregistered Device " The device has been unregistere
d.
690
14-117 Examining vobd.log
The vobd . 1 og log records st orage and network-related activities:
• Disk latency
2020-05-07T00:32:17.816Z: (GenericCorrelator] 22024925476us : [vob . user.maintenancemode.entering) The host has bequn entering maintenance mode
2020-05-07T00:32:17.816Z: (UserLevelCorrelator] 22024925476u: : [vob .user.maintenancemode.entering] The host has bequn entering maintenance mode
2020-05-07T00:32:17.816Z: [UserLevelCorrelator] 22024925932u: : (esx. audit.maintenancemode.entering] The host has begun entering maintenance mode.
2020- 05- 07T00 : 32 :19. 846Z : (UserLev elCorrelator] 22026955146u: : [vob .user.maintenancemode.entered] The host has entered maintenance mode
2020-05-07T00:32:19.846Z: [UserLevelCorrelat or] 2202695574Sus : [esx.audit.maintenancemode.entered] The host has entered maintenance mode.
2020-05-07T00:32:19.846Z: [GenericCorrelator] 22026955146us: [vob .user.maintenancemode.entered] The host has entered maintenance mode
2020-05-07T00:4l:00.927Z: (UserLevelCorrelator] 22548035986u:: [vob .user.maintenancemode.exited] The host has e xited maintenance mode
2020-05-07T00:41:00.927Z: (GenericCorrelator] 22548035986us: [vob . user.maintenancemode.exited) The host has exited maintenance mode
2020-05-07T00:41:00.927Z: (UserLevelCorrelator] 22548037288u: : [esx.audit.maintenancemode .exited] The host has ex ited maintenance mode.
2020-05-06I18 : 25:37 . 528Z : [netCorrelator] 27298830us : [vob .net.vmnic.linkstate.up] vmnic vmnic3 linkstate up
2020-05-06I18 : 25:37 .544Z : [netCo rrelator] 27315774us: [vob . net.vmnic.linkstate.up] vmnic vmnicO linkstate up
2020-05-06I18 :25:37.SS9Z : [netCorrelator] 27329213us: [vob .net.vmnic.linkstate.up] vmnic vmnicl linkstate up
2020- 05- 06I18 :25:37 .576Z : [netCorrelator] 2734 7089us : [vob.net.vmnic.linkstate.up] vmnic vmnic2 linkstate up
The vobd . 1 og file records latency information about the individual disks, such as latency
•
issues.
After an individual disk is recorded as having latency issues, you can verify if it is healthy and
when the lat ency is happening, such as during intensive read or write operations. You can look
deeper to see if the disk is a capacity disk or a cache disk. If the issue is with a cache disk in a
hybrid environment, both reads and writes can be aff ected.
A caveat with maintenance mode recording in the vobd . log file is t he selected maintenance
mode evacuation option is not recorded, but the timestamp is present and useful f or cross-
referencing o t her log files.
691
14-118 vobd.log: Device Repaired
After the error condition is cleared, the vob d . log log shows the device coming back online.
Device Coming
Online
692
14-119 Examining vsanmgmt.log
The vs anmgmt . 1 og log records connections from vCenter Server t o vSAN for the
perf ormance service and health check.
Ping Test
LULU-U~-UblLU:u0:48.402Z info vsand[263976) [opl0-03703ece rYsanHealthPino : :Pinqiest) Ready to send pino with id• 28568, seq• 1 I
2020-05-06120:00 : 48 . 407Z info vsand[263976) [opI0-03703ece f'sanHealthPino :: _parseRecvPacket) Pinoer : all host response come back, pino done Seq: l, size:64
2020-05-06T20:00:48.411Z info vsand[263976) [opI0-03703ece lvsanHealthPino::PinoJ Run pino test !or the hosts ['172.20.12.52', '172.20.12.51') from local 172.20.
12.53
2020-05-06T20:00:48.411Z info vsand[263976) [opI0-03703ece WsanHealthPino::PinotestJ Ready to send pino with id• 28568, seq• 2
2020-05-06T20:00:48.415Z info vsand[263976J [opI0-03703ece f'sanHealthPino::_parseRecvPacketJ Pinoer: all host response come back, pino done Seq:2, size:64
2020-05-06T20:00:48.417Z info vsand[263976) [opI0-03703ece lv'sanHealthPino::PinoJ Run pino test for the hosts ['172.20 . 12.52', '172 . 20.12.51'1 from local 172.20.
12 . 53
2020-05-06T20:00:48.417Z in!o vsand[263976) [opI0-03703ece .VsanHealthPino : :Pinotest) Ready to send pino with id• 28568, seq• 3
2020-05-06T20:00:48.422Z info vsand[263976J [opI0-03703ece ~sanHealthP1no::_parseRecvPacket) Pinoer: all host response come back, p1no done Seq:3, size:64
2020-05-06T20:00:48.426Z info vsand[263976J [opI0-03703ece sanHealthP1no::PinO) Pinoer: pino taI'(,let nwr..ber: 2
2020-05-06T20:00:48.426Z info vsand[263976) [opI0-03703ece sanHealthPino::PinoJ Run pino test !or the hosts ['172.20.12.52', '172.20.12.51'1 from local 172.20.
12.53
693
14-120 Examining vmware-vsan-health-
service.log
The vmware - vsan - heal th - service. log log on vCenter Server records health issues
useful for troubleshooting vSAN issues:
The vmware - vsan - heal th - service. log file on vCenter Server in the
Ivar I 1 og I vmw are I vs an - hea 1th directory is useful for troubleshooting File Service
enablement and ESX Agent deployment issues.
694
14-121 Lab 17: Reviewing the Troubleshooting
Lab Environment
Review information to become familiar with the troubleshooting lab environment:
695
14-124 Lab 20: Troubleshooting the Two-Node
vSAN Cluster Configuration Issue
Diagnose and fix t he vSAN cluster problem:
696
14-127 Lab 23: Troubleshooting the vSAN
Cluster Configuration Issue (1)
Diagnose and fix t he vSAN cluster problem:
697
14-130Lab 26: Troubleshooting the vSAN
Cluster Configuration Issue ( 4)
Diagnose and fix t he vSAN cluster problem:
698
14-132 Review of Learner Objectives
• Explain which log files are useful for vSAN troubleshooting
• With ESXCLI commands and the vs an top utility, administrators can investigate problems
w ith specific hosts from the command line.
• You can use the RVC to view information about your vSAN environment, when unavailable
by using ESXCLI commands.
• Including log files as a troubleshooting tool gives administrators an in-depth v iew of vSAN
operations.
Questions?
699
700