Edu en Vsanft7 Lec Se

Download as pdf or txt
Download as pdf or txt
You are on page 1of 724

azarpara.vahid@gmail.

com

V M w a re v " " " "' A N :


Fast Track V7
Lecture Manual

VMware® Education Services


VMware, Inc.
are~ www.vmware.com/education

mcse2012.blogfa.com
[email protected]

VMware vSAN: Fast Track [ V7]

Lecture Manual

VMware vSAN™

Part Number EDU-EN-VSANFT7-LEC (02-SEP-2022)

Copyright © 2022 VMware, Inc. A ll rights reserved. This manual and its accompanying materials are
protected by U.S. and international copyright and intellectual property laws. VMware products are covered
by one or more patents listed at http://www.vmware.com/go/patents. VMware is a registered trademark or
trademark of VMware, Inc. in t he United States and/or other jurisdictions. A ll other marks and names
mentioned herein may be trademarks of their respective companies. VMware vSphere® w ith VMware
Tanzu™, VMware vSphere® vMotion®, VMware vSphere® Web Client, VMware vSphere® Virtual Volumes™,
VMware vSphere® Syslog Collector, VMware vSphere® Storage vMotion®, VMware vSphere® Replication™ ,
VMware vSphere® Lifecycle Manager™, VMware vSphere® High Availability, VMware vSphere® Fault
Tolerance, VMware vSphere® ESXi™ Shell, VMware vSphere® ESXi™ Dump Collector, VMware vSphere®
Distributed Switch™, VMware vSphere® Distributed Resource Scheduler™ , VMware vSphere® Distributed
Power Management™, VMware vSphere® Client™ , VMware vSphere® Add-on for Kubernetes, VMware
vSphere® AP I for Storage Awareness™, VMware vSphere® 2015, VMware vSphere®, VMware vSAN™
Enterprise Plus, VMware vSAN™, VMware vRealize® Operations™ Enterprise, VMware vRealize®
Operations™, VMware vRealize® Operations™ Standard, VMware vRealize® Operations™ Advanced,
VMware vC loud® Air™ Network, VMware vCenter® Server Appliance™, VMware vCenter Server®, VMware
Virtual SAN™, VMware View®, VMware Horizon® V iew™, VMware Verify™, VMware Skyline™ Health,
VMware Horizon® 7, VMware Horizon® 7, VMware Horizon® 7 on VMware Cloud™ on A WS, VMware HCI
Mesh™, VMware Customer Connect™, VMware vSphere® VMFS, Stretched Clusters for VMware Cloud™ on
A WS , VMware vSphere® Storage 1/0 Control, VMware Skyline Collector™, VMware Skyline Advisor™,
VMware Site Recovery Manager™, VMware PowerCLI™, VMware Platform Services Controller™, VMware
Photon™, VMware vSphere® Network 1/0 Control, VMware Lab Connect™ , VMware Pivotal Labs® Health
Check™, VMware Go™, VMware vSphere® Flash Read Cache™, Enhanced vMotion™ Compatibility, VMware
ESXi™, VMware ESX®, VMware vSphere® Distributed Resource Scheduler™, and VMware ACE™ are
registered trademarks or trademarks of VMware, Inc. in the United States and/or other jurisdictions.

The training material is provided "as is,'' and all express or implied conditions, representations, and warranties,
including any implied warranty of merchantability, fitness for a particular purpose or noninfringement, are
disclaimed, even if VMware, Inc., has been advised of the possibility of such claims. This material is designed
to be used for reference purposes in conjunction with a training course.

The training material is not a standalone training tool. Use of the training material for self-study without class
attendance is not recommended. These materials and the computer programs to which it relates are t he
property of, and embody trade secrets and confidential information proprietary to, VMware, Inc., and may
not be reproduced, copied, disclosed, transferred , adapted or modified w ithout t he express written approval
of VMware, Inc.

www.vmware.com/education

mcse2012.blogfa.com
~ontents

Module 1 Course lntroduction ..............................................................................................1


1-2 Course Introduction ................................................................................................................................. 1
1-3 Importance ................................................................................................................................................... 1
1-4 Learner Objectives (1) ............................................................................................................................. 1
1-5 Learner Objectives (2) ........................................................................................................................... 2
1-6 Course Outline ........................................................................................................................................... 3
1-7 Typographical Conventions ................................................................................................................. 4
1-8 References .................................................................................................................................................. 5
1-9 VMware Online Resources ................................................................................................................... 6
1-10 VMware Learning Overview ................................................................................................................ 7
1-11 VMware Certification Overview ........................................................................................................ 8
1-12 VMware Credentials Overview .......................................................................................................... 9

Module 2 Introduction to vSA N ....................................................................................... 11


2-2 lmportance ..................................................................................................................................................11
2-3 Module Lessons ........................................................................................................................................11
2-4 Lesson 1: Introduction to vSAN .........................................................................................................12
2-5 Learner Objectives .................................................................................................................................12
2-6 About vSAN ..............................................................................................................................................13
2-7 vSAN Node Minimum Requirements .............................................................................................. 14
2-8 About the vSAN Datastore ................................................................................................................15
2-9 vSAN Datastore Characteristics .......................................................................................................16
2-10 vSAN Disk Groups ..................................................................................................................................17
2-11 Hybrid Disk Groups ................................................................................................................................18
2-12 All-Flash Disk Groups ............................................................................................................................19
2-13 vSAN Storage Policies ........................................................................................................................ 20
2-14 vSAN RAID Types .................................................................................................................................21
•••
111
2-15 Multiple Storage Policies .....................................................................................................................22
2-16 vSAN Storage Policy Resilience ......................................................................................................23
2-17 Integrating vSAN with vSphere HA .............................................................................................. 24
2-18 Integrating vSAN with VMware Products (1) ............................................................................. 25
2-19 Integrating vSAN with VMware Products (2) ............................................................................26
2-20 vSAN Use Cases ....................................................................................................................................27
2-21 vSAN Licensing .......................................................................................................................................28
2-22 vSAN Licensing Differences (1) ........................................................................................................29
2-23 vSAN Licensing Differences (2) ...................................................................................................... 30
2-24 Review of Learner Object ives ........................................................................................................... 31
2-25 Lesson 2: vSAN Objects and Components ................................................................................ 32
2-26 Learner Objectives ................................................................................................................................32
2-27 vSAN and Object-Based Storage ................................................................................................... 33
2-28 About vSAN Storage Policies .......................................................................................................... 35
2-29 Default vSAN Storage Policy ...........................................................................................................36
2-30 About Objects and Components .................................................................................................... 37
2-31 Component Replicas and Copies ....................................................................................................38
2-32 Object Accessibility ...............................................................................................................................39
2-33 About Witnesses .................................................................................................................................. 40
2-34 Example: Witness .................................................................................................................................. 42
2-35 Large vSAN Objects ............................................................................................................................ 43
2-36 Review of Learner Objectives ......................................................................................................... 44
2-37 Lesson 3: vSAN Software Underlying Architecture ............................................................... 45
2-38 Learner Objectives ............................................................................................................................... 45
2-39 vSAN Architectural Components ................................................................................................... 46
2-40 vSAN Architecture Analogy: Building a House ..........................................................................4 7
2-41 CLOM and Its Role: Architect .......................................................................................................... 48
2-42 DOM and Its Role: Contractor (1) ................................................................................................... 49
2-43 DOM and Its Role: Contractor (2) .................................................................................................. 50
2-44 LSOM and Its Role: Worker ................................................................................................................51
2-45 CMMDS and Its Role: Project Manager .........................................................................................52
2-46 RDT and Its Role: Delivery Truck ....................................................................................................53
2-4 7 Activity: vSAN Component Layer ................................................................................................. 54
2-48 Activity: vSAN Component Layer Solution ................................................................................ 55
2-49 Component Interaction: Architect and Contractor ..................................................................56
2-50 Component Interaction: Contractor and Worker ..................................................................... 57

IV
2-51 Component Interaction: Architect, Contractor, and Project Manager ............................. 58
2-52 Activity: Drive Status Reporting ......................................................................................................59
2-53 Activity: Drive Status Reporting Solution ................................................................................... 60
2-54 Activity: Physical Space .......................................................................................................................61
2-55 Activity: Physical Space Solution ....................................................................................................62
2-56 Step-by-Step VM Creation (1) ..........................................................................................................63
2-57 Step-by-Step VM Creation (2) ........................................................................................................ 64
2-58 Step-by-Step VM Creation (3) .........................................................................................................65
2-59 Beginning-to-End VM Creation ........................................................................................................ 66
2-60 Activity: VM Creation ........................................................................................................................... 67
2-61 Activity: VM Creation Solution ........................................................................................................ 68
2-62 Lab 1: Reviewing the Lab Environment ........................................................................................ 69
2-63 Review o f Learner Objectives ......................................................................................................... 70
2-64 Key Points ................................................................................................................................................ 70

Module 3 Planning a vSAN Cluster ................................................................................ 71


3-2 Importance .................................................................................................................................................71
3-3 Module Lessons .......................................................................................................................................71
3-4 Lesson 1: vSAN Requirements ......................................................................................................... 72
3-5 Learner Objectives ................................................................................................................................72
3-6 vSAN Cluster Requirements .............................................................................................................. 73
3-7 vSAN Configuration Minimums and Maximums .........................................................................7 4
3-8 vSAN Host CPU Requirements ........................................................................................................ 7 5
3-9 vSAN Host Memory Requirements ................................................................................................7 6
3-10 vSAN Host Network Requirements ............................................................................................... 77
3-11 vSAN Host Storage Controllers ......................................................................................................78
3-12 vSAN Host Boot Device Requirements .......................................................................................79
3-13 About Hard Disk Drives ...................................................................................................................... 80
3-14 Solid-State Devices ................................................................................................................................81
3-15 vSAN Limitations ....................................................................................................................................82
3-16 Review o f Learner Objectives ..........................................................................................................83
3-17 Lesson 2: Planning Capacity for vSAN Clusters ...................................................................... 84
3-18 Learner Objectives ............................................................................................................................... 84
3-19 Capacity-Sizing Guidelines ..................................................................................................................85
3-20 vSAN Reserved Capacity ................................................................................................................. 86
3-21 Planning for Failures to Tolerate ......................................................................................................88
3-22 Planning Capacity for VMs ................................................................................................................ 89
v
3-23 Plan and Design Consideration: VM Home Namespace Object s ...................................... 90
3-24 Plan and Design Consideration: VM DK and Snapshot Objects ............................................ 91
3-25 Plan and Design Consideration: VM Swap Object ....................................................................92
3-26 vSAN Cache Tiers .................................................................................................................................93
3-27 vSAN Capacit y Tiers ........................................................................................................................... 94
3-28 Magnetic Devices for Capacit y Tiers .............................................................................................95
3-29 Flash Devices for Capacit y Tiers .................................................................................................... 96
3-30 Mult iple vSAN Disk Groups ................................................................................................................ 97
3-31 A bout vSAN Cluster Scaling ............................................................................................................ 98
3-32 Planning for Scaling Up ....................................................................................................................... 99
3-33 Planning for Scaling Out ................................................................................................................... 100
3-34 Using t he V Mware Compat ibility Guide ....................................................................................... 101
3-35 Review o f Learner Objectives ........................................................................................................ 102
3-36 Lesson 3: Designing a vSAN Net work ........................................................................................ 103
3-37 Learner Objectives ..............................................................................................................................103
3-38 vSAN Networking Overview ......................................................................................................... 104
3-39 Designing a vSAN Network .............................................................................................................105
3-40 N IC Teaming and Failover ................................................................................................................106
3-41 Unicast Support ....................................................................................................................................107
3-42 Net work I/ 0 Control .......................................................................................................................... 108
3-43 Priority Tagging and Isolating vSAN Traffic ..............................................................................109
3-44 Jumbo Frames .......................................................................................................................................110
3-45 vSAN Net work Requirement s .......................................................................................................... 111
3-46 vSAN Communicat ion Port s ............................................................................................................ 112
3-4 7 vSAN Network Best Practices ........................................................................................................ 113
3-48 Review o f Learner Object ives ......................................................................................................... 114
3-49 Key Points ................................................................................................................................................114

Module 4 Deploy ing a vSAN Cluster ...........................................................................115


4-2 Importance ............................................................................................................................................... 115
4-3 Module Lessons ..................................................................................................................................... 115
4-4 Lesson 1: Preparing ESXi Hosts for a vSAN Cluster .............................................................. 116
4-5 Learner Objectives ............................................................................................................................... 116
4-6 Verifying Hardware Compat ibilit y .................................................................................................. 117
4-7 Configuring Storage Controllers ..................................................................................................... 118
4-8 Considering Multiple Storage Cont rollers .................................................................................... 119
4-9 Configuring BIOS for High Performance ....................................................................................120

VI
4-10 CPU Power Management .................................................................................................................. 121
4-11 Verifying OS Controlled Mode ....................................................................................................... 122
4-12 Using VMware Skyline Health to Verify Hardware Compatibility .................................... 123
4-13 vSAN Hardware Compatibility List Database .......................................................................... 124
4-14 Manually Updating Drivers and Firmware .................................................................................. 125
4-15 Automating Drivers and Firmware Installation ......................................................................... 126
4-16 About vSphere Lif ecycle Manager ............................................................................................... 127
4-17 vSphere Lifecycle Manager Desired Image Feature ............................................................. 128
4-18 Elements of vSphere Lif ecycle Manager Desired Image ..................................................... 129
4-19 Configuring vSphere Lifecycle Manager Desired lmage ...................................................... 130
4-20 Setting Up vSphere Lifecycle Manager Desired Image for New Clusters .................... 131
4-21 Remediating Clusters .......................................................................................................................... 132
4-22 Review of Learner Objectives ........................................................................................................ 133
4-23 Lesson 2: Deploying a vSAN Cluster ........................................................................................... 134
4-24 Learner Objectives ..............................................................................................................................134
4-25 vSAN Cluster Configuration Types .............................................................................................. 135
4-26 Configuring a vSAN Cluster ............................................................................................................. 136
4-27 About Cluster Quickstart .................................................................................................................. 137
4-28 Comparing Cluster Quickstart and Manual Configuration.................................................... 138
4-29 Creating vSAN Clusters .................................................................................................................... 139
4-30 Adding Hosts Using Cluster Quickstart (1) ............................................................................... 140
4-31 Adding Hosts Using Cluster Quickstart (2) ................................................................................ 141
4-32 Verifying vSAN Health Checks ...................................................................................................... 142
4-33 vSAN Cluster Configuration (1) ...................................................................................................... 143
4-34 vSAN Cluster Configuration (2) .....................................................................................................144
4-35 Scaling vSAN Clusters Using Cluster Quickstart ..................................................................... 145
4-36 Skipping the Cluster Quickstart Workflow ................................................................................ 146
4-37 Manually Configuring a vSAN Cluster .......................................................................................... 147
4-38 Manually Creating a vSAN Disk Group ........................................................................................ 148
4-39 vSAN Fault Domains ........................................................................................................................... 149
4-40 Implicit Fault Domains .........................................................................................................................150
4-41 Explicit Fault Domains ......................................................................................................................... 151
4-42 vSAN Fault Domains: Best Practices ........................................................................................... 152
4-43 vSphere HA on vSAN Clusters ...................................................................................................... 153
4-44 Enabling vSphere HA on a vSAN Cluster .................................................................................. 154
4-45 vSphere HA Networking Differences with vSAN .................................................................. 155
••
VII
4-46 Recommended vSphere HA Settings for vSA N Clusters ................................................... 156
4-47 Enabling vSAN Reserved Capacity .............................................................................................. 157
4-48 Reserving vSAN Storage Capacity for Maintenance Activities ........................................ 158
4-49 Planning for Capacity Reserve ....................................................................................................... 159
4-50 VMware Skyline Health ......................................................................................................................160
4-51 vSAN Logs and Traces ...................................................................................................................... 161
4-52 Backup Methodology ......................................................................................................................... 162
4-53 Lab 2: Configuring a Second vSAN Cluster .............................................................................. 163
4-54 Lab 3: Working w ith vSAN Fault Domains ................................................................................ 163
4-55 Review o f Learner Objectives ........................................................................................................ 164
4-56 Key Points ...............................................................................................................................................164

Module 5 vSAN Storage Policies ................................................................................. 165


5-2 lmportance .............................................................................................................................................. 165
5-3 Module Lessons .................................................................................................................................... 165
5-4 Lesson 1: vSAN Storage Policies ................................................................................................... 166
5-5 Learner Objectives ..............................................................................................................................166
5-6 Storage Policy-Based Management ............................................................................................. 167
5-7 Defining Storage Policies: vSAN Rule Sets ............................................................................... 168
5-8 Storage Policy Naming Considerations ....................................................................................... 169
5-9 Monitoring Storage Policy-Based Management ......................................................................170
5-10 VM Storage Policy Capabilities for vSAN ................................................................................... 171
5-11 About Failures t o Tolerate ............................................................................................................... 172
5-12 Level of Failures to Tolerate ........................................................................................................... 173
5-13 vSAN Data Protection Space Consumption ............................................................................ 17 4
5-14 Comparing RAID 1 Mirroring and RAID 5/6 Erasure Coding .............................................. 175
5-15 Number of Disk Stripes per Object .............................................................................................. 177
5-16 Planning Considerations: Stripe Width ........................................................................................ 179
5-17 Flash Read Cache Reservation ......................................................................................................180
5-18 Force Provisioning ............................................................................................................................... 182
5-19 Object Space Reservation ............................................................................................................... 184
5-20 IOPS Limits for Objects ..................................................................................................................... 186
5-21 Disabling Object Checksums ........................................................................................................... 187
5-22 Assigning vSA N Storage Policies (1) ............................................................................................ 188
5-23 Assigning vSA N Storage Policies (2) ........................................................................................... 189
5-24 Storage Policies and the VM Home Object ..............................................................................190
5-25 Viewing Object and Component Placement (1) ...................................................................... 192
•••
V III
5-26 Viewing Object and Component Placement (2) ..................................................................... 193
5-27 Viewing Object and Component Placement (3) ..................................................................... 194
5-28 Verifying Individual vSA N Object Compliance Status ........................................................... 195
5-29 Verifying Individual vSAN Component States ......................................................................... 196
5-30 Minimizing the Impact of Policy Changes on Clusters ........................................................... 197
5-31 Review of Learner Objectives ........................................................................................................ 198
5-32 Lesson 2: Analyzing vSAN Objects and Components Placement .................................. 199
5-33 Learner Objectives ..............................................................................................................................199
5-34 About Storage Policy Changes .................................................................................................... 200
5-35 Activity: Object Count (1) ................................................................................................................. 201
5-36 Activity: Object Count (1) Solution .............................................................................................. 202
5-37 Activity: Object Count (2) ...............................................................................................................203
5-38 Activity: Object Count (2) Solution ............................................................................................ 204
5-39 Activity: Object Count (3) ...............................................................................................................205
5-40 Activity: Object Count (3) Solution .............................................................................................206
5-41 Activity: Object Count ( 4) ............................................................................................................... 20 7
5-42 Activity: Object Count ( 4) Solution .............................................................................................208
5-43 Activity: Object Count (5) ................................................................................................................ 210
5-44 Activity: Object Count (5) Solution ............................................................................................... 211
5-45 Activity: Object Count (6) ................................................................................................................ 212
5-46 Activity: Object Count (6) Solution .............................................................................................. 213
5-4 7 Activity: Object Count (7) ................................................................................................................ 215
5-48 Activity: Object Count (7) Solution ............................................................................................... 216
5-49 Activity: Object Count (8) ................................................................................................................ 217
5-50 Activity: Object Count (8) Solution .............................................................................................. 218
5-51 Activity: Object Count (9) ................................................................................................................ 219
5-52 Activity: Object Count (9) Solution ............................................................................................. 220
5-53 Activity: Objects and Witnesses ................................................................................................... 221
5-54 Activity: Objects and Witnesses Solution ................................................................................. 222
5-55 Activity: VMs and Failures ............................................................................................................... 223
5-56 Activity: VMs and Failures Solution ............................................................................................. 224
5-57 Activity: Failures and Witnesses ................................................................................................... 225
5-58 Activity: Failures and Witnesses Solution ................................................................................. 226
5-59 Activity: RAID Levels and Stripes ................................................................................................ 227
5-60 Activity: RAl D Levels and Stripes Solution .............................................................................. 228
5-61 Activity: Failures and Snapshots ................................................................................................... 229

IX
5-62 Activity: Failures and Snapshots Solution .................................................................................230
5-63 Lab 4: Analyzing the Impact of Storage Policy Changes .................................................... 231
5-64 Lab 5: Identifying Objects with Reduced Availability ............................................................ 231
5-65 Review of Learner Objectives ....................................................................................................... 232
5-66 Key Points .............................................................................................................................................. 232

Module 6 vSAN Resilience and Data Availability ................................................. 233


6-2 Importance ............................................................................................................................................. 233
6-3 Lesson 1: vSAN Resilience and Data Availability .................................................................... 234
6-4 Learner Objectives ............................................................................................................................. 234
6-5 About Failure Handling ...................................................................................................................... 235
6-6 About vSA N Component States ................................................................................................. 236
6-7 About the vSAN Object Repair Timer ....................................................................................... 237
6-8 Overriding the Object Repair Timer ............................................................................................ 238
6-9 Resynchronizing Components ....................................................................................................... 239
6-10 Failure Handling Scenario (1) .......................................................................................................... 240
6-11 Failure Handling Scenario (2) ........................................................................................................... 241
6-12 Failure Handling Scenario (3) .......................................................................................................... 242
6-13 Failure Handling Scenario ( 4 ) .......................................................................................................... 243
6-14 Failure Handling Scenario (5) .......................................................................................................... 244
6-15 Failure Handling Scenario ( 6) .......................................................................................................... 245
6-16 Failure Handling Scenario (7) .......................................................................................................... 246
6-17 Failure Handling Scenario (8) .......................................................................................................... 24 7
6-18 Failure Handling Scenario (9) .......................................................................................................... 248
6-19 Review of Learner Objectives ....................................................................................................... 249
6-20 Key Points ..............................................................................................................................................249

Module 7 Configuring vSAN Storage Space Efficiency .....................................251


7-2 lmportance .............................................................................................................................................. 251
7-3 Configuring vSAN Storage Space Efficiency .......................................................................... 252
7-4 Learner Objectives ............................................................................................................................. 252
7-5 About vSAN Storage Space Efficiency ..................................................................................... 253
7-6 Using Deduplication and Compression (1) ................................................................................. 254
7-7 Using Deduplication and Compression (2) ................................................................................ 255
7-8 Using Deduplication and Compression (3) ................................................................................ 256
7 -9 Disk Management ................................................................................................................................ 25 7
7-10 Design Considerations ...................................................................................................................... 258

x
7-11 Compression-Only Mode (1) ........................................................................................................... 259
7-12 Compression-Only Mode (2) ..........................................................................................................260
7-13 Configuring Space Efficiency .......................................................................................................... 261
7-14 Verifying Space Efficiency Savings ............................................................................................. 262
7-15 Using RAID 5 or RAID 6 Erasure Coding (1) ............................................................................ 263
7-16 Using RAID 5 or RAID 6 Erasure Coding (2) ........................................................................... 263
7-17 Using RAID 5 or RAID 6 Erasure Coding (3) ........................................................................... 264
7-18 Reclaiming Space Using TRIM/UNMAP (1) ............................................................................... 265
7-19 Reclaiming Space Using TRIM/UN MAP (2) .............................................................................. 265
7-20 Reclaiming Space Using TRIM/UNMAP (3) .............................................................................. 265
7-21 Enabling TRIM/UN MAP Support .................................................................................................. 266
7-22 Monitoring TRIM/UNMAP ................................................................................................................ 267
7-23 Lab 6: Configuring vSAN Space Efficiency .............................................................................. 268
7-24 Review o f Learner Objectives ....................................................................................................... 269
7-25 Key Points .............................................................................................................................................. 269

Module 8 vSAN Security Operations .......................................................................... 271


8-2 Importance .............................................................................................................................................. 271
8-3 Lesson 1: vSAN Security Operations .......................................................................................... 272
8-4 Learner Objectives ............................................................................................................................. 272
8-5 vSAN Encryption ................................................................................................................................. 273
8-6 Design Considerations for vSA N Encryption .......................................................................... 27 4
8-7 About Permissions .............................................................................................................................. 27 5
8-8 Setting Up Key Providers ................................................................................................................ 276
8-9 KMS Server Cluster ............................................................................................................................ 277
8-10 Adding a KM S to vCenter Server (1) .......................................................................................... 278
8-11 Adding a KM S to vCenter Server (2) ......................................................................................... 279
8-12 KM IP Client Certificates ....................................................................................................................280
8-13 vSAN Data-at-Rest Encryption (1) ................................................................................................ 281
8-14 vSAN Data-at-Rest Encryption (2) .............................................................................................. 282
8-15 Operational Impact When Enabling Encryption ...................................................................... 283
8-16 Enabling vSAN Data-at-Rest Encryption .................................................................................. 284
8-17 Wiping Residual Data ......................................................................................................................... 285
8-18 Allowing Reduced Redundancy .................................................................................................... 286
8-19 Writing Data t o an Encrypted vSAN Datastore .................................................................... 287
8-20 Scaling Out a Data-at-Rest Encrypted vSA N Cluster ......................................................... 288
8-21 Perf arming Rekey Operations (1) ................................................................................................. 289

XI
8-22 Pert arming Rekey Operations (2) ................................................................................................290
8-23 Rotating KMI P Client Certificates .................................................................................................. 291
8-24 Changing t he Key Provider ............................................................................................................. 292
8-25 Verifying Bidirectional Trust ............................................................................................................ 293
8-26 About Encrypted vSAN Node Core Dumps ........................................................................... 294
8-27 vSAN Data-in-Transit Encryption (1) ........................................................................................... 295
8-28 vSAN Data-in-Transit Encryption (2) .......................................................................................... 295
8-29 vSAN Data-in-Transit Encryption Workflow ........................................................................... 296
8-30 vSAN Data-in-Transit Encryption Rekey .................................................................................. 297
8-31 vSAN Data-in-Transit Encryption Healt h Check .................................................................... 298
8-32 Scaling Out Data-in-Transit Encrypted vSAN Clusters ....................................................... 299
8-33 Lab 7: Managing vSAN Securit y Operat ions .......................................................................... 300
8-34 Lab 8: Encryption Rekey Operations ......................................................................................... 300
8-35 Review of Learner Object ives ........................................................................................................ 301
8-36 Key Points ...............................................................................................................................................301

Module 9 Introductio n to Advanced vSAN Configurations ............................ 303


9-2 lmportance .............................................................................................................................................303
9-3 Module Lessons ...................................................................................................................................303
9-4 Lesson 1: vSAN File Service .......................................................................................................... 304
9-5 Learner Objectives ............................................................................................................................ 304
9-6 vSAN File Service ...............................................................................................................................305
9-7 vSAN File Shares ............................................................................................................................... 306
9-8 vSAN Distributed File Syst em ....................................................................................................... 307
9-9 File Service VMs .................................................................................................................................. 308
9-10 Provisioning File Service Agent Machines ............................................................................... 309
9-11 File Service Agent Machines Storage Policy ............................................................................ 310
9-12 Enabling vSAN File Service ............................................................................................................... 311
9-13 vSAN File Service Configurat ion ................................................................................................... 312
9-1 4 vSAN File Service Domain Configurat ion .................................................................................. 313
9-15 vSAN File Service Net work Configurat ion ................................................................................ 314
9-16 FSVM IP A ddress Configuration .................................................................................................... 315
9-17 Viewing ESX Agent Deployment .................................................................................................. 316
9-18 Creating a vSAN File Share ............................................................................................................. 317
9-19 Configuring a vSAN File Share ....................................................................................................... 318
9-20 Configuring Network Access Contro l ......................................................................................... 319
9-21 Viewing vSAN File Share Propert ies ..........................................................................................320
••
X II
9-22 Monitoring vSAN File Share Performance Metrics ................................................................ 321
9-23 Viewing VMware Skyline Health Details for vSAN File Service ....................................... 322
9-24 vSAN File Service Considerations ............................................................................................... 323
9-25 Lab 9: Configuring vSAN File Service ........................................................................................ 324
9-26 Review of Learner Objectives ....................................................................................................... 325
9-27 Lesson 2: VMware HC I Mesh Using Remote vSA N Datastores ...................................... 326
9-28 Learner Objectives ............................................................................................................................. 326
9-29 About VMware HC I Mesh ................................................................................................................ 327
9-30 Previous vSAN Challenges ............................................................................................................. 328
9-31 VMware HCI Mesh Advantages .................................................................................................... 329
9-32 VMware HCI Mesh: Use Cases (1) ................................................................................................330
9-33 VMware HCI Mesh: Use Cases (2) ................................................................................................ 331
9-34 Stranded Capacity Issues ................................................................................................................ 332
9-35 Comparing Homogeneous and Heterogeneous Storage................................................... 333
9-36 VMware HCI Mesh Architecture: Example Setup .................................................................. 334
9-37 VMware HCI Mesh: Terminology (1) ............................................................................................ 335
9-38 VMware HCI Mesh: Terminology (2) ........................................................................................... 335
9-39 VMware HCI Mesh: Common Topologies ................................................................................. 336
9-40 Storage-Only Cluster Topology ................................................................................................... 337
9-41 Cross-Cluster Topology ................................................................................................................... 338
9-42 VMware HCI Mesh: Network Requirements and Recommendations ............................ 339
9-43 Example Network Architecture ................................................................................................... 340
9-44 VMware HCI Mesh: Scalability limits ............................................................................................. 341
9-45 VMware HCI Mesh: Mounting the Remote vSAN Datastore ............................................ 342
9-46 Mounting Remote Datastores (1) .................................................................................................. 343
9-4 7 Mounting Remote Datastores (2) ................................................................................................. 343
9-48 Mounting Remote Datastores (3) .................................................................................................344
9-49 Client Datastore View ....................................................................................................................... 345
9-50 Server Datastore View ..................................................................................................................... 346
9-51 Hosts: Access Status ......................................................................................................................... 34 7
9-52 VM Creation Test ................................................................................................................................ 348
9-53 Remote Accessible Objects ........................................................................................................... 349
9-54 Server Cluster Partition Health Check .......................................................................................350
9-55 Remote VM Performance ................................................................................................................ 351
9-56 Physical Disk Placement ................................................................................................................... 352
9-57 VMware HCI Mesh Interoperability: VM Component Protection .................................... 353
•••
XIII
9-58 VMware HCI Mesh Int eroperability: SPBM lntegration ........................................................ 354
9-59 VMware HCI Mesh Int eroperability: vSphere vMot ion and vSphere Storage vMot ion355
9-60 VMware HCI Mesh Int eroperability: vSphere DRS ................................................................ 356
9-61 Lab 10: Managing Remote vSAN Datastore Operations .................................................... 357
9-62 Review of Learner Object ives ....................................................................................................... 358
9-63 Lesson 3: vSAN Direct ..................................................................................................................... 359
9-64 Learner Objectives ............................................................................................................................. 359
9-65 About t he vSAN Direct Datastore ............................................................................................. 360
9-66 vSAN Direct Use Cases .................................................................................................................... 361
9-67 vSAN Direct A rchitecture ............................................................................................................... 362
9-68 vSAN Direct with Kubernet es ....................................................................................................... 363
9-69 Cloud-Native Operat ions Workflow ........................................................................................... 364
9-70 Claiming Disks for vSAN Direct ..................................................................................................... 365
9-71 A fter Claiming Disks for vSAN Direct ........................................................................................ 366
9-72 Def ault Tags .......................................................................................................................................... 367
9-73 Tag-Based Policies ............................................................................................................................. 368
9-7 4 Storage Compatibilit y ........................................................................................................................ 369
9-7 5 Capacity Report ing .............................................................................................................................3 7 0
9-76 Review of Learner Object ives ........................................................................................................ 371
9-77 Lesson 4: vSA N iSCS I Target Service ....................................................................................... 372
9-78 Learner Objectives ............................................................................................................................. 372
9-79 About the vSAN iSCSI Target Service ...................................................................................... 373
9-80 vSAN iSCSI Target Service Net working ................................................................................... 37 4
9-81 Enabling and Using t he vSAN iSCS I Target Service ............................................................. 375
9-82 vSAN iSCSI LUN Object s ................................................................................................................ 376
9-83 Lab 11: Configuring a vSAN iSCSI Target .................................................................................. 377
9-84 Review of Learner Object ives ....................................................................................................... 378
9-85 Key Points .............................................................................................................................................. 378

Module 10 v SAN Cluster Maintenance ..................................................................... 379


10-2 Importance ............................................................................................................................................. 379
10-3 Module Lessons ................................................................................................................................... 379
10-4 Lesson 1: vSAN Clust er Maint enance Operations .................................................................380
10-5 Learner Objectives ............................................................................................................................. 380
10-6 Maint enance Mode Options ............................................................................................................. 381
10-7 A bout t he Data Migration Precheck ........................................................................................... 382
10-8 A bout t he Ensure Accessibility Opt ion ...................................................................................... 383

XIV
10-9 Ensure Accessibility: Assessing Impact ...................................................................................... 384
10-10 Ensure Accessibility: Delta Component ..................................................................................... 385
10-11 Delta Components in the vSphere Client .................................................................................. 386
10-12 Ensure Accessibility: Time Considerations ............................................................................... 387
10-13 Object Repair Timer Considerations ........................................................................................... 388
10-14 Object Inaccessibility: Example (1) ................................................................................................ 389
10-15 Object Inaccessibility: Example (2) .............................................................................................. 390
10-16 Object Inaccessibility: Example (3) ................................................................................................ 391
10-17 About the Full Data Migration Option ........................................................................................ 392
10-18 Full Data Migration: Component Placement ............................................................................. 393
10-19 Full Data Migration: Cluster Size Considerations .................................................................... 394
10-20 Full Data Migration: Assessing Impact ........................................................................................ 395
10-21 About the No Data Migration Option ......................................................................................... 396
10-22 No Data Migration: Assessing Impact ......................................................................................... 397
10-23 Changing the Def ault Maintenance Mode ................................................................................. 398
10-24 Planned Maintenance ......................................................................................................................... 399
10-25 About vSAN Disk Balance ............................................................................................................. 400
10-26 About Automatic Rebalance .......................................................................................................... 401
10-27 Enabling Automatic Rebalance ..................................................................................................... 40 2
10-28 Reserving vSAN St orage Capacity (1) ...................................................................................... 403
10-29 Reserving vSAN Storage Capacity (2) ..................................................................................... 404
10-30 Shutting Down and Restarting vSAN Clusters ...................................................................... 405
10-31 Rebooting vSAN Clusters Without Downtime ...................................................................... 406
10-32 Moving vSAN Clusters to Other vCenter Server Instances ............................................ 407
10-33 vSAN Logs and Traces ................................................................................................................... 408
10-34 Redirecting vSAN Logs and Traces ........................................................................................... 409
10-35 Configuring Syslog Servers ............................................................................................................ 410
10-36 Lab 12: Verifying the vSAN Cluster Data Migration Precheck ........................................... 411
10-37 Review of Learner Objectives ........................................................................................................ 412
10-38 Lesson 2: vSAN Cluster Scaling and Hardware Replacement .......................................... 413
10-39 Learner Objectives .............................................................................................................................. 413
10-40 About vSA N Cluster Scaling ........................................................................................................... 414
10-41 Increasing Capacity by Scaling Up ................................................................................................ 415
10-42 Adding New Hosts to vSAN Clusters ......................................................................................... 416
10-43 Adding New Capacity Devices to Disk Groups ....................................................................... 417
10-44 About Disk Claim Management ...................................................................................................... 418

xv
10-45 Replacing Capacity Tier Disks ......................................................................................................... 419
10-46 Replacing Cache Tier Disks ............................................................................................................ 420
10-47 Removing Disk Groups ...................................................................................................................... 421
10-48 Replacing vSAN Nodes .................................................................................................................... 422
10-49 Decommissioning vSAN Nodes .................................................................................................... 423
10-50 Lab 13: Decommissioning the vSAN Cluster ............................................................................ 424
10-51 Lab 14: Scaling Out the vSAN Cluster ........................................................................................ 424
10-52 Review of Learner Objectives ....................................................................................................... 425
10-53 Lesson 3: Upgrading and Updating vSAN ................................................................................ 426
10-54 Learner Objectives .............................................................................................................................426
10-55 vSAN Upgrades ................................................................................................................................... 427
10-56 vSAN Upgrade Process ................................................................................................................... 428
10-57 Preparing to Upgrade vSAN .......................................................................................................... 429
10-58 vSAN Upgrade Phases .................................................................................................................... 430
10-59 Supported Upgrade Paths ............................................................................................................... 431
10-60 About the vSAN Disk Format ....................................................................................................... 432
10-61 vSAN Disk Format Upgrade Prechecks .................................................................................... 433
10-62 Verifying vSAN Disk Format Upgrades .....................................................................................434
10-63 vSAN Build Recommendations ..................................................................................................... 435
10-64 vSAN System Baselines ................................................................................................................... 436
10-65 Review of Learner Objectives ....................................................................................................... 43 7
10-66 Key Points .............................................................................................................................................. 437

Module 11 vSAN Stretched and Two-Node Clusters ......................................... 439


11-2 lmportance .............................................................................................................................................439
11-3 Module Lessons ...................................................................................................................................439
11-4 Lesson 1: vSAN Stretched Clusters ............................................................................................ 440
11-5 Learner Objectives ............................................................................................................................ 440
11-6 About vSAN Stretched Clusters ................................................................................................... 441
11-7 vSAN Stretched Cluster Use Cases (1) ...................................................................................... 442
11-8 vSAN Stretched Cluster Use Cases (2) .....................................................................................443
11-9 Design of vSAN Stretched Clusters ........................................................................................... 444
11-10 About Preferred Sites .......................................................................................................................446
11-11 About Witness Hosts ........................................................................................................................ 44 7
11-12 Sizing Witness Hosts .........................................................................................................................448
11-13 vSAN Stretched Cluster Heartbeats ..........................................................................................449
11-14 Managing Read and Write Operations ...................................................................................... 450

XVI
11-15 Stretched Cluster Networking ........................................................................................................ 451
11-16 Network Requirements: Between Data Sites ......................................................................... 452
11-17 Network Requirements: Between the Data Sites and the W itness Site ...................... 453
11-18 Static Routes for vSAN Traffic .....................................................................................................454
11-19 Planning for High Availability .......................................................................................................... 455
11-20 Configuring Stretched Clusters ..................................................................................................... 456
11-21 Replacing a W itness Host ................................................................................................................ 45 7
11-22 Stretched Clusters and Maintenance Mode ............................................................................. 458
11-23 Monitoring Stretched Clusters ....................................................................................................... 459
11-24 Review of Learner Objectives ...................................................................................................... 460
11-25 Lesson 2: vSAN Stretched Cluster Failure Handling ............................................................. 461
11-26 Learner Objectives ..............................................................................................................................461
11-27 vSAN Stretched Cluster Failure Handling (1) ........................................................................... 462
11-28 vSAN Stretched Cluster Failure Handling (2) .......................................................................... 463
11-29 vSAN Stretched Cluster Site Disaster Tolerance ..................................................................464
11-30 Site Disaster Tolerance: Dual Site Mirroring ............................................................................. 465
11-31 Dual Site Mirroring with RAID 1...................................................................................................... 466
11-32 Dual Site Mirroring with RAID 5/6 ................................................................................................ 467
11-33 Keeping Data on a Single Site ........................................................................................................ 468
11-34 Symmetrical and Asymmetrical Configuration ........................................................................ 469
11-35 Activity 1................................................................................................................................................ 4 70
11-36 Activity 1 Solution ................................................................................................................................ 4 71
11-37 Activity 2 ................................................................................................................................................ 4 72
11-38 Activity 2 Solution .............................................................................................................................. 4 73
11-39 Activity 3 ................................................................................................................................................474
11-40 Activity 3 Solution .............................................................................................................................. 4 75
11-41 Activity 4 ................................................................................................................................................476
11-42 Activity 4 Solution .............................................................................................................................. 4 77
11-43 Activity 5 ................................................................................................................................................ 4 78
11-44 Activity 5 Solution ..............................................................................................................................4 79
11-45 Activity 6 ............................................................................................................................................... 480
11-46 Activity 6 Solution ...............................................................................................................................481
11-47 Lab 15: Configuring the vSAN Stretched Cluster .................................................................. 482
11-48 Review o f Learner Objectives ....................................................................................................... 483
11-49 Lesson 3: Two-Node vSAN Clusters ..........................................................................................484
11-50 Learner Objectives .............................................................................................................................484
••
XVII
11-51 About Two-Node vSAN Clusters ................................................................................................ 485
11-52 Two-Node vSAN Cluster Use Cases .......................................................................................... 486
11-53 Two-Node Direct Connect vSAN Clusters .............................................................................. 487
11-54 Shared vSAN Witness Nodes ....................................................................................................... 488
11-55 Witness Node Locations .................................................................................................................. 489
11-56 Shared vSAN Witness Node Memory Requirements ......................................................... 490
11-57 Shared vSAN Witness Node for a Mixed Environment ....................................................... 491
11-58 Configuring a Two-Node vSAN Cluster .................................................................................... 492
11-59 Review of Learner Objectives ....................................................................................................... 493
11-60 Key Points ..............................................................................................................................................493

Module 12 vSAN Cluster Monitoring .......................................................................... 495


12-2 Importance .............................................................................................................................................495
12-3 Module Lessons ...................................................................................................................................495
12-4 Lesson 1: vSAN Health Monitoring ............................................................................................... 496
12-5 Learner Objectives .............................................................................................................................496
12-6 About CEIP ............................................................................................................................................497
12-7 Joining CEIP ..........................................................................................................................................498
12-8 Running Proactive Tests .................................................................................................................. 499
12-9 VMware Skyline Health .................................................................................................................... 500
12-10 Online Health ..........................................................................................................................................501
12-11 VMware Skyline Health: vSAN Cluster Partition ....................................................................502
12-12 VMware Skyline Health: Network Latency Check ................................................................503
12-13 VMware Skyline Health: vSAN Object Health ........................................................................ 504
12-14 VMware Skyline Health: Time Synchronization .......................................................................505
12-15 VMware Skyline Health: vSAN Disk Balance .......................................................................... 506
12-16 VMware Skyline Health: Disk Format Version ......................................................................... 507
12-17 VMware Skyline Health: vSAN Extended Configuration ....................................................508
12-18 VMware Skyline Health: vSAN Component Utilization ....................................................... 509
12-19 VMware Skyline Health: What if the Most Consumed Host Fails .................................... 510
12-20 Review of Learner Objectives ......................................................................................................... 511
12-21 Lesson 2: vSAN Performance Monitoring ................................................................................. 512
12-22 Learner Objectives .............................................................................................................................. 512
12-23 vSAN Online Performance Diagnostics ...................................................................................... 513
12-24 vSAN Performance Service ............................................................................................................ 514
12-25 About I/ 0 Impact on Performance .............................................................................................. 515
12-26 About vSAN Cluster Metrics ........................................................................................................... 516
•••
XVIII
12-27 Cluster-Level Metrics for VMs ........................................................................................................ 517
12-28 Back-End Cluster- Level Metrics ..................................................................................................... 518
12-29 Throughput Comparison ................................................................................................................... 519
12-30 IOlnsight ..................................................................................................................................................520
12-31 Preparing an IO lnsight Instance ...................................................................................................... 521
12-32 Viewing IOlnsight Instance Metrics .............................................................................................. 522
12-33 Host-Level Metrics for Disks .......................................................................................................... 523
12-34 Host-Level Metrics for the Cache Tier ....................................................................................... 524
12-35 Host-Level Metrics for Resync Operations .............................................................................. 525
12-36 Host-Level Metrics for Network Performance ....................................................................... 526
12-37 VM Metrics ............................................................................................................................................. 527
12-38 Review of Learner Objectives ....................................................................................................... 528
12-39 Lesson: vSA N Capacity Monitoring ............................................................................................. 529
12-40 Learner Objectives ............................................................................................................................. 529
12-41 Capacity Usage Overview ..............................................................................................................530
12-42 Capacity Usage with Space Efficiency ........................................................................................ 531
12-43 Usable Capacity Analysis ................................................................................................................. 532
12-44 Capacity Usage Breakdown ........................................................................................................... 533
12-45 Capacity History ..................................................................................................................................534
12-46 vSAN Capacity Reserve .................................................................................................................. 535
12-47 Lab 16: Monitoring vSAN Performance and Capacity ......................................................... 536
12-48 Review of Learner Objectives ....................................................................................................... 537
12-49 Key Points .............................................................................................................................................. 538

Module 13 Troubleshooting Methodology .............................................................. 539


13-2 Lesson 1: Troubleshooting Methodology .................................................................................. 539
13-3 Importance ............................................................................................................................................. 539
13-4 Learner Objectives ............................................................................................................................. 539
13-5 PNOMA Troubleshooting Framework ...................................................................................... 540
13-6 PNOMA vSAN Physical Layer ...................................................................................................... 542
13-7 Activity: vSAN Physical Layer ....................................................................................................... 543
13-8 Activity: vSAN Physical Layer Solution .....................................................................................544
13-9 PNOMA: vSAN Network Layer .................................................................................................... 545
13-10 Activity: vSAN Network Layer ..................................................................................................... 546
13-11 Activity: vSAN Network Layer Solution .................................................................................... 54 7
13-12 PNOMA: vSAN Object Layer ........................................................................................................ 548
13-13 Activity: vSAN Object Layer ......................................................................................................... 549

X IX
13-14 Activity: vSAN Object Layer Solution ........................................................................................550
13-15 PNOMA: vSAN Management Layer ............................................................................................ 551
13-16 Activity: vSAN Management Layer ............................................................................................. 552
13-17 Activity: vSAN Management Layer Solution ........................................................................... 553
13-18 PNOMA: vSAN Application Layer ............................................................................................... 554
13-19 Activity: vSAN Application Layer ................................................................................................ 555
13-20 Activity: vSAN Application Layer Solution ............................................................................... 556
13-21 vSAN Layers: Creating the vSAN Cluster ................................................................................ 557
13-22 Troubleshooting by Layer and Importance .............................................................................. 558
13-23 Troubleshooting Process: Defining the Problem ................................................................... 559
13-24 Defining the Problem (1) .................................................................................................................. 560
13-25 Defining the Problem (2) ................................................................................................................. 560
13-26 Defining the Problem (3) ................................................................................................................... 561
13-27 Defining the Problem ( 4) ................................................................................................................... 561
13-28 Defining the Problem ( 5) .................................................................................................................. 562
13-29 Activity: Defining the Problem ....................................................................................................... 563
13-30 Activity: Defining the Problem Solution .....................................................................................564
13-31 Troubleshooting Process: Identifying the Root Cause of the Problem ........................ 565
13-32 Identifying the Root Cause ............................................................................................................. 566
13-33 Identifying the Root Cause: Health Checks .............................................................................. 567
13-34 Identifying the Root Cause: Questions to Consider (1) ....................................................... 568
13-35 Identifying the Root Cause: Questions to Consider (2) ...................................................... 569
13-36 Identifying the Root Cause: Questions to Consider (3) ...................................................... 570
13-37 Identifying the Root Cause: Questions to Consider ( 4) ....................................................... 571
13-38 Identifying the Root Cause: Questions to Consider (5) ...................................................... 572
13-39 Troubleshooting Process: Resolving the Problem ................................................................ 5 73
13-40 A voiding and Resolv ing Common Problems (1) ..................................................................... 57 4
13-41 A voiding and Resolv ing Common Problems (2) .................................................................... 575
13-42 A voiding and Resolv ing Common Problems (3) .................................................................... 576
13-43 A voiding and Resolv ing Common Problems ( 4) .................................................................... 577
13-44 A voiding and Resolv ing Common Problems (5) .................................................................... 578
13-45 A voiding and Resolv ing Common Problems (6) .................................................................... 579
13-46 A voiding and Resolv ing Common Problems (7) ....................................................................580
13-47 A voiding and Resolv ing Common Problems (8) ..................................................................... 581
13-48 A voiding and Resolv ing Common Problems (9) .................................................................... 582
13-49 Review of Learner Objectives ....................................................................................................... 583

xx
13-50 Key Points ..............................................................................................................................................584

Module 14 Troubleshooting Tools .............................................................................. 585


14-2 Importance ............................................................................................................................................. 585
14-3 Module Lessons ................................................................................................................................... 585
14-4 Lesson 1: VMware Skyline Health ................................................................................................. 586
14-5 Learner Objectives ............................................................................................................................. 586
14-6 About VMware Skyline Health ...................................................................................................... 587
14-7 Accessing VMware Skyline Health ............................................................................................... 588
14-8 VMware Skyline Health Check Categories ............................................................................... 589
14-9 VMware Skyline Health for vSAN ............................................................................................... 590
14-10 Online Health Checks .......................................................................................................................... 591
14-11 vSAN Release Catalog Up-to-Date Health Check ................................................................ 592
14-12 Scenario: Troubleshooting Network Health Issues (1) ......................................................... 593
14-13 Scenario: Troubleshooting Network Health Issues (2) ........................................................ 594
14-14 Scenario: Troubleshooting Network Health Issues (3) ........................................................ 595
14-15 Scenario: Troubleshooting Network Health Issues ( 4) ........................................................ 596
14-16 vSAN Capacity Check ...................................................................................................................... 597
14-17 Performance Service Charts .......................................................................................................... 598
14-18 vSAN Performance Checks: VM .................................................................................................. 599
14-19 vSAN Pert ormance Checks: Disks ............................................................................................. 600
14-20 vSAN Performance Check: Physical Adapter .........................................................................601
14-21 vSAN Performance Check: Host Network ..............................................................................602
14-22 vSAN Performance Check: VM Virtual Disks ......................................................................... 603
14-23 Running Proactive Tests ................................................................................................................. 604
14-24 Exporting Support Bundles: Local Files .................................................................................... 605
14-25 Exporting Support Bundles: vCenter Server .......................................................................... 606
14-26 Exporting Support Bundles: ESXi Host ......................................................................................607
14-27 Review of Learner Objectives ...................................................................................................... 608
14-28 Lesson 2: Commands for vSAN .................................................................................................. 609
14-29 Learner Objectives ............................................................................................................................ 609
14-30 About vSphere ESXi Shell ................................................................................................................610
14-31 Accessing vSphere ESXi Shell ......................................................................................................... 611
14-32 Examining the vsantop Utility .......................................................................................................... 612
14-33 Navigating vsantop.............................................................................................................................. 613
14-34 Examples of vsantop Entity Outputs ........................................................................................... 614
14-35 ESXCLI Commands ............................................................................................................................ 615

XXI
14-36 Viewing vSphere St orage Inf ormation (1) .................................................................................. 616
14-37 Viewing vSphere Storage Inf ormation (2) ................................................................................. 617
14-38 Viewing vSphere Network Information (1) ................................................................................ 618
14-39 Viewing vSphere Network Information (2) ............................................................................... 618
14-40 Listing Available Subcommands (1) .............................................................................................. 619
14-41 Listing Available Subcommands (2) ............................................................................................. 619
14-42 Other Useful Commands in vSphere ESXi Shell (1) ...............................................................620
14-43 Other Useful Commands in vSphere ESXi Shell (2) ..............................................................620
14-44 Other Useful Commands in vSphere ESXi Shell (3) ............................................................... 621
14-45 Python Scripts for Testing Systems ........................................................................................... 622
14-46 Using Python to Inject Errors ......................................................................................................... 623
14-47 About PowerCLI ................................................................................................................................. 624
14-48 PowerCLI Commands: Example 1 ................................................................................................ 625
14-49 PowerCLI Commands: Example 2 ............................................................................................... 626
14-50 ESXC LI Namespaces in vSAN ....................................................................................................... 627
14-51 Using the esxcli vsan network Command ................................................................................. 628
14-52 Using the esxcli vsan network list Command .......................................................................... 629
14-53 Activity: Using the esxcli vsan network Command .............................................................. 630
14-54 Activity: Using the esxcli vsan network Command Solution .............................................. 631
14-55 Using the ESX C LI Debug Namespace ....................................................................................... 632
14-56 Activity: Using the esxcli vsan debug Command ................................................................... 633
14-57 Activity: Using the esxcli vsan debug Command Solution ................................................. 634
14-58 Using ESXCLI to Investigate Object Health (1) ....................................................................... 635
14-59 Using ESX CLI to Investigate Object Health (2) ...................................................................... 636
14-60 Using ESXCLI to Investigate VMDK Files ................................................................................. 637
14-61 Activity: Using the esxcli vsan debug vmdk list Command ............................................... 638
14-62 Activity: Using the esxcli vsan debug vmdk list Command Solution .............................. 639
14-63 vSAN Health Check Results: Overall State ............................................................................. 640
14-64 Using ESX CLI to Investigate Health Check Results ............................................................... 641
14-65 vSAN Health Check Results: Query Failed Tests .................................................................. 642
14-66 Activity: Using the esxcli vsan health cluster get -t Command ....................................... 643
14-67 Activity: Using the esxcli vsan health cluster get -t Command Solution ..................... 644
14-68 Using ESX CLI to Investigate vSAN Controllers ..................................................................... 645
14-69 Activity: Using the esxcli vsan debug controller list Command ....................................... 646
14-70 Activity: Using the esxcli vsan debug controller list Command Solution ...................... 64 7
14-71 Using ESX CLI to Investigate Fault Domains ............................................................................ 648
••
XXll
14-72 Activity: Using t he esxcli vsan f ault domain Command ........................................................ 649
14-73 Activity: Using t he esxcli vsan f ault domain Command Solution ..................................... 650
14-74 Using ESXCLI to Investigate Drive Type and Tier ................................................................. 651
14-75 Activity: Using the esxcli vsan storage list Command ......................................................... 652
14-76 Activity: Using the esxcli vsan storage list Command Solution ........................................ 653
14-77 Using ESXCLI to Investigate iSCSI lnformation ...................................................................... 654
14-78 Activity: Using the esxcli vsan iscsi Command ........................................................................ 655
14-79 Activity: Using the esxcli vsan iscsi Command Solution ...................................................... 656
14-80 Using ESXCLI to Investigate Cluster Details ............................................................................ 657
14-81 Activity: Using the esxcli vsan cluster get Command .......................................................... 658
14-82 Activity: Using the esxcli vsan cluster get Command Solution ........................................ 659
14-83 About Ruby vSphere Console ...................................................................................................... 660
14-84 Logging In to the Ruby vSphere Console (1) ............................................................................ 661
14-85 Logging In to the Ruby vSphere Console (2) ........................................................................... 661
14-86 Navigating the vSphere and vSAN Infrastructure ................................................................. 662
14-87 Using Ruby vSphere Console Help .............................................................................................. 663
14-88 Using the Ruby vSphere Console to List vSAN Commands ............................................ 664
14-89 Viewing Host-Specific Information ............................................................................................... 665
14-90 Viewing Host-Specific Disk Information ..................................................................................... 666
14-91 Using the Ruby vSphere Console to Investigate VM Objects .......................................... 667
14-92 About Unassociated Objects ......................................................................................................... 668
14-93 Signs of Unassociated Objects ...................................................................................................... 669
14-94 Using the Ruby vSphere Console to Investigate Unassociated Objects ..................... 670
14-95 Creation of Unassociated Objects (1) .......................................................................................... 671
14-96 Creation of Unassociated Objects (2) ......................................................................................... 671
14-97 Using the Ruby vSphere Console to Investigate a VM ....................................................... 672
14-98 Activity: Using the vsan.object_inf o Command ..................................................................... 673
14-99 Activity: Using the vsan.object_info Command Solution ................................................... 67 4
14-100 Using the Ruby vSphere Console to Investigate Swap Objects ..................................... 67 5
14-101 Using the Ruby vSphere Console to Investigate Object Status ..................................... 676
14-102 Activity: Using the vsan.obj_status_report Command ....................................................... 677
14-103 Activity: Using the vsan.obj_status_report Command Solution ..................................... 678
14-104 Using the Ruby vSphere Console to Predict Failures .......................................................... 679
14-105 Review of Learner Objectives ...................................................................................................... 680
14-106 Lesson 3: Useful Log Files ................................................................................................................ 681
14-107 Learner Objectives .............................................................................................................................. 681
•••
XXlll
14-108 Log Files for vSAN ............................................................................................................................. 682
14-109 Examining boot.gz............................................................................................................................... 683
14-110 Examining clomd.log ........................................................................................................................... 684
14-111 Examining hostd.log ........................................................................................................................... 685
14-112 Activity: Mounting vSAN Disks Issues ........................................................................................ 686
14-113 Activity: Mounting vSAN Disks Issues Solut ion ...................................................................... 687
14-114 Examining vmkernel. log (1) .............................................................................................................. 688
14-115 Examining vmkernel. log (2) ............................................................................................................. 689
14-116 Examining vmkwarning.log ............................................................................................................. 690
14-117 Examining vobd.log ............................................................................................................................. 691
14-118 vobd.log: Device Repaired .............................................................................................................. 692
14-119 Examining vsanmgmt.log .................................................................................................................. 693
14-120 Examining vmware-vsan-health-service.log .............................................................................694
14-121 Lab 17: Reviewing the Troubleshooting Lab Environment ................................................. 695
14-122 Lab 18: Troubleshooting the Maint enance Mode Issue ....................................................... 695
14-123 Lab 19: Troubleshooting the vSAN Datastore Capacity Increasing Issue ................... 695
14-124 Lab 20: Troub leshoot ing the Two- Node vSAN Clust er Configurat ion Issue ............ 696
14-125 Lab 21: Troubleshooting the vSAN Cluster Issue .................................................................. 696
14-126 Lab 22: Troubleshooting the vSAN Node Configuration Issue ........................................ 696
14-127 Lab 23: Troubleshooting the vSAN Cluster Configuration Issue (1) ............................... 697
14-128 Lab 24: Troubleshooting t he vSA N Cluster Configuration Issue (2) ............................. 697
14-129 Lab 25: Troubleshooting the vSA N Cluster Configuration Issue (3) ............................. 697
14-130 Lab 26: T roubleshooting the vSAN Cluster Configuration Issue ( 4) ............................. 698
14-131 Lab 27: T roubleshoot ing the vSAN Clust er Datastore Capacity Reporting Issue ... 698
14-132 Review of Learner Object ives ....................................................................................................... 699
14-133 Key Points .............................................................................................................................................. 699


XXIV
Module 1
Course Introduction

1-2 Course Introduction

1-3 Importance
vSAN is a policy-driven software-defined storage solution that is integrated with vSphere. vSAN
simplifies storage provisioning and management in the software-defined enterprise.

1-4 Learner Objectives (1)


• Describe vSAN concepts

• Detail the underlying vSAN architecture and components

• Explain the key features and use cases for vSAN

• Identify requirements and planning considerations for vSAN clusters

• Describe the vSAN deployment options

• Explain how to configure vSAN fa ult domains

• Detail how to define and create a VM storage policy

• Discuss the impact of vSAN storage policy changes

• Describe vSAN storage space efficiency

• Explain how vSAN encryption works

• Identify requirements to configure vSAN iSCSI targets

• Detail VMware HCI Mesh technology and architecture

1
1-5 Learner Objectives (2)
• Detail vSAN File Service architecture and configuration

• Explain the use cases of vSA N Direct

• Describe how to set up stret ched and two-node vSAN clusters

• Explain the import ance vSAN node hardware compatibility

• Describe the use of vSphere Lifecycle Manager t o aut omate driver and firmware
installations

• Detail vSAN resilience and data availability

• Describe vSAN maintenance mode and data evacuat ion options

• Explain how to use proactive tests to check t he int egrity of a vSAN cluster

• Use VMware Skyline Healt h for monitoring vSAN health

• Apply a structured approach to troubleshoot vSAN cluster configuration and operat ional
problems

2
1-6 Course Outline

1. Course Introduction

2. Introduction to vSAN

3. Planning a vSAN Cluster

4. Deploying a vSAN Cluster

5. vSAN Storage Policies

6. vSAN Resilience and Data Availability

7. Configuring vSAN Storage Space Efficiency

8. vSAN Security Operations

9. Introduction to Advanced vSAN Configurations

10. vSAN Cluster Maintenance

11. vSAN Stretched and Two- Node Clusters

12. vSAN Cluster Monitoring

13. Troubleshooting Methodology

14. Troubleshooting Tools

3
1-7 Typographical Conventions
The following t ypographical conventions are used in t his course.

Conventions Use and Examples

Mono space Identifies command names, command opt ions, parameters, code fragments,
error messages, filenames, fo lder names, directory names, and path names:

• Run the esxtop command.

• ... found in the /var I log /mes sages file.

Mono space Identifies user inputs:


Bold
• Enter ipconf ig I release.
Boldface Identifies user interface controls:

• Click the Configuration t ab.

Italic Identifies book titles:

• vSphere Virtual Machine Administration

<> Indicates placeholder variables:

• <ESXi host name>

• ... the Set tings I <Your Name> . txt file

4
1-8 References

Title Location

VMware vSphere Documentation https://docs.vmware.com/en/VMware-


vSphere/index.html

VMware vSAN Design Guide https:/I storagehub. vmware.com/t/vmware-r-vsan-tm-


design-and-sizing-guide-2/

Administering VMware vSAN https:/I docs.vmware.com/ en/VMware-


vSphere/7. 0 /vsan-70-administ ration-guide.pdf

VMware vSAN Operations and https:/I storagehub. vmware.com/t/ operations-and-


Management management -1/

Monitoring the VMware vSA N Clust er https://docs.vmware.com/en/VMware-


vSphere/7. 0 I com. vmware. vsphere. vsan-
monitoring.doc/ GU ID-610054 6C-1A3 F-46E7-B795-
C793B21C1C61.ht ml

Handling Failures and Troubleshooting https://docs.vmware.com/en/VMware-


vSAN vSphere/7. 0 I com. vmware. vsphere. vsan-
monitoring.doc/ GU ID-0 F3C4D 3 F-9B86-4879-9C 60-
D6A977523112.ht ml

5
1-9 VMware Online Resources

Documentation for vSphere: https:/I docs.vmware.com/ en/VMware-vSphere/index.html

VMware Communities: http://communities.vmware.com

• Start a discussion.

• Access t he knowledge base.

• Access documentation, technical papers, and compatibilit y guides.

• Access communities.

• Access user groups.

VMware Support: htt p:/ /www.vmware.com/support

VMware Hands-on Labs: http://hol.vmware.com

VMware Learning: http://www.vmware.com/learning

• Access course cat alog.

• Access worldwide course schedule.

6
1-10 VMware Learning Overview
You can access the following Education Services:

• VMware Learning Paths:

Help you find the course that you need based on the product, your role, and your level
of experience

Can be accessed at https:/ /vmware.com/learning

• VMware Customer Connect Learning, w hich is the official source of digital training, includes
the fallowing options:

On Demand Courses: Self-paced learning t hat combines lecture modules with hands-on
practice labs

VMware Lab Connect: Self-paced, technical lab environment where you can practice
skills learned during instructor-led training

Certification Exam Prep: Comprehensive video-based reviews of exam topics and


objectives to help you take your certification exam

• Fo r more information, see https://vmware.com/learning/connect-learning.

7
1-11 VMware Certification Overview
VMware certifications validate your expertise and recognize your technical knowledge and skills
with VMware technology.

Enterprise Architect

Design
Senior Administrator
Solution A rchitect
VCAP VMware Certified Advanced Professional - -
Deploy

Administrator
Developer

Operator

Application Data Center Cloud Management Net work Security End - User
Modernization Virtual ization and Automation Virtualization Computing

VMware certification sets the standards for IT professionals who work with VMware technology.
Certifications are grouped into technology tracks. Each track offers one or more levels of
certification (up to four levels).

For the complete list of certifications and details about how to attain these certifications, see
https:/ /vmware.com/ certification.

8
1-12 VMware Credentials Overview
VMware badges are digital emblems of skills and achievements. Career certifications align to job
roles and validate expertise across a solution domain. Certifications can cover multiple products
in the same certification.

vmware· vmware* vmware· vmware· vmware·


CERTIFIED CERTIFIED CERTIFIED CERTIFIED CERTIFIED

Specialist certifications and skills badges align to products and verticals and show expanded
expertise.

vmware vmware

Digital badges have the fallowing features:

• Easy to share in social media (Linkedln, Twitter, Facebook, biogs, and so on)

• Validate and verify achievement

• Contain metadata with skill tags and accomplishments

• Based on Mozilla's Open Badges standard

For the complete list of digital badges, see http:/ /www.pearsonvue.com/vmware/badging.

9
Module 2
Introduction to vSAN

2-2 Importance
Understanding the logical architecture and relationships between vSAN elements provides the
necessary foundation to build a software-defined data center.

Objects and components are the building blocks of vSAN data storage. Understanding how
objects are created and distributed across multiple components is important for planning a
datastore that retains performance as objects and components are managed.

2-3 Module Lessons


1. Introduction to vSAN

2. vSAN Objects and Components

3. vSAN Software Underlying Architecture

11
2-4 Lesson 1: Introduction to vSAN

2-5 Learner Objectives


• Describe the vSAN architecture

• Identify vSAN objects and components

• Describe the advantages of object-based storage

• List the differences between all-flash and hybrid vSAN configurations

• Explain the key features and use cases for vSAN

• Discuss vSAN integration and compatibility with other VMware technologies

12
2-6 About vSAN
vSAN is a software-defined storage solution that provides shared storage for VMs.

vSAN virtualizes local physical storage in the form of HDD or SSD devices on ESXi hosts in a
cluster, turning them into a unified datastore.

vSAN is a vSphere cluster feature that you can enable on an existing cluster or when creating a
cluster, similar to how you enable the vSphere HA and vSphere DRS features.

-----------------------------------------
• I
I I
I
I
----------
1
I
I
I I c
I
I I
I
I I
I
I I
I
I I
I
I I
I 1 vSAN Datastore 1
l _________ J I
I I
I
,..--------- .. .. --------- .. ,..--------- ..I I
I
I
I
I
I
G: SSD :
I
1
I
I
I
I
G
:SSD:
I
I
I
I
I
I
.

'
.
SSD
'
I
I
I
I
I
~: :~~:
I . . . . I
: . SSD. I
I SSD SSD 1
I I · L..::.:.J f
• I · ' ' ' I I L..::.:.J L..::.:.J I I
I
I
I
:~~:
1 L..::.:.J L..::.:.J I
""---------~
:... I:_________
SSD :11: SSD :1:
., :GB:
~---------""
I
I
I
I
I
Ill 0 Ill Ill 0 Ill Ill 0 Ill I
I
I I
ESXi Host ESXi Host ESXi Host
I I
I I
I I
I I
I I
I I
I I
I I
I vSAN Network I
·- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ..
vSAN provides enterprise-class storage that is robust, flexible, powerful, and easy to use. It
aggregates locally attached storage devices to create a storage solution that can run at the
edge, core, or cloud, and all easily managed by vCenter Server. vSAN is integrated directly into
the hypervisor.

13
2-7 vSAN Node Minimum Requirements
vSAN nodes must have the following minimum hardware resources available .

._/ l:ssD:I • ssD 1 SSD for caching.

1 SSD for capacity


SSD (or HOD for hybrid mode) .
SATA HDD
Server Listed In The
V Mware Compatibility Gu ide
10 Gb NIC

• For All-Flash
or Hybrid mode
1 Gb for hybrid mode.

SAS/SATA Controllers RAID controllers must work in


passthrough or RAID 0 mode.

Memory: 8-32 GB of RAM Memory requirements might


depending on the amount of differ depending on workload
drives and disk groups.
needs.

You must verify that the ESXi hosts in your organization meet the vSAN hardware requirements.
All capacity devices, drivers, and firmware versions in your vSAN configuration must be certified
and listed in the vSAN section of the VMware Compatibility Guide.

14
2-8 About the vSAN Datastore
A datastore is the basic unit of storage in virtualized environments. When you enable vSAN on a
cluster, a vSAN datastore is created automatically.

Only one vSAN datastore is created, regardless of the number of storage devices and hosts in
the cluster.

The vSAN datastore appears as another datastore in the list of datastores that might be
available, including vSphere Virtual Volumes, VMFS, and NFS.

vSAN Datastore
[ 411 •

..

- - - - ----
- - - ----
-- -
vSAN Cluster

• • • • • •
SSD SSD SSD
• • • • • •

...__ Disk Group - - Disk Group _ . - Disk Group _ .

Ill 0 Ill Ill 0 Ill Ill 0 Ill

The size of the vSAN datastore depends on the number of capacity devices per ESXi host and
the number of ESXi hosts in the cluster. For example, if a host has seven 2-TB capacity devices
and the cluster includes eight hosts, the approximate storage capacity is 7 x 2 TB x 8 = 112 TB.

When using the all-flash configuration, f lash devices are used for capacity. For hybrid
configuration, magnetic disks are used fo r capacity.

15
2-9 vSAN Datastore Characteristics
The vSAN datastore has the following characteristics:

• vSAN provides a single vSAN datastore accessible to all hosts in the cluster.

• A single vSAN datastore can provide different service levels for each VM or each virtual
disk.

• Only capacity devices contribute to datastore capacity.

• The capacity of the cache devices does not affect the size of the vSAN datastore.

vSAN works best when all ESXi hosts in the cluster share similar or identical configurations,
including storage configurations, across all cluster members.

A consistent configuration balances VM storage components across all devices and hosts in the
cluster.

You can increase the vSAN datastore capacity by adding capacity devices or hosts with
capacity devices to the vSAN cluster.

16
2-10 vSAN Disk Groups
A disk group is a unit of physical storage capacity on a host and a group of physical devices that
provide performance and capacity to the vSAN cluster. On each ESXi host that contributes its
local devices to a vSAN cluster, devices are organized into disk groups.

Hosts can include a maximum of five disk groups, each of which must have one flash cache
device and one or more capacity devices (a maximum of seven). In vSAN, you can configure a
disk group w ith either all-flash or hybrid configurations.

Flash Disk Groups Hybrid Disk Groups

Cache Tier Cache Cache I Cache Cache



SSD
• •

• •
SSD



SSD

• •

SSD


·&,

·&,
• • •
·&,

.&.
• • •
Capacity Tier •


SSD
• •

• •
SSD



SSD
• •

• •
SSD


·&,

·&,
• • •
·&,

·&,
• • •
Disk Group Disk Group Disk Group Disk Group

The devices used for caching cannot be shared across disk groups and cannot be used for other
purposes. A single caching device must be dedicated to a single disk group. In hybrid clusters,
flash devices are used for the cache layer and magnetic disks are used for the storage capacity
layer. In an all-flash cluster, flash devices are used for both cache and capacity.

17
2-11 Hybrid Disk Groups
The vSAN hybrid disk group configurations include one f lash device for cache and between one
and seven magnet ic devices for capacity. Cache devices are used for performance.

The cache device should be sized at a minimum o f 10% of the disk group capacity:

• 70% o f t he available cache is used for frequently read drive blocks.

• 30% of the available cache is used for write buffering.

• • • • • • • • • •
SSD SSD SSD SSD SSD
Cache Tier • • • • • • • • • •

·----- - -·e=: --------


·e=: - - - - - - - - - - - - - - - - - -·
·e=: ·e=: ·e=:
• • • • • • • • • •

Capacity Tier
·e=:•
·e=: ·e=:
• •
·e=: • • • • •
·e=: ·e=:
• • • •
·e=: ·e=:
• • • •
·e=: ·e=:
• • • •

·e=: ·e=: .&. .&.


• • • • • • • •
·e=: ·e=:
• • • •
·e=: ·e=:
• • • •
·e=: ·e=:
• • • •

·e=: ·e=: .&. .&.


• • • • • • • •
·e=: ·e=:
• • • •
·e=: ·e=:
• • • •
·e=: ·e=:
• • • •
•Disk Group· •Disk Group· • Disk Group· •Disk Group· •Disk Group·

18
2-12 All-Flash Disk Groups
The vSAN all-flash disk group configurat ions include one flash device for cache and between
one and seven capacity f lash devices.

Flash devices are used in a two-tier format for caching and capacity, and 100% of t he available
cache is used for writ e buffering. The administrat or decides which f lash devices to use for the
capacity t ier and t he cache tier.

• • • • • • • • • •
SSD SSD SSD SSD SS D

- - - - - - - ------- - - -----· . - ------· . - ------· . ..


Cache Tier • • • • • • •

... - - - - - -
• • •

....
• • • • • • • • • •
SSD SSD SSD SSD SSD
• • • • • • • • • •
• • • • • • • • • • • • • • • • • • • •
SSD SSD SSD SSD SSD SSD SSD SSD SSD SSD
Capacity Tier • • • • • • • • • • • • • • • • • • • •
• • • • • • • • • • • • • • • • • • • •
SSD SSD SSD SSD SSD SSD SSD SSD SSD SSD
• • • • • • • • • • • • • • • • • • • •
• • • • • • • • • • • • • • • • • • • •
SSD SSD SSD SSD SSD SSD SSD SSD SSD SSD
• • • • • • • • • • • • • • • • • • • •

... Disk Group - - Disk Group - ... Disk Group - ... Disk Group - - Disk Group -

19
2-13 vSAN Storage Policies
vSAN storage policies define VM storage requirements for performance and availability.

Storage policies also define the placement of VM objects and components across the vSAN
cluster.

The number of component replicas and copies that are created is based on the VM storage
policy.

After a storage policy is assigned, its requirements are pushed to the vSAN layer during VM
creation. Stored files, such as VMDKs, are distributed across the vSAN datastore to meet the
required levels of protection and performance per VM.

VM Storage Policy vSAN Datastore


•Capacity ~--
•Availability ~-- v m '---____..,
• Performance
c-
vSphere vSAN

- o·IS kG roup
,.
- - o·IS kG roup - - o·IS kG roup -
• • • • •

,. SSD SSD SSD


• • • • •

• • • • • • • • • • • •
SSD SSD SSD SSD SSD SSD
• • • • • • • • • • • •

• • ,. • • • • • • • • •


SSD
• •
SSD
• •
SSD
• •
SSD
• .
,
SSD
• ,. SSD

Ill 0 Ill Ill 0 Ill Ill 0 Ill

20
2-14 vSAN RAID Types
vSAN supports the following common RAID types:

• RAID 0, Striped: Fastest performance, no redundancy

• RAID 1, Mirrored: Good performance, full redundancy w ith 200% capacity usage

• RAID 10, Mirrored plus striped: Best performance, redundancy with 200% capacity usage

• RAID 5, Striped plus parity: Good performance with redundancy that has slower drive writes
because of parity calculations

• RAID 6, Striped plus double parity: Good performance with redundancy that has the slowest
drive writes because it has twice the parity calculations as RAID 5

21
2-15 Multiple Storage Policies
Different vSAN storage policies can be applied to different objects in the same VM . For
example, if a VM has two virtual disks, each drive can be assigned different storage policies.

Multiple storage policy use case:

• To support VMs that require storage reservations for different VMDKs

• To provide increased redundancy to mission-critical data such as database files separate


f rom the OS

Boot Policy
• Avai lability A r- - - - - - - - - - - -

I I
• Performance A Boot I
• Capacity A .. .. - JI
-- - • .. •
I
vmdk
VM
I
I
I I
Application Pol icy 0 I I
• Avai lability B 0 App I
• Performance B 0
-- --
I
1 --- -- vmdk I
• Capacity B v
I
-
---- -- -- I
1 ___________ 1
I

22
2-16 vSAN Storage Policy Resilience
When configuring a VM storage policy, you can select a RAID configurat ion that is optimized
with suitable availability, performance, and capacity for your VM deployments.

Edit VM Storage Policy vSAN x


1 Nam<? and di?scr ption Ava1lat>11ty Advanced Polley Rules Tags

2 Policy structure Site d saster tol€fance © None • standard cluster ..,

Failures to tO:erate <D 2 failures· RA D·l (M ronng) ..,


No data reaunc:tancy

t 1 •a1lure - RA D-1 ( M rronng)

Defines the number 1 'a1lure - RA1D-5 (Erasure Coding)

of failures tolerated 2 failures - RAI0-1 (Mirroring)

2 failures - RA ::>-6 (Erasure Coding)


by an object. 3 •a11ures • RA D·I (M rronng)

CAN C EL BACK NEXT

23
2-17 Integrating vSAN with vSphere HA
You can enable both vSAN and vSphere HA on the same cluster.

If vSphere HA is already enabled on a cluster, it must be temporarily disabled. After vSAN is


enabled, vSphere HA can be re-enabled.

vSphere HA provides as much protection for VMs on a vSAN datastore as it does on a


traditional VMFS or NFS datastore.

When enabling vSAN and vSphere HA for the same cluster, the vSphere HA agent traffic, such
as heartbeats, and election packets flow over the vSAN network rather than the management
network.

vSphere HA uses the management network only when vSAN is disabled. vCenter Server
chooses the appropriate network based on the order in which the two services are enabled and
disabled.

24
2-18 Integrating vSAN with VMware Products
(1)

vSAN combined with vSphere and t he VMware ecosystem makes t he ideal storage platform for
the VMware Horizon virtual desktop infrastructure (VDI).

vSAN provides scalable storage in a VMware Horizon environment. You can scale up by adding
disk drives in each host or scale out by adding hosts to the cluster. vSAN supports both all-flash
and hybrid storage configurations for the VMware Horizon 7 VDI.

- - _.......
Horizon 7

vSphere vSAN

Ill 0 Ill Ill 0 Ill ••• Ill 0 Ill

25
2-19 Integrating vSAN with VMware Products
(2)

vSAN 7 support s using native file services as persist ent volumes for Tanzu clusters.

When used with vSphere w it h Tanzu, persistent volumes can support t he use of encryption and
snapshots. vSAN also enables vSphere Add-on for Kubernet es so that stateful containerized
workloads can be deployed on vSAN datast ores.

•I ---------------------------1I
I
ub s I
I I
I I
I M I
I
Persistent I I
I I
Storage
Volumes Class
I I
~------------------------ --~
I I
CNS Control Plane I
~-----------------~----1

VMDKs and Core:VMFS


9 vSAN File Service
Files SPBM :vSAN, vSphere Vi rtual Volu me "'-""
Storage
Class
vSp So

111 O I 111 O I • II 111 0 II

26
2-20 vSAN Use Cases
Some of the most common vSAN use cases include:

• Hyperconverged infrastructure (HCI): Use vSAN as part of a software-defined, unified


system that combines storage, compute, and network v irtualization w ith advanced
management capabilities.

• Business-critical applications: Use vSAN as a solution for storing business-critical applications


w ith specific storage needs.

• VDI: Use vSAN as a VDI storage solution for VMs and user data.

• Remote and branch offices: Use vSAN as a storage solution to increase local IT
performance, start with a small physical footprint, and control costs with f lexible licensing
models.

• Disaster recovery: Use vSAN as a disaster recovery solution to lower disaster recovery
costs, manage a disaster recovery from a unified UI, and orchestrate and automate
recovery.

27
2-21 vSAN Licensing
When planning your vSAN cluster, you must configure a license. vSAN licenses have per-CPU
capacity.

When you assign a vSAN license to a cluster, the amount of license capacity used is equal to the
total number of CPUs on the hosts that participate in the cluster.

The vSAN cluster must be assigned a license key before its evaluation period expires or before
its currently assigned license expires. If you upgrade, combine, or divide vSAN licenses, you
must re-assign the licenses to vSAN clusters.

vSAN license editions include Standard, Advanced, Enterprise, and Enterprise Plus packaging.

For more information about vSAN licensing, see the vSAN licensing guide at
https://www.vmware.com/content/ dam/ digitalmarketing/vmware/ en/pdf/products/vsan/vmw
are-vsan-licensing-guide.pdf.

28
2-22 vSAN Licensing Differences (1)
All vSAN licenses include the fallowing featu res.

• Storage Policy Based Mgmt .

• Virtual Distributed Switch

• Rack Awareness

• Soft ware Checksum

• All-Flash Hardware

• iSCSI Target Service

• QoS - IO PS Limit

• Cloud Nat ive Storage (CNS) Cont rol Plane

• vSphere Container St orage

• Interface (CSI) Driver

• Shared Witness

29
2-23 vSAN Licensing Differences (2)
The remaining licenses enable specific functionality.

License Edition Standard Advanced Enterprise Enterprise


Plus

Deduplication & Compression

RAID 5/6 Erasure Coding

vRealize Operations within vCenter

Data-at-Rest and Dat a-In-Transit


Encryption

Stretched Cluster with Local Failure


Prot ection

File Services

HCI Mesh

Data Persistence Plat t orm for


Modern Stateful Services

vRealize Operations 8 Advanced

30
2-24 Review of Learner Objectives
• Describe the vSAN architecture

• Identify vSAN objects and components

• Describe the advantages of object-based storage

• List the differences between all-flash and hybrid vSAN configurations

• Explain the key features and use cases for vSAN

• Discuss vSAN integration and compatibility with other VMware technologies

31
2-25 Lesson 2: vSAN Objects and
Components

2-26 Learner Objectives


• Define vSAN objects

• Describe how objects are split into components

• Explain the purpose of witness components

• Describe how vSAN stores large objects

• Explain how to view object and component placement on a vSAN datastore

32
2-27 vSAN and Object-Based Storage
vSAN is an object-based datastore.

VMs stored in the vSAN datastore comprise a series of objects.

vm

I NVM e I I NVM e I I NVM e I I NVM e ]


• • • • • • • • • • • • • • • •
SSD sso SSD sso SSD sso SSD sso
• • • • • • • • • • • • • • • •
• • • • • • • • • • • • • • • •


SSD
• •
sso • •
SSD
• •
sso • •
SSD
• •
sso • •
SSD
• •
sso •
Disk Group Disk Group Disk Group Disk Group
Ill 0 Ill Ill 0 Ill Ill 0 Ill Ill 0 Ill

vSAN Network
VMs include the fallowing objects:

• The VM home namespace

• VM disks (VMDK)

• VM swap object

• VM snapshots

• Vmem object

vSAN also stores other types of objects, including:

• vSAN pertormance service object

• vSAN iSCSI and file services objects

33
vSAN stores and manages data as flexible data containers called objects. Each object on the
datastore includes data, part of the metadata, and a unique ID. One of the most common objects
in the datastore is the v irtual machine disk (VMDK) object which contains VM data. Using a
unique ID, the object can be globally addressed by more than the filename and path. The use of
objects enables a detailed level of configuration on t he object level, for example, RAID type or
drive usage at a level higher than the physical drive blocks.

In a block-level file system, blocks are arranged in a RAID set or a logical drive first. You create
the file system on top of the RAID set. The file system includes the metadata or file allocation
table that defines filenames, paths, and data location. In this environment, the system places file
blocks on the drive according to the file system structure and bases the data protection on the
logical drive or RAID set.

Consider the following use case. The VMBeans company evaluates what sort of storage to use
for the content generated by their media development team. The company's current storage is
terabytes in size and needs the ability to grow. Their professional services vendor recommends
using object-based storage to store the media files. The vendor explains the benefits of object-
based storage for datastores that reach into the petabyte range and upwards. VMBeans agrees
that object-based storage seems ideal for this use.

34
2-28 About vSAN Storage Policies
vSAN storage policies define storage requirements for VMs:

• They determine how storage objects are provisioned.

• They guarantee the required level of service.

• They can be constructed from capabilities advertised to vCenter Server through vSphere
API for Storage Awareness.

vSAN

Availability Advanced Policy Rules Tags

Site disaster toleran ce CD None - standard cluster v

Failures to tolerate CD 1 failure - RAID-1 (Mirroring) v

Consumed storage space for 100 GB VM disk v.•ould be 200 GB

vS AN

Availability Advanced Policy Rules Tags

Number of disk stripes per object © 1 v

IOPS limit for objec t © 0

Objec t space reservatio n © Thin provisioning v

Initially reserved storage space for 100 GB VM disk would be 0 B

Flash read cache reservauon (%) © ---


o
Reserved cache space for 10008 VM disk would be 0 B

Disable object checksum © (>


Force provisioning © (>

35
2-29 Default vSAN Storage Policy
vSAN has a default VM storage policy:

• It uses mirroring to make data redundant.

• It cannot be deleted.

• It can be modified.

v o~ ge Pol1c1e
AT OT C L

0
0

0 Ol ey

Storage policies define VM storage requirements, such as performance and availability, in the
form of a policy. vSAN requires t hat VMs deployed to a vSAN datastore are assigned at least
one VM storage policy. If a storage policy is not explicitly assigned to a VM, the default storage
policy of the datastore is applied to the VM. If a custom policy has not been applied to the vSAN
datastore, the vSAN default storage policy is used.

36
2-30 About Objects and Components
Objects are made up of components.

If objects are replicated, multiple copies of the data are located in replica components.

In this example, vSAN creates two replicas of the object data. Each of these replicas is a
component of the object.

vm

Disk Group Disk Group Disk Group Disk Group


111 0 111 111 0 Ill 111 0 111 Ill 0 111

c --
vSAN Network
Each object is composed of a set of components, determined by storage policies. In this
example, with the fa ult tolerance set to 1, vSAN places protection components, such as replicas,
on separate hosts in the vSAN cluster, where each replica is an object component.

For example, a storage policy that tolerates a failure creates a copy of the VMDK data in
another location of the vSAN datastore.

The VMDK is the object and each copy is a component of that object.

37
2-31 Component Replicas and Copies
The number of component replicas and copies that are created is based on the VM storage
policy.

vSAN Def ult S orag Polley

Rules VM Comph nee VM Template Storage Compat1bll1ty

Rule-set 1: VSAN
Placement
Storage Type VSA
Ste d sast r tot ranc Non
Fa ures to to! rate 1f ure RAIO 1 (Mirroring)
umber of d sk str pes per ob ect 1
IOPS I mil for Object 0
Ob ect space reservation Th!n provas on ng
Flash read cache reservat on 0
D sab e object checksum No
Force provtsion ng No

& New Virtual Machine • ,, G ACTIO N S v

Monotor Perrrussions Oatastores Networks Snapshots Updates



ISSUH and Alarms v 0 Group components by host plbcement

A I Issues Vor IU<'ll Object Compononts


Trigg<lfed Alarms

Ty~ Compone"t St•t• Ho•t F1
Performance v

Ovefvi<!w v CJ VM home (RAIO 1)

Ad~mxed
Component O Actow D sa-esx Ol.vclassJocal
Tasks and Events v
Component 0 ACllW 13 sa-esx 02 vdass local
Tasks
Evl!fllS Witness 0 Active El sa-esx 03 vdass.local
Utillzauon
v Q Hard d!Sk 1 (RAIO 1)
vSAN v

Physlcal dosl< p1ac._ component 0 Actl\'lt El sa-esx 01 vclass.local

Performance
COmponcnt 0 Activo 0 sa-esx 03 vdass.local

0 ACtl\'lt 13 sa-esx 02 vclass local •


• •
6 ccm-u on l hOst•

The replica object's purpose is to allow t he VM to continue to run if a failure occurs in the vSAN
physical infrast ructure.

The number of repl ica objects creat ed is det ermined by the setting specified in t he vSAN
st orage policy used t o set the resiliency of the storage objects f or vSAN .

38
2-32 Object Accessibility
Object and component accessibility can be viewed in the fa llowing ways:

• Use the vSphere Client.

• Run the esxcli vs an debug object l ist command.

Object accessibility is determined at the object layer:

• Components are not cluster-aware.

• Components depend on object-level logic from the DOM.

Reasons why DOM objects can be offline or inaccessible include:

• The object data is not properly in sync.

• The underlying LSOM components are offline.

vSAN objects are available when more than 50 percent of the components comprising an object
are accessible. Quorum, an asymmetrical voting system to decide the availability of objects,
determines accessibility.

Each component can have one or more votes. In the case of a tie, a witness component is
provisioned to achieve quorum. Ties are primarily caused by a network partition or some other
split-brain scenario.

You contact VMware Support in the fa llowing situations:

• When the DOM owner is not properly established and an 1/0 error is returned

• When a DOM object is unassociated

39
2-33 About Witnesses
When needed, vSAN automatically creates witness components.

Witness components provide an availability mechanism to VMs by serving as a tiebreaker when


a quorum does not exist in a vSAN cluster:

• The quorum and voting system is in place to preserve data integrity.

• Each component has one or more votes.

• Quorum is achieved when more than 50 percent of the votes are available.

• If a quorum exists, the object is accessible.

Storage Policy ./
FTT = 1 - RAID 1 ~- vm
0 - ......--
- I I I I I I ...
I vmdk I
I I
L. - - - -
--------1
1 solated 1
I - I
I I
I NVM e I I I
• • • • • • I · · I
SSD SS D
• • • •
Component : Component :
• • • •
SSD SSD I I
• • • • • • • • • • • •
I I
Disk Group Disk Group Disk Group 1 Disk Group I
I111 0 111 I Ill 0 111 111 0 Ill I
I
Ill 0 111
I
I
I I
• @ --------- • I I
vSAN Network
Like many clustering solutions, having an even number of replicas creates a partition risk, also
called split-brain scenario. A split-brain scenario occurs when hosts containing an even number of
replicas for a specific VM cannot communicate with other hosts over the network. To resolve
this issue, vSAN typically creates another component called a witness.

The witness component is small (only a few megabytes) and contains only metadata, no
application data. The purpose of the witness is to serve as a tiebreaker when a partition occurs.

40
vSAN supports a quorum-based system in which each component might have more than one
vote to decide the availability of VMs. A minimum of 50 percent of the votes that make up a VM
storage object must be accessible at all times. When fewer than 50 percent of the votes are
accessible to all hosts, the object is not available to the vSAN datastore.

The default storage policy states that any object can sustain at least one component failure. This
illustration represents RAID 1 w ith two replicas on two separate capacity disks in two hosts. A
witness can be created and placed on a third host. If the system loses a component or access to
a component, the system uses the witness. The component that can communicate with the
witness is declared to have integrity and is used for all data read/write operations until the
broken component is repaired.

41
2-34 Example: Witness
vSAN controls witness configurations in the background, transparent to the user.

Witnesses are of the fallowing types:

• Primary

• Secondary

• Tiebreaker
This example shows a three-way mirror across five nodes.
--- --- ----- - ---- ------ ---- --- -- --- --- ---- -- --- --- --
'
- - - ' I Votes = 1 I I Votes= 1 I I Votes= 1 I I Votes= 1 J I Votes= 1 I I
I
I
I

vmdk Component
I
I
I
Ill Ill Ill Ill Ill Ill I
I I

----- --------- --------- ----


I
------ -- ----- I

Using a single witness, A secondary witness is added to avoid


four votes would exist. an even number of components.

A primary witness is the first witness that is deployed for any object.

Secondary witnesses are created to ensure that every cluster node has equal voting power
toward a quorum.

If an even number of total components exists after adding primary and secondary witnesses, a
tiebreaker witness is added to make the total component count an odd number.

In the example, each component is given one vote. Two witnesses are used to guarantee an
adequate quorum if a component failure occurs. The witness count is dependent on how the
system places components and data on the nodes in the cluster.

For more detailed examples of the witness architecture and logic, see VMware Virtual SAN:
Witness Component Deployment Logic at
https:/ /biogs. vmware .com/vsphere/2014/ 04/ vmw are-virtual-san-witness-component-
deployment-logic.html.

42
2-35 Large vSAN Objects
When a VM disk exceeds 255 GB, the object is automatically split into multiple components.
vSAN 7.0 uses concatenation in this case. If the VM has a disk of size 300 GB, the first
component would be 255 GB, and the second component would be 45 GB.

When planning the vSAN datastore, consider the size of VMDKs and other objects planned for
the datastore.

Physical Placement H rd d•s'i< I x


Group components by host placement

'lartual Object Compooen1s

Typ.c I lost t.u t Oom.lln Cachd O •

v Eb SA.Payload-01 > Q Hard disk 1 <Concat enat ion)

v PAID 1

Component $ ActiVe O n1·es>.1·01 vclass local • Local VM•·1are Disk (mp:

Compon_nt e ActiVe D n1·es:>.l-04 vclass local a Local VMware Dis~ (mp;

v RAID 1

Component 0 ActiVe ti nf-esxl·03 vclass local g local V~"••1are Dis~ (mp~

Component e Act1Ve a n!-&5~1·02 vclass local a Local \IM\•1are Disk: (mp>

The VMDK for Hard Disk 1is300 GB

(

CLOSE

When planning vSAN disk group architecture, you must plan for enough individual devices so
that vSAN can split objects exceeding limits, in addition to any striping planned through policies.

vSAN divides any object larger than 255 GB on a vSAN datastore, regardless of whether stripes
are defined in the storage policy applied to an object.

Any object, such as a VMDK, can be up to 62 TB. The 62 TB limitation is the same for VMFS and
NFS so that VMs can be cloned and migrated using vSphere vMotion between vSAN and other
datastores.

43
2-36 Review of Learner Objectives
• Define vSAN objects

• Describe how objects are split into components

• Explain the purpose of witness components

• Describe how vSAN stores large objects

• Explain how to view object and component placement on a vSAN datastore

44
2-37 Lesson 3: vSAN Software Underlying
Architecture

2-38 Learner Objectives


• Describe the CLOM , DOM, LSOM, CMMDS, and RDT vSAN software components

• Explain t he relationships between the vSAN software components

45
2-39 vSAN Architectural Components
The main vSAN architecture components are:

• Cluster Level Object Manager (CLOM)

• Distributed Object Manager (DOM)

• Local Log Structured Object Manager (LSOM)

• Cluster Membership, Monitoring, and Directory Services (CMMDS)

• Reliable Datagram Transport (RDT)

46
2-40 vSAN Architecture Analogy: Building a
House

47
2-41 CLOM and Its Role: Architect
The CLOM process runs on every vSAN node:

• It validates that objects can be created based on policies and available resources.

• It is responsible for object compliance.

• It defines the creation and migration of objects.

• It distributes loads evenly between the vSAN nodes.

• It is responsible for proactive and reactive rebalancing.

You manage the CLOM process w ith the I etc/ ini t. d/ clomd
status I stop I start I restart command.

48
2-42 DOM and Its Role: Contractor (1)
The DOM runs on each host in a vSAN cluster. The DOM process includes:

• Managing object availability and initial 1/0 requests

• Replicating and coordinating I/ 0 to hosts where components reside

• Determining object accessibility

• Receiving instructions from the CLOM and the DOMs running on other hosts in the vSAN
cluster

• Instructing the LSOM to create local components of an object:

DOM services on the hosts of a vSAN cluster communicate to coordinate the creation
of components.

All DOMs in a vSAN cluster resynchronize objects during a recovery.

49
2-43 DOM and Its Role: Contractor (2)
Each object has a DOM owner, a DOM client, and a DOM component manager.

DOM client:

• Processes the I/ 0 generated by a VM

• Runs on every node that contains VMs

• Forwards 1/0 to the DOM owner

DOM owner:

• Receives 1/0 requests from the DOM client

• Determines object accessibility

• Replicates 1/0 based on the object's RAID layout and determines in which components the
data block resides

• Forwards the I/ 0 to the DOM component manager where the components reside

DOM component manager:

• Interacts with the LSOM locally to commit the I/ 0 to disk

50
2-44 LSOM and Its Role: Worker
The LSOM performs the following funct ions:

• Creates the local components as instruct ed by the DOM

• Provides read and write buffering

• Performs t he encryption process for the vSAN dat astore

• Reports unhealt hy storage and network devices

• Pert or ms I/ 0 retries on failing devices

• Interacts directly w ith the solid-state and magnetic devices

• Pert orms solid-state drive log recovery when t he vSAN node boot s

51
2-45 CMMDS and Its Role: Project Manager
The CMMDS performs the following functions:

• Provides topology and object configuration information to the CLOM and the DOM as
requested

• Records the owners of the objects

• Inventories all items, such as hosts, networks, and devices

• Stores object metadata information, such as policy-related information on an in-memory


database

• Defines the master, backup, and agent cluster roles:

The backup host accelerates the process of convergence if the master host fails.

Roles are decided at the time of cluster discovery.

Communication between the master host and other hosts occurs every second.

Updates are exchanged through the RDT.

52
2-46 RDT and Its Role: Delivery Truck
ROT is a network protocol for the transmission of vSAN traffic.

- -

0 0
- - -

53
2-4 7 Activity: vSAN Component Layer
Which vSAN layer pert or ms I/ 0 ret ries on failing devices?

1. The LSOM layer

2. The management layer

3. The DOM layer

4. The transaction layer

54
2-48 Activity: vSAN Component Layer
Solution
Which vSAN layer performs 1/0 ret ries on failing devices?

1. The LSOM layer

2. The management layer

3. The DOM layer

4. The transaction layer

55
2-49 Component Interaction: Architect and
Contractor
When it receives a request to create an object, the CLOM determines whether the object can
be created with the selected VM storage policy.

If the object can be created, the CLOM instructs the DOM to create the components.

56
2-50 Component Interaction: Contractor and
Worker
During component creation, the DOM, CLOM, and LSOM have the following interactions:

• The DOM decides what components are created locally.

• The DOM instructs the LSOM to create the local components. The LSOM interacts at the
drive layer and provides persistent storage. The DOM interacts w ith the LSOM across the
local hosts.

• If components are requ ired on other nodes, the DOM interacts w ith the DOM instance on
the remote node.

57
2-51 Component Interaction: Architect,
Contractor, and Project Manager
The DOM and the CLOM consult the CMMDS:

• To get information about available resources in the cluster

• To learn about the topology and the object configuration

58
2-52 Activity: Drive Status Reporting
One of the capacity drives failed in a disk group. Which process flags the drive status?

1. CLOM

2. DOM

3. LSOM

4. RDT

59
2-53 Activity: Drive Status Reporting Solution
One of the capacity drives failed in a disk group. Which process flags the drive status?

1. CLOM

2. DOM

3. LSOM

4. RDT

60
2-54 Activity: Physical Space
Which vSAN architectural component displays the amount of physical space that is being used
on the disk?

1. CLOM

2. DOM

3. LSOM

4. CMMDS

61
2-55 Activity: Physical Space Solution
Which vSAN architectural component displays the amount of physical space that is being used
on the disk?

1. CLOM

2. DOM

3. LSOM

4. CMMDS

62
2-56 Step-by-Step VM Creation (1)
The process of creating a VM shows how vSAN software components interact.

1. A new VM is defined.
2. vCenter service daemon
(vpxd) receives the request.

Create
NewVM
Select
Host • VM
Created

3. The request is 4. vCenter service agent


received by the (vpxa) receives the
selected host. request.
5. CMMDS creates the
requested policy.

63
2-57 Step-by-Step VM Creation (2)
7. CLOM checks the VM storage policy to
6. Hostd starts the VM determine how many components should be
and VMDK file created.
creation . 8. CLOM checks resources available to satisfy the
request.
9. CLOM decides initial placement of the
components.

Create
NewVM
Select
Host
VPXA LSOM .. VM
Created

10. CMMDS updates the information


received from CLOM .

64
2-58 Step-by-Step VM Creation (3)

11. DOM receives the request


from CLOM to create the
objects.
Create Select VM
NewVM Host Created

12. DOM instructs LSOM to


create the components.
13. LSOM provides persistent
storage to components.

65
2-59 Beginning-to-End VM Creation

7. CLOM checks the VM storage policy to


6. Hostd starts the VM determine how many components should be
and VMDK file created .
creation . 8. CLOM checks resources available to satisfy the
request.
9. CLOM decides initial placement of the
1. A new VM is defined. components.
2. vCenter service daemon
(vpxd) receives the request. 11. DOM receives the request
from CLOM to create the
objects.

Create
NewVM
Select
Host .. '

Hostd
• VM
Created

3. The request is 4 . vCenter service agent


received by the (vpxa) receives the
selected host. 12. DOM instructs LSOM to
request.
create the components .
5. CMMDS creates the
requested policy. 13. LSOM provides persistent
storage to components.

10. CMMDS updates the information


received from CLOM .

66
2-60 Activity: VM Creation
A user is trying to create a VM, but t he creation fails shortly after t he user clicks Finish.

Which service should you investigate first?

1. The CLOM service

2. The LSOM service

3. The DOM service

4. The CMMDS service

67
2-61 Activity: VM Creation Solution
A user is trying to create a VM, but t he creation fails shortly after the user clicks Finish.

Which service should you investigate first?

1. The CLOM service

2. The LSOM service

3. The DOM service

4. The CMMDS service

68
2-62 Lab 1: Reviewing the Lab Environment
Review information to become familiar with the lab environment:

1. Access the Lab Environment

2. Examine the Existing vSAN Cluster Details

3. Examine the ESXi Host Configuration Details

4. Verify the vSAN Cluster Licensing

69
2-63 Review of Learner Objectives
• Describe the CLOM, DOM, LSOM, CMMDS, and RDT vSAN software components

• Explain the relat ionships between the vSAN software components

2-64 Key Points


• vSAN is a software-defined storage solution providing shared storage for VMs.

• vSAN virtualizes local physical storage resources of the ESXi host, turning them into object-
based storage.

• Only one vSAN datastore is created, regardless of the number of storage devices and
hosts in the cluster.

• Disk groups contain a maximum of one flash cache device and seven capacity devices, and
a host can include a maximum of five disk groups.

• With vSAN, you can configure a disk group with either all-flash or hybrid configurations.

• vSAN is object-based storage, and an object is a logical volume composed of a set of


components.

• vSAN storage policies define VM storage requirements for performance and availability.

Questions?

70
Module 3
Planning a vSAN Cluster

3-2 Importance
You must understand how to plan for server hardware, storage capacity, and network
configuration requirements for a successful vSAN cluster deployment.

3-3 Module Lessons


1. vSAN Requirements

2. Planning Capacity for vSAN Clusters

3. Designing a vSAN Network

71
3-4 Lesson 1: vSAN Requirements

3-5 Learner Objectives


• Identify requirements and planning considerations for vSAN clusters

• Discuss vSAN cluster planning and deployment best practices

72
3-6 vSAN Cluster Requirements
When planning a vSAN cluster deployment, you must verify t hat all elements of the cluster meet
the minimum requirements for vSAN.

All devices, drivers, and firmware versions in your vSAN configuration must be certified and
listed in the vSAN section of the VMware Compatibility Guide.

A standard vSAN cluster must contain a minimum of three hosts that contribute to t he capacity
of the cluster.

As a best practice, consider designing clusters with a minimum of four nodes.

See the VMware Compatibility Guide at


https://www.vmware.com/resources/compatibility I search.php ?deviceCategory=san.

73
3-7 vSAN Configuration Minimums and
Maximums
Familiarit y with vSAN minimum and maximum configurations is useful during the initial planning
phase o f your deployment.

vSAN 7 support s a wide array of values for vSAN cluster configuration.

Feature or Component Minimum Maximum

ESXi host 3 64

VM None 200 VMs per host (8000 per


vSphere HA protected
cluster)

Disk group 1 (per host) 5 (per host)

Cache t ier disk 1 (per host) 5 (per host)

Capacity tier disk 1 (per host) 35 (per host)

74
3-8 vSAN Host CPU Requirements
When determining CPU requirements for hosts in the vSAN cluster, consider the following
information:

• Number of virtual CPUs required for virtual machines (VMs)

• Virtual CPU to physical CPU core ratio

• Cores per socket

• Sockets per host

• ESXi hypervisor CPU overhead

• vSAN operational overhead (10%)

Additional CPUs must be considered for vSAN operational overhead if vSAN deduplication,
compression, and encryption capabilities are enabled.

The vSAN ReadyNode Sizer tool is useful for determining CPU requirements for vSAN hosts.

75
3-9 vSAN Host Memory Requirements
When determining memory requirements for host s in t he vSAN clust er, consider the following
information:

• Memory per VM

• ESXi hypervisor memory overhead

• vSAN operational overhead

The memory requirements for vSAN hosts depend on t he amount of disk groups and devices
that t he ESXi hypervisor must manage.

Consider at least 32 GB o f memory for a fully operational vSAN node wit h five disk groups and
seven capacity devices per disk group.

DOD

For more information about calculating vSAN memory consumption, see VMware knowledge
base article 2113954 at https:/ /kb.vmware.com/s/article/2113954.

76
3-10 vSAN Host Network Requirements
When configuring your network for vSAN hosts, consider the fallowing recommendations:

• 1 GbE adapt ers must be dedicated to hybrid vSAN traffic.

• 10 GbE adapters, can be shared with o t her network t raffic types.

If a network adapter is shared w it h other t raffic types, use V LA Ns to isolate t raffic types.

Consider configuring Network 1/0 Cont rol on a vSphere distributed switch to ensure that
sufficient bandwidth is guaranteed to vSAN.

77
3-11 vSAN Host Storage Controllers
Storage controller recommendations:

• Use controllers that support pass-through mode to present disks directly to a host.

• Use multiple storage controllers to improve pert ormance and to isolate a potential controller
failure to only a subset of disk groups.

• Consider the storage controller pass-through mode support for easy hot-plugging or the
replacement of magnetic disks and flash capacity devices on a host.

Ill 0 Ill III III


RAID 0
LUN - RAID 0
LUN

• • • • • • • •
SSD SSD SSD SSD
• • • • • • • •

Configure controllers that do not support pass-through to present each drive as a RAID 0 LUN
with caching disabled or set to 100% Read. If a controller works in RAID 0 mode, you must
perform additional steps before the host can discover the new drive.

The controller must be configured identically for all disks connected to the controller including
those not used by vSAN. Do not mix the controller mode for vSAN disks and disks not used by
vSAN to avoid handling the disks inconsistently, which can negatively affect vSAN operation.

In RAID 0 mode, each drive must be used to create one RAID 0 volume that only contains one
drive. If you have 12 drives, you create 12 RAID 0 volumes each with one of the drives in it. RAID
0 mode introduces additional complexity for a disk replacement.

78
3-12 vSAN Host Boot Device Requirements
You can boot vSAN hosts from a local disk, a USB device, SD cards, and SAT ADOM devices.

If you choose to boot vSAN hosts from a local disk, using separate storage controllers for boot
disks and vSAN disks is the best practice.

When booting vSAN hosts from a USB device, an SD card, or SAT ADOM devices, log
information and stack traces are lost on host reboot. They are lost because the scratch partition
is on a RAM drive. Therefore, a best practice is to use persistent storage for logs, stack traces,
and memory dumps.

Consider configuring the vSphere ESXi Dump Collector and vSphere Syslog Collector.

During installation, the ESXi installer creates a core dump partition on the boot device. The
default size of the core dump partition satisfies most installation requirements.

If the ESXi host has 512 GB of memory or less, you can boot the host from a USB, SD, or
SAT ADOM device. When booting a vSAN host from a USB device or SD card, the size of the
boot device must be at least 4 GB.

If the ESXi host has more than 512 GB of memory, you can boot the host from a SATADOM or
disk device w ith a minimal size of 16 GB. When you use a SAT ADOM device, use a single-level
cell (SLC) device.

79
3-13 About Hard Disk Drives
Hard disk drives (HDDs), also called magnetic or mechanical drives, include single or multiple
platters rotating at a specific speed to provide data access. The HDD provides larger storage
capacity at a lower cost.

The HDD has an arm with several heads or transducers that read and write data on the disk. The
arm moves the heads across the surf ace of the disk to access different data.

Common HD D rotational speeds include:

• 7,200 RPM

• 10,000 RPM

• 15,000 RPM

In vSAN, HDDs are used in the capacity tier for hybrid configurations.

80
3-14 Solid-State Devices
In vSA N, SSDs are used in cache tiers to improve performance. They can also be used in both
the cache t ier and t he capacity tier, which is called a vSAN all-flash configuration.

SSDs have a limited number of write cycles before t he cell fails, which is called its write
endurance rating. Every time t he drive writes or erases, the f lash memory cell's oxide layer
deteriorates. The type of cell affects the number o f write cycles before failure.

SSD Category Life Span of SSD

Single-level cel l (S LC) 100,000 write cycles

Multilevel cell (M LC) 3,000 to 10,000 write cycles

Enterprise MLC (eM LC) 20,000 to 30,000 write cycles

Triple- level cell (TLC) 1,000 write cycles

Solid-st ate devices (SSD) include a collection o f memory semiconductors and all data is stored in
integrat ed circuit cells. SSDs are more expensive than HDDs per amount o f storage space
available. SSD offers up to t en t imes fast er read and write speeds than a midrange HDD.

81
3-15 vSAN Limitations
When planning to deploy vSAN, you should st ay w ithin t he limits of what is supported by vSAN.

For example, vSAN does not support:

• Hosts that participate in mult iple vSAN clusters

• vSphere DPM and Storage 1/0 Control

• SEsparse disks, which are a default format for all delta disks on VM FS6 dat astores

• RDM and diagnostic partition

82
3-16 Review of Learner Objectives
• Identify requirements and planning considerations for vSAN clusters

• Discuss vSAN cluster planning and deployment best practices

83
3-17 Lesson 2: Planning Capacity for vSAN
Clusters

3-18 Learner Objectives


• Discuss how to plan storage consumption by considering data growth and failure tolerance

• Explain how to design vSAN hosts for operational needs

84
3-19 Capacity-Sizing Guidelines
When planning for the storage capacity of the vSAN datastore, you must consider the following
factors:

• Storage space required for VMs

• Anticipated growth

• Failures to tolerate

• vSAN operational overhead

When planning to use advanced vSAN features such as software checksum or deduplication and
compression, reserve addit ional storage capacity to manage the operational overhead.

Plan for additional storage capacity to handle any potential failure or replacement of capacity
devices, disk groups, and hosts. Reserve additional storage capacity for vSAN to recover
components after a host failure or when a host enters maintenance mode.

Keep at least 30% of storage consumption unused to prevent vSAN from rebalancing the
storage load. vSAN rebalances the components across the cluster whenever the consumption
on a single capacity device reaches 80% or more. The rebalance operation might affect the
pertormance of applications.

Plan extra capacity to handle any potential failure or replacement of capacity devices, disk
groups, and hosts. When a capacity device is not reachable, vSAN recovers the components
from another device in the cluster. When a flash cache device fails or is removed, vSAN
recovers the components from the entire disk group.

Provide enough temporary storage space for changes in the vSAN VM storage policy. When
you dynamically change a VM storage policy, vSAN might create a new RAID tree layout of the
object. When vSAN instantiates and synchronizes a new layout, the object may consume extra
space temporarily. Keep some temporary storage space in the cluster to handle such changes.

Enabling deduplication and compression with software checksum features requires additional
storage space overhead, approximately 6.2 percent capacity per device.

85
3-20 vSAN Reserved Capacity
vSAN requires free space set aside for operations such as host maintenance mode data
evacuation, component rebuilds, and rebalancing.

This free space also accounts for the capacity needed for host outages. Activities such as
rebuilds and rebalancing can temporarily consume additional raw capacity.

The free space required for these operations is called vSAN reserved capacity and it comprises
the fallowing elements:

• Operations reserve

• Host rebuild reserve

Reserve Reserve
Capacity Capacity
(% cixed) (%Va ies)

Usable Capacity Usable Capacity

I111 0 111 I I111 0 111 I I111 0 111 I I111 0 111 I


• • • •
• • • •
• • • •
I111 0 111 I I111 0 111 I I111 0 111 I I111 0 111 I
I111 0 111 I I111 0 111 I I111 0 111 I I111 0 111 I
I vSAN I vSAN I
vSAN 7 vSAN 7 U1

vSAN reserved capacity comprises:

• Operations reserve: Reserves storage space for internal vSAN operations, such as object
rebuild or repair.

• Host rebuild reserve: Reserves storage space to ensure that all objects can be rebuilt if host
failure occurs in the cluster.

In all vSAN versions earlier than vSAN 7 U1, the free space required for these transient
operations was called slack space. The limitations of vSAN in versions earlier than vSAN 7 U1 led
to a generalized recommendation of free space as a percentage of the cluster (25-30%),
regardless of cluster size.

86
When sizing a new vSAN cluster, use the vSAN Sizer tool which has the vSAN reserved
capacity calculation logic built in. As a best practice, do not use manually created spreadsheets
or calculators because they will no longer accurately calculate free capacity requirements for
vSAN environments using version 7 U1 or later.

87
3-21 Planning for Failures to Tolerate
When planning the storage capacity of the vSAN datastore, you must consider the failures to
tolerate (FTT) levels and the failure tolerance method (FTM) attributes of the VM storage
policies for the cluster.

The storage space consumption w ill vary, depending on VM availability requirements.

For example, a VM configured with FTT set to 1 and FTM set to RAID 1 Mirroring can consume
twice the storage space on the vSAN datastore to support the configured availability.

Similarly, a VM configured with FTT set to 1 and FTM set to RAID 5/6 Erasure Coding can
consume 33% of the additional storage space on the vSAN datastore to support the configured
availability.

A Failures To Tolerate policy with RAID 1 can significantly affect the space consumption of a
vSAN datastore.

Consider another example in which you expect to have 100 VMs in your vSAN environment. On
average, each VM has 2 objects, each object has an average size of 40 GB. The total number of
VMDK objects is 200.

If you uniformly apply a Failures To Tolerate policy value of 1 to all objects, each object w ill have
a provisioned space of 40 GB x 2 = 80 GB. Therefore, the total provisioned space of 100 VMs is
expected to be 200 x 80 GB = 16 TB.

If you increase the Failures To Tolerate value to 3, the average space consumed by an object
will be 40 GB x 4 = 160 GB. Therefore, the total provisioned space of all 200 objects will be 200
x 160 GB = 32 TB.

88
3-22 Planning Capacity for VMs
When planning the storage capacity of the vSAN datastore, consider the space required fo r the
following VM objects:

• VM home namespace object

• VM VMDK object

• VM snapshot object

• VM swap object

The VM snapshot object inherits the storage policy settings from the VM's base VMDK file. You
must plan extra space according to the expected size and number of snapshots required.

The VM swap object inherits the storage policy settings from the VM home namespace object.
You must also consider enabling thin provisioning for the VM swap object if your environment is
not overcommitted for memory.

Because VM VMDK files are thin-provisioned by default, prepare for future capacity growth.

The VM VMDK object holds the user data. Its size depends on the size of v irtual disk, defined by
the user. However, the actual space required by a VMDK object for storage depends on the
applied VM storage policy.

For example, if the size of the VMDK is 40 GB and the FTT is set to 3, the actual space
consumption can be up to four times the VMDK size (40 GB x 4).

The size of a VM swap object depends on the memory configured on a VM. Because vSAN
applies the Failures To Tolerate policy of 1 to a VM swap object, the actual space consumption
can be twice as much as the configured VM memory.

For example, if the memory configured on a VM is 8 GB, the actual storage space consumption
will be 16 GB for the VM swap object.

89
3-23 Plan and Design Consideration: VM Home
Namespace Objects
A home namespace object does not contain user data. It is a container object that contains
various files (such as VMX and log files) which, compared to other objects, occupies much less
space.

VM home namespace objects only accept the fallowing policies:

• Failures To Tolerate

• Force Provisioning

90
3-24 Plan and Design Consideration: VMDK
and Snapshot Objects
VMDK and snapshot objects hold user data, and their size depends on the size of the VMDK file
defined by the user. The space required by a VMDK object depends on the user-applied policy.

Because VMDKs are thin-provisioned by default, you must prepare for future growth in capacity.

The VM snapshot delta object inherits the storage policy settings of the VM's base VMDK file, so
plan extra space according to the expected size and number of snapshots required.

The space required by a VMDK object depends on the user-applied policy. For example, if the
size of the VMDK is 40 GB and the Failures To Tolerate is set to 3, the actual space
consumption can be up to 4 times of the VM DK file size ( 40 GB x 4 ).

In t he example, if the object space reservation is also set to 100%, the entire 160 GB will be
reserved when applying the policy.

91
3-25 Plan and Design Consideration: VM Swap
Object
The size of a VM swap object depends on the memory that is configured on a VM .

The VM swap object inherits the storage policy settings from the VM home namespace object
and thin provisioned by default.

92
3-26 vSAN Cache Tiers
A vSAN cache tier must meet the following requirements:

• An SSD must be connected.

• A higher cache-to-capacity ratio can be considered to allow future capacity growth.

• Fo r hybrid clusters, a flash caching device must provide at least 10% of the anticipated
capacity tier storage space.

Disk Group

• •
Cache Tier SSD
(SSD) • •

• • • •
SSD SSD
• • • •
• • • •
SSD SSD
• • • •

For best performance, consider a PCle f lash device which is faster than SSD.

In vSAN all-flash configurations, the cache tier is not used for read ing operations. You can use a
small capacity with high write endurance flash device for cache t ier.

93
3-27 vSAN Capacity Tiers
A vSAN capacity t ier must meet the following requirements:

• For vSAN hybrid configurations, at least one HDD must be available.

• For vSAN all-flash configurations, at least one SSD must be available.

Storage requirements are an important part of a vSAN deployment plan.

Environments that are planned and deployed according to requirements and best practices have
a better chance of avoiding failures and workflow interruptions.

Disk Group

• •
SSD
• •

• • • •
SSD SSD
Capacity Tier • • • •
(SSD or HDD) • • • •
SSD SSD
• • • •

94
3-28 Magnetic Devices for Capacity Tiers
When planning the size and number of magnetic disks for capacity in hybrid configurations,
follow the requirements for storage space and performance.

Use magnetic devices according to requirements for performance, capacity, and cost of the
vSAN storage. SAS and NL-SAS magnetic devices have faster performance.

Plan the configuration of magnetic capacity devices according to the fallowing guidelines:

• For better vSAN performance, use many magnetic disks that have smaller capacity.

• For balanced performance and predictable behavior, use the same type and model of
magnetic disks in a vSAN datastore.

Plan for enough magnetic disks to provide adequate aggregated performance, using more small
devices provides better pert ormance than using fewer large devices. Using multiple magnetic
disk spindles can speed up the process.

95
3-29 Flash Devices for Capacity Tiers
Plan the configuration of f lash capacity devices for vSAN all-flash clusters to provide high
performance and the required storage space, and to accommodate future growth.

Choose SSD flash devices according to requirements for performance, capacity, write
endurance, and cost of vSAN storage:

• For capacity: Using flash devices is less expensive and has lower write endurance.

• For balanced pert ormance and predictable behavior: Use the same type and model of flash
capacity devices.

96
3-30 Multiple vSAN Disk Groups
An entire disk group can fail if a f lash cache device or a storage controller stops responding.
vSAN rebuilds all components for a failed disk group from another location in the cluster.

Using multiple disk groups, with each providing less capacity, has benefits and disadvantages.

Benefits:

• Improved performance:

• The datastore has more aggregated cache, and I/ 0 operations are faster.

• If a disk group fails, vSAN rebuilds fewer components.

• Risk of failure is spread among multiple disk groups.

Disadvantages:

• Costs are increased because two o r more caching devices are required.

• A vSAN host requires additional memory to manage more disk groups.

• Multiple storage controllers are required to reduce the risk of a single point of failure.

97
3-31 About vSAN Cluster Scaling
vSAN scales up and scales out if you need more compute o r storage resources in t he cluster.

Scaling up adds resources to an existing host :

• Capacity disks for storage space

• Caching tier devices for performance

Scaling out adds nodes to the clust er for compute and storage capacit y .

I ll 0 Ill Ill 0 Ill Ill 0 Ill


,.---------., r---------,
• DiskG1oup •
r---------~
• Dlsk Gr"oup
• Disk G1ouJ' • I

:I Cache I: :I Cache I: ;, I__ I:


___,
Cache ,

I -. - - . -. - - . I • -
. --~. -~· • I ' •

I SS ssr I I
1..,_ sso
SS '
. _....,. _
. _.....,. I
I I SS I SS I
I i.:..
• _ . . : , i - · _ . . . . .. , I I • • I
I ,..,...
, -....... . , I I r-,- - . ~---:t I I r:------:1 ,, . , . - -..,., I
I SS SSD I I SS sso I I $S I SS I
..· -_________
I

I -· - -· -· - -
~---------~
· I · --· •
..
• I
I -· _
~---------
_..
• ._
· _.-• I
..

Scaling Up

Ill 0 Ill Ill 0 Ill Ill 0 Il l Ill Ill 0


r--------- .
1 Disk Group 1
,.---------,..
1 Disk Group I
r--------- -.
1 Disk Group 1
,.--------- ..
1 Dts Group 1

:I Cache I: : ( Cache I: :I Cache I: : [ Cache I:


I
I

sso
• •
S S[
• I
I
I ....-
'
. -
SSD
--.
. . - - - - -.. I
SS I
'
I
.
SSD
. . sso . '
I
I
I L
I
I
I • • . • I I · . .._
. _...,· I I · · I I 1r..:.....-..:.1 L;._ _;a I
I

I I . . . .. I I . . I I ,r:---~ r.:-
. -~ I
I sso ~s I I SSD ~so I I 5~0 550 I I t I
I._ ____
• _____ ,
. • • I I · · • · I I I• . ,. •__ __..
. I
.. _________ .I
I

~---------~ "--------- "


Scaling Out

98
3-32 Planning for Scaling Up
Scaling up a vSAN cluster refers to increasing the storage capacity by adding disks to the
existing vSAN node.

Always increase capacity uniformly on all cluster nodes to avoid uneven data distribution, which
can lead to uneven resource utilization.

Ways to scale up:

• Create new disk groups.

• Add new disks to existing disk groups.

• Replace existing cache and capacity disks with higher-capacity drives.

Reasons to scale up:

• Poor performance because of undersized cache disk

• To satisfy stripe width policy compliance

• To increase vSAN datastore capacity

II 0 I II 0 II I 0 I
•I

[ c I: ij c ] [ ' II
I
I
I
I
I
I ~-
-- --- - - -

c Ii g U

99
3-33 Planning for Scaling Out
Scaling out a cluster adds storage and compute resources to the cluster.

Reasons to add more nodes to a vSAN cluster:

• To increase storage and compute resources to a cluster

• To increase the amount of fa ult domains to meet FTT compliance

• To resolve cluster fu ll situation

You can also add compute-only hosts to the cluster, which add only CPU and memory
resources, not storage resources. If you have diskless servers or unused servers in inventory,
you can add them to a vSAN cluster.

Ill 0 Ill Ill 0 Ill II 0 I I Ill 0 111


r---------
• D sk GrolJP
.•
r---------,,
1 o ~k Gro 1p •
,..---------
• D sk Gro u p
...

r------
D
--,..
:I Cact1e ) : : ( Cache
I .
I:
r.------:1 I
:I
I •
Cact1e

I:
I
I . • '
I SSD I I SC sc I I
I .
SD
.
s I
t
I • I I I
I , . 1
r :---n I I . • r:---:1 I I
• •
I
I SSD SS I I S5 D SS I 1 SSD SS 1
I • • • I .. _________ ..
t • • • I I • • • I
'-"---------· '----------· ·---------·
Scaling Out

100
3-34 Using the VMware Compatibility Guide
Using the VMware Compatibility Guide, you can verify that ESXi hosts in your organization meet
vSAN hardware requirements.

For vSAN, the guide provides access to a sizing tool and a configuration guide.

The vSAN Ready Node Sizer tool is not limited to storage. It also factors in CPU and memory
sizing. This tool incorporates sizing overheads for swap, deduplication, and compression
metadata, as well as disk formatting and CPU for vSAN.

You can use the step-by-step guide to select the version, platform, model, and vendor for your
vSAN ReadyNode.

You can access the vSAN ReadyNode Sizer tool at http://vsanreadynode.vmware.com/RN/RN.

For the stepwise procedure, see How to Configure a vSAN ReadyNode at


https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/products/vsan/vmw
are-how-to-configure-vsan-ready-node.pdf.

101
3-35 Review of Learner Objectives
• Discuss how t o plan st orage consumption by considering dat a growth and failure t olerance

• Explain how to design vSAN hosts for operat ional needs

102
3-36 Lesson 3: Designing a vSAN Network

3-37 Learner Objectives


• Identify vSAN networking features and requirements

• Describe ways of controlling traffic in a vSAN environment

• List best practices for vSAN network configurations

103
3-38 vSAN Networking Overview
A vSAN cluster requires a network channel for communication between hosts.

Hosts use a VMkernel adapter to access the vSAN network. You can create a vSAN network
with standard or distributed switches.

• If using a vSphere distributed switch, a single VMkernel port group attaches to the hosts
that are enabled with vSAN.

• If using standard switches, each host has its own standard switch configuration for the
vSAN network .

..------------------------------
: vSphere vSAN Cluster :
I ~-------, I
I I I
I I I I
I I I I
I
I I vSAN Datast.ore I I
I - - - - - - - .... I
I I
I [sso ] [sso] [sso] I

I I: sso ][ :I
SSD Bl: sso :J I: SSD :11: SSD ] :

: BB BB BB I
I I
I 1111 0 1111 1111 Q 1111 1111 0 1111 I
1 1 1 1
ESXi Host ESX!i Host ESXi Host
I I
I I
I I
1 vSAN Network I
------------------------------

104
3-39 Designing a vSAN Network
When planning the network for t he vSAN cluster, consider the following networking features
that vSAN supports to provide availability, security, and guaranteed bandwidth:

• Distributed or standard switches

• NIC teaming and failover

• Unicast support

• Network I/ 0 Control

• Priority tagging and isolating vSAN t raffic

• Jumbo frames

105
3-40 NIC Teaming and Failover
vSAN uses the NIC teaming and failover policy configured on the virtual switch for network
redundancy only. vSA N does not use the second NIC for load-balancing purposes.

vSAN does not support multiple VM kernel adapters on t he same subnet.

Consider configuring Link Aggregation Cont rol Protocol (LAC P) or EtherChannel for improved
redundancy and bandwidt h use.

vSAN-pg - Edit Settings


General

Adva nced
Load balancing Route based on originating virtual port
VLAN
Network failure detection Link status only
Secunty
Teaming ana fa1lover Notify switches Yes

T ratfic shaping Failback x

Mon t orlng Failover Order Info


Miscellaneous Failover order l©J Select active and s1andby uplinks. Dunng a

••
.Ac!~ uplink•
tallover. standby ul)llnks activate 1n the
order speCffred below .
• Uplink I
S!endby uplink<

• Uplml<2
Unused uplinks

I CA NCEL I OK

106
3-41 Unicast Support
Unicast is the supported protocol for a vSAN network. Multicast is no longer required on the
physical switches that support vSAN clusters.

Reasons vSAN unicast mode was introduced in vSAN 6.6:

• To simplify network requirements for vSAN cluster communications for Cluster Membership,
Monitoring, and Directory Services (CMMDS) and VM 1/0 traffic

• To verify cluster participation

If hosts in your vSAN cluster are running earlier versions of ESXi, a multicast network is still
required.

1-+-L----~(111 6 111)

1111 0 111 1

1111 0 1111

107
3-42 Network I/ 0 Control
Network I/ 0 Control is available on vSphere distributed switches and provides the following
bandwidth controls:

• Guarantees a minimum amount of bandwidth for each traffic type

• Limits the bandwidth that each traffic type can consume

• Controls the proportion of bandwidth allocated to each traffic type during congestion

If you plan to use a shared 10 Gb Ethernet network adapter, place the vSAN traffic on a
distributed switch and configure Network I/ 0 Control to guarantee sufficient bandwidth for
vSAN traffic.

Fault
Tolerance

vSphere Distributed Switch


..-----------------
1 Teaming Policy 1
~----------------1

o-D o-D
111111 111111

108
3-43 Priority Tagging and Isolating vSAN
Traffic
Priority tagging is a mechanism to indicate to the connected network devices that vSAN traffic
has high quality-of-service (QoS) demands.

You can assign vSAN t raffic to a certain class and mark the traffic accordingly with a class-of-
service (CoS) value from 0 (low priority) to 7 (high priority).

Use the vSphere Distributed Switch traffic filtering and marking policy to configure priority levels.

For example, a tag of 7 for vSAN traffic indicates that vSAN traffic has high QoS demands.

Consider isolating vSAN traffic by segmenting it in a VLAN for enhanced security and
performance, especially if the backing physical adapter capacity is shared among several other
traffic types.

109
3-44 Jumbo Frames
Jumbo frames can transmit and receive up to six t imes more data per frame than the default of
1,500 bytes. This feature reduces the load on host CPUs when transmitting and receiving
network traffic.

You should verify that jumbo frames are enabled on all network devices and hosts in a cluster.

By default, the TC P Segmentation Offload (TSO) and Large Receive Offload (LRO) features are
enabled on ESXi. These features offload TCP /IP packet processing work onto the NI Cs. If not
offloaded, the host CPU must perform this work.

110
3-45 vSAN Network Requirements
The net work infrastructure and configurat ion on the ESXi hosts must meet the minimum
networking requirements for vSAN.

Network Component Requirements

Connection between host s Each host in the vSAN cluster must have a VMkernel
adapter for vSAN t raffic exchange.

Host network All host s in the vSAN cluster must be connect ed to a


vSAN layer 2 or layer 3 network.

Net work lat ency The maximum lat ency is 1 ms RTT f or st andard
(nonstretched) vSAN clusters between all hosts in
the cluster.

IPv 4 and IPv6 support vSAN network support s both 1Pv 4 and 1Pv6.

vSAN hybrid cluster Use 10 Gb or faster, but 1 Gb is supported w it h


Latency <1 ms RTT.

vSAN al l-flash cluster A vSA N al l-flash cluster requ ires 10 Gb or faster with
Latency <1 ms RT T, 1 Gb is not supported.

111
3-46 vSAN Communication Ports
The port s listed are used for vSA N communication.

Port Protocol vSAN Service

2233 TCP vSAN transport for storage I/ 0

8080 TCP vSAN management service

9080 TCP vSAN st orage I/ 0 filter

3260 TCP vSAN iSCSI target port

5001 UDP vSAN network health test

8010 TCP vSAN observer for live statistics

80 TCP vSAN performance service

112
3-47 vSAN Network Best Practices
When determining network requirements, the following practices can help improve performance,
throughput, and availability:

• Do not share a VMkernel adapter for multiple traffic types.

• vSAN traffic is independent of host management traffic and is better when isolated.

• Dedicate an Ethernet physical network of at least 1 Gb for vSAN hybrid configuration.

• Dedicate an Ethernet physical network of at least 10 Gb for vSAN all-flash configuration.

• Provision additional physical N IC for f ailover.

• If you plan to use a shared 10 Gb Ethernet network adapter, place vSAN traffic on a
distributed switch and configure Network I/ 0 Control to guarantee bandwidth to vSAN
traffic.

• Use jumbo frames fo r a vSAN networks in data centers where jumbo frames are already
enabled in the network infrastructure.

113
3-48 Review of Learner Objectives
• Identify vSAN networking features and requirements

• Describe ways of controlling traffic in a vSAN environment

• List best practices for vSAN network configurations

3-49 Key Points


• When planning a vSAN cluster deployment, you must verify that all cluster elements meet
the minimum requirements for vSAN.

• Using separate storage controllers for host boot disks and vSAN disks is a best practice.

• Planning additional storage capacity to handle any potential failure or replacement of


capacity devices, disk groups, and hosts is also a best practice.

• The vSAN ReadyNode Sizer tool is not limited to storage. It also factors in CPU and
• •
memory s1z1ng.

• When planning network configuration for a vSAN cluster, consider availability, security, and
bandwidth requirements.

Questions?

114
Module 4
Deploying a vSAN Cluster

4-2 Importance
Because vSAN can run extremely I/ 0-intensive workloads, your vSAN node hardware must
meet VMware compatibility requirements. Performance characteristics and hardware stability
cannot be guaranteed if you do not thoroughly test devices, firmware, and driver combinations.

Understanding how to configure vSAN cluster settings that meet the requ irements of your
environment is important. Failure to properly configure these settings can affect performance
and availability.

4-3 Module Lessons


1. Preparing ESXi Hosts for a vSAN Cluster

2. Deploying a vSAN Cluster

115
4-4 Lesson 1: Preparing ESXi Hosts for a
vSAN Cluster

4-5 Learner Objectives


• Explain the importance of hardware compatibility

• Describe the application of host hardware settings for optimum performance

• Discuss the importance of driver and firmware version compatibility

• Describe the use of vSphere Lif ecycle Manager to automate driver and firmware
installations

116
4-6 Verifying Hardware Compatibility
The VMware Compatibility Guide has a dedicated section for vSAN.

See the VMware Compatibility Guide before upgrading to ensure that new drivers and firmware
have been certified and tested for use with vSAN.

Ensure that the hardware devices are compatible with the versions of vSphere and vSAN.

Verify the compatibility of the following hardware components:

• Driver and firmware version of the storage controller

• High-write endurance value and disk firmware details of the cache tier devices

• Firmware details of the capacity t ier devices

See the VMware Compatibility Guide at


https://www.vmware.com/resources/compatibility I search.php.

117
4-7 Configuring Storage Controllers
Storage controllers play a key role in the I/ 0 path for vSAN operations to ensure optimum
pert ormance.

vSAN supports pass-through and RAID 0 modes. Pass-through is the preferred vSAN operating
mode.

See the VMware Compatibility Guide to verify the mode that is supported for your storage
controller.

Ill 0 Ill Ill 0 Ill

p------------ -----
I
• Passth ro ugh
------------.
I
r • • • • • • • • • • •
I
I RA ID- 0
.- - 1 · • • - - 1 • • • • • • • • • • • I
I

---
I I
I
------------- ----- -------------
I
·- - - - - - - - - - - I .....__ - - - I - - - e - - - - - - -·

• • •
SSD

·- • •
SSD

·- _
....._ •
___. ....._
• _ •
____. _
·-
....._ •
__. .....__
• _ •
____.

118
4-8 Considering Multiple Storage Controllers
You might need to install additional storage controllers in certain scenarios:

• You want to reduce the impact of a potential controller failure by placing disk groups on
separate controllers.

• A recent scale-up with additional disks requires additional controllers.

• The queue depth of the current single controller is inadequate to meet the workload and
physical disk configuration.

• The business wants better performance, which typically requires multiple controllers.

119
4-9 Configuring BIOS for High Performance
The frequency of the physical CPU is controlled by either the BIOS or by the ESXi hypervisor
(OS controlled).

For high performance, consider the following configurations:

• Select the OS controlled CPU power management mode.

• Enable Turbo Boost mode.

• Expose BIOS C-states to the hypervisor to enable high performance, as needed.

System BIOS

System BIOS settr.gs • System Profie settngs


CPU Power Management - - OSDBPM •

Memory Frequency Ma •1mum Performance •

TisboBoosl Enabled OOisabled


Energy Et hc1ent Turbo rna eg OOisebled
C1E Enabled oOisebled
C Stares ·-........___,........................_.., ..~-.........-·--~--~~ Enabled oOisebled
Collaborative CPU Perlormanca Con1rol Enabled oOisabled
Memory Patrof Sc.rub ---···-·-··....._._, ......·--·-··-- o Extend9d os1andard Drsabl9d
Memory Refresh Ra1e - - - · - · · -..--............- .....- 1x 02x
Uncore FreQU9tlcy ··-·-··--..· · · - · · - · - - - - -..- - O Dynamic Maximum
Ene<gy Efftc1ent Poky jPerformance
.,-,h.tir T t En bl d r11ro')c f r PrortK~o. A

120
4-10 CPU Power Management
The following CPU power management policies can be selected to manage energy consumption
and performance.

If you do not select a policy, ESXi uses the Balanced Power policy by default. For vSAN nodes,
the power policy should be adjusted for Hig h Performance.

When CPU is idle, ESXi can apply deep halt states, also known as C-states, to reduce power
consumption.

Edit Power Policy Settings sa-esxi-01.vclass.lo... X

@ High performance
Do not use any power management features

0 Balanced
Reduce energy consumption with minimal performance compromise

0 Low power
Reduce energy consumption at the risk of lower performance

( Custom
User-defined power management policy

C AN C EL

121
4-11 Verifying OS Controlled Mode
You use the vSphere Client to verify OS controlled mode.

Select the ESXi host and select Configure > Hardware > Overview > Power Management.

If ACPI C-states or ACP I P-states appears in the Technology text box, your power
management settings are set t o OS controlled.

O sa -esxi-0 1.vclass.local Acr10 Ns v

summary Monitor configure Permissions VMS Datastores Networks Upda tes

v1nua1 maen1nes 7.96 GB


>
Networking > Persistent Memory

Total OMB
System > Avallab•o OMB
Hardware v

Overview
Powe r Management [ EDIT POW ER POLICY I
Firmware T~ hnolo 9 y ·ACPI C-states, ACPI P-states
Virtual Flash v
Active po icy Balanced
V1rtua• Flash Resource Manaq_

122
4-12 Using VMware Skyline Health to Verify
Hardware Compatibility
VMware Skyline Health ensures that installed devices, drivers, and firmware are compatible w ith
the installed vSAN release.

The Hardware Compatibility section shows warnings if the controller, firmware, or drivers are not
listed as compatible in the VMware Compatibility Guide.

To view details, select vSAN Cluster and select Monitor > vSAN > Skyline Health > Hardware
compatibility.

Skyline Health
Last checked: 09/04/2020, 6:0218 PM RE TES T

v Hardware compat ibility

• VSAN HCL DB Auto Update

9 Controller is V tvl w are cer tified for E...

e Controller d river is VMware certified

9 Controller firmware is VMware cert...

9 Controller disk group mode is VMw ...

9 vSAN firmwa re version recommen ...

123
4-13 vSAN Hardware Compatibility List
Database
The vSAN Hardware Compatibilit y List Database (HCL DB) is used for the compatibility checks.

If vCenter Server has Internet connectivity, the HCL DB downloads automatically and regularly.
If automatic download is not possible, t he HCL DB can be updated manually using an offline
JSON file.

To view details, select vSAN Cluster and select Monitor > vSAN > Skyline Health > Hardware
compatibility > vSAN HCL DB up to date.

vSAN HCL DB up-to -date


HCL DB info Info

UPDATE FROM FILE GET LATEST VERSION ONLINE

Cu rrent t ime Lo cal HCL DB copy last up dat ed

09/04/2020, 6:59:39 PM 09/04/2020, 6:06:00 A M

For information about updating t he vSAN HCL database manually, see VMware knowledge base
article 2145116 at htt ps://kb.vmware.com/s/article/2145116.

124
4-14 Manually Updating Drivers and Firmware
You can download drivers and firmware from a vendor's website or from the VMware
Compatibility Guide website.

To install the downloaded drivers manually, copy the VMware installation bundle (VIB) to a
datastore or file system accessible to your host and use es x c 1 i commands to install the
drivers.

[root@tese-01:-) esx cli software vib install -d /tmp/ -ESX-7 . 0 . 1-nhpsa-2 . 0 . 14-offline_bundle-5036227 . zip
Inst allation Result
Message : The update completed successfully, but the system needs to be rebooted for the changes to be effective .
Reboot Required : tru e
V!Bs Installe d : Microsemi_bootbank_nhpsa_2 . 0 . 14-lOE.M . 701 . 0 . 0 . 4598673
VIBs Removed : Microsemi_bootbank_nhpsa_2 . 0 . 10-lOE.M . 701 . 0 . 0 . 42 4 0417
VI Bs Skipped :

Because manually installing drivers on individual hosts in a larger infrastructure becomes complex
to manage, use vSphere Lifecycle Manager.

For firmware updates, follow the vendor recommendations.

125
4-15 Automating Drivers and Firmware
Installation
vSphere Lifecycle Manager centralizes automated patch and version management by supporting
the fallowing activities:

• Upgrading and pat ching ESXi hosts

• Installing and updating third-party software on ESXi hosts

• Standardizing ESXi images across hosts in a cluster

• Installing and updating ESXi drivers and firmware

• Managing VMware Tools and VM hardware upgrades

126
4-16 About vSphere Lif ecycle Manager
vSphere Lif ecycle Manager is a unified software and f irmware management utility that uses the
desired-state model for all life cycle operations:

• Monitors compliance (drift)

• Remediates back to the desired state

vSphere Lifecycle Manager has a modular framework to support vendor firmware plug-ins.

-0 0
0
Desired
vmware.com Vendor State r@~
AV AV i \'.:V
vSphere Lifecycle Manager
Drift
Base Image Vendor Add-On Firmware and Drivers Add-On Vendor Plug-Ins

ES Xi Drivers BIOS
1/0 Cont rollers HP DELL
Storage Devices ·--------·
NICs
BMC
l[Q]1
•• ••
·--------·
Desire d
Image

• • • •
Apply Image Across Cluster

I
I
I
I
I
Ill 0 Ill Ill 0 Ill Remediate Drift I
'11111111- • • • • • • • • • • • • • • • • • • I

Ill 0 Ill Ill 0 Ill

127
4-17 vSphere Lifecycle Manager Desired
Image Feature
The v Sphere Lifecycle Manager Desired Image feature merges hy pervisor and host life cycle
management activit ies.

An image is created locally from desired state criteria comprised of t he hypervisor base image
and vendor drivers and firmware.

The Hardware Support Manager (HSM) vendor plug-in enables connectivity for the vendor
catalog and host management.

Releases Vendor
Catalog

vmware.com Vendor
AV AV

v v
Firmware and
Drivers Add-On
ESXi Version Host Cred entials
Vendor Add - Ons Cluster Reposito ry
Components
NICS
Firmware Repository
Driver Reposito ry

Drift Detect ion


Select Servers

A A
User Input Desired Image (Per Cluster) User Input

To start to using vSphere Lifecycle Manager Desired Image, the cluster must meet fol lowing
requirements:

• All ESXi host s must be vSphere version 7.0 and later.

• All ESXi host s must have a stateful installation.

• All ESXi host s in the clust er must be from t he same vendor.

If a host has a version of vSphere earlier t han 7.0, you must first use an upgrade baseline t o
upgrade the host and then you can start using images.

128
4-18 Elements of vSphere Lifecycle Manager
Desired Image
vSphere Lifecycle Manager Desired Image defines the exact soft ware stack to run on all ESXi
hosts in a cluster. It includes the fol lowing elements:

• ESXi hypervisor base image containing software fixes and enhancements

• Components (a logical grouping of VI Bs)

• Vendor add-ons (set o f O EM-bundled components)

• Firmware and driver add-ons

To maintain consistency, you apply a single ESXi image to all hosts in a clust er.

Firmware and
Drivers Add-On

129
4-19 Configuring vSphere Lif ecycle Manager
Desired Image
To configure an image:

1 Select the base ESXi Version.

2 Select Vendor Addon.

3 Select Firmware and Driver Add-on.

4 Include additional components.

OJ vSAN-Cluster ACTIONS v

Summary Monitor Configure Perm issions Hosts VMS Datastores Networks Updates

Hosts v Convert to an Image


Baselines
Step 1: Define Image
Image

Cluster Se11ings ESXi Version 7.0 GA -15843807 .., (rele~ed 03/16/2020)

VMware Tools Vendor Addon CD SELECT (optional)

VM Hardware
Firmwar e and Drivers Addon CD SELECT (optional)

Compone nts CD No additional components Show details

SAVE I VALIDATE

130
4-20 Setting Up vSphere Lifecycle Manager
Desired Image for New Clusters
When creating a cluster, you can also create a corresponding image for t he cluster:

1. Create a clust er.

2. Select the Manage all hosts in the cluster with a single image check box.

3. Define t he ESXi version for your cluster image.

4. (Optional) Select a vendor add-on for the host.

New Cluster SA-Datacen ter x


Name New Cluster

Location OJ SA-Da1acen1er
(D vSphere DRS

(D vSphere HA

vSAN

These services will have de1ault se11ings · these can be changed later in the
Cluster Quickstart work11ow.

a Manage all hosts in t he cluster with a single image (D

lrnage setup

ESXi Version 7.0b . 16324942 v

Vendor Addon None


(optional) -------------

The duster image can be 1urther customized later

I CANCEL l

131
4-21 Remediating Clusters
When you remediate a cluster, vSphere Lifecycle Manager applies the following elements to the
hosts:

• ESXi base image

• Vendor add-ons

• Firmware and driver add-ons

• User-specified components

Remediation makes the selected hosts compliant with the desired image.

You can remediate the entire cluster or precheck hosts without updating them.

132
4-22 Review of Learner Objectives
• Explain the importance of hardware compatibility

• Describe the application of host hardware settings for optimum performance

• Discuss the importance of driver and firmware version compatibility

• Describe the use of vSphere Lifecycle Manager to automate driver and firmware
installations

133
4-23 Lesson 2: Deploying a vSAN Cluster

4-24 Learner Objectives


• Deploy and configure a vSAN cluster using Cluster Quickstart

• Manually configure a vSAN cluster using the vSphere Client

• Explain how to use VMware Skyline Health

• Describe vSAN cluster backup methodology

134
4-25 vSAN Cluster Configuration Types
Because vSAN is a cluster-based solution, creating a cluster is the first logical step in the
deployment of the solution.

vSAN clusters can be configured in the fallowing ways:

• Single-site vSAN cluster

• vSAN stretched cluster

• Two-node vSAN cluster

Single-site vSAN clusters are configured on one site to run production workloads. All ESXi hosts
run on that single site.

vSAN stretched clusters span across three sites, two data sites, and a witness site. You typically
deploy vSAN stretched clusters in environments where avoiding disasters and downtime is a
key requirement.

A two-node vSAN cluster is a specific configuration implemented in environments where a


minimal configuration is a key requirement and typically running a small number of workloads that
require high availability.

135
4-26 Configuring a vSAN Cluster
You add hosts to the newly created cluster and configure vSAN.

You have two ways to configure a vSAN cluster:

• Cluster Quickstart

• Manual configuration

136
4-27 About Cluster Quickstart
Cluster Quickstart groups common tasks and consolidates the workflow. You can configure a
new vSAN cluster that uses recommended default settings for functions such as networking,
storage, and services.

137
4-28 Comparing Cluster Quickstart and Manual
Configuration

Cluster Quickstart Manual Configuration

You can use Cluster Quickstart only if hosts have A cluster can always be configured
ESXi 6.0 Update 2 or later. manually, regardless of the ESXi version
and hardware configuration.

ESXi hosts should have similar configurations. This method offers more f lexibility while
configuring a new or existing cluster.

Cluster Quickstart helps configure a vSAN cluster per This method provides detailed control
recommendations. over every aspect of cluster
configuration.

Cluster Quickstart is available only through the the This method is available through any
HTM L5-based vSphere Client. version of the vSphere Client.

138
4-29 Creating vSAN Clusters
To create a vSAN cluster, you must first create the vSphere cluster:

• Right-click a data center and select New Cluster.

• Enter a name for the cluster in the Name text box

• Select to configure DRS, vSphere HA, and vSAN.

New Cluster SA.-Datacenter x

Na e SA-vSA -01

Location Qi SA-Datacenter

© vSphere DRS

© vSphere A

vSA

These services will have default settings - these can be cnanged later 1n the
Ctus~er Ou1ckstart \\1orkflow
O "1anage all hosts 1r the clus er \Vith a single image ()

CANCEL

139
4-30 Adding Hosts Using Cluster Quickstart (1)
To start Cluster Quickstart, click the existing cluster and select Configure > Configuration >
Quickstart.

0 SA-vSAN-01 ACTIONS v

Summary Monitor Configure Permissions Hosts VMs Datastores Networks Updates

Services ) Cluster quickstart SKIP OUfCKSTART I


Conl19uration v
We have collected some common configuration tasks to make 1t easier to get your cluster up and running. II you prefer to configure your cluster manually. you can choose not to use this automated
( Ouockstart l workflow.
General
Security 1. Cluster basics 2. Add hosts 3. configure cluster
Locensng
VMwareEVC Selected services: Add new and existing hosts to your cluster. Configure network setungs for vMot1on and vSAN traffic.
VM/Host Groups review and customize cluster services , and set up a vSAN
• vSphere DRS
VM/HOSI Ru es datastore.
• vSphere HA
VM Overrides
• 11SAN
VO F~ters
Host Opttons
Host Prof e
Trust Authority )

Alarm Def mttons


I EOIT
Scheduled Tasks

vSAN >

140
4-31 Adding Hosts Using Cluster Quickstart
(2)

You can include hosts in the Add hosts w izard:

1. On the Add hosts page, enter informat ion for new hosts or click Existing hosts to select
from a list of hosts in the inventory.

2. On the Host summary page, verify the host settings.

3. On the Ready to complete page, click Finish.

Add hosts Add new and existing hosts to your cluster x

I Add hosts New hosts (6) Existing hosts (0 from 0)

'} Host summary


a VMI t oo Slim'! Q'e<lfllt•<llS fQ< att hosts

3 Ready to complete
10.198 26 7

10.198.26_8
root

root
• .......
•••••••
II x
x
10.198 269 root ....... x
10.198 2610 root ••••••• x
10.198.26.11 root ••••••• x
10.198 2612 root ••••••• x
.... llOClress O< FODN

C AN C [ L NCXT

The selected hosts are placed in maintenance mode and added to the cluster. When you
complete the Cluster Quickstart configuration, the hosts exit maintenance mode.

If running vCenter Server on a host in the cluster, you do not need to place the host in
maintenance mode as you add it to a cluster using the Cluster Quickstart workflow. The host
that contains the vCenter Server VM must be running ESXi 6.5 or later. The same host can also
be running Platform Services Controller. All other VMs on the host must be powered off.

141
4-32 Verifying vSAN Health Checks
After the hosts are added to the cluster, the vSAN health checks verify that the host has the
necessary drivers and firmware.

2. Add hosts

Not configured hosts: 6


v Advanced vSAN configuration in sync
v Time is synchronized across hosts and VC
v All required hosts are in maintenance mode
v Host physical memory compliance check
v Software version compat1b11ity
v vSAN HCL DB up-to-date
v vSAN HCL DB Auto Update

142
4-33 vSAN Cluster Configuration (1)
To configure the vSAN cluster:

1. Configure the networking settings, including vSphere distributed switches, port groups, and
physical adapters.

2. Set up VMkernel ports for vSphere vMotion and vSAN traffic.

3. Configure DRS, vSphere HA, vSAN, and Enhanced vMotion Compatibility.

4. Claim disks on each host for the cache and capacity tier.

5. (Optional) Create fault domains for hosts that can fail together.

6. On the Ready to complete page, verify the cluster settings and click Finish.

Co nfig ure cluster

1 Distributed switches

St •age 1 arf

4 Aa\ n ed pt ,

5 Cam d ks

Revew

On the Configure distributed switches page, enter networking settings, including distributed
switches, port groups, and physical adapters. Network I/ 0 Control is automatically created on all
switches created. Make sure to upgrade existing switches if using a brownfield vSphere
distributed switch.

In the port groups section, select a distributed switch to use for VMware vSphere vMotion and a
distributed switch to use for the vSAN network.

In the physical adapters section, select a distributed switch for each physical network adapter.
You must assign each distributed switch to at least one physical adapter. This mapping of
physical network interface cards (NICs) to the distributed switches is applied to all hosts in the
cluster.

On the vMotion and vSAN traffic page, it is strongly encouraged to provide dedicated VLANs
and broadcast domains for added security and isolation of these traf fic classes.

143
4-34 vSAN Cluster Configuration (2)
The Cluster Quickstart setup is the ideal t ime to configure the required vSAN options.

Configure cluster Advanced options x


Customize the cluster settings.
1 D1str1buted switches
> vSphere HA
2 Advanced opnons
> vSphere DRS

3 Claim disks v vSAN Options

Deployment type Single Site duster


4

Data At ·R•Sl encryl>(Jon (>

Oata·ln· Tran$t encryption (>


space emciency None
-----
... ©
Fault domains (>
Large sca:e duster su1>1><>rt (> ©
> Host Opllons

> Enhanced vMollon Compal1bl lty

CANCEL B NEX T

Configuring the required vSAN options such as encryption, compression, and fa ult domains
during the Cluster Quickstart setup eliminates the need to enable them later and reduces the risk
of moving or disrupting data availability in the cluster.

144
4-35 Scaling vSAN Clusters Using Cluster
Quickstart
You can also use Cluster Quickstart to scale the vSAN cluster by adding more hosts to an
existing cluster. Al l existing cluster configurations are automatically applied to the new hosts.

Cluster quickstart
I SKIP OUICKSTART

We have collected some common conflguraoon tasks to make 11 ea$1e1' to get your cl~ter up and rum1ng If you prefer to configure your cluster manually. you can choose not to use ttlls automated wori<flow

1. Cluster bastes 2. Add hosts 3. Configure c luster

Selected services: Not configured hosts: I Configure netWOlk settings for vMotion and vSAN traffic,
review and custorr1ze cluster se"'ices. and set up a vSAN
• vSphere ORS v vSAN HCL OB up-to-dote
datastore
• vSphereHA G vSAN HCL OB Auto Update
• vSAN L Sl:SI controller is V '-1.,are cer~'ied
v Cootroller 1s VMware certJtied for ESX1release
v Controller ~r 1s V'-lNa·e cert1f 'id
v Controllef l1·mware is VM,,are cer~fled
v Controllef disk group mode 1s V'-lwa1e certified
v vSAN firmware provid« health
v VSAN firmware \ersion recommendation
v Adv.;oced vSAN con'gurat1on ll sync
v Tme 1s sy~chr01'1Zed across hosts and VC
v com vmware vsan.heOllh.tes: cluster hostonmm :estnome
v Host physocal memory complaaoce check
v SO!twa·e ~rsion compat btkty

11 A DD 11 RE·V.ALIJATE 11 CONftGURE

145
4-36 Skipping the Cluster Quickstart Workflow
Advanced users can skip the Cluster Quickstart workflow and configure the vSAN cluster
manually. However, you cannot later return to the workflow.

: S KIP OUICKSfA~
Cluster q uickstart

I. Cluster basics 2. Add hosts 3. Configure cluster


Confirm skip quickstart workflow x
Selected services: No t configured hosts· 1 Conf.gufe ~!w~ se:tiogs for" '40tl0fl and VS.AN ;riHK
r~ ~ customiz~ clu\.ttt ~f'~. orld Sf't up• vSAN
• vSpl'le-re DRS v vSAN i'iCL 00 vP to cla'C
d.ltastore Thos tab conta.ns a srnplf1~ conf19Uration workflow Jf you dismiss quocksta•t
• vSol°'e-re HA • VS AN i'iCL Do Au•o \,pcla'~
workflow. you will not be able to restore 1t and wil need :o contom.e
• vS.0.!'1.1 scs controln ss VViN..,.e c~rt f"!fd
configuration manually Add1toonally hosts add~ to this clust~r wh~n
../ 01 troflPf IS \I M"dr~ Ctirbfie.d for ES>< relei'i4!
extending the cluster on the future Wiii also have to be configured rnanua ly
.../ COi tr°"er drrYer 1s. \.M,ttarecertified
.../ COi troUer firmwa.-e rs. V'-'.vare c~bfire<:I
.../ Cort1otl~ di~ 9'~ mode a VVware cert fo!d
V VSAN fITTY'ware prov.di!r heal~h
.../ VSAN h.rmware vers.on rKommtnda•aon CANCEL I CON TI NUE

.../ .Advanced wSA'I c~tigura:10f\ 1n sync


V Time as Sy"fW:htOl"l.-zed aoou hosts and vc
V com vmw4-"e VM:in h.:-~l~h 1e1.t c~l.:J Nn·1nmm t€"c.tn.1f'l">f"
v >~' r>hyscoi rrit"l'T'IQ(y ccm~ ... <h""< k
v S< ~w 1 ·~ .,...• .,on CQmP6l boloty

con

146
4-37 Manually Configuring a vSAN Cluster
vSAN can be manually configured on a new or existing vSphere cluster. All hosts must have a
VMkernel network adapter for vSAN traffic configured, and vSphere HA must be temporarily
disabled.

The cluster configure w izard takes you through the vSAN cluster configuration, vSAN services,
disk claim, and fa ult domain setup.

~ ~ My-vSAN-Cluster ACilONS v

v GJ sa-vcsa-01 vclass.local Summary Monitor Configure Permissions Hosts VMs Data stores Net1;vorks Updates
v DJ My-Datacen:er
Services > vSAN is turned OFF .,..._ _ _ _ _ _ _ _ _...,. [ CONFIGURt I
> ~ My-vSAN-Cluster

> 01 SA-Datacen:e Configuration >


Configure vSAN x
Licensing > & Some of the configuration options are not ava lable because there are no hosts 1n the
cluster or some •eatures are not supported on the vCenter se--ver
Trus: Authority
Alarm De~1n tions Space efficiency /\Jone

Scheduled Tasks
CJt Data-At-Rest encryption ©
vSAN v
©
Services

Rekey interval 1 oay v

CJt Large scale clust er support ©

[ CAN CEL I APPLY

147
4-38 Manually Creating a vSAN Disk Group
Disks are assigned to disk groups for either cache or capacity purposes. Each drive can be used
in only one vSAN disk group.

IC! vSAN-Cluster Acr oN~ v Claim Unused Disks tor vSAN x


~t lniW<I ct:i.~ r.o ~ clM:'CCI IOI UC!>< or c-=itt ii" ~ 13 ..l.=cd Ul*CY !6000 Ge » l60 00 G8
•!.l.N Qr. ' C< , . . . _ ol "IP"Clly C::l<~ _ _ , musi ~equal B lo.'1'ed ~ M> oo G8 » ,o oo c;a
• Services v .:.U24 OISK5 en~ 7 0
or or •.a:tt ~ ine nu:rtler GI ~ O-..ks ~ ·or- 00011 » 00011
• C°"'l01¥atJOn Groupoy 0tsA modtVW~
• ~0te
• 'SA~
• .. , '• D•- CMtt'~lt rraesporl 1'fl" ·~tp!tt

• VMware Virtual disk ~ Do nol c1a1m HOO 2 drskS on • llosts Paraoel SCSI
- • II VMware Vlrtual disk ~ Cache ber Flash 1 disk on • llOsts Paranei SCSI
• II VMware Virtual dis~ B Capaclly 11er Flash 2 disks on • hosts ParaUel SCSI
• II VMware Virtual djsk Capac1ry 11e1 Flash 1 d1Sk on • llOsts Parane1 SCSI

I C•~tfl I. .

When you create a disk group, consider the ratio of flash cache to consumed capacity. The ratio
depends on the requirements and workload of the cluster. For a hybrid cluster, consider using at
least 10% of f lash cache to the consumed capacity ratio.

In a hybrid disk group configuration, the cache device is used by vSAN as both a read cache
(70%) and a write buffer (30%). In an all-flash disk group configuration, 100% of the cache
device is dedicated as a write buffer.

148
4-39 vSAN Fault Domains
vSAN fa ult domains can spread component redundancy across servers in separate computing
racks. By doing so, you can protect the environment from a rack-level failure, such as power and
network connectivity loss.

vSAN requires a minimum of three fa ult domains. At least one additional fa ult domain is
recommended to ease data resynchronization in the event of unplanned downtime or planned
downtime, such as host maintenance or upgrades.

Each fa ult domain consists of one or more hosts.

If fault domains are enabled, vSAN applies the active VM storage policy to the fa ult domains
instead of to the individual hosts.

149
4-40 Implicit Fault Domains
Each host in a vSAN cluster is an implicit fault domain by default.

vSAN distributes data across fault domains (hosts) to provide resil ience against drive and host
failure. This approach sufficiently provides the correct combination of resilience and flexibility for
data placement in a cluster in the most environments.

Object
-·-
B RAID-1
FTT=1
< c-o ~-
~-

RAID-1

--
11 I C1 III III C2 I II III v SAN
W11n.••
11 I I 1I I 0 III I
0

vSAN Cluster

150
4-41 Explicit Fault Domains
vSAN includes the ability to configure explicit fa ult domains that include multiple hosts.

vSAN distributes data across these fa ult domains to provide resilience against entire server rack
failure resulting from rack power supplies and top-of-rack networking switch failure.

Explicit fa ult domains increase availability and ensure that component redundancy of the same
object does not exist in the same server rack.

Object
~-

B RAID-1
F 11 =1 < c-o ~-

RAID-1

( 111 0 111 I mII~~~~!ITI1 I111 0 111 I I111 0 111 I


I111lc 1I 111 I ( 111 0 111 I ( 111 !c2! 111 I I111 0 111 I
I111 0 111 I 1111 0 1111 1111 0 1111 I111 0 111 I
1111 0 1111 1111 0 1111 1111 0 1111 1111 0 1111
Rack 1 Rack2 Rack3 Rack4

In this example, you should configure four fault domains, one for each rack to help maintain
access to data if an entire server rack failure occurs.

Standard vSAN clusters using the explicit fa ult domains feature offer great levels of flexibility to
meet the levels of resilience requ ired by an organization.

151
4-42 vSAN Fault Domains: Best Practices
For a balanced storage load and fa ult tolerance when using fa ult domains, consider the following
guidelines:

• Provide sufficient fault domains to satisfy the failures to tolerate value.

• Configure a minimum of three fa ult domains in the vSAN cluster. For best results, configure
four or more fa ult domains.

• Assign the same number of hosts to each fa ult domain.

• Use hosts w ith unit orm configurations.

• Dedicate one additional fault domain with available capacity fo r rebuilding data after a
failure.

• A host not included in an explicit fa ult domain is considered its own fault domain.

• You do not need to assign every vSAN host to a fa ult domain. If you decide to use fa ult
domains to protect the vSAN environment, consider creating equal-sized fa ult domains.

152
4-43 vSphere HA on vSAN Clusters
vSAN, in conjunct ion w ith vSphere HA, provides a high availability solution for VM workloads. If a
host that is running VMs fails, the VMs are restarted on other available hosts in the cluster.

Requirements for vSAN to operate w ith vSphere HA:

• vSphere HA uses the vSAN network for communication.

• vSphere HA does not use the vSAN datast ore as a datastore heart-beating location.
Ext ernal datastores can still be used with this functionality if t hey exist.

• vSphere HA must be disabled before configuring vSAN on a cluster. vSphere HA can be


enabled only after the vSAN cluster is configured.

153
4-44 Enabling vSphere HA on a vSAN Cluster
To use vSphere HA to provide high availability to the VMs that run on the vSAN cluster, the
following requirements must be met:

• ESXi hosts in the cluster must be version 5.5 U1 or later.

• vSAN and vSphere HA must be configured in a specific order.

Disable Enable Reconfigure Reenable


vSphere HA vSAN vSphere HA vSphere HA
• • • •

Disable Disable Reconfigure Reenable


vSphere HA vSAN vSphere HA vSphere HA
• • • •

154
4-45 vSphere HA Networking Differences with
vSAN
When enabling vSphere HA and vSAN on the same cluster, consider the fol lowing points to
ensure that vSphere HA sees the same network topology as vSAN:

• vSphere HA interagent traffic traverses the vSAN network rather than the management
network.

• vSphere HA traffic migrates back to the management network if vSAN is disabled.

Ill 0 Ill Ill 0 Ill

r- .. - ..
~ X Virtual Distributed Switch X •

j
·-··-··-··-··-··-·· .. .. .. ··-··-··-··-··

Management Network

vSphere vMotion Network

vSAN Network
vSphere HA Traffic

155
4-46 Recommended vSphere HA Settings for
vSAN Clusters
When configuring vSphere HA on a vSAN cluster, use the following recommended values.

vSphere HA Setting Recommended Value

Host Monitoring Enabled

Host Hardware Monitoring - VM Disabled


Component Protection

Virtual Machine Monitoring Custom preference

Host Isolation Response Power off and restart VMs

Datastore Heartbeats Disable datastore heartbeats

Host Isolation Addresses Two isolat ion addresses

156
4-47 Enabling vSAN Reserved Capacity
vSAN 7 U1 includes a reserve capacity workflow t o simplify storage capacity management for
vSAN backend operations and maintenance.

Operations reserve capacit y is used for internal vSAN operations, such as object rebuild or

repair.

Host rebuild reserve capacity is used to ensure that all objects can be rebuilt if any hosts fail.

Enable Capacity Reserve SA·VSAN-01 x


vSAN Se1V1Ces T~ R" ~~VS&~ E: 9 I r .., I SAN D W• !hat tl'1'tf• v.111 ~ ~h spa<t In tile CMtf"I 10< 11\l n
OP@fa:Jons to cooiplete wccess.'ufly tnabl1119 ~t rebuld reser.'t' allows vSAN to tolerate one ~t fa e
~ ['A11 '
- rese<vation ts ~ and caoac ty us.lg(' reac~ me 11m new workloads ra1 to d"IJIOV
~ 'l)IT Learn more~

-
OisiOleO ED•

~ EDI.,.
8 Aetu~try wnnen 18 56 GB (9 28%)
EllatJled EDtl

Oi!.atJltd £DtT

Disal:MO [t;Aa.,[

EDtl I CA'<C£L I APPLY

To enable the Host Rebuild reserve, you must have a minimum of four hosts in a vSAN cluster.

To enable, select vSAN Cluster > Configure > vSAN > Services to enable capacit y reserve.
When enabled, the operations reserve and host rebuild reserve options are available.

When vSAN Reserved Capacity reservation is enabled, and if t he cluster st orage capacity usage
reaches the limit, new workloads will fail t o deploy.

157
4-48 Reserving vSAN Storage Capacity for
Maintenance Activities
In earlier versions, 30% of the total capacity was used as slack space. In vSAN 7.0 U1, instead of
slack space, reserved capacity is used.

You can reserve vSAN storage capacity for the following maintenance activities:

• Operations reserve

• Host rebuild reserve

Reserved Capacity = Operations Reserve + Host Rebuild Reserve

Reserve Reserve
Capacity Capac·ty

Usable Capacity Usable Capacity

I111 0 111 II I 11 0 111 I ,,111 0 111 I I •• 0 111 I


• • • •
• • • •
ii
• •
( 111 0 111 II I 11 0 111 I ( 111 0 111 I ( 11 0 111 I
I111 0 111 II I 11 0 111 I I111 0 111 J I1•1 0 111]

I vSA I vSA I
vSAN 7 vSAN 7 U

158
4-49 Planning for Capacity Reserve
The operations Reserve reserves capacity for internal vSAN operations, such as object rebuild

or repair.

The host rebuild reserve reserves capacity to ensure that all objects can be rebui lt if host failure
occurs.

The host rebuild reserve is based on N+1. To calculate the host capacity reserve, divide 100 by
the number of hosts in the cluster. The answer is the percentage reserved on t he hosts:

• For example, in a 20-host cluster, the host rebuild reserve reserves 5% capacity on each
host to ensure sufficient rebuild capacity.

One caveat is t hat vSAN calculates t he amount of capacity to reserve using the the host with
the highest capacity in clusters that have hosts contributing differing capacities:

• For example, in a 20-host cluster with 9 hosts contributing 75 GB, 10 hosts contributing 100
GB, and 1 host contributing 200 GB, vSAN reserves 10 GB (5% of 200 GB) on all hosts, no
matter how much capacity they contribute.

159
4-50 VMware Skyline Health
VMware Skyline Hea lth is the primary and most convenient way t o monitor vSAN health.

VMware Skyline Hea lth provides an end-to-end approach to monitoring and managing t he
environment. It also helps ensure optimal configuration and operation of your vSAN environment
to provide the highest levels of availability and performance.

VMware Skyline Hea lth alert s can typically stem from:

• Configuration inconsistency

• Exceeding software or hardware limits

• Hardware incompat ibility

• Failure conditions

IOI vSA N-Cluster ACT IONS v

Summary Monitor Configure Permissions Hosts VMs

Issues and Alarms > Skyl ine Hea lth


Last checked: 09/0S/2020. 6.50:48 AM RE TES T
Performance >
> Online health
Tasks and Events >
vSphere ORS > > Network

Resource Allocation > > Physical disk


Utilization
> Data
Stora!'.Je overview
Security > Cluster
vSAN v
> Capacity utilization
Skyline Health

Virtual Ob1ects > Hard>'Jare compatibility


Physical Disks
> Performance service
Resvnc1nQ Ob1ects
Proactive Tests > vSAN Build Recommendat ion
Capac1tv
Performance
>Hyperconverged cluster configuration
compliance
Performance DiaQnostics
Support

The ideal methodology to resolve a hea lth check is t o correct the underlying situation. You must
determine t he root cause and fix the issue for all transient conditions.

Health check alert s that flag anomalies for intended conditions can be suppressed.

160
4-51 vSAN Logs and Traces
vSAN support logs are contained in the ESXi host support bundle in the form of vSAN traces.
vSAN support logs are collected automatically by gathering the ESXi support bundle of all hosts.

Because vSAN is distributed across multiple ESXi hosts, you should gather the ESXi support logs
from all t he hosts configured for vSAN in a cluster.

By default, vSAN traces are saved to the Iv a r I 1 o g Iv s ant races ES Xi host system
partition path. The traces can also be accessed from a symbolic link in Iv s ant races.

VMware does not support storing logs and traces on the vSAN datastore.

When USB and SD card devices are used as boot devices, the logs and traces reside in RAM
disks, which are not persistent during reboots.

Consider redirecting logging and traces to other, persistent storage when these devices are
used as boot devices.

For more information about redirecting vSAN logs and traces, see VMware knowledge base
article 1033696 at https://kb.vmware.com/s/article/1033696.

161
4-52 Backup Methodology
Regardless of the storage system, backup and restore operations are fundamental to achieving
your organization's recovery point objectives:

• Frequent ly back up VMs deployed on vSAN datastores.

• Periodically test backups.

• Consider the retention span for backups.

• Store backups in a secure place.

A range of backup products are compatible with vSAN.

For more information about backup solutions supported by VMware vSAN, see VMware
knowledge base article 56975 at https://kb.vmware.com/s/article/56975.

162
4-53 Lab 2: Configuring a Second vSAN
Cluster
Manually configure the vSA N cluster and verify cluster information using the command line:

1. Create a Cluster

2. Configure the vSAN Cluster Using Quickstart

3. Verify vSAN Cluster Details from the Command Line

4-54 Lab 3: Working with vSAN Fault Domains


Create fa ult domains and examine their effect on VMs:

1. Configure vSAN Fault Domains

2. Verify VM Compliance

3. Prepare for the Next Lab

163
4-55 Review of Learner Objectives
• Deploy and configure a vSAN cluster using Cluster Quickstart

• Manually configure a vSAN cluster using the vSphere Client

• Explain how to use VMware Skyline Health

• Describe vSAN cluster backup met hodology

4-56 Key Points


• Ensuring hardware compatibility is vital for vSAN performance.

• vSphere Lifecycle Manager can automate driver and firmware updates for supported
controllers.

• vSAN is a cluster-based solution. Creating clusters is the first logical step in the deployment
of this solution.

• The new, streamlined method to configure vSAN clusters is to use Cluster Quickstart.

• During t he Cluster Quickstart setup is t he ideal time t o configure required vSAN services.

• Advanced users can skip the Cluster Quickstart workf low and configure vSAN clusters
manually.

Questions?

164
Module 5
vSAN Storage Policies

5-2 Importance
Storage policies are the logical rule sets that define how vSAN distributes objects and
components across a datastore. These ru les are how vSAN meets specific business needs for
redundancy, performance, high availability, and other benefits.

5-3 Module Lessons


1. vSAN Storage Policies

2. Analyzing vSAN Objects and Component s Placement

165
5-4 Lesson 1: vSAN Storage Policies

5-5 Learner Objectives


• Explain how storage policies work with vSAN

• Detail how to define and create a VM storage policy

• Describe how to apply and modify VM storage policies

• Explain how to change VM storage policies dynamically

• Identify the VM storage policy compliance status

166
5-6 Storage Policy-Based Management
Storage Policy-Based Management (SPBM) helps you ensure that VMs use st orage that
guarantees a specified level of capacity, performance, availability, redundancy, and so on.

Storage policies help you meet t he fallowing goals:

• Categorize storage based on certain levels of service

• Provision VM disks f or optimal configuration

• Protect dat a through object-based fa ult tolerance

These storage characteristics are defined as sets o f ru les.

Knowing approximately how many objects and components are needed and their capacity
consumption guides the planning decisions for the datastore. Storage policies define these
numbers by applying storage requirements to objects which determines how many component s
any particular object will have. Planning both your most commonly applied policy and the other
edge case policies that you might need, and knowing what systems those policies are applied t o
is crit ical.

When st orage is provisioned, vSphere leverages the Storage Policy-Based Management


(SPBM) module to match the policy requirements against storage capabilities to determine
where the object and it s components exist.

VM storage policies are used during the provisioning of a VM to ensure that VM objects and
components are placed on the datastore t hat is best for its requirements. Ideally, you want t o
create the best mat ch of predefined VM storage requirements wit h available physical storage
properties.

167
5-7 Defining Storage Policies: vSAN Rule
Sets
vSAN rule sets:

• Are specific to vSAN clusters

• Include placement rules t hat describe VM storage requirements

• Include advanced policy rules that allow for additional storage requirements

Availability Advanced Policy Rule s Tags

Number of disk stripes per object CD 1 v

Availabilit y Advanced Policy Rules Tags IOPS limit for object CD 0

Object space reservation CD Thin provisioning v


Site disaster tolerance CD None· standard cluster v
ln~ialy reserved stora!}e space for 100 G8 VM disk v1<>uld be O B

Failures to tolerate CD 1failu re· RAID-1 (Mirroring} v


Flash read ca che reservation(%} CD 0
-
Consumed storage space for 100 GB VM disk would be 200 GB
Reserved cache space for 100GB VM disk would be OB

Disable object checksum CD


Force provisioning CD

When you configure a VM storage policy, the vSphere Client and vSphere Web Client display
the datastores that are compatible with capabilities of the policy.

When a VM storage policy is assigned to a VM, datastores are grouped into compatible and
incompatible categories. Assigning the VM to a datastore incompatible with the storage policy
puts the VM in a noncompliant state.

By using VM storage policies, you can easily see which storage is compatible or incompatible.
You can eliminate the need to ask t he SAN administrator or refer to a spreadsheet of NAA IDs
each time you deploy a VM.

168
5-8 Storage Policy Naming Considerations
You use a standardized storage policy naming structure to ensure that policies are applied
appropriately.

Youcanusethe [ClusterName ] - [Workloads ] - [ IntentionofPolicy ] -


[OptionalSet t ingindica t ors] naming structure for workload VMs:
• Cluster01-ManagementVMs-BasicProtectionEnhancedPerf

• MultiCluster-ProductionVMs-BasicProtectionEnhancedPerf

• MultiCluster-ProductionVMs

You can apply the [ApplicationName ] - [ In t entionofPolicy ] -


[ OptionalSet t ingindica t ors] naming structure for applications:
• App-SharePointWebFarm-BasicProtectionSpaceEfficient

• App-SharePointSQLBackEnd-EnhancedProtectionEnhancedPerf

169
5-9 Monitoring Storage Policy-Based
Management
A storage policy defines a set of capability requirements for VMs:

• Storage policies are based on vSAN capabilit ies.

• Storage policies cannot be deleted w hen in use.

• Storage policies are monitored for compliance by vSAN.

Po lic ies and Profiles


{tJ VM Custom1zat1on Spec1ficat1ons VM Storage Policies
. . Vf"1 Storage Policies
CREATE EDIT CLO N E CHECK REAPPLY DELETE
""' Host Profiles
Storage Policy Components
[] Name vc
O Host-local PMem Default Storage Poll.. @ sa-vcsa-01 vclass local

0 Management Storage policy - Encryp .. fill sa-vcsa-01 vclass local

v • No Redundancy r~~ sa-vcsa-01.vclass.local

0 VM Encryption Policy @ sa-vcsa-01.vclass local

0 vSAN Default Storage Policy G: sa-vcsa-01 vclass local


0 Wol No Requirements Policy ~ sa-vcsa-01.vclass.local

Rules VM Compliance VM Template S1orage Compa1ibili1y

Name Compliance Status Last Che<:ked

6J SA-Payload-01 ../ Compliant Mar 1, 2021 9:59 PM

vSAN monitors and reports policy compliance during the VM life cycle. If a policy becomes
noncompliant, vSAN takes remedial actions. vSAN reconfigu res the data of the affected VMs
and optimizes the use of resources across the cluster.

Planning for these operations is not necessary. Standard daily operations, such as
reconfiguration processes, occur w ith minimal effect on the regular workload and are accounted
for in VMware best practices.

170
5-10 VM Storage Policy Capabilities for vSAN
Aside from the objects on the datastore themselves, storage policies are highly influential on
vSAN datastore planning.

Storage policies are created using one or more vSAN rules.

Storage Policy Capability Use Case Potential Planning Impact

Failures to tolerate Redundancy High

Number of disk stripes per object Performance Low to moderate

Flash Read Cache reservat ion(%) Performance None to low

Force provisioning Override policy None to low

Object space reservation(%) Capacity planning Moderate to high

IOPS limit for object Performance None to low

Disable object checksum Performance None

Because the vSAN storage policies are critical to ascertain the object and component needs of
the datastore, which policy is applied to the majority of objects is an important consideration
when planning the deployment and can have a significant impact on a datastores architecture.

The default vSAN storage policy is created and implemented when vSAN is enabled on a
cluster. This policy contains a rule set with all rules defined at t heir default values. A VM w ith the
default policy applied supports a single failure and is striped across a single drive.

171
5-11 About Failures to Tolerate
The Failures to Tolerate configuration has a significant effect on datastore planning.

The number of failures to tolerate and the method used are important in determining how many
components are deployed to the datastore, how much capacity is consumed, and how the data
is distributed.

A vail ab ilit y Advanced Policy Rules Tags

Site d isaster t o lerance CD No ne - stan d ard cluster v

Fail ures to t ol erate CD No data redundan cy v

No data redundancy • uld be 100 GB


1 fa il ure - RAID-1 (Mirro ring }
1 fa il ure - RAID-5 (Erasure Co ding}
2 fa ilures - RAID-1 (Mirroring}
2 fa ilures - RAID-6 (Erasure Coding }
3 fa ilures - RAID-1 (Mirroring}

The number of failures to tolerate sets a requirement on the storage object to remain available
after a specified number of failures corresponding to the number of host or drive failures in the
cluster occurs. This value specifies that configurations must contain at least a number of failures
to tolerate + 1 replica.

Witnesses ensure that the object data is available even if the specified number of host failures
occur. If the Number of Failures to Tolerate is configured to 1, the object cannot persist if it is
affected by both a simultaneous drive failure on one host and a network failure on a second
host.

Consider the following use case: VMBeans approaches their professional services vendor to
advise them of a policy configuration for their vSAN datastore. The customer is using a four-
host cluster and wants to incorporate fault-tolerance on their systems. The vendor recommends
that a storage policy of Failures to Tolerate of 1 is attainable, using their current configuration.

The failure tolerance and method configuration:

• RAID 1 is used for objects when the number of failures to tolerate is 1, 2, or 3.

• RAID 5 is used for objects when the number of failures to tolerate is 1.

• RAID 6 is used fo r objects when the number of failures to tolerate is 2.

172
5-12 Level of Failures to Tolerate
The number of failures tolerated by an object has a direct relationship w ith the number of vSAN
objects and components created.

In this example, vSAN uses RAID 1 to ensure data availability:

• For n fai lures that are tolerated, n+ 1 copies of the object are created.

• For n fai lures that are tolerated, 2n+ 1 hosts contributing storage are required.

• The default number of failures tolerated is 1.

• The possible numbers of failures to tolerate are 0 through 3.

Storage Policy ./-


vm • FTT 1 - RAID 1 ./-
c-o
...
I I
I I
L.
(
,----------- -------------' I
I NVMe I NVMe ) VMe NVMe I
I I
I
I
I
.
.
SSD
• Bl: SSD :I SSD
• I
I
I
I
: Replica : SSD : I: SSD :t Replica I
I I
I
: Disk Group Disk Group Disk Group I
I
: (111 0 1111 (111 0 1111 (111 0 1111 I
I I

',_ - --- - - -~___


v_Sp__h_e_re_ _~_ _ _
vS_A_N_ _ _~- - ----- -~'

In t he example, the VMDK object tolerates 1 failure (FTT 1) and uses RAID 1 (Mirroring) to
protect from that one failure which means that 1 object is represented by 3 components on the
datastore. Because vSAN provides t his protection using mirroring, two full copies of t he data
exist so that one copy remains in place if the other becomes inaccessible.

173
5-13 vSAN Data Protection Space
Consumption

Tolerated Failures RAID Type Minimum Hosts Total Capacity


Required Requirement*

Failures to To lerat e = 0 No data redundancy 3 1x


(RA ID 0)

Failures to To lerat e = 1 RAI D 1 (Mirroring) 3 2x

Failures to To lerat e = 2 RAID 1 (Mirroring) 5 3x

Failures to To lerat e = 1 RAID 5 (Erasure 4 1.33x


Coding)

Failures to To lerat e = 2 RAID 6 (Erasure 6 1.5x


Coding)

Failures to To lerat e = 3 RAID 1 (Mirroring) 7 4x

* Total capacity requirement without considering deduplication or compression

Ensure t hat coding has a significant space savings over mirroring. Space savings provided by
using RAID 5 (erasure coding) is 33% less t han mirroring. Space savings provided by using RAID
6 (erasure coding) is 50% less t han mirroring.

174
5-14 Comparing RAID 1 Mirroring and RAID
5/6 Erasure Coding
The number of failures t o tolerate and t he met hod used to tolerate those failures have a direct
effect on the architecture of the dat astore, including how many hosts to use, how many disk
groups are on each host, and t he overall size of the dat astore.

Failures to RAID 1 (Mirroring) RAID 5/6 (Erasure Coding) RAID 5/6


Tolerate Savings
Minimum Hosts Total Capacity Minimum Hosts Total Capacity
Required Requirement Required Requirement

0 3 1x N/A N/A N/A

1 3 2x 4 1.33x 33% less

2 5 3x 6 1.5x 50% less

3 7 4x N/A N/A N/A

Erasure coding provides significant capacity savings over mirroring, but erasure coding incurs
additional overhead in IO PS and network bandwidth. Erasure coding is only support ed in all-flash
vSAN configurations.

While mirroring techniques excel in workloads where performance is a critical factor, t hey are
expensive regarding the amount of capacity t hat is required. RA ID 5/6 (erasure coding) can be
configured to help ensure the same levels of component availability while consuming less
capacity t han RAID 1.

The use of erasure coding results in a smaller capacity consumption increase for the same
number o f failures t o tolerat e, but at a cost of addit ional host requirements and write overhead,
in comparison to mirroring. This additional overhead is not unique t o vSAN and is common
among current storage platforms.

The space savings of erasure coding is guaranteed. For example, assigning a policy with a RAID
5 erasure coding rule reduces capacit y consumption by 33 percent, compared to the same level
of availability (FTT= 1) wit h RAID 1 mirroring.

Object s with RAID 5 or 6 applied are considered to have additional stripe properties not defined
by the number of st ripes rule:

• A RAID 5 object has a stripe width of 4 .

• A RAID 6 object has a stripe width o f 6.

175
For more information about differences between vSAN RAID 5 and traditional hardware-based
RAID 5, see Use of Erasure Coding in VMware Virtual SAN 6.7 at
https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.virtualsan.doc/GUID-
AD408FA8-5898-4541-9F82-FE72E6CD6227.html.

Consider the fallowing use case. VM Beans is planning to use the default storage policy with a
fa ult tolerance of 1. They have 1 cluster w ith 4 hosts, which is sufficient to support the failure
level. However, upon investigation, VMBeans notes that the datastore will likely have capacity
challenges with the mirroring configuration. VMBeans determines that changing from RAID 1 to
RAID 5 erasure coding will regain some of the used space and still provide fault tolerance of 1.

176
5-15 Number of Disk Stripes per Object
When a cluster is being planned, stripes are a direct contributor to the number of components
that make up a vSAN object and the increased need for capacity devices and disk groups.

The stripes per object:

• Have a default value of 1

• Have possible values of 1 through 12

• Are placed on disparate capacity devices

vm Storage Policy
• FTI 1 - RAID 1 ./-
• Stripes 2 ./-
c-o
..
I I
I I
L. ..
RAID 1
-
I
I NVMe I
I I
I . . • • • ~. . . I
1 Stripe sso SSD ss o SSO I
I • • . • I
I . . . . . =:t.
....,_.. J:=::f:::l . . . . I
I SSD SSD SSD Stripe Stripe SS D • SSD Stripe :
I . . . . . .rt::::=:r "l::::::::fi · .
I
: Disk Group Disk Group Disk Group Disk Group 1

: 1111 0 1111 1111 0 1111 1111 0 1111 1111 0 1111:


I I
I I

L--------_I__vs_p_he_r_e --~ vSAN ~------J

Overusing stripes has the potential to affect the ability of vSAN to balance components across
the datastore. This effect can be cascading, depending on how it interplays with other policy
settings. For example, an object mirrored once would create a single component per replica.
With a stripe width of two, those two replica components become four stripe components.

To understand the effect of the number of stripes, examine the context of write operations and
read operations. All writes go to the cache device and the value of an increased number of
stripes might not improve performance. The new stripes might use a different disk group or be in
the same disk group and therefore use the same cache device.

177
From a read perspective, an increased number of stripes helps when you experience many
cache misses. For example, if a VM consumes 4,000 read IOPS and experiences a hit rate of 90
percent, 400 read operations must be serviced directly from magnetic drives. A single hard
drive might not be able to service those read operations, so an increase in number of stripes
might help.

In general, the default number of stripes of 1 should meet the requirements of most, if not all, VM
workloads. Any performance improvements are dependent on the workload.

Consider the following use case. VMBeans has another location with a four-host hybrid cluster
and is concerned that a number of their more I/ 0-intensive application servers are performing
less than optimally. After investigating the performance statistics for the cluster, VMBeans
determines that there is a higher number of cache read misses for these servers than for others.
The administrator adds a stripe value of 2 to the policy that applies to those servers. The
increased stripe value spreads the data across more disk groups ensuring more 1/0 paths and
the cache read miss problems are resolved.

178
5-16 Planning Considerations: Stripe Width
The number of disk st ripes per object value determines t he number of capacity devices across
which each storage object copy is striped.

For example:

• A storage policy supports 1 failure, so there are 2 copies of t he data.

• The storage policy specifies a stripes value of 3.

• The minimum number of drives is 6, which is t he t otal of dat a copies multiplied by the
number of stripes.

For data resiliency, st ripes from different mirrors o f the same object are never placed on t he
same host.

179
5-17 Flash Read Cache Reservation
The flash read cache rule is used only for hybrid configurations that allocate a portion of the
cache device to one or more VMs.

Use the flash read cache reservat ion w ith caution:

• Thin-provisioned storage is often used to overcommit storage allocations.

• When the f lash read cache is allocated, the rule uses the fu ll allocated storage of the object,
not the currently consumed storage.

Storage Policy ./-


• Flash Read Cache 25°/o ./-
----------c-o
20 GB
vmdk

I ----------- I
I
I
• I
I
I
I SSD I
I
40 GB Cache
I
I • I
I
I I

• I
I
I
I
I
0 I 400 GB Capacity
I
I
I
• • I
I
I I
5 GB of cache is reserved I Disk Group
for a single object. ----------- I

When planning a vSAN hybrid cluster, consider whether any application servers might benefit
from this sort of reservat ion and then size the cache device accordingly.

By default, a disk group cache device shares the read cache equally, based on demand between
the objects stored on the disk group. When a flash read cache reservat ion is applied, a specified
portion of the cache device is reserved for a specific vSAN object.

Reserved read cache is specified as a percentage of the logical size of the object:

• Reserved flash capacity cannot be used by other objects.

• Unreserved flash capacity is shared equally among all objects.

180
• Def ault value: 0 percent.

• Possible values: 0 or 100 percent.

• This setting has no effect on all-flash architecture and is intended for hybrid configurations.

As a best practice, avoid applying any flash read cache reservat ion to t he VM home object
because it w ill not benefit from this policy.

This value specifies the logical size of the st orage object in percentage up to the ten-thousandth
place (four decimal places). This specific unit size is needed so that administrat ors can express
appropriate sizes as t he SSD capacity increases. For example, in a 1 TB drive, if an administrator
is limit ed to 1 percent increments, these increments are equivalent to cache reservations in
increments of 10 GB. This value is t oo large for a single VM in most cases. Ideally, the read cache
should match t he working set of t he v irtual disk to maximize the read cache hit rate. The
reservation should be set t o 0, except to solve specific use cases regarding read-int ensive VMs.

Consider the fal lowing use case. VM Beans is experiencing a problem wit h one of t heir high-
traffic VMs. Occasionally, the VM fails to service request s for data in a timely fashion. After
investigating, IT finds that the cache device on the disk group does not have sufficient resources
to service the critical VM with the other requests that come in. The best solut ion would be to
move f rom a hybrid architecture to an all-flash vSAN archit ecture. Until they migrate to an al l-
f lash architect ure, VM Beans applies a read cache reservation to the VMDK of the VM. This policy
ensures that a portion of the cache device is always available to that VM object. The critical VM
now has t he necessary resources available for when the read requests are high.

181
5-18 Force Provisioning
Force provisioning allows an object to be created despite not having sufficient resources in the
cluster. vSAN makes the object compliant when addit ional resources are added.

Force provisioning carries additional considerations to be addressed during the planning of the
datastore:

• Placing hosts in maintenance mode could affect the accessibility of a VM with no failure
tolerance.

• Placing a host in maintenance mode could affect the pert ormance of the cluster if a large
number of force-provisioned machines must be moved.

• Consider the resources consumed by noncompliant VMs when adding resources to the
cluster.

Storage Policy
• FTT 2 - RAID 1 ./

vm - c-

...
I ...._...... vmdk._____ I
I I
L..
#'------------ --------------------------, \
'
I
I
I
I
I I
I
I NVMe I ( NVMe I :
I
1
=::::::=
-. .
-..-- . . . . I
:
I
I
. Replica


SSD
_......,.

__


SSD


----I

.
SSD

.
.
.
SSD

.
I
I
I
. . . . • • • •
I SSD SSD SSD SSD SSD SSD I
I . . . . • • • • . . . . I
Disk Group I
: Disk Group Disk Group I
I I
I
_____ I

' 4111s----- I
;

vSphere vSAN

The diagram shows a VM provisioned by using a policy in which the number of failures to
tolerate is set to 2. Using the 2n+1 equation for the number of fai lures, the policy requires at least
five hosts in the cluster. Using force provisioning, the VM is deployed to tolerate 0 failures and 1
stripe. When additional resources are available, vSAN makes the VM compliant with its policy.

182
vSAN prioritizes creating data components on the datastore so t hat, after the required
resources become available, data components are created first to secure the data as early as
possible.

Force provisioning overrides the fallowing policy rules to provision VMs to a dat astore unable to
meet the policy requirements.

• The Level of Failures to t olerate is 0

• The St ripe Width is set to 1

• The Flash Read Cache Reservation is set to 0.

Consider the fol lowing use case. VMBeans is deploying a vSAN cluster at a new locat ion and is
experiencing a delay in t he shipment o f some of the hardware that is required to fu lly implement
their solution. Some of t he new VMs need more storage for fa ult tolerance than t he cluster can
provide wit hout the delayed hardware.

The new location has a hard deadline for operational reasons, so the Operations team must
immediately build and configure the VMs. The team craft s a policy using force provisioning to
apply to the VMs. The VMs deploy even though there are insufficient resources for the policy
because force provisioning is enabled. When the addit ional resources are added to the cluster,
vSAN rea llocates t he objects to meet all policy needs.

183
5-19 Object Space Reservation
When planning a vSAN datastore, you must consider whether objects are provisioned thin, thick,
or somewhere in between. VMs are thin-provisioned by default.

The level of reservation dictates both real storage versus logical storage consumption, as well as
deduplication and compression:

• If deduplication and compression are disabled, objects can be provisioned with 25%, 50%,
75%, and thick configurations.

• If deduplication and compression are enabled, space reservation is limited to thick and thin
configurations.

Storage Policy ./-


• Reservation 50°/o ./-
c-o
vm

,,,.------------
I
----------- ....' I
I
I I
I I
I NVMe I
Replica
I
I

I: sso :11:sso ] I
I
I
I
I
I
• •
I: sso :11: sso :I •
I
I
I
I Disk Group Disk Group Disk Group I
I
I

'
____ _
.....
vSphere vSAN ------""
I
I

This capability defines the percentage of the logical size of the storage object that is reserved
during initialization. Reserved storage is thick provisioned (lazy zero) to the value of the setting
and the remainder is thin provisioned. Lazy zero provisioning is used in calculations for total
capacity but does not consume the space. The value is the minimum amount of capacity to be
reserved.

As a best practice, avoid applying any object space reservation to the VM home object because
it w ill not benefit from this policy.

184
When used with deduplication and compression, object space reservation results in reduced
capacity availability because space must be reserved in case the blocks become non-unique.
This does not affect performance.

Consider the fallowing use case. The Operations team is building a new VM for a business unit
and one of the requirements is that its drives are thick provisioned from the beginning. The team
deploys the VM using a policy that reserves the space that is required for the VM DK.

185
5-20 IOPS Limits for Objects
IOPS limits for objects is a quality-of-service feature that limits the number of IOPS t hat an
object can consume.

The I/ 0 size for the 10 PS limit is normalized to 32 KB.

The IOPS limits for objects have two primary use cases:

• To prevent workloads from affecting other workloads by consuming too many IOPS

• To create artificial standards of service as part of a tiered service offering, using the same
pool of resources

vm vm vm Storage Policy ./-


• IOPS 500 ./-
------c-o
I I
vmdk ....----~vmdk..__....~vmdk
I I
I I
L - - - - - - - - - - ..
·-
'>
vS phe re vSAN
.____ _.___,A
By default, vSAN seeks to dynamically adjust performance based on demand and provide a fair
weighting of resources available.

Limiting the IOPS of one or more VMs might be advantageous. In environments with a mix of
both low and high utilization, a VM with low utilization during normal operations can change its
pattern and consume larger amounts of resources, preventing other VMs from operating
properly.

One chunk of I/ 0 data is 32 KB in size.

Consider the following use case. One of the VMBeans servers that was deployed to a vendor-
maintained vSAN datastore has performance issues when usage is at its highest. The vendor
investigates and informs VMBeans that they are exceeding their IOPS for the server during peak
times. T he vendor suggests that VMBeans invest in a higher tier of service for that VM to
prevent this from affecting t heir operations.

186
5-21 Disabling Object Checksums
Software checksums are used to detect data corrupt ion t hat might be caused by hardware or
software components.

vSAN includes software checksums w it h t he fallowing benefits:

• Automatically detect and resolve silent drive errors

• Rebuild corrupted dat a from other mirrors or dat a/parity stripes

• Perform drive scrubbing in the background

• Enabled for all objects by default

• Disabled per object wit h VM storage policies

The checksums feature can be disabled if t his functionality is already included in an application
such as Oracle RAC.

Software checksums can detect corruption that might be caused by hardware or software
components during read or write operations.

The following t ypes of corrupt ion exist for drives:

• Latent sector errors: Are typically the result of a physical drive malfunction.

• Silent data corrupt ion: Can lead to lost or inaccurate data and significant downtime. No
effective means of detection exists wit hout end-t o-end integrity checking.

During read/writ e operations, vSAN checks for t he validit y o f the data based on the checksum.
If the data is not valid, vSAN takes t he necessary steps to either correct the data or report it t o
the user to take action.

vSAN has a drive-scrubbing mechanism that periodically checks the data on drives for errors. By
default, the data is checked once a year but this period can be modified with the
VSAN.ObjectScrubsPerYear advanced ESXi host setting.

187
5-22 Assigning vSAN Storage Policies (1)
The vSAN datastore has a default storage policy configured with standard parameters to
protect vSAN dat a. However, you can create user-defined cust om vSAN storage policies.

Default Policy

Failures= 1
Stripes= 1

Object Object Object Object

Default Policy User Defined User Defined

Failures= 1 Failures= 2 Failures= 1


Stripes =1 Stripes = 1 Stripes= 3

Object Object Object Object

When a vSAN datastore is created, a default policy is applied.

The vSAN default storage policy has the following features:

• Tolerates a single failure

• Support s a single disk stripe

• Does not force provisioning

• Reserves 0 storage

• Reserves 0 flash read cache

A datastore's default storage policy should have a rule set t hat applies to t he w idest range of
VMs t hat are to be hosted on the datastore. Individual VMs should have a custom st orage policy
applied that overrides the default policy for the datastore, as needed. When most VMs use the
default dat astore policy, the overhead of policy administration and compliance is minimized.

188
5-23 Assigning vSAN Storage Policies (2)
The VM home directory, virtual disks, and the VM swap object can have user-defined custom
vSAN storage policies t hat override the default storage policy.

0 Name T Pla<iement and Availab y storage Policy T UUID

"' 0 (!l SA-Workload -01 ~ Healthy

0 o Hard disk 1 G Healthy (!} vSAN Default Storage Policy 94814760-68a1-b 525-aafa-005056013 66f

0 o Hard disk 2 G Healthy ~ FTT1 - 2 Stripes afe84860-9a6f-0243-5666-00505601366f

0 r::J VM home G Healthy (!} vSAN Default Storage Policy 92 814760-c472-a Sc c-3 759-005056013 66f

A datastore's default storage policy should have a rule set that applies to t he w idest range of
VMs that are to be hosted on the datastore. Individual VMs should have a custom storage policy
applied that overrides the default policy for the datastore, as needed. When most VMs use the
default datastore policy, the overhead of policy administration and compliance is minimized.

189
5-24 Storage Policies and the VM Home
Object
The VM home object does not apply storage policies in the same way as other objects.

The VM home object can be configured to:

• Tolerate failures

• Use mirrored or RAID failure methods

• Force-provision

Other policy rules, such as stripes, do not affect the VM home object.
Physical Placement 2 obj e cts
O Group comp onents by host placement
Virtual Object Componen1s

Ty pe Component State Ho.st

v BJ SA·Payload·01 > C Hard disk 1

Witness O Active O sa-esxi-02.vclass.local

v RAID 0

Component O Active O sa-esxi-03.vclass.local

Component G Active LJ sa-esx i -03. v c Iass.Io ca I

Component G Active LJ sa-esxi-03.vclass.local

v RAID 0

Component 0 Active D sa-esxi-01.vclass.local

Component O Active D sa-esxi-01.vclass.local

Component O Active D sa-esxi-01.vclass.local

v ~ SA-Payload-01 > tJ VM home (

Component G Active Ll sa-esxi-01.vclass.local

Component O A ctive 0 sa -esx i-02.vclass.loc al

Witness O Active D sa -esx i-03.vclass.loc al

[)

The VM in t he example has a storage policy t hat tolerates 1 failure and 3 stripes. The VM home
object is mirrored to tolerate 1 failure but is not striped across multiple drives. In contrast, t he
hard disk has 1 object that is mirrored and each mirror is striped across 3 drives.

The VM home object is the location where VM configuration files, such as .vmx, .log, digest files,
and memory snapshots are stored.

190
The VM home object overrides the following storage policy rules:

• Stripe width does not exceed 1 stripe.

• Object space reservat ion does not exceed 0%.

• Read cache reservation does not exceed 0%.

191
5-25 Viewing Object and Component
Placement (1)
You can examine each VM to see where its components are physically located.

& New Virtual Machine ACTIO NS v

Summary Monitor Configure Permissions Datastores Networks Snapshots Updates

Issues and Alarms v 0 Group components by host placement

All Issues Virtual ObJect Components


Triggered Alarms ...
Type Compon~nt State Host F1
Performance v

Overview v tJ VMhome (RAID 1)


Advanced
Component O Active 13 sa-esx1 01.vclass local
Tasks and Events v
Component O Active 13 sa-esx1-02 vclass local
Tasks
Events Witness 0 Active ti sa-esxi-03 vclassJocal
Utlhzat1on
v (.;;;;) Hard d sk 1 (RAIO 1)
vSAN v

..... Physical d sl< placement


Performance
I Component O Active [3 sa esxl 01 vclass local

Component O Active 13 sa-esx1 03 vclass local

Witness O Active tJ sa-esx1 02.vdass loca1



6 components on 3 hosts

When vSAN creat es an object for a virtual disk and det ermines how t o distribute the object in
the cluster, it considers t he following factors:

• vSAN verifies t hat the v irtual disk requirements are applied according t o the specified VM
storage policy settings.

• vSAN verifies t hat the correct clust er resources are used at the time of provisioning. For
example, based on t he protect ion policy, vSAN determines how many replicas to create.
The performance policy determines the amount of Flash Read Cache allocated for each
replica and how many stripes to create for each replica and where to place them in the
clust er.

• vSAN continually monit ors and report s the policy compliance status o f t he virtual disk. If you
find any non-compliant policy status, you must troubleshoot and resolve the underlying
problem.

192
5-26 Viewing Object and Component
Placement (2)
To view the component layout, select the objects that make up a VM and click VIEW
PLACEMENT DETAILS.
VIE\\' PLACEMENT DETAILS

0 Name T Placement and Ava1labi T storage Policy T UUIO T

v 8 tp SA·Payload-01 O Healthy


r I
a Hard disk l

Q Hard disk l - SA.Payload-Ol.vmdlc


O
O
Healthy

Healthy
(l\
(j1
No RecluAdancy

No Reclulldancy
d45d3d60-507f-738f-478e-00505601366f
c3503 d60-93a3-d a3 0-64t7 -00505601366'


0
Q Herd d isk 2
a Hard d isk 2 · SA-Payloacl-Ol_lvmdk
O
0
Healthy

Healthy
(2t
G1
vSAN Default Storage Polley

vSAN Default storage Policy


d45d3d60-760e-9bca-Oc25-00505601366t

aa533d60-S8f9..S568-6251-00505601366f

0 CJ VM home O Healthy l71 No ReduAdancy c0503d60-f9a 7-cf3a-2391-0050560l366f

II .1..-.. '11"'1 e-
' ('t\

In the vSphere Client, the administrator can view the location of each object and component.
The Capacity Disk Name column provides the physical drive name to which a particular object is
deployed.

In the example, the VM hard drive is protected by a policy to tolerate 1 failure through mirroring.
Two replicas and one t iebreaker w itness component exist, each on a separate host.

193
5-27 Viewing Object and Component
Placement (3)
The component layout shows where the data is located on the physical datast ore components,
down to the specific disk.

The component layout depends on multiple fact ors, including:

• Object size

• Redundancy

• Stripes

Tyoe Component State Host Colldle OIJk

v {jJ SA-Paytoad·Ol >U Hard disk 1 (RAID 1}

Witness O Active 13 tnf·esxi·04.vclass local U Local Vt'1ware Disk (mpx vmhba

Component O Active D inf·esxi-03 vclass local g Local VMware Disk (mpx vmhba

Component O Active 13 inf-esxl-02 vclass local g Local Vf\1ware Disk (mpx vmhba

v (lJ SA-Payload·Ol > ("';) Hard disk 2 (RAl01)

Component 0 Active 13 tnf-esxi-03 vclass local U Local VMware Disk (mpx vmhba

Component O Active l[J int-esxl-01.vclass local U Local VMvl are Disk (mp>. vmhba

Witness O Active (j inf-esxi-02.vclassJocal g Local Vf\1v.'are Disk (mpx vmnba

194
5-28 Verifying Individual vSAN Object
Compliance Status
The compliance status shows whether an object is compliant with its assigned storage policy:

• Component failure can cause an object to become noncompliant.

• Noncompliance triggers the VM storage compliance alarm.

• If an object is noncompliant, troubleshooting is required.

~ sa-vm-01 .vclass.loca l A CT IO NS v

Summary Monitor Configure Permissions Data stores Networks Snapshots Updates

Settings v Policies [ EDIT V M STORA GE POLICIES I


V M SDRS Rules CH ECK VM S TO RAG E PO LI C Y C OMPLI ANC E RE- A PP Y VM S TORAGt:- POL CY

vApp Options

Alarm Definitions
Name T VM Storage Polley
T I Compliance Stat us T I Last Checked T

Scheduled Tasks 0 U VMhome SA·VSAN ·01_YM_RAID5_Policy v Compliant 09/11/2020, 2:41:4 0 AM

Policies 0 ~ Hard disk 1 SA ·VSAN ·01_YM_RAID5_Policy v Compliant 09/11/2020, 2:41:4 0 AM

VMwareEVC

Guest user MappinQs

2 items

195
5-29 Verifying Individual vSAN Component
States
To verify the individual vSAN component state, select a VM and select Monitor > vSAN >
Physical Disk Placement.

If the vSAN component state is not Active, but specifically Absent or Degraded, the object is
noncompliant wit h t he assigned storage policy.

Physical disk placement

O Group components by host placement


Virtual Object Components

Type
I Component State
I
v Q Hard disk 1 (RAID 5)
- Component O Active
- Component O Active
- Component O Active
- Component 0 Active
- v CJ VM home (RAID 5)

Component O Active
- Component O Active
- Component 0 Active
- Component O Active
~

196
4-47 Enabling vSAN Reserved Capacity
vSAN 7 U1 includes a reserve capacity workflow t o simplify storage capacity management for
vSAN backend operations and maintenance.

Operations reserve capacit y is used for internal vSAN operations, such as object rebuild or

repair.

Host rebuild reserve capacity is used to ensure that all objects can be rebuilt if any hosts fail.

Enable Capacity Reserve SA·VSAN-01 x


vSAN Se1V1Ces T~ R" ~~VS&~ E: 9 I r .., I SAN D W• !hat tl'1'tf• v.111 ~ ~h spa<t In tile CMtf"I 10< 11\l n
OP@fa:Jons to cooiplete wccess.'ufly tnabl1119 ~t rebuld reser.'t' allows vSAN to tolerate one ~t fa e
~ ['A11 '
- rese<vation ts ~ and caoac ty us.lg(' reac~ me 11m new workloads ra1 to d"IJIOV
~ 'l)IT Learn more~

-
OisiOleO ED•

~ EDI.,.
8 Aetu~try wnnen 18 56 GB (9 28%)
EllatJled EDtl

Oi!.atJltd £DtT

Disal:MO [t;Aa.,[

EDtl I CA'<C£L I APPLY

To enable the Host Rebuild reserve, you must have a minimum of four hosts in a vSAN cluster.

To enable, select vSAN Cluster > Configure > vSAN > Services to enable capacit y reserve.
When enabled, the operations reserve and host rebuild reserve options are available.

When vSAN Reserved Capacity reservation is enabled, and if t he cluster st orage capacity usage
reaches the limit, new workloads will fail t o deploy.

157
5-31 Review of Learner Objectives
• Explain how storage policies work with vSAN

• Detail how to define and create a VM storage policy

• Describe how to apply and modify VM storage policies

• Explain how to change VM storage policies dynamically

• Identify the VM storage policy compliance status

198
5-32 Lesson 2: Analyzing vSAN Objects and
Components Placement

5-33 Learner Objectives


• Verify the VM storage policy compliance status

• Analyze the Impact of VM storage policy changes

• Identify the vSAN objects and components placement details

199
5-34 About Storage Policy Changes
When the def ault policy is changed, the number of components created depends on t he policy
variables.

100 GB VMDK
, .,
l
Policy ~

FTI=l
RAID 1
SW= 1
r
\.._ ~ - - RAID1 - -

• •

vSAN vSAN
Replica 1 Replica 2

100 GB 100 GB
1111 0 1111 1111 0 . 1111

200
5-35 Activity: Object Count (1)
Based on the image, answer the following questions.

1 How many objects does this VM have?

2 How many replica components make up the disk object?

ACllO S v

S mm ry on or Co 11gure Perm1ssons D 1 stores Sn .... pSho s

p en
ssues a v

Issues
Trigg r _d I rm_
- Group componen y os p1aceme t

e on a ce v
c s te
Ove 1e 'I

A · · nc d ID )

as s a E e ts v
Compon nt
ras s
Ee Componen Ac ve

U liza ·on
~SS ~ Ac
v
) - · VI home RAIO ~
Pr1ys1ca1d isk p lacement
Pe tormance > Virtual mac 1ne s • ap obJe

201
5-36 Activity: Object Count (1) Solution
Based on the image, answer the following questions.

1 How many objects does this VM have?

The VM has the fal lowing three objects:

- Hard disk 1

- VM home namespace

- VM swap object

2 How many replica components make up the disk object?

Hard disk 1 is the VMDK object. It comprises two replica components and one w itness
component.

CllOt s v

Summ ry on· or Con11gure PerrrliSStons D s Sn psho .s

\5

Issues '--"
Group c omponen by os placement
f 1199 r _d I rms

ts v
Compon nt Ac
Tas s
E e Comp on en c e
U liza ·on
~ Ac
s v
)
Physic di dis~~ placement
Performance )

202
5-37 Activity: Object Count (2)
Based on the details provided, answer the questions.

Object size = 100 GB

Failures to Tolerate = 0

Stripes = 1

1. How many objects are created?

2. How many components are created, excluding witnesses?

3. How many drives is each component written to?

vSAN

Availability Advance Policy Rules Tags

51 e disaster tol ranee © on - st nd rd cluster

Fa lures o tolerate © No da a redundancy v

100GB

203
5-38 Activity: Object Count (2) Solution
Based on the details provided, answer the questions.

Object size = 100 GB

Failures to Tolerate = 0

Stripes = 1

1. How many objects are created?

One object.

2. How many components are created, excluding witnesses?

One component.

3. How many drives is each component written to?

One drive.

100GB
0
vSAN
FTI=O
RAID-0
Object
Stripes=1

,;------------ -- - - - - ----~,,
''\
I
,, ;

\
I

RAID-0

vSAN
Component

0 1111
,,
I
\
\
' ', ,, ;

-------------------------

204
5-39 Activity: Object Count (3)
Based on the details provided, answer the questions.

Object size = 100 GB

Failures to Tolerate = 1

RAID 1 (M irroring)

Stripes = 1

1. How many objects are created?

2. How many components are created, excluding witnesses?

3. How many hosts are required?

4. How many drives is each component written to?

vSAN

Availabili y Advanced Po icy Rules Tags

Site d1sas er toleranc © Non - st nd rd clus ....,

Failures o tolerate © 1 ailure - RAID- (Mirroring)


oudb 200GB

205
5-40 Activity: Object Count (3) Solution
Based on the details provided, answer the questions.

Object size = 100 GB

Failures to Tolerate = 1

RAID 1 (Mirroring)

Stripes = 1

1. How many objects are created?

One object.

2. How many components are created, excluding witnesses?

Two components.

3. How many hosts are required?

Two hosts.

4. How many drives is each component written to?

Two drives (one drive on each of the host).

vSAN
0 100GB
FTT=1
Object RAID-1 (Mirroring)
Stripes=1

,,',----------- -----------,' ', , ------------ -----------,,' ------------ -----------,',


, \ I ,, ' \ I
;'
, ' \
I \ I \ I \

vSAN vSAN vSA N


Mirror 1 Mirror 2 W1tne.>s

RAID-1 RAID-1
I
I
I
I
I
vSAN vSAN
Component Component

0 I I II I
0 IIII 0 I III
I I I I
\ I \ I \ I

',, , _______________________, , ,'' '


\ I \ I
'' ' , ,,
, \
,, ,/
-- - - -- - - ---- - - - ----- - - -~ -----------------------;

206
5-41 Activity: Object Count ( 4)
Based on the details provided, answer the questions.

Object size = 100 GB

Failures to Tolerate = 1

RAID 5 (Erasure Coding)

Stripes = 1

1. How many objects are created?

2. How many components are created?

3. How many hosts are required?

4. How many drives is each component written to?

vSAN

Availability Advanced Pohcy Rules Tags

Site disaster tolerance <D None - standard cluster v

Failures to tolerate © 1 failure - RAID-5 (Erasure Coding) v

Consu d storage spa e fo 100 GB VM d sk wou d b 133 33 GB

RAID 5 erasure coding is a space efficiency feature optimized for all-flash configurations. Erasure
coding provides the same levels of redundancy as mirroring but w ith a reduced capacity
requirement.

Erasure coding guarantees capacity reduction over a mirroring data protection method at the
same failure tolerance level. As an example, consider a 100 GB virtual disk. Surviving one disk or
host failure requires two copies of data at twice the capacity, that is, 200 GB. If RAID 5 erasure
coding is used to protect the object, the 100 GB virtual disk consumes 133 GB of raw capacity, a
33% reduction in consumed capacity versus RAID 1 mirroring.

RAID 5/6 (erasure coding) does not support 3 failures to tolerate.

For more information about Erasure Coding for RAID 5 and RA ID 6, see Erasure Coding (RAID-
5/6) at https://storagehub.vmware.com/t/vsan-6-7-update-1-technical-overview /erasure-
coding-raid-5-6-3/.

207
5-42 Activity: Object Count ( 4) Solution
Based on the details provided, answer the questions.

Object size = 100 GB

Failures to Tolerate = 1

RAID 5 (Erasure Coding)

Stripes = 1

1. How many objects are created?

One object.

2. How many components are created?

Four components (three data components and one distributed parity component)

3. How many hosts are required?

Four hosts.

4. How many drives is each component written to?

Four drives (four drives on each of the host.)

vSAN
0 100GB
FTT=1
Ob1ect RAID-5 (Erasure Coding)
Stripes=1

,,,----------- -----------........' , ,----------- ----------- ......'


, ' ,, '
I, \ I \
I \ I \
I I

0 I II I 0 I II I
I
I

RAID-5 !
I
RAID-5
I
I
I
I I
I I
I I
I I
I I
I I
I I
I I
\ t ' I
\ t
,,,
\ I
' ...... ,,,
\ I
' ......
·----------------------- ------------------------

,,,,---------- ----------........ ,,,,---------- ---------........


, I
' \
\
, I
\
\
\
I I
I
I
I
I

0 1111 : 0 IIII

RAID-5 RAID-5

I
\ I \ I
\ I \ I
' ...... ,,, ' ...... ,,,
------------------------ ------------------------
208
RAID 5 erasure coding requires a minimum of four hosts. When a policy containing a RAID 5
erasure coding rule is assigned to this object, three data components and one parity component
are created. To survive the loss of a disk or host (FTT=1), these components are distributed
across four hosts in the cluster.

209
5-43 Activity: Object Count (5)
Based on the details provided, answer the questions.

Object size = 100 GB

Failures to Tolerate = 2

RAID 1 (M irroring)

Stripes = 1

1. How many objects are created?

2. How many components are created?

3. How many hosts are required?

4. How many drives is each component written to?

vSAN

Availability Advanced Po acy Rules Tags

St e disaster tolerance © on - stand rd clust r

Failures o olera e © v

Consu d s or g SP ce fo 100 GB v ds OU db 300 GB

210
5-44 Activity: Object Count (5) Solution
Based on the details provided, answer the questions.

Object size = 100 GB

Failures to Tolerate = 2

RAID 1 (Mirroring)

Stripes = 1

1. How many objects are created?

One object.

2. How many components are created?

Five components (three data components and two witness components)

3. How many hosts are required?

Five hosts.

4. How many drives is each component written to?

Five drives.

vSAN
0 100GB
FTI=2
Object RAID-1 (Mirroring)
Stripes=1

,,
,,----------- ------------,, \
\ ,,
,,----------- -----------,,' \ ,I ,
,------------- -----------,, \
\
I \ I \ \
I \
I '
I
\
I
I
\
,

lo fo lo 1111 I
RAID-1 RAID-1 RAID-1

I I I

' \ I \ I
' ...... , _________________________ ,
I
, ',...,, _________________________, ,' \
'•...... ________________________ _,'
I

I
,, •'----------- ----------... .. \
\. I
.
, , ------------ ----------....•,
\
I \ I \
I

vSAN vSAN
W1tnes<. W itness

.___Q - 0
\ ,' I
\
......... _________________________ , ,I
,
',,..., _________________________, ,' \
\

211
5-45 Activity: Object Count (6)
Based on the details provided, answer the questions.

Object size = 100 GB

Failures to Tolerate = 2

RAID 6 (Erasure Coding)

Stripes = 1

1. How many objects are created?

2. How many components are created?

3. How many hosts are required?

4. How many drives is each component written to?

vSAN

Availability Advanc d Policy Rules Tags

S1 e d1sas er olerance © None - st nd rd cluster

Failures o o era e © 2 allures - RAID-6 (Erasure Coding) "


Co u d s or g sp e o 100 GB v d

Like RAID 5 erasure coding, RAID 6 erasure coding is a space efficiency feature optimized for
all-flash configurations. Erasure coding provides the same levels of redundancy as mirroring but
with a reduced capacity requirement. In general, erasure coding is a method of taking data,
breaking it into multiple pieces, and spreading it across multiple devices, while adding parity data
so that it can be recreated if one of the pieces is corrupted or lost.

212
5-46 Activity: Object Count (6) Solution
Based on the details provided, answer the questions.

Object size = 100 GB

Failures to Tolerate = 2

RAID 6 (Erasure Coding)

Stripes = 1

1. How many objects are created?

One object.

2. How many components are created?

Six components (four data component and two distributed parity components).

3. How many hosts are required?

Six hosts.

4. How many drives is each component written to?

Six drives.

vSAN
0 100GB
FTI=2
Object RAID-6 (Erasure Coding)
Stripes=1

I
,,------------- -----------,...., \
\
,I ,'------------ -----------,', \ ,I ,'------------ ------------',
\

, ' , \
' , \
'
' 1
I

lo
I
I

RAID-6 RAID-6 RAID-6

I
I
I
I
I
I
I
I
I
' , ' , ' ,
'•,, ________________________ _/
\ I \ I \ I
'• ________________________ _/
,
....
',,.... ______________________ _,,''

,,
,, ---------- ---------••• \ ,,,,---------- ---------••• \ ,,,----------- ---------,,. \
, I \ I \
,
I \
'
' ' '
I
I
I
I

lo
I I

I I I I
I I
I

0 IIII 0 IIII I
I
I
I
I
I
I
I I
I
I
RAID-6 RAID-6 RAID-6 I
I
I
I
I
I
I
I
I
I
I
I
I
I
,
' \

' .------------------------- ,/
I
\
\
\
..------------------------- , ,
,,I
'
\
\
. _________________________,, '
........
I

213
RAID 6 erasure coding requires a minimum of six hosts. Using our previous example of a 100 GB
v irtual disk, the RAID 6 erasure coding rule creates f our data components and two parity
components. This configuration can survive the loss of two disks or hosts simultaneously
(FTT= 2).

214
5-47 Activity: Object Count (7)
Based on the details provided, answer the questions.

Object size = 100 GB

Failures to Tolerate = 3

RAID 1 (Mirroring)

Stripes = 1

1. How many objects are created?

2. How many components are created?

3. How many hosts are required?

4. How many drives is each component written to?

vSAN

Availability Advanced Policy Rules Tags

51 e disaster tolerance <D None - stand rd cluster

Failures o olera e © 3 failures - RAID-1 (Mir onng)


o db OOGB

215
5-48 Activity: Object Count (7) Solution
Based on the details provided, answer the questions.

Object size = 100 GB

Failures to Tolerate = 3

RAID 1 (Mirroring)

Stripes = 1

1. How many objects are created?

One object.

2. How many components are created?

Seven components (four data components and three witness components).

3. How many hosts are required?

Seven hosts.

4. How many drives is each data set written to?

Seven drives.

0 ~~~~~
L...:::;vSA
;::::.
N_1 RAI 0-1 (Mirroring)
Stripes=1

...----·---- --------. ., ',


,, ---------- ·---------..',' .• ,•
,• • ----------- -------.....• •
,,,.----------- -----------.......•
••
•I ••
I
• \ I
•• I '

I. o 1111 I ••
lo 1111 I (o 1111 1
I
!. lo 11111
:
!
'
· ----
••

RAID-1 RAJD- 1

•.. r
RAID-1

.). r•N
·. ----
:

••
••
RAJD-1

~ •N
... ''
'''
.',
•' •''
,,'
'-~------------------------

,
....---·-----
,
,.
---------....... ,,'
,------------ ------·-- .... ••
•• (
,•..---------- -----·--- ...·,,


I •• •

i Io 1111 I j (o 1111 I (o 1111 I


••• •••
• ••
Witness ••• • Witness Witness
••• '''
•••
J. rN
••
••• •"' r •N
•••
''
''
•• • ''
.....
I
•, _________________________ ,
,
:

,• .. --------------------------'.
........
••
,=
''
'• ......_________________________, •
', ,'
.

216
5-49 Activity: Object Count (8)
Based on the details provided, answer the questions.

Object size= 400 GB

Failures to Tolerate = 1

RAID 1

Stripes = 1

1. How many objects are created?

2. How many drives is data set written to?

3. How many components are created, excluding witnesses?

400GB
FTT=1
RAID=1
vSAN Stripes=1
Object

217
5-50 Activity: Object Count (8) Solution
Based on the det ails provided, answer the questions.

Object size = 400 GB

Failures to Tolerat e = 1

RAID 1

Stripes = 1

1. How many objects are created?

One object.

2. How many drives is data set written to?

Two drives.

3. How many components are created, excluding witnesses? Four.

Four components (since object size exceeding default component size)

400GB

vSAN
0 FTT=1
RAID -1
Object Stripes=1

,,
, ,--------------- ---------------- ,,'• ,;
, ---------------- --------------- ,,',
I
,, ' \
\ I
I , ' \
\
I \ I \
f I I I

vSAN vSAN
Mirro r 1 Mirror 2

RAID-1 RAID-1

vSAN vSAN vSAN vSAN


Component Component Component Component

0 IIII 0 IIII
I
\
\
\ , I
I \
\
\ I
, I

',' ,,,' ',....... ;"'


,'
""~--------------------------------- '---------------------------------

218
5-51 Activity: Object Count (9)
Based on the details provided, answer the questions.

1. How much storage does the new policy take?

2. Is additional storage required to complete the operation?

ti/- Storage Polley ti/ - Storage Polley


• Two-Way (Mirror) ~i _ • Two-Way (Mirror)
ti/- • Stripes=2 "" • Stripes=3
IJ-
100 GB 100 GB c-o
....__,,_

v SAN vSAN vSAN I vSAN


Mirror 1 Mirror 2 M irror 1 H-- - - - - - - - -+I M irror 2
100 GB 100GB

219
5-52 Activity: Object Count (9) Solution
Based on the det ails provided, answer the questions.

1. How much storage does the new policy consume?

Same as the previous policy.

2. Is additional storage required to complete the operation?

Yes, double t he storage is required while the storage policy updat e operation takes place.

~-

~-
Stor:tge Po I'icy
• Two-Way (t.t or 100GB 100 GB
~

~-
- St orage P oI'icy
• Two-Way (t.lirror)

c-o • Stnpes=2
0
Ob,ect
. 0
Ob,.ct c-o • S1r pes=3

vSAN vSAN vSAN vSAN


Mirror 1 Mirror 2 Mirror 1 Mirror 2
100GB 100GB 100GB 100GB
I I
I I I I • • • •
vSAN vSAN vSAN vSAN vSAN vSAN vSAN vSAN vSAN vSAN
Com-
50GB
Com-
50GB
ComP'0"4nt
50GB
Com-•
50GB
COl~~nt
33.JGB
Component
33.3GB
CompOMnl
33.3GB
eom...,.n1
33.JGB
Com.........
33.3GB
"""'-'
33.3 GB

The operation order is:

1. New RAID 1 mirrored components are created w ith new stripe width.

2. The 1/0 is directed simultaneously to both RAID 1 component sets during switchover.

3. The original RAID 1 component set is removed from t he object and delet ed.

220
5-53 Activity: Objects and Witnesses
Based on the image, answer the following questions.

1 How many objects does this VM have?

2 Why does t he disk object have two witness components?

sa-vm-06. vclass. local I ACT IONS v

Summary Moni1or Con1igure Permissions Datastores Networks Snapshots Updates

Issues arid Alarrns v Physic al disk placen1 ent

All Issues LI Group c omp onents by host placement


Triggered Alarms
Virtual Object Components
Pertorman<;e v
Type Component State Ho.st
Overview
Advanced Hard disk 1 (RAID 1)

Tasks and Event s v


Component 0 Active [J sa-esxi-03.vclass.local
Tasks
Events Component 0 Active [ sa-esxi-01.vclass.local

Utilization
Component ~ Active t1 sb-esxi-04.vclass.local
vSAN v
Witness ~ Active L1 sb-esxi-01.vclass.local
Physical disk placen1ent
Performance Witness O Active sb -esxi-02 .vet ass.local

> L... VM home (RAID 1)

> Virtual machine swap obJect (RAID 1}

221
5-54 Activity: Objects and Witnesses Solution
Based on the image, answer the following questions.

1 How many objects does this VM have?

Three objects.

2 Why does the disk object have two witness components?

It is configured as a three-way mirror.

sa-vm-06. vclass. local I ACT IONS v

Summary Monitor Con1igure Permissions Datastores N et\v orks Snapshots Updates

Issues and Alarrns v Physic al disk placem ent

All Issues LI Group components by host placement


Triggered Alarms
Virtual Object Components
Pe-rtorman~e­ v
Type Component State Host
Overv1ew
Advanced Hard disk 1 (RAID 1)
Tasks and Eve-nt s v
Component 0 Active LI sa-esxi-03.vclass.local
Tasks
Events Component 0 Active E sa-esxi-01.vclass.local

Utilization [J sb-esxi-04.vclass.local
Component $ Active
vSAN v
Witness $ Active L1 sb-esx1-01.vclass.local
Physical disk fJlacen1ent
Performance Witness O Active sb -esxi-02 .vcl ass.local

> L.... VM home (RAIO 1)

> Virtual machine swap object (RAID 1)

222
5-55 Activity: VMs and Failures
Based on the image, answer the following questions.

1 How many VMs can you see in the figure?

2 Which of t hese VMs can tolerate one failure and why?

0 Name T Plaooment and Ava ilabili T storage Policy

0 B:: sa-vm-01.vclass.local O Healthy

0 g Hard disk 1 $ Healthy lJ SA-vSAN-01_Vrv1_RAID5_Policy

0 VMhome 0 Healthy (j SA-vSAN-01_VM_RAID5_Policy

0 Virtual machine sl/\rap object & Healthy CF SA-vSAN-01_VM_RAID5_Policy

0 6J sa-vm-02.vclass.local 0 Healthy

0 Q Hard disk 1 O Healthy (j SA-vSAN-01_VM_RAID5_Policy


O C VMhome O Healthy Cl} SA-vSAN-01_Vf\~_RAID5_Policy

0 ~ sa-vrn-03 vclass local 0 Healthy

0 Q Hard diskl O Healthy G4 Raid 0

0 L:J VM home O Healthy G- Raid 0

11 Virtual machine swap object O Healthy (fa Raid 0

223
5-56 Activity: VMs and Failures Solution
Based on the image, answer the following questions.

1 How many VMs can you see in the figure?

Three VMs are visible: sa-vm-01, sa-vm-02, and sa-vm-03.

2 Which of these VMs can tolerate one failure and why?

sa-vm-01 has a RAID 5 policy, so it supports one failure.

sa-vm-02 has the vSAN Default Storage policy, so it supports one failure by default.

0 Name T Plarement and Availab ili T storage Po licy

v 0 ~ sa-vm-01.vclass.local O Healthy

0 Q Hard d isk 1 ~ Healthy CF SA-\>SAl\l-01_Vr~_RAID5_Policy


- 0
0 VMhome Healthy CT SA-vSAN-01_V M_RAID5_Poli cy

0 Virtua l machine sw ap obJect ~ Healthy ffi SA-vSAN-01_V M_RAID5_Policy

0 6J sa-'...'m-02.vcla ss.local 0 Healthy

O Q Hard d isk 1 0 Healthy CJ SA-vSAN-01_ V M_RAID5_ Policy

O LJ VM home O Healthy Qi SA-vSAN-01_Vf\~_RAI05_Policy

v 0 ~ sa-vrn-03 vcla ss local 0 Healthy

0 g Hard d isk 1 O Healthy G'. Raid 0

CJ L:J VM home ~ Healthy La Raid 0

11 Virtua l machine sw ap object O Healthy ('=1 Raid 0

Addit ional information: sa-vm-01 has a RAID 5 policy applied. RAID 5 can protect against up to 2
failures.sa-vm-03 has RAID 0 policy. RAID 0 does not create replicas of the data. Hence, it does
not protect against any failure.

224
5-57 Activity: Failures and Witnesses
Based on the image, answer the following questions.

1 How many failures can t his VM tolerate and continue to be operational?

2 Why does t his VM have t hree witness components?

sa-vm-07. vclass. local I ACTIONS v

Summary Monitor Conf igure Permissions Dat astores Ne1works Snapshot s Updat es

Issues a 11 d A larms v Physic al disk placement

All Issues [J Group components by host plac ement


Triggered Alarms
Virtual Objeci Components
Performance v
Component St.ate Host
Overview
Advanced v

Tasks ar1d Eve11ts v


Component 0 Active [j sb-esxi-04.\iclass.local
Tasks
Component 0 Active sa-esxi-01 vclass local
Events
Utilization
Component G Active D sa-esxi-02 vclass.local
vSAN v
Component G Active [J sa-esxi-03.vclass.local
Physical disk placement
Performance Wrtness ~ Active sb-esxi-03.v class .Ioc al

Vv1tness ~ Active lJ sb-esx1-02.vclass.local

V\t'1tness ~ Active (3 sb-esx1-01.vclass.local

225
5-58 Activity: Failures and Witnesses Solution
Based on the image, answer t he following questions.

1 How many failures can t his VM tolerate and continue to be operational?

The hard disk 1 object of the VM has the RA ID 1 policy applied. As shown in the image, it has
a total of four component s. Each of these component s represent s a replica of the data,
w hich implies that t he Failures to Tolerate policy value is 3. This means that the VM can
tolerate a maximum o f three concurrent failures.

2 Why does t his VM have t hree w itness components?

To support t he configured Failures to To lerate policy value.

sa-vm-07. vclass. local ACTIOfll S v

Summary Moni1or Con1igure Permissions Da1as1ores Networks Snapsho1s Upda1es

Issues a11d Alarms v Physic al disk placement

All Issues (J Group components by host plac ement


Triggered Alarms
Virtual Object Components
Performance v
T\•pe Component State Ho.st
Overview
Advanced v Q Hard disk 1 (RAID 1)
Tasks and Eve11ts v
Component O Active [j sb-esxi-04.vclass.local
Tasks
Events Component O Active 0 sa-esxi-01 vclass local

Utilization
Component O Active [] sa-esxi-02 vclass.local
vSAN v
Component 0 Active [] sa-esxi-03.vclass.local
Physical disk placement
Performance Witness ~ Active [] sb-esxi-03.vclass.local

Witness ~ Active O sb-esxi-02.vclass.local

VV1tness ~ Active [j sb-esx1-0l.vclass.local

226
5-59 Activity: RAID Levels and Stripes
Based on the image, answer the following questions.

1 What type of RAID level was used for this VM?

2 What stripe width option was used for this VM?

& sa-vm-05.vclass.local I ACTIONS v

summary Moni1or Configure Permissions Da1as1ores Ne1works Snapsho1s Upda1es

Issues and Alarms v Physical disk placement

All Issues [ I Group components by host placement


Triggered Alarms
"\/ir1ual Object Componer1ts
P~rfo rmat1ce v
Type Component State HOJt
Overview
Advanced v Q Hard disk, (RAID 1)
Tasks and Events v
v RAID 0
Tasks
Events Component 0 Active r sb-esxi-01.vc lass.local

Utilization sa-esxi-04.vclas s.local


Component ~ Active t
vSAN v
RAID 0
Physical disk placement
Performance Component O Active [ sb-esxi-02.vclass.local

Component 0 Active E' sa-esxi-02.vclass.local

227
5-60 Activity: RAID Levels and Stripes
Solution
Based on the image, answer the following questions.

1 What type of RAID level was used for this VM?

RAID 10.

2 What stripe width option was used for this VM?

Under each RA ID 0 tree, two component s are created, so a stripe width o f 2 was used.

sa-vm-05.vclass.local I I ACTIONS v

summaf'/ Monitor Configure Permissions Datastores Networks Snapshots Upda1es

Issues and Alarms v Physical disk placement

All Issues ( I Group compon ents by host placement


Triggered Alarms
Virtual Object Components
Performance v
Type Component State Ho.st
Overview
Advanced Q Hard disk 1 (RAID 1)
Tasks and Event s v
v RAID 0
Tasks
Events component 9 Active t sb-esxi-01.vclass.local

Utilization
Component 9 Active t sa-esxi-04.vclas s. local
vSAN v
RAID 0
Physical disk placement
Performance Component 0 Active [ sb-esxi-02.vclass.local

Component 0 Active [ sa-esxi-02.vclass.local

228
5-61 Activity: Failures and Snapshots
Based on the image, answer the following questions.

1 How many failures can any of t he objects tolerate and why?

2 Does this VM have any snapshots?

~ sa-vm-04. vclass. local ACTIONS v

Summary Monitor Configure Permissions Datas1ores Networks Snapshots Updates

Issues and Alarms v Physic al disk placement

All Issues Cl Group compon ents by host placement


Tr199ered Alarms
Virtual Object Componen1s
Performance v
Type Component state Ho.st
Overviev-r
Advanced v ~ Hard disk 1 (RAID 6}
Tasks a11d Events v
Component ~ Active sa-esxi-01 vclass.local
Tasks
Events Component G Active sb-esx i-01.v class .lo ca I

Utilization
Con1ponent ~ Active E1 sb-esxi-02.vclass.local
vSAN v
Component ~ Active sa-esxi-03 vclass local
Physical disk placement
Performance Component G Active sa-esx i-02.vclass.lo ca I

Component G Active sa-esxi-04.vclass.Jocal

> Ll VM home (RAID 6}

> Virtual machine sw ap obJect (RAID 6)

229
5-62 Activity: Failures and Snapshots Solution
Based on the image, answer the following questions.

1 How many failures can any of the objects tolerate and why?

The objects are using RAID 6. The highest Failures to Tolerate value supported by RAID 5
and RAID 6 is 2, so each of the objects can tolerate up to 2 failures.

2 Does this VM have any snapshots?

The VM does not have any snapshots.

sa-vm-04. vclass. local ACTIONS v

Summary Monitor Conilgure Permissions Oatas1ores Networks Snapshots Updates

Issues and Alarms v Physic al disk placement

All Issues Cl Group component s by host p lac ement


Tr199ered Alarms
Virtual Object Componen1s
Pertormance
Type Component state Host
Overview
Advanced v iQ Hard disk 1 (RAID 6}

Tasks and Event s v


Component ~ Active sa-esxi-01 vclass .local
Tasks
Events Component G Active sb-esxi-01.vclass.local

Utilization
Con1ponent 0 Active E1 sb-esxi-02.vclass.local
vSAN v
Component 0 Active sa-esxi-03 vclass local
Physical disk placement
Performance Component G Active sa-esx i-02.vclass .lo ca I

Component ~ Active El sa-esxi-04.vclass.local

> Ll VM home (RAID 6}

> Virtual machine swap obJect (RAID 6)

230
5-63 Lab 4: Analyzing the Impact of Storage
Policy Changes
Analyze the impact of storage policy changes on VMs:

1. Determine the VM Storage Policy

2. Verify the Existing Storage Policy and Component Layout of a VM

3. Change the VM Storage Policy and Monitor Component Layout Changes

5-64 Lab 5: Identifying Objects with Reduced


Availability
Identify objects w ith reduced availability caused by an invalid storage policy:

1. Create a Storage Policy That the Cluster Cannot Support

2. Force-Provision a VM with an Invalid Storage Policy

3. Identify vSAN Objects with Reduced Availability

4. Change the VM Storage Policy

231
5-65 Review of Learner Objectives
• Verify the VM storage policy compliance status

• Analyze the Impact of VM storage policy changes

• Identify the vSAN objects and components placement details

5-66 Key Points


• Policy-based storage allows you to quickly respond to changes.

• Storage policies help guide decisions regarding datastore architecture from the very
beginning.

• Policy-based storage enables you to ensure that performance and availability requirements
for VMs are met.

• Policy-based storage enables you to create and update many VM storage requirements
without downtime and maintenance windows.

Questions?

232
Module 6
vSAN Resilience and Data Availability

6-2 Importance
Maintaining a fault-tolerant vSAN environment strengthens the resiliency of the environment and
minimizes downtime.

233
6-3 Lesson 1: vSAN Resilience and Data
Availability

6-4 Learner Objectives


• Describe how to configure the Object Repair Timer

• Discuss hardware fai lure scenarios

• Explain how to plan maintenance tasks to avoid vSAN object failures

• List the reasons for resynchronizing component s

234
6-5 About Failure Handling
Failure is handled differently in traditional storage arrays and vSAN environment s.

Traditional storage environment :

• A failed physical disk must be replaced to achieve full redundancy.

• Hot spare disks are either set aside t o immediately replace failed disks or inst alled in the
system.

• Disk failure handling requires an immediate 1:1 disk replacement.

vSAN cluster:

• The entire vSAN cluster is used to provide t he redundancy.

• During a failure, component s such as st ripes or mirrors o f object s are dist ributed t o other
resources.

235
6-6 About vSAN Component States
vSAN components can exist in different st ates:

• Active: Healthy and functioning correctly

• Reconfiguring: In the process of applying storage policy changes

• Absent: No longer available because of a failure

• Stale: No longer in sync with other components of the same vSAN object

• Degraded: Not expect ed to return, because of a detected failure

Component
• Active

Component 0 Reconfiguring

Component
0 Absent

Component Active - stale

Component Degraded

236
6-7 About the vSAN Object Repair Timer
vSAN waits before rebuilding a disk object after a host is either in a failed state or in
maint enance mode. Because vSAN is uncertain if the f ailure is transient or permanent, the repair
delay value is set to 60 minutes by defa ult.

To reconfigure the Object Repair Timer, select the vSAN cluster and select Configure > vSAN >
Services > Advanced Options > EDIT.

vSAN Services TUR N OFF VSA N

Space Eff1c1ency Deduplicat1on and compression EDIT

> Data·At ·Rest Encryption Enabled GE NERATE NEW EN CRYPTION KEYS EDIT

> Data· In· Transit Encryption Enabled EOtT

> Performance Service Enabled EDIT

> v SAN iSCSI Target Service Disabled EDIT

> File service Enabled DIS ABLE CHECK UPGRAD E EDIT

> Enable Capacity Reserve EDIT

v A dvanced Opt ions EC 1T

I Object repair timer I 60 minutes

Sit e read locality Enabled

Thin swap Enabled

Large cluster support Disabled

Aut omatic rebalance Disabled

237
6-8 Overriding the Object Repair Timer
The vSAN object healt h test includes funct ionality to rebuild components immediately, rather
than waiting as specified by t he Object Repair Timer.

To repair objects immediat ely, select the vSAN cluster and select Monitor > vSAN > Skyline
Health > Data > vSAN object health > REPAIR OBJECTS IMMEDIATELY.

vSAN object health


Overview Info

I RE PAIR OBJ ECTS IMMEDIATEL y I PURGE INACCESSIBLE VM SWAP OBJ ECTS

Healt h/O bject s Objec t count Object s UUID

9 Healthy 26 a64b5a5f-3cf0-ea6e-432d-00505601d5ca,504b5a5f-Bdcd·

O Reduced availability with no rebuild - delay timer 12 390f5a5f-9753-c1d 8-06e4-00505601d5bd,2dOf5a5f-6ad5-

-O Inaccessible 7 9c4b5a5f-059c-6c3e-a698-00505601d5bd,884b5a5f-827f·

238
6-9 Resynchronizing Components
The resynchronizing of components can be initiat ed in two ways.

Failure-init iated resync:

• Cache device failure

• Capacity device failure

• Storage controller failure

• Host network communication failure

• Host failure

User-init iated resync:

• Policy change

• User-triggered reconfiguration

• User placing host into maintenance mode

239
6-10 Failure Handling Scenario (1)
When restoring I/ 0 flow:

• Failure is detected and the failed components are removed from the active set.

• Assuming most object components are available, the 1/0 f low is restored.

VM

r - - - - .. - - - - -
I
I
I
I
I
I
I
vmdk I
I
I
I
I
I
I
I
I
I RA ID - 1 I

I
I
I
I
- - - - - - - - - - .i

vSAN ~­ vSAN
Replica 1 Replica 2

Ill 0 Ill Ill )( Ill Ill 0 Ill Ill 0 Ill Ill 0 Ill
esxi-01 esxi-02 esxi-03 esxi-04 esxi-05

240
6-11 Failure Handling Scenario (2)
When rebuilding component s to establish protection:

• If the component state is absent, wait 60 minutes before init iating a rebuild .

• Start rebuilding.

VM

r - - - - .. - - - - -
I
I
I
I
I
I
I
vmdk I
I
I
I
I
I
I
I
I
I RAID - 1 I

I
I
I __________ .. I

• •
vSAN vSAN
Replica 1 Replica 2

Ill 0 Ill Ill )( Ill Ill 0 Ill Ill 0 Ill Ill 0 Ill
esxi-01 esxi-02 esxi-03 esxi-04 esxi-05

241
6-12 Failure Handling Scenario (3)
When a cache device failure causes degraded components, an instant mirror copy is created if
the component is affected.

I I I
I I I
I I I
VM I I I
I I I
-• .• .

RAID - 1 • • I
• I I
I I I
vSAN Network I I I
I I I

I I I
I I I
Ill 0 Ill I Ill 0 111 I Ill )( Ill I Ill 0 111
I I I
esxi-01 I esxi-02 I esxi-03 I esxi-04
I I I
• • • • • • • •
I I I
SSD SSD SSD SSD
• • I • • I • • I • •
I I I

Immediate Change 'r


I
'
, I
'r
I
r

~
I I I '
I I I
vSAN • • vSAN I vSAN I
vSAN
New Mirror Copy Replica ~
Replica Replica Witness
~ I I I
I I I 0

242
6-13 Failure Handling Scenario ( 4)
When a capacit y device fails wit h error and causes degraded component s, an inst ant mirror
copy is created if t he component is affect ed.

I I I
I I I
I I I
VM I I I
I I I



• .•
RA ID - 1 I I I
I I I
I I I
vSAN Network I I I
I I I

I I I
I I I
111 0 111 I 111 0 111 I 111 0 111 I 111 0 111
I I I
esxi - 01 I esxi - 02 I esxi -03 I esxi -04
I I I
• • • • • • • •
I I I
SSD SSD SSD SSD
• • I • • I • • I • •
I I I
I I I
Immediate Change ' • I ' r I ' r
I ' r
~ I I ~ I
vSA N . • vSAN I v I vSAN
New Mirror Copy Replica ,
Replica a Witn ess
~ I I I
I I ' I 0

243
6-14 Failure Handling Scenario (5)
When a capacity device fails without error and causes absent components, a new mirror copy is
created after 60 minutes.

I I I
I I I
I I I
VM I I I
I I I
•.
-• -•
RAID - 1 • • I
• I I
I I I
vSAN Network I I I
I I I

I
I - I
I
I
I
Ill 0 Ill I 111 0 111 I 111 0 111 I 111 0 Ill
I I I
esxi -01 I esxi- 02 I esxi -03 I esxi - 04
I I I
• • • • • • • •
I I I
SSD SSD SSD SSD
• • I • • I • • I • •
I I I
I ,r I I

vSAN
• I
I
I vSAN
I
I
I v
•, I
I
I
vSAN

New Mirror Copy Replica Rep lica a W·tness
~ I I I
I I
4
I 0
New Mirror Copy Absent
After 60 Minutes Component

244
6-15 Failure Handling Scenario (6)
When a storage contro ller fails and causes degraded components, resynchronizing begins
immediately.

I I I
I I I
I I I
VM I I I
I I I
. . .
- - -
RAID - 1 • • I
• I I
I I I
vSAN Network I I I
I I I
--
I I I
I I I
111 0 Ill I 111 0 111 I Ill 0 111 I Ill 0 Ill
I I I
esxi-01
• •
I
I
esxi-02
• •
I
I
esxi-03

~

I
I
esxi-04
• •
I I I
SSD SSD SSD SSD
• • I • • I • • I • •
I I I
I
,r I ,r I
,r
Immediate Change ' • I I I
~ I I I
vSAN ~ . vSAN I vSAN I vSAN
New Mirror Copy Replica ' Rep lica Replica W itness
~ I I I
I I I 0

245
6-16 Failure Handling Scenario (7)
When a host failure causes absent components, vSAN waits 60 minutes before rebuilding
absent components.

If the host returns w ithin 60 minutes, vSAN synchronizes the stale components.

I I I
I I I
I I I
VM I I I
I I I
.• .• .•
RAID - 1 • • I
I I I
I I I
vSAN Network I I I
I I I

- I
I
I
I
I
I
111 0 Ill I Ill 0 111 I 111 111 I Ill 0 Ill
I I I
esxi -01 I esxi -02 I esxi-03 I esxi - 04
I I I
• • • • • • • •
I I I
SSD SSD SSD SSD
• • I • • I • • I • •
I I I

, I
, I
'
, I
,
' I ' I I '
I I I
New Mirror Copy
After 60 Minutes
/ vSAN
Replica
~

'
• vSAN
Replica
I vSAN
Replica
I
vSAN
Witn~ss
I I I
I I I 0

246
6-17 Failure Handling Scenario (8)
When host isolation resulting from a network failure causes absent components, vSAN waits 60
minut es before rebuilding absent components.

If the net work connection is restored within 60 minutes, vSAN synchronizes the stale
components.

I I I
I I I
I I I
VM I I I
I I I
-- -- --
RAID - 1 • • I
I I I
I I I
vSAN Network I I I
I I I

I I I
I I I
Ill 0 111 I Ill 0 Ill I 111 0 111 I Ill 0 Ill
I I I
esxi -01 I esxi -02 I esxi - 03 I esxi -04
I I I
• • • • • • • •
I I I
SSD SSD SSD SSD
• • I • • I • • I • •
I I I
I I I

New Mirror Copy /


,r
vSAN •
I
I


vSAN
I
I
I vSAN
~ I
I
I

v SAN
,
After 60 Minutes Replica Replica Replica Witness
I I I
I I I 0

247
6-18 Failure Handling Scenario (9)
The net work partition results in isolating the esxi-01 and esxi-04 hosts.

vSphere HA restarts t he affected VMs on eit her the esxi-02 or the esxi-03 host. These hosts
are st ill in communication and own more t han 50% of the VM components.

Isolated I I I
I I I Isolated
I I I
VM I I
I I

RAID - 1
I
I I I
vSAN Network I I I

I I I
I I I
111 0 111 I 111 0 Ill I 111 0 111 I Ill 0 111
I I I
esxi-01 I esxi-02 I esxi-03 I esxi-04
I I I
• • • • • • • •
I I I
SSD SSD SSD SSD
• • I • • I • • I • •
I I I
I I I
I I I
I I I
I vSAN I vSAN I vSAN
Replica 1 Replica 2 Witness
I I I
I I I 0

248
6-19 Review of Learner Objectives
• Describe how to configure the Object Repair Timer

• Discuss hardware fai lure scenarios

• Explain how to plan maintenance tasks to avoid vSAN object failures

• List the reasons for resynchronizing components

6-20 Key Points


• vSAN can manage failures to ensure that data is highly available.

• The entire vSAN cluster is used to provide the redundancy.

• The resynchronizing of components can be failure-initiated or user-initiated.

Questions?

249
Module 7
Configuring vSAN Storage Space Efficiency

7-2 Importance
As the number of virtual machines increases in the vSAN cluster, you can consider using the
vSAN storage space efficiency techniques to reduce the amount of space required for storing
data and storage cost.

251
7-3 Configuring vSAN Storage Space
Efficiency

7-4 Learner Objectives


• Describe vSAN storage space efficiency

• Discuss deduplication and compression overhead

• Explain how to configure compression-only mode

• Detail the use of RAID 5 or RAID 6 erasure coding

• Describe how to reclaim storage space with TRIM/UNMAP

252
7-5 About vSAN Storage Space Efficiency
vSAN storage space efficiency techniques reduce the t otal storage capacity required t o meet
your workload needs.

Enable the deduplication and compression on a vSAN cluster to eliminate duplicate data and
reduce the amount o f space required to store data.

You can set the VMs to use RAID 5 or RA ID 6 erasure coding which can protect your data while
using less storage space than the default RA ID 1 mirroring storage policy.

You can use TRIM/U NMAP to reclaim storage space, for example, when files are delet ed within
a virtual disk.

253
7-6 Using Deduplication and Compression (1)
Enabling deduplication and compression can reduce the amount of physical storage consumed
by as much as seven times.

Environments with highly redundant data, such as full-clone virtual desktops and homogeneous
server operating systems, naturally benefit the most from deduplication.

Likewise, compression offers more favorable results with data that compresses well, such as
text, bitmap, and program files.

You can enable deduplication and compression when you create a vSAN all-flash cluster or
when you edit an existing vSAN all-flash cluster.

Deduplication and compression are enabled as a clusterwide setting, but they are applied on a
disk group basis.

vSAN performs deduplication and compression at the block level to save storage space.

Deduplication removes redundant data blocks, whereas compression removes additional


redundant data within each data block.

Deduplication and compression might not be effective for encrypted VMs because VM
encryption encrypts data on the host before it is written to storage.

254
7-7 Using Deduplication and Compression (2)
When you enable or disable deduplication and compression, vSAN performs a rolling reformat of
every disk group on every host, which involves all data to be evacuated.

Depending on the data stored on the vSAN datastore, this process might take a long time to
complete.

1111 0 1111 1111 0 1111 1111 0 1111 1111 0 1111


I Cache J I Cache I I Cache I I Cache J

I: sso :11: sso :I I: sso :I I: sso :1 1: sso :I I: sso :1 1: sso :1 [ sso :I


I: sso :I I: sso :I I: sso :1 I: sso :1 I: sso :11: sso :1 1: sso :1 [ sso :I

Disk Group Disk Group Disk Group Disk Group

Reformat Reformat Reformat

255
7-8 Using Deduplication and Compression (3)
Deduplication and compression occur inline when data is written back from the cache tier to the
capacity tier.

The deduplication algorithm uses a fixed block size and is applied within each disk group.

The compression algorithm is applied after deduplication but before the data is written to the
capacity tier.

Given the additional compute resource and allocation map overhead of compression, vSAN
stores compressed data only if a unique 4K block can be reduced to 2K or less. Otherwise, the
block is written uncompressed.

vSAN Layer

Disk Group

Writes Write ACK

256
7-9 Disk Management
Consider the fallowing guidelines when managing disks and disk groups in a cluster with
deduplication and compression enabled:

• You cannot remove a single disk from a disk group.

• You must remove the entire disk group to make modifications.

• A single disk failure causes the entire disk group to fail.

• Consider adding additional disk groups to increase the cluster storage capacity.

Ill 0 Ill Ill 0 Ill Ill 0 Ill


r---------, r---------, r---------,
I
Disk Group I I
Disk Group I I
Disk Group I
I I I I I I

:I
I

Cache
• • •
I:
I
:I
I

Cache
• • •
I:
I
:I
I

Cache
• • •
I:
I
I I I I I I
SSD SSD SSD SSD SSD SSD
I • • • • I I • • • • I I • • • • I
I I I I I I
• • • • • • • • • • • •
I SSD SSD I I SSD SSD I I SSD SSD I
I •
L---------.1
• • • I
.. _________ .,,I
I • • • • I •
L---------.J
• • • I

r---------,
1
Disk Group 1
I I

:I Cache I:
I I
• • • •

, ______
I 1
SSD SSD
I

I
.


--- ,
SSD
.


SSD 1
.


I

.. _________ ..
I .._
. _ ___,
. ..._
. _ ___,
. I

257
7-10 Design Considerations
Consider the fallowing guidelines when you configure deduplication and compression in a vSAN
cluster:

• VM storage policies must have either 0 percent or 100 percent object space reservations.

• Deduplication and compression are available only on all-flash disk groups.

• The processes of deduplication and compression incur compute overhead and potentially
impact performance in terms of latency and maximum IOPS.

• However, the extreme performance and low latency of flash devices easily outweigh the
additional compute resource requirements of deduplication and compression in vSAN.

• Enabling deduplication and compression consumes up to 5 percent of the vSAN datastore


capacity for metadata, such as hash, translation, and allocation maps.

• The space consumed by the deduplication and compression metadata is relative to the size
of the vSAN datastore capacity.

258
7-11 Compression-Only Mode (1)
You can enable compression-only mode on an all-flash vSAN cluster to provide storage space
efficiency without the overhead of deduplication.

Compression-only mode provides the fallowing capabilities:

• Compression-only mode can reduce the amount of physical storage consumed by as much
as two t imes.

• Compression-only mode reduces the failure domain from the entire disk group to only one
disk.

If a vSAN cluster is enabled for deduplication and compression, any disk failure affects the
entire disk group operation.

• You can scale up a disk group without unmounting it from the vSAN cluster.

259
7-12 Compression-Only Mode (2)
The compression-only mode algorithm moves data from the cache t ier to individual capacity
disks, which also ensures better parallelism and throughput.

vSAN Layer

Disk Group

Writes Write ACK

260
7-13 Configuring Space Efficiency
To configure space efficiency, select a vSAN cluster and select Configure > vSAN > Services >
Space Efficiency > Edit.

Select the Compression only or Deduplication and compression mode and click APPLY.

vSAN Services vSAN - Cluster x


& -
These settings require all disks to be reformatted. Moving large amount of stored date might be slO\V and
temporarily decrease tile performance of the cluster.
J

I
space efficiency I Compression only
None
v

(> Data-At-Rest encry Compression only


Deduplication and com pression
Wipe residual data

v
Key provider

0 Allow reduced redundancy ©


C> Data· In· Transit encryption ©
Rekey interval A 1 day v

[ CAN CEL l APP LY

261
7-14 Verifying Space Efficiency Savings
To verify the storage space savings information, select a vSAN clust er and select Monitor >
vSAN > Capacity > CAPACITY USAGE.

capac ity

CAPACITY USAGE CAPACITY H ISTORY

Capacity Overview

• Used 18.24 GB/199.99 GB (9.12%)

Act ually written 18.24 GB (9.12%)


Compression savings: 2.11 GB (Ratio: 1.23x)

262
7-15 Using RAID 5 or RAID 6 Erasure Coding
(1)
You can use RAID 5 or RAID 6 erasure coding to protect against data loss and also increase the
storage efficiency.

Erasure coding can provide as much data protection as RAID 1 failures to tolerate = 1 while using
less storage capacity. For example, a VM protected with RAID 1 requires twice the virtual disk
size, but with RAID 5 it requires only 1.33 times the virtual disk size.

You can configure RAID 5 on all-flash vSAN clusters w ith four or more nodes and RAID 6 on six
or more nodes.

7-16 Using RAID 5 or RAID 6 Erasure Coding


2
The table presents a general comparison between RAID 1 and RAID 5 or RAID 6 erasure coding.

RAID 5 or RAID 6 erasure coding does not support a failures to tolerate value of 3.

Failure Tolerance Method Failures to Tolerate VM Disk vSAN Storage Capacity


Size Required

RAID 1 1 100 GB 200 GB

RAID 5 1 100 GB 133 GB

RAID 1 2 100 GB 300 GB

RAID 6 2 100 GB 150 GB

RAID 1 3 100 GB 400 GB

263
7-17 Using RAID 5 or RAID 6 Erasure Coding
(3)

RAID 5 or RAID 6 erasure coding is a storage policy at tribute that you can apply to VM
components.

To use RAID 5, set Failures to tolerate to 1 failure - RAID-5 (Erasure Coding).

To use RA ID 6, set Failures to tolerate to 2 failures - RAID-6 (Erasure Coding).

vSAN

Availability Advanced Policy Rules Tags

Site disaster tolerance © None · standard cluster

Failures to tolerate © No data redundancy v

ould bl! 100 GB


1 allure · RAID·1 M1rronng
1 failure· RAID·5 (Erasure Coding)
2 failures · RAID·1 (Mirroring)
2 failures · RAID·6 (Erasure Coding)
3 failures · RAID ·1 Mirronn.

264
7-18 Reclaiming Space Using TRIM/UNMAP (1)
vSAN supports SCSI UNMAP commands directly from a guest OS to reclaim storage space.

The guest operating systems can use TRIM/UNMAP commands to reclaim space that is no
longer used.

A TRIM/UNMAP command sent f rom the guest OS can reclaim the previously allocated storage
as f ree space. This opportunistic space efficiency feature can deliver much better storage
capacity utilization in vSAN environments.

7-19 Reclaiming Space Using TRIM/UNMAP


(2)
In addition to freeing up storage space in t he vSAN environment, TRIM/UNMAP provides the
fallowing benefits:

• Faster repair

• Removal of dirty cache pages

Because reclaimed blocks do not need to be rebalanced or rem irrored if a device fails, repairs
are much faster.

Removal of dirty cache pages from the write buff er reduces the number of blocks that are
copied to the capacity tier.

7-20 Reclaiming Space Using TRIM/UNMAP


(3)

vSAN supports offline unmaps as well as inline unmaps.

On Linux operating systems, off line unmaps are performed w ith the f strim command. lnline
unmaps are performed when the file system is mounted using the moun t -o dis card
command.

On Windows operating systems, NTFS performs inline unmaps by default.

265
7-21 Enabling TRIM/UNMAP Support
TRIM/UNMAP support is disabled by default. You can enable TRIM/UNMAP support in the
following ways:

• PowerCLI:

• Get - Cluster - name <Cluster- Name> I Set -


VsanClusterConfiguration - GuestTrimUnmap:$true
• Ruby Virtual Console:

• /localhost/VSAN- DC/computers> vsan.unmap support


<vSAN- Cluster Name> - e
Unmap support is already disabled
VSAN - Cluster: success

266
7-22 Monitoring TRIM/UNMAP
To monit or TRIM/UNMAP statistics, select a host in the vSAN clust er and select Monitor >
vSAN > Performance > BACKEND.

Unmap Throughput measures UNMAP commands that are being processed by t he disk groups
of a host.

Recovery Unmap Throughput measures t hroughput o f UNMAP commands being synchronized


as part of an object repair fo llowing a failure or an absent object.

Performance

VM BACKEND D ISKS ~YSICAL ADAPTERS HOST NElWORK IOI NSIGHT

Time Range. LAST v 1


- - Hour(s) I SHOW RESULTS J

Metrics about hosts in t he perspective or vSAN baelcend.

IOPS (D
12

0
7:30 AM S-00 AM S IS AM 9 30 AM

- Read IOPS - Write !OPS - Resync Read IOPS - Recovery write IOPS - ~ IOPS - Recovery lk\map IOPS

1 00 KB/s
I TrimNnmap Throughput <D I
512008/s

0.00 B/s
7:45AM 8-00 AM S·l SAM

I - ~P Throughput - Recovery lklmap Tnroultoput I

267
7-23 Lab 6: Configuring vSAN Space
Efficiency
Configure vSAN space efficiency features:

1. Configure vSAN Cluster Space Efficiency

2. Verif y Space Efficiency Savings

268
7-24 Review of Learner Objectives
• Describe vSAN storage space efficiency

• Discuss deduplication and compression overhead

• Explain how to configure compression-only mode

• Detail the use of RAID 5 or RAID 6 erasure coding

• Describe how to reclaim storage space with TRIM/UNMAP

7-25 Key Points


• Consider enabling deduplication and compression for efficient use of vSAN storage space.

• Compression-only mode can be enabled to avoid deduplication overhead.

• RAID 5 or RAID 6 erasure coding provides significant storage space savings.

• You can use TRIM /U NMAP to reclaim storage space.

Questions?

269
Module 8
vSAN Security Operations

8-2 Importance
Maintaining data security is critical in any organization to meet enterprise security compliance.
vSAN offers the data-in-transit and data-at-rest encryption methods to ensure that data remains
secure.

Encrypting your vSAN datastore requires you to set up a key management server cluster.
When you enable encryption, vSAN encrypts everything in the vSAN datastore.

vSAN data-in-transit encryption secures the traffic exchanged between vSAN nodes.

271
8-3 Lesson 1: vSAN Security Operations

8-4 Learner Objectives


• Explain how vSAN encryption works

• Discuss design considerations for vSAN encryption

• Detail how to set up key providers

• Describe how to configure vSAN data-at-rest encryption

• Identify the steps to replace key providers

• Discuss vSAN node core dump encryption

• Describe how to configure vSAN data-in-transit encryption

272
8-5 vSAN Encryption
vSAN encryption is a native HCI encryption solution. It is built in to the vSAN layer:

• Supports various key management server (KMS) vendor solutions

• Configured at the cluster level

• Supported on hybrid and all-flash vSAN clusters

• Compatible with other vSAN features

• Supports data-at-rest encryption, a vSAN datastore-level method

• Supports data-in-transit encryption, a vSAN network-level method

vSAN Services VSAN-auster x


Space efficiency Deduplication and compression v ©
I (> Data·A l ·Rest encryption <D I
Wipe residua l data <D
Key provider

AllO\V reduced redundancy <D


I(:. Data -In· Transit enaypt ion © I
Rekey Interval i day v

[ CA NCEL I APP L Y

273
8-6 Design Considerations for vSAN
Encryption
Consider the following points when working with vSAN encryption:

• Do not deploy your key provider on the same vSAN datastore that you plan to encrypt.

• Modern processors offload encryption operations to a dedicated portion of the CPU.

• The witness host in a two-node or stretched cluster does not participate in vSAN
encryption.

• vSAN node core dumps are also encrypted.

• vSAN data-at-rest encryption and vSAN data-in-transit encryption are independent of each
other. These features can be enabled and configured independently.

274
8-7 About Permissions
In secure environments, only authorized users should be able to perform crypt ographic
operations:

• The No cryptography administ rator role limit s authorized users.

• You should consider assigning this role to a subset of administrat ors when enabling vSAN
encryption.

• Cust om roles help implement least-privilege management.

• You should review and audit role assignments regu larly to ensure that access is limited only
to aut horized users.

ROI' es

.. Alarms
. ~ ~9!~~·m
• ~~·~ al¥n'I
• Disao:e NIM «:.on
• M.xttr~~
• ~~ 111.l""'I
• .,... l"" ia1us

Perm ss1ons
• "lo<. .""'6SO'I

I"',er)~~ ~~
c
eOO!ir.'l:'r.!>f I • MoO•t
• Moo•r lo"c'
• ~

• ~ ~ ,llOf'e

Auto Deploy
• ~ I

.• ~-
Ctc '<::
·~

• E61
.• ~'
c;rui..
• Oc;c·e
• Ed.I
. •~ '~
;.r; ·e
- 1

Certificates
.. Conte.it Library
• ....... .... t

275
8-8 Setting Up Key Providers
Use a supported key provider to distribute the keys to be used with vSAN encryption.

To support encryption, add the KMS to vCenter Server and establish the trust. vCenter Server
requests encryption keys from the key provider.

The KMS must support the Key Management Interoperability Protocol (KMIP) 1.1 standard.

For more information about KMS vendor solutions supported by VMware, see
https://www.vmware.com/resources/compatibility/pdf/vi_kms_guide.pdf.

- - - - t '
11111111 s . '-
I
- - - /
'
/
vCenter Server
I/

276
8-9 KMS Server Cluster
Set up the KMS cluster for high availability to avoid a single point of failure.

The KMS cluster has the following characteristics:

• The KMS cluster is a group of KM IP-based key management servers that replicate keys to
one another.

• The KMS cluster must have key management servers f rom the same vendor.
_____....,. ,
r. - - - - - - - -
I 111111 § • I
I I t ,
- - - -
: 111 s •
..........___.. ,
11 I :
---- /
'/ vCenter Server
I lllll!!!l ~-----.
I I 111111 s . :
, ________ ...
KMS Cluster

277
8-10 Adding a KMS to vCenter Server (1)
You add a KMS to your vCenter Server system from the vSphere Client.

vCenter Server defines a KMS cluster when you add the first KMS instance and sets this cluster
as a default.

You can add KMS instances from the same vendor to the cluster and configure them to
synchronize keys among each other.

If your environment requires KMS solutions from different vendors, you can create KMS clusters
for each vendor.

@ sa-vcsa -01 .vclass.loca l ACTIONS v

Summary Monitor Configure Perm1ss1ons Datacenters Hosts & Ousters

Settings v Key Providers


General
ADD STANDARD KEY PROVIDER A AU T D T
L1cens1nq

Messaoe of the Dav


Advanced Seltinos
Authentication Proxy
vcenter HA

Security v

Trust Authority
Kev Providers

See your KMS vendor documentation to configure the key synchronization.

278
8-11 Adding a KMS to vCenter Server (2)
To set up communication between the KMS and vCenter Server, trust must be established.

vCenter Server uses KMIP to communicate with the KMS over SSL or TLS.

....------ 1
r. - - - - - - - -
1
11111111 @ • 1 SSL or TLS
I I / 1,
~
____
: ..._
11111111 @ •....... ,I </
,,.. L-
__________ _
- - - - - - - - - - -
vCenter Server
' I/
: 11111111 @ • : KMIP Protocol
l ________ .J

KMS Cluster

279
8-12 KMIP Client Certificates
The type of certificate used by vCenter Server (KMIP client) depends on the KMS vendor.

Always check with the KMS vendor for their certificate requirements to establish the trust.

Make KMS trust Choose a method x


vCenter
Cnoose a metood to mai<e the KMS trust the vCc-.1t~ baseo ori the KMS veooor's
requ: ements Once tile' trust ts estabU~ a rep icas 1ri the same KMS c ster v.
1 Choose a method
a so trust t~ vCemer

2 Estao 1s:i - r1.1st


O vCeo:er Root CA Certr~ica:e
D<wr.lload •he \'Center roo: cer.!·icate and upload " to •he KMS A cen f1cates
signed by th1s root cer..'icate v1111 be uusted by the MS

O vCenter Certificate
00'.>mlO!d the vCente cert f1cate and uoload t to tne- K.V.S

O MS certificate and pnvate key


Up oad the !<MS certificate and pnvate key to vCeoter

0 New CertJfJCate S1gnrig Reouest (CS~)


Submit the ..Cente·~oMe ated CS=! to the KMS then upload the- ne-.v ~·15·
sigoeo cerlif1cate to \'Center

CANCEL NEXT

280
8-13 vSAN Data-at-Rest Encryption (1)
When vSAN data-at-rest encryption is enabled on a vSAN cluster:

1. vCenter Server requests a key encryption key (KEK) from the KMS.

2. vCenter Server sends the KEK ID to all hosts.

3. Hosts use the KEK ID to request the KEK from the KMS.

4. Hosts create a unique data encryption key (DEK) for each drive.

5. Each cache and capacity drive is encrypted using its DEK.

6. vCenter Server requests host encryption key (HEK) from the KMS which is used to encrypt
core dumps.

11111111 KMS • vCenter Server


-..iiiiioiiiii--~-- K-EKID : ~
----- ------

I KEK :
.--
1
--
KEK ID :
·
-- - - - - - -

Ill 0 111 ESXi

I DEK I I DEK I
Core
Dumps 88 88 88

88 88 88
- Disk Group - - Disk Group - - Disk Group -

281
8-14 vSAN Data-at-Rest Encryption (2)
When data-at-rest encryption is enabled on a new vSAN cluster, disks are encrypted
immediately on the creation o f disk groups.

If dat a-at-rest encryption is enabled on the existing vSAN clust er, a rolling disk format change is
perf ormed. Each disk group is evacuated in turn. The cache and capacity devices are
reformatted using the DEKs.
r- · -- · -- · -- · -- · -- · -- · -- · -- · -- · -- · -- · -- · -- · -- · -- · -- · -- · 1
- I
I vSAN Cl ust er -
I I
- I
I
I Ill 0 Ill Ill 0 Ill Ill 0 Ill I
- I
I • • • • • •
I SSD SSD SSD I
I
• • • • • •
I -
I • •- • • • • -• • • • ..... • • I
- SSD s. SD Rol ling SSD SSD Encryption SSD SSD I
-
•-• •- •
.....
I • • • • • • • •
I I
- -- t -- t - - t - - t - - t - - I - - I - - t - - t - - I - - I - - t - - l - - t -- t -- I - - I - - l
I

282
8-15 Operational Impact When Enabling
Encryption

Encrypted and unencrypted vSAN datastores use different disk formats.

On new vSAN clusters, the appropriate disk format is selected automatically, but an existing
vSAN cluster requires a disk format change (D FC).

You must consider the following points before enabling encryption on an existing vSAN cluster:

• Ensure that sufficient capacity is available to pert orm disk evacuations.

• Ensure that no resync operations are running.

• Ensure that the cluster is in a healthy state.

• Ensure that no congestion exists in the cluster.

• Pref er ably schedule the D F C outside of production hours.

283
8-16 Enabling vSAN Data-at-Rest Encryption
To enable vSAN data-at-rest encryption, select t he vSAN cluster and select Configure > vSAN
> Services > Data-At-Rest Encryption > EDIT.

Enable Data-at-Rest encryption. Optionally, select Wipe residual data and Allow reduced
redundancy. Click APPLY.

vSAN Services VSAN-Cluster x


& These settings require all d1s1<s to be reformatted Moving large amount of stored data might be slow and
temporarily decrease the perrormance or the cluster

Space efficiency DeduphcatJon and compression v <D


~ Data-At-Rest encryption <D
Wipe residual data <D
v
Key provider

&I Allow reduced redundancy <D


(]t Data-In-Transit encryption <D
Rekey interval 1 day v

I CANCEL I A PPLY

284
8-17 Wiping Residual Data
Select the Wipe residual data check box to erase any existing data f rom devices before you
enable data-at-rest encryption on an existing cluster.

This setting is not necessary for enabling data-at-rest encryption on new vSAN clusters.

vSAN Services vSAN-Cluster x


Space efficiency Deduplication and compressionv ©
C> Dat a-At -Rest encrypt ion ©

I Wipe residual data

Key provider
I ©

Allow reduced redundancy ©


(]It Dat a-In· Transit encryption ©

( C AN CEL l APPLY

285
8-18 Allowing Reduced Redundancy
If your vSAN clust er has a limited number of fault domains or hosts, select the Allow reduced
redundancy check box.

If you allow reduced redundancy, your dat a might be at risk during the disk reformat operation.

vSAN Services vSAN -CJuster x


Space efficiency Dedup lication and compression v ©
(> Data-At-Rest encryption ©
Wipe residual data ©
Key provider

I Allow reduced redundancy I ©


(> Data· In·Transit encryption <D

I C ANCEL I APPLY

286
8-19 Writing Data to an Encrypted vSAN
Datastore
Data is written to the cache tier:

1. Write 1/0 is broken into chunks.

2. Checksum is created.

3. Encryption is performed.

4. Encrypted data is written to the write buff er.

Data is later destaged to the capacity tier.

5. Decryption is performed.

6. Deduplication is performed (If configured).

7. Compression is performed (If configured).

8. Encryption is performed.

9. Data is stored in the capacity tier.

----------------r----------------r----------------
: ESXi-01 : ESXi-02 : ESXi-03 :
I~---------------~----------------~---------------J
I I I
I I I I
I I I
I
I I
I VM I
I
I
I I
I I
I I
I I
I I
I I
I I
I I I I
I I I I
I I
I RAID 1 I
I
I
I
I
I I
I I
I I
I I
I I
I I
I I
I
I I
I
I
I
I
I
• 1111 1111 I
I
I I
I
I I
• • I • • • •
I I
I SSD I
I
SSD SSD I
I • • • • • • I
I
I I
I
I I
I
I I
I
I • I
I
I I
I
I • • I
I
I
I
I SSD I
I
I • • I
I I
I I
I I
I I
~------------------------------------------------~

287
8-20 Scaling Out a Data-at-Rest Encrypted
vSAN Cluster
When you add a host to the data-at -rest encrypted vSAN cluster, the new host receives the
KEK and t he HE K from t he KMS.

The DEK is generated for each cache and capacity device, and disk groups are creat ed using t he
correct format.
-----------------------------------------------------
I I
I
vSAN Cluster I
I

• ......Ill---- ...... I
I
I
I
I
____......
...._ 0 Ill Ill 0 Ill Ill 0 Ill Ill 0 Ill I
I
I
I I
I
I
a> KEK I
I
I I
I
• • • • • • • •
I
I SSD SSD SSD SSD I
I • • • • • • • • I
I I
I I
I • • • • • • • • I
I SSD SSD SSD SSD I
I • • • • • • I
I
.. • • •. • • • • • • • • • • • • • • .• I
I . . I

1_ - - - - - - - - - - - - - - - - - - - - - - ~~~~~- - - - - - - - - - - - - - - - - - - - - - I
......... - ...... ..

• '

288
8-21 Performing Rekey Operations (1)
As part of security auditing, regularly generate new encryption keys to maintain t he enterprise
security compliance.

vSAN uses shallow rekey and deep rekey, which is a two-layer encryption model to make data
very well secured.

llJJ vSAN-Cluster ACTIONS v

summary Monitor Configure Permissions Hosts VMS Data stores Networks Updates

Strvicts > vSAN Servtces TU R~l OFF VSAN

Corligur1tion > Space EHiciency Dedupltcat1on and compression EDIT

Licensing > v Data-At-Rest Encryp~on Enabled ,- " ,,, - .,_ '' · '· "1• " · " , -, · 1

Trust AuthontV
Key provider SA·KMS-Cluster·01
Alarm Definitions
Scheduled Tasks D sk w ping Disabled

VSAN v
> D!lta·ln-Transit Encryption Enabled EDIT

> Performance service Enabled EDIT


Disk Manai:iem ent
Faull Domains > vSAN iSCSI Target Service Disabled EDIT

Datastore Shannq
> File Service Disabled ENABLE

> Enab e Capacity Reserve EDIT

> Advanced Options EDIT

289
8-22 Performing Rekey Operations (2)
A shallow rekey operation replaces only the KEK, and the data does not require re-encryption.

A deep rekey operation replaces both the KEK and the DEK and requires a full DFC.

In most cases, a shallow rekey is sufficient.

Perf arming a deep re key is time-consuming and might temporarily decrease the performance of
the cluster.

Generate New Encry pt ion Keys x

CD Re-enCl"'JPlion temporanly de<reases the performance of the cluster.

All encrypt on keys on the key management server cluster are regenerated .
a Also re· encryp t all de la on th e storage using the new k eys(!)
Allov1 reduced redundancy ©

[ CAN CEL I '


GENERA f E

290
8-23 Rotating KMIP Client Certificates
As part of enterprise security auditing, you might be required to periodically rotate the KMIP
client certificate on vCenter Server:

• All changes should be performed on vCenter Server.

• When the client certificate is replaced, you must reconfigure the KMS to trust the new client
certificate.

0
o_ 0 -------------· I
_ ____..0- I
I
I
I
Trust KM IP Client Certificate I vCenter Server I
I
I

KMS I
I
I
I
I I. 1111 J I. 1111 1 I
I
I
I I. 1111 J
I· 1111 1 I
I
Trust KMS Certificate I
I I. 1111 )
I· 1111 I I
I
I ••••••• ••• •••• • ••••
• I
I •

~~~~
I
I
I
I
•• I
I •
••••••• • • •••••• I
I
I
I
I ' I

·-- ------ - -- ---- ·

291
8-24 Changing the Key Provider
You can change the key provider. The process of changing the key provider is essentially a
shallow rekey operation:

1. You add a key provider.

2. You select an alternate KMS cluster.

3. The new KMS configuration is pushed to the vSAN cluster.

vSAN Services vsAN -cJuster x

Space efficiency Deduplication and compression v ©


~ Data-At-Rest encryption

0 Wipe residual data ©


Key provider SA-KMS·Clusl er-01 v

SA-KMS ·Cluster·01
Allo•.v reduced redundanc. SA·KMS·Cluste<-02

ti) Dat a-In-Transit encryption ©


Rekey interval DEFAULT.., 1 day v

I C A NCEL I AP PLY

292
8-25 Verifying Bidirectional Trust
After you change the key provider, you verify that the KMS connection is operational.

Communication between the KMS and the KM IP client is temporarily interrupted until
bidirectional trust is established.

Key Providers

ADD S TANDARD KEY PROVIDER A f:'." DFFA U T f:'." DI T RFMOVF

Key Provider Co nnect ion Stat us Certificat es

0 SA·KMS-Cluster-01 (default) Q) Connected 0 Connected

0 SA·KMS·Clu ster-02 Q) Connected Q> Connected

293
8-26 About Encrypted vSAN Node Core
Dumps
Core dumps for vSAN nodes in crypto-safe mode are always encrypted using the HEK. Set a
password to decrypt encrypted cored dumps.

Export System Logs - sa - Select logs x


vcsa-01.vclass.local v l::J Select All I Deselect All
> 8 System
I 1 Select host s
> a Fed
> 8 V irtualMachines
2 Select logs > B Storage
> a Userwor1d
> a FeatureSlaleSwitch
v PJ Confiouratioa_
lJ Gather performance data

Duration: 300 second(s) interval: 5 second(s)

__J Password for encrypted core dumps

Password:

Confirm password:

(D You can upload tiles directly to V Mwar& by going to A dmln1stratJon >support > Upload Fiie to Se<'v1ce Request

CANCE L EJ EXPOR T LOGS

A core dump is a state of memory that is saved at the time when a system stops respond ing
with a purple error screen.

Core dumps are used by VMware Support representatives for diagnostic and technical support
purposes.

ESXi host creates a VMFS-L based ESX-OSData volume and configures a coredump file.

294
8-27 vSAN Data-in-Transit Encryption (1)
vSAN data-in-transit encryption encrypts vSAN traffic exchanged between vSAN nodes.

vSAN uses a message authentication code to ensure authentication and integrity of the vSAN
traffic.

vSAN data-at-rest and vSAN data-in-transit encryption features are independent of each other.
They can be enabled and configured independently.

8-28 vSAN Data-in-Transit Encryption (2)


vSAN uses a native proprietary technique to encrypt the vSAN traf fic between vSAN nodes.

vSAN data-in-transit encryption does not rely on the KMS cluster for encrypting vSAN traffic
between vSAN nodes.

vSAN Services vSAN·Cluster x


space efficiency Deduplication and com pression v ©
C> Dat a·At-Rest encryption ©
Wipe residual data ©
Key provider
x
Fault tolerance traFlic encryption
Allow reduced redundancy
~--.v,.--~

~ Dat a-In· Tra nsit encryption ©I


Rekey in terva l OEFAUL T v 1 d ay v

I CANCEL l APPLY

295
8-29 vSAN Data-in-Transit Encryption
Workflow
vSAN enforces encryption on vSAN traffic exchanged between vSAN nodes only when data-in-
transit encryption is enabled:

1. vSAN creates a TLS link between vSAN nodes intended to exchange the traffic.

2. vSAN nodes create a shared secret and attach it to the current session.

3. vSAN uses the shared secret to establish an authenticated encryption session between
vSAN nodes.

296
8-30 vSAN Data-in-Transit Encryption Rekey
As part of t he security compliance audits, vSAN init iates the rekey process to generat e new
keys at t he scheduled interval.

By default, the rekey interval is set to one day. Depending on enterprise security compliance,
the rekey interval can be adjusted as needed.

vSAN Services vSAN -Cluster x

space efficiency DedupilcatJon and compression v ©


(:> Data·At-Rest encryption ©
W1oe residual data ©
Key provider

Allow reduced redundancy ©


tc) Oata-1n-Transrt encryption ©
Rekey inter val D EFAULT 'V 1 day v

6 hours
12 hours
1 day

[ CAN CEL I A PPLY

To enable vSAN data-in-transit encryption:

1. Select the vSAN cluster and select Configure > vSAN > Services > Data-In-Transit
Encryption > EDIT.

2. Enable Data-In-Transit encryption.

3. From the Rekey interval drop-down menu, select the required interval.

4. Click APPLY.

297
8-31 vSAN Data-in-Transit Encryption Health
Check
Individual vSAN node readiness for data-in-transit encryption is verified, and inconsistent
configuration can be remed iated.

To view t he status of t he vSAN data-in-transit encryption healt h check, select the vSAN cluster
and select Monitor > vSAN > Skyline Health > Data-in-transit-encryption > Configuration
check.

298
8-32 Scaling Out Data-in-Transit Encrypted
vSAN Clusters
When you add a host to the data-in-transit encrypted vSAN clust er, the new host configuration
is aut omatically remediated.

If you add a host running an older version of ESXi that does not support data-in-transit
encryption, the host is partitioned because it cannot communicat e with other hosts in t he cluster.
r----------------------------------------------------
I
1
I
I
I
I
I
I
I
I
I
I
I
I III 0 Ill Ill 0 Ill Ill 0 Ill Ill 0 Ill I
I
I
I
I
I
I
I • • • • • • • •
I
I SSD SSD SSD SSD I
I • • • • • • • • I
I
I
I • I
I
• • • • • • • • I
I
SSD SSD SSD SSD
I
I
~ . • - . • • • - . •
I
I
I
I
I
I
I
I vSAN Clust er I
I I
L----------------------------------------------------

299
8-33 Lab 7: Managing vSAN Security
Operations
Configure vSAN cluster encryption:

1. Configure vSAN Data-at-Rest Encryption

8-34 Lab 8: Encryption Rekey Operations


Rekey encryption and replace the key provider:

1. Generate New Encryption Keys

2. Verify Connectivity to the Second KMS

3. Register the Second KMS w ith vCenter Server

4. Change the KMS

300
8-35 Review of Learner Objectives
• Explain how vSAN encryption works

• Discuss design considerations for vSAN encryption

• Detail how to set up key providers

• Describe how to configure vSAN data-at-rest encryption

• Identify the steps to replace key providers

• Discuss vSAN node core dump encryption

• Describe how to configure vSAN data-in-transit encryption

8-36 Key Points


• vSAN encryption requires ongoing operations to maintain a secure environment.

• You must use permissions to restrict access to encrypted data.

• Rotating keys and certificates is a best practice to maintain high levels of security.

Questions?

301
Module 9
Introduction to Advanced vSAN
Configurations

9-2 Importance
Administrators must know how to enable and configure advanced vSAN features to benefit
their business.

Advanced featu res include using vSAN File Service to create file shares in the vSAN datastore,
scaling storage and compute independently w ith VMware HCI Mesh, using vSAN Direct to
manage local disks, and using the vSAN iSCSI target service to provide block storage to physical
workloads.

9-3 Module Lessons


1. vSAN File Service

2. VMware HCI Mesh Using Remote vSAN Datastores

3. vSAN Direct

4. vSAN iSCSI Target Service

303
9-4 Lesson 1: vSAN File Service

9-5 Learner Objectives


• Describe vSAN File Service

• Detail the vSAN File Service architect ure

• Describe vSAN File Service configuration

• Identify vSAN file share properties

• Explain how to view vSAN File Service pert ormance metrics

• Describe how to monit or vSAN File Service healt h

• Explain how to view vSAN file share capacity utilization

304
9-6 vSAN File Service
With vSAN File Service, you can provision NFS and SMB file shares on your existing vSAN
datastore. You can access these file shares from supported clients, such as VMs and physical
workstations or servers.

You can also create a container file volume that can be accessed from Kubernetes pods.

vSAN File Service is based on the vSAN Distributed File System (VDFS), which provides the
underlying scalable file system by aggregating vSAN objects.

The VDFS provides resilient file server endpoints and a control plane for deployment,
management, and monitoring.

Kubernetes Pods Remote Physical


VM VM
Workstation

NFS NFS NFS NFS

vSphere vSAN

Ill 0 Ill Ill 0 Ill Ill 0 Ill

305
9-7 vSAN File Shares
vSAN file shares are integrated into the existing vSAN Storage Policy- Based Management
(SPBM) on a per-share basis for resilience.

vSAN File Service uses a set of file service VMs (FSVMs) to provide network access points to
the file shares.

FSVMs run a containerized NFS service, and each file share is dist ributed bet ween one or more
containers.

r I

Fi le Share

r ------ --------------- --------------- ------ ----------- I


I
I
I
• • • I
I
I
I
I SHARD-1 SHARD-2 SHARD-3 I

I vSA N :
I
I
Docker Docker Docker File Se rvice •
I
I
Domain :
I
I
I
I
FSVM-1 FSVM -2 FSVM-3 I
I

·-----------------------------------------------------·

vSphere vSAN

111 0 Ill Ill 0 Ill 111 0 Ill

306
9-8 vSAN Distributed File System
When you configure vSAN File Service, a VD FS is created to manage the following activit ies:

• File share object placement

• Scaling and load balancing

• Pert ormance optimization

Scale-Out of Services

Load Ba lancing Resil ience

r I r I
r ' r ' r I
r ' r ' r I

Dr
Dr
Dr
Dr
Dr
D Dr
D
I I I I I
r ' I
r '
D D D D D D D D
l i l i l i l i l i l i l i l i
~f\lf-S ;

. docker W docker W docker •••

vSAN Distributed File System

1111 0 111] 1111 0 111 I 1111 0 1111 1111 0 111 1 1111 0 111I 1111 0 1111 1111 0 111! 1111 0 1111 1111 0 111! ••• 1111 0 111 !

vSAN Datastore

307
9-9 File Service VMs
FSVMs are preconfigured Photon Linux-based VMs.

Up to 32 FSVMs can be deployed per vSAN clust er to provide multiple access points and
f ailover capability for vSAN file shares.

One FSVM per host is deployed when vSAN File Service is enabled. ESX Agent Manager is used
to provision FSVMs and pin them to a host (affinity).
·-------------------------------------------------------------------,
I I
I I
I vSAN File Server Agent Virtual Machines - FSVM I
I I
I I
I I
I I
I I
I I
I I
I I
I
I
Docker Docke r Docker vSAN I
I
I
I
Fil e Serv ice I
I
I FSVM-1 FSVM-2 FSVM-3 Cluste r I
I I
I I
I I

------------- ------------------------- ------


I I
I i•-••• I I
I I I I
I
I
I
I VM -to -H ost Pin I
I
I
I
I I I I
I
1
1 _____ _
-------------- ------------------------
I L--~~~.--~~~~~~-.~~~~~~~~~~~~~--..--~~~
-----· I
I
I
I I
I
I Ill 0 0
Ill 111
111 111 0 Ill I
I

~-------------------------------------------------------------------·

308
9-10 Provisioning File Service Agent Machines
When you configure vSAN File Service, ESX Agent Manager performs the following tasks:

• Verify vSAN File Service OVF image compatibility

• Provision agent machines on to a vSAN cluster

If vSAN File Service is disabled, the solution sends a Destroy Agent signal to remove all the
FSVMs.

VCenter Server

Create Agent Provision Agent


Provisioning vSphere
Download OVF Package Engine Integration Provisioning and
vSAN File Service Monitoring Agents
Node OVF Package Get Runtime Info ESXi Host
Monitoring VIB Module
Destroy Agent Engine Monitor Agent
Install

EAM Permissions

309
9-11 File Service Agent Machines Storage
Policy
The FSVM is configured w ith a custom vSAN storage policy called
FSVM Profile DO NOT MODIFY.

This custom storage policy offers no data redundancy and pins t he FSVM to a host.

Do not modify this storage policy setting.

Rules VM Comp nee VM Tempi te Stor g Comp t1b1hty

n r I
FSVM_Profde_oo_ OT_MODIFY
Description Storage profile for FSV s

Rut • t 1: VSAN
Plac ment
Type VSAN
S te disaster to ranee None - st ndard cluster
No d redund ncy th host f an ty
,. .......mber of d str s per ob t 1

OPS Im t for ob ect 0


Ob;ect space reserv t1on Thin provrs1on ng
Rash read cache reservation 0
Dsab o tc um 0

Force proviSto n 0

310
9-12 Enabling vSAN File Service
vSAN File Service is disabled by default. You can enable it from vSAN > Services. Select a
vSAN cluster and select Configure > vSAN > Services > File Service > Enable.

IOI vSAN-Cluster ACTIONS v

Summary Monitor Configure Permissions Hosts VMs Data stores Networks Updates

Services > vSAN Se rvices TURN OFF VSAN

Configuration > Space Efficiency None EDIT

Licensing > > Data-At-Rest Encryption Disabled ED IT

Trust A uthontv
> Data-In-Transit Encryption Disabled ED IT
Alarm Definitions

scheduled Tasks > Periormance service Enabled EDIT

vSAN v
> vSAN iSCSI Target Service Disabled ED IT

Disk Manaqement
> File Service Disabled I ENABLE I
Fault Domains > Enable Capacity Reserve EDIT

Datastore Shanni:i
> Advanced Options ED IT

311
9-13 vSAN File Service Configuration
vSAN File Service maintains a per-cluster configuration. You can easily configure it t hrough t he
guided workflow in vCenter Server.

Domain vclass.local

DNS servers 172.20.10.10

Number of shares 0

Net\vork ~ VMNelwork

Subnet mask 255.255.255.0

Gateway 172.20-10 10

IP addresses
(sEE AL L J

version Last upgrade: 09/04/2020, 12:16:43 AM; OVF file version: 7.0.1.1000·16596215

312
9-14 vSAN File Service Domain Configuration
A vSAN File Service domain is a namespace for a group of file shares with a common
networking and security configuration.

You must enter a unique namespace for the vSAN File Service domain you are creating.

vFileService Enabled :S/\c.Lc ~~~~'·"~'c~J\l'c 0:11

I Domain I vclass.local

DNS servers 172.20.10.10

Number of shares 0

Net\vork VMNelwork

Subnet mask 255.255.255.0

Gateway 172.20.1010

IP addresses
!sEE ALL J

vers·on Last upgrade: 09/04/2020, 12:16:43 AM; OVF file version: 7.0.1.1000-16596215

313
9-15 vSAN File Service Network Configuration
In t his re lease, vSAN File Service supports only 1Pv4 for file share access.

Select a network port group for FSVMs to provide access to file shares.

You should select a distribut ed port group to ensure consistency across all hosts in a vSAN
cluster.

All FSVMs should reside on the same network subnet.

Domain vclass.local

DNS servers 172.20.10 .10

Number or shares 0

I Net\vork I ~ VM Network

Subnet mask 255 255.255.0

Gatew ay 172.20.10.10

IP addresses (sEE ALL )

v ersion Last upgrade: 09/04/2020, 12:16:4 3 AM; OVF file version: 7.0.1.1000·16596215

314
9-16 FSVM IP Address Configuration
You must provide a pool of static IP addresses for FSVMs.

Provide the same number of IP addresses as the number of ESXi hosts present in the vSAN
cluster during setup.

Domain vclass.local

DNS servers 172.20.10.10

)(
Numb er of shares
172.20.10.100 (primary) fsvm 01.v class.local

Network
172.20.10.101 fsvm -02.vclass.local

Subnet m ask
172.20.10.102 Fsvm ·03.vclass.local

Gatew ay 172.20.10.103 fsvm-04.vclass.local •


'----=~..;....;;_;;~~~~---'---i

IP addr ~s es
(seEALL J

315
9-17 Viewing ESX Agent Deployment
As part of t he vSAN File Service configuration, ESX Agent Manager deploys vSAN File Service
nodes.

D ESX Agents ACTIONS v

v O sa·vcsa·01.vclass.local summary M oo1tcr Perm1SSlOOS VM S Updates

v OJ SA·Datacente<
Vir t ual Machines VM T empla t es VAPPS VM Folders
> CJ Discovered virtual machine
v

{:~ vSAN F1 e Service Node (1)


Name t v Slate v Status v Prov1s1onee1 Space u~ecs Space Host CPU v H061 M em
(:~ vSAN F1 e Service Node (2)

& vSAN F1 e Service Node (3) f:.~ vSAN File Serv1c e Node (1) Powered On v Normol 12331 GB 4 6 GB 812 MHz 2.18 GB

~ vSAN F1 e Service Node (4 ) t~ vSAN File Servtce Node [2) Powered On v Normol 1233 GB 4 Gt GB 812 MHl 18 GB

E; sa-vcsa-01.vclass.local t~ vSAN File Service No de (3) Powered On v Norm81 1233 GB 4 6 GB 78'1 MHz 172 GB
(:~ vSAN File Serv1c e Node t4) Powered On v Normol 1233 GB 4 6 GB 12 GHt 171 GB

316
9-18 Creating a vSAN File Share
After vSAN File Service is enabled, you can create a file share to access from NFS clients and a
container file volume to access from Kubernet es pods.

To create a file share, select a vSAN cluster and select Configure > vSAN > Services > File
Shares > ADD.

lOl vSAN-Cluster ACTIONS v

Summary Monitor Configure Permissions Hosts VM S Data stotes N etworks Updates

services > Fiie Shares

Configuration > File service domain: vclass.loca I

Share deployment type: All v


Licensing >
vSAN File Share
Trust Authority
Alarm Defin1t1ons
I ADD
I Container File Volume

Na mo Doployrnent typo Proto col T Storage Policy Usa ge/Quota Actua l Usage
Scheduled Tasks 0 T

VSAN v

SeNices
Disk Manaqemenl
Fault Domains
File Shares
Datastore Sharinq

317
9-19 Configuring a vSAN File Share
Choose a suitable name for the file share.

You can select eit her t he NFS or the SMB protocol to access the file share. You also select the
required protocol version.

A file share supports both AU TH_SYS and the Kerberos security mode. Based on t he selected
protocol version, select t he supported security mode.

The vSAN default storage policy is assigned for file shares. You can select a policy based on
your availability and performance requirements.

Define the file share quota to limit the capacity t hat t he file share can consume on t he vSAN
datastore. Include a warning t hreshold .

Labels are key-value pairs t hat can be used to identify file shares. Labels are useful when
assigning file shares to Kubernetes pods.

General x
N ame prod rs

NFS v
Protocol
Enable active directory configuration in the File service configuration bef01 e using
SM B protocol

v~sions NFS 4 .1 and NFS 3

Security mode AUTH_SYS


Enable active directory conflguret1on in the File Service conf19uratlon before using
Kerberos authenUcauon

Storage policy vSAN Default Storage Policy v

Storage spiilce quotas

rJ Share warning threshold 1 GB v

rJ Share hard quota 5 GB v

Lab~s © dept prodl ADD

CAN CEL NEX T

318
9-20 Configuring Network Access Control
You use network access cont rol to limit which clients can access a file share.

A vSAN file share can be allowed to access from any NFS client IP addresses or from a specific
list o f client IP addresses.

You can define a custom network access property.

Net access contro l x


0 No access

0 Allow access rrorn any IP

0 Customize net access

(D
- -
rhe rules are honored trori top to bottom Top rules override bottom ones Put more genEKal rules below the spec1rlc ones You can use
· ··to denote ·any other IP addresses not mentioned above·

IP set/subnet Permission Squasf1 option

10.10.123.0/8or10.10.123.1· IO. 0 .123.16 Readonly v (] Root squash


-------

CANCEL [ BACK ) NEX T

319
9-21 Viewing vSAN File Share Properties
After a file share is created, you can record the file share mount path details to mount from NFS
clients.

You can view file share storage use details.

And you can modify the file share storage quota properties by editing the file share.

IDl vSAN-Cluster ACT IONS v

summary Monitor configure Permissions Hosts V Ms o atastores Networks Updates

Services > File Shares

File service domain: vclass.local


Configuration >
Share deployment type: VSAN File Share v
Licensing >
Trust Authonty A DO EDIT OElETE C OPY PATH v
x
Alarm Definitions
Name T Deploym Share path copied. Storage Polley usage/Quota Actual usage
Scheduled Tasks
lsvm ·01.vclass. locatJprodl s
vSAN v .., • :;r ti'·. vSAN Fr a \'' ) ·. •,, ::r·' :u t ''): . ~::r· .., I I \ 0% 0.00 B

Seivices
Disk ManaQement
Fa ult Dom a1ns
Frie Shares

Datastore Sharrnq

320
9-22 Monitoring vSAN File Share Performance
Metrics
You can monitor throughput, IOPS, and latency-related information per file share.

To monitor performance metrics, select a vSAN cluster and select Monitor > vSAN >
Performance > FILE SHARE.

llj] v SAN -CI ust e r ACTIONS v

summary Monitor Configure Permissions Hosts VMS Datastores Networks Updates

Security
Pe rform ance

vSAN v VM BACKE ND FIL E SHARE IOINSIGHT

Skyline Health File sha re: C prodfs


Virtual Objects
Time Range: LAST v 1 Hour(s) [ SHOW RESULTS ]
Physical Disks

Resyncini:i Objects

Proactive Tests mNFS performance - &I File system performance - ©


Capacity
Metrics ab out vSAN f ile serv ices.
Perforn1 anee

321
9-23 Viewing VMware Skyline Health Details
for vSAN File Service
VMware Skyline Health provides detailed information about vSAN File Service infrastructure
health, file server health, and share health.

To view the health details, select a vSAN cluster and select Monitor> vSAN > Skyline Health >
File Service.

!Bl vSAN -Cluster ACTIO NS v

Summa ry M onitor Configure Permissions Hosts V Ms

Performance > Skyline Health


RE TES T
Tasks and Events > Last che-cked: 09/04/2020, 2:44:18 AM

> N etwork
Resource Allocation >
Utilization > Physical disk
Storaqe Overview
Security > Data
VSAN v > Cluste r
Skyline Health
> Capacity ut ilization
Virtual Objects
Physical Disks > Performan ce service
Resyncinq Ob1ects
Proactive Tests >Hyperconve rged cluster config urat ion
compliance
Capacity
Performance
Performance Diaqnostics
Iv File Service f

Support ~·~'-"_fr_as_t_ru_ct_u_r_e_H_e_a_lt_h~~~~__,>
[

Data Miqration Pre-check ~ File Server Hea Ith

Cloud Native Storage v


9 Share Health
Container Volumes

322
9-24 vSAN File Service Considerations
Consider the fol lowing points when selecting vSAN File Service:

• Only 1Pv4 is supported, and static IP addresses are required for FSVMs

• Up to 32 file shares can be created.

• ESXi hosts in a vSAN cluster should have a minimum of four CPUs.

• Stretched clust er and two-node clust er configurations are not supported.

For more about vSAN File Service, see the FAQ page at
htt ps:/ I storagehub. vmware.com/t/vsan-frequently-asked-questions-f aq/file-service-7 I

323
9-25 Lab 9: Configuring vSAN File Service
Configure vSAN File Service:

1. Configure vSAN File Service

2. Create a File Share

3. Mount the NFS Share to the Student Desktop

4. View vSAN File Service Virtual Objects

5. Verify vSAN File Service Skyline Health Details

6. Monitor vSAN File Share Performance Metrics

324
9-26 Review of Learner Objectives
• Describe vSAN File Service

• Detail the vSAN File Service architect ure

• Describe vSAN File Service configuration

• Identify vSAN file share properties

• Explain how to view vSAN File Service pert ormance metrics

• Describe how to monit or vSAN File Service healt h

• Explain how to view vSAN file share capacity utilizat ion

325
9-27 Lesson 2: VMware HCI Mesh Using
Remote vSAN Datastores

9-28 Learner Objectives


• Describe VMware HCI Mesh technology

• Discuss VMware HCI Mesh architecture and use cases

• Describe VMware HCI Mesh common topologies

• Describe the VMware HCI Mesh client and server views

• Explain how to mount a remote vSAN datastore

• Explain VMware HCI Mesh license requirements and scalability limitations

• Discuss VMware HCI Mesh interoperability with vSphere features

326
9-29 About VMware HCI Mesh
VMware HCI Mesh is a technology for the disaggregation of compute and storage resources in
vSAN.

With the VMware HCI Mesh architecture, you can remotely mount datastores from other vSAN
clusters (server clusters) to one or more vSAN clusters (client clusters). The client and server
clusters must be w ithin the same vCenter Server inventory. This approach maintains the
simplicity of the existing HCI model without requiring specialized hardware.

327
9-30 Previous vSAN Challenges
In traditional vSAN clusters, there was no way to use storage from cluster 1 in cluster 2 without
physically relocating storage devices.

I
I
I
I VM VM VM VM
I
I
I
VM VM VM VM I VM VM VM VM
I
I
I
I
I
I
I
I
I
I
I
I
I
I
40% I 80%
I
I
I
vSAN Cluster 1 I
vSAN Cluster 2
Capacity Underused I Capacity Overused
I

One cluster is under utilized and the other is running out of capacity.

328
9-31 VMware HCI Mesh Advantages
VMware HCI Mesh provides t he following advantages over traditional vSAN:

• It improves t he storage-to-compute rat io.

• License optimization: The st orage and comput e can now be separated. You can use t he
appropriate licenses as required by t he clusters and save cost s.

• Heterogeneous storage classes: Different types of storage classes provide bett er efficiency
in hyperscale deployments.

• It has policy-based performance and availability placement .

...
..-.

Un ified HCI Cross-Cl uster Sharing Save on Compute


Management of Resources and Licensing

--------------------
I
vSAN Cluster Federation -------------------, I
I I
I
I
I
I
I VM VM VM VM VM VM VM VM I
I I
I
I
I
I
I Ill 0 Ill Ill 0 Ill Ill 0 Ill ,Ill 0 Ill Ill 0 Ill Ill 0 Ill I
I
'
I
I Ill 0 Ill Ill 0 Ill Ill 0 Ill Ill 0 Ill Ill 0 Ill Ill 0 Ill I
I
I
I I
I
I
I Existing Cluster Exist ing Cluster New Cluster
I
I Capacity underused Capacity Overused No VMs Running I
I Storage Only

329
9-32 VMware HCI Mesh: Use Cases (1)
Example VMware HCI Mesh architecture use cases:

• Balancing capacity across vSAN clusters:

• VMware HCI Mesh can be used for balancing capacity. vSphere Storage vMotion can
move data (not compute) to other vSAN clusters based on capacity usage and the
configured threshold.

• Hardware maintenance:

• VMware HCI Mesh can be used to move data during patching or maintenance windows.

• Storage as a service:

• Cloud providers can provide a managed pool of storage for multiple tenant consumers
by using storage-only cluster topology, which is easily scalable. In this model, the cloud
provider owns and manages the storage-only cluster. The tenant cluster mounts the
datastore remotely.

During hardware maintenance, the cluster might not have sufficient space to move data. In such
cases, some VMs can be migrated to another vSAN datastore using VMware HCI Mesh.

330
9-33 VMware HCI Mesh: Use Cases (2)
• Scaling compute without adding storage:

• Traditional vSAN scales by adding hosts, which results in inefficient storage use.

• You might have more storage than t he workloads on that cluster need.

• You can use VMware HCI Mesh to scale storage and compute independently of each
other.

• License optimization:

• Storage and compute can be separated to save license costs.

• You can use the appropriate licenses as required by t he clusters and save costs.

Oracle requires the licensing of every socket in the cluster where the compute is located. If the
workload requires more storage, the customer must procure not only compute and VMware
licensing but also Oracle licensing.

With VMware HCI Mesh, smaller vSAN clusters can be configured to run the Oracle compute
and still use capacity from participating clusters.

331
9-34 Stranded Capacity Issues
A fixed compute-to-storage rat io can lead to a stranded capacity problem in an HCI
environment.

Stranded capacity:

• Compute-intensive workloads are likely to run out of CPU and memory before storage.

• Storage-intensive workloads run out of storage, either capacity or IOPS, before CPU and
memory.

Because of the flexible VMware HCI Mesh architecture, VMware HCI Mesh can be used w ith
different topologies to address such issues.

332
9-35 Comparing Homogeneous and
Heterogeneous Storage
A vSAN cluster exposes storage policies which are homogeneous across all hosts in the cluster.

Homogeneous storage classes can lead to inefficient storage resource usage:

• Workloads have diverging storage performance requirements.

• Different production VMs can have drastically varying performance needs.


For example:

• A database, such as oracle or SOL Server, requires low latency offered by an all-flash
NVMe configuration.

• A web server is perfectly capable of running on hybrid vSAN.

• In this case, the web server is placed on a faster and more expensive storage tier than
needed.

With VMware HCI Mesh using remote vSAN datastores, you can place VMs on heterogeneous
hardware.

333
9-36 VMware HCI Mesh Architecture: Example
Setup
The diagram shows a typical VMware HCI Mesh architecture example setup.

VM VM VM VM....-.

--------- ---------
Ill

111
Ill

Ill
Ill

Ill
Ill

Ill

vSphere vSAN

VM VM VM VM VM VM VM

Ill Ill Ill Ill Ill Ill Ill Ill


--------- --------- --------- ---------
111 Ill Ill Ill 111 Ill Ill Ill

vSphere vSAN vSphere vSAN

334
9-37 VMware HCI Mesh: Terminology (1)
The following key terms are referenced in the VMware HCI Mesh architecture.

Local cluster:

• The vSAN cluster where storage is hosted and consumed.


Each cluster has a subcluster UUID, for example,
ssssssss-ssss-ssss-ssss-ssssssss.
• All vSAN clusters, where VMware HCI Mesh is not used, are considered local vSAN clusters.

Server cluster:

• The cluster where the storage is locally hosted. This cluster provides storage resources to
other clusters.

9-38 VMware HCI Mesh: Terminology 2


Local vSAN datastore:

• From the server cluster viewpoint, the vSAN datastore is considered a local datastore.

Remote vSAN datastore:

• From the client cluster viewpoint, the vSAN datastore is considered a remote datastore.

Cross-cluster vMotion:

• The remote vSAN datastore is mounted on both the client and the server clusters, which
means that cross-cluster vSphere vMotion migration is possible.

335
9-39 VMware HCI Mesh: Common Topologies
Different VMware HCI Mesh topologies can be used, based on the infrastruct ure requirement.

Common VMware HCI Mesh t opologies include:

• Storage-only clust er topology

• Cross-cluster topology

336
9-40 Storage-Only Cluster T apology
In this topology, one vSAN cluster is used only to provide storage to other clusters.

The storage-only cluster topology has the fal lowing use cases:

• Ideal for independent scaling of compute and storage

• Ideal for greenfield environments and diskless client clusters

• Saving license costs

As a best practice, the storage-only cluster should not run workloads.

VM VM VM VM VM VM VM VM VM VM VM VM

I1111 • 111111 •I I1111 • 111111 •I I1111 • 111111 •I


I1111 • I (1111 .I ( 1111 . J I1111 •I I1111 • I I1111 .I
I vSAN Cluster I I vSAN Cluster I I vSAN Cluster I

I I I
I I I
I I I
I I I
I I
I I
I I
I I

·------- ----------- ------------ -------·


I1111 • 111111 • 111111 ·111111 •I
I1111 • I I1111 •I I1111 • I I1111 •I
I vSAN Cluster I

337
9-41 Cross-Cluster T apology
The cross-cluster t opology has the following features and use cases:

• Provides the most flexibilit y by mounting multiple remote datastores.

• VM workloads can run on both client and server clusters.

• Can be bidirectional.

• Useful in addressing stranded capacity scenarios.

• Ideal for brownfield environment s with multiple existing clusters.

The diagram shows an example of cross-clust er VMware HCI Mesh. You can use variations of
this setup.

VM VM VM VM VM VM VM VM VM VM VM VM

I1111 . J J 1111 .I ) 1111 • J I1111 •I ) 1111 • J J 1111 .I


11111 . J J 1111 .I 11111 . J I1111 .I J 1111 • 111111 .I
I vSAN Cluster I I vSAN Cluster I I vSAN Cluster I

I I I I I I
I
I '••••••••••••••••• ~ ••••••&••••••••••••••••• 1 ••••••~
I I I
I I I I
I I I I
I -------------------------------------------------~ I
I I
I I
I I
I
I
VM VM VM VM VM VM VM VM I
I
I I
I I
I
I
I
I1111 . J ( 1111 .I ( 1111 · I (1111 .I I
I
I
I
I
I
I1111 . J ( 1111 .I ( 1111 · I I1111 .I I
I
I
I
I
I vSAN Cluster I I vSAN Cluster I I
I
I I
I I

--
I

---------
I

338
9-42 VMware HCI Mesh: Network
Requirements and Recommendations
Network connectivity requirements:

• Both layer 2 and layer 3 are supported for intercluster connectivity.

Network redundancy recommendations:

• Use multiple NI Cs on the ESXi host.

• Use dual top-of-rack (T oR) switches on the rack.

• A leaf spine topology is preferred for core redundancy and reduced latency.

• A single vSAN VMKNIC is required without air-gap support.

Network performance requirements:

• HCI Mesh has the same latency and bandwidth requirement as local vSAN.

339
9-43 Example Network Architecture
The diagram shows an example o f good network architecture to support VMware HCI Mesh.

Core/Spine
Switch/Router

-- --- -- ---------- ---- --


-- -- - - - L------r------r------" - - -
' '

;
.-~..L-..#--~
--- ' ''
''
--,
~-----~~

ToR t ToR 2 ToR 1 \TOR 2


\
\
''
' ' I

'
Uplink v~ic 1 Uplink
vmnic2 vmnic1 vm nic2
portgroup portgroup
\ I
\ I

DVS vSAN portgroup DVS vSAN portg/ oup


'
I

vSAN vmknic

vSAN vmknic

340
9-44 VMware HCI Mesh: Scalability limits
As multiple vSAN clusters participate in a mesh topology, consider the following scalability
limitations:

• A single vSAN datastore can be mounted on a maximum of 64 hosts, which include both
server and client hosts.

• A client cluster can mount a maximum of five remote vSAN datastores.

• A server cluster can export its datastore to a maximum of f ive client clusters.

341
9-45 VMware HCI Mesh: Mounting the Remote
vSAN Datastore
Consider the fol lowing while mounting the vSAN datast ore of one cluster to another vSAN
cluster:

• You can mount vSAN datastores only from the same vCenter Server instance.

• Two-node clusters and stretched clusters are not compatible with VMware HCI Mesh.

342
9-46 Mounting Remote Datastores (1)
To mount a remote datastore on the client vSAN cluster, select vSAN > Datastore Sharing on
the Configure tab and click the MOUNT REMOTE DAT ASTORE link.

![JI SA-Cluster-01 ACTIONS v

Summary Monitor Configure Permissions Hosts VMs Datastores Networks Updates

Services > Datastore Sharing


View and manage remote vSAN datastores mounted to this cluster
Configuration >
MOUNT REMOTE DATASTORE
Trust Authority

Alarm Definitions Data store Servl!r Cluster Capacity Frl!e Space VM Count 0 Clil!nt Clusters

Scheduled Tasks
0 g (local) vsanDatastore-ClusterOl !Ell SA-Cluster-01 59 97 GB 56.37 GB 1
vSAN v

Services

Disk Management

Fault Domains

Dat astore Sharing

9-47 Mounting Remote Datastores (2)


On the first page of the wizard, select the remote vSAN datastore to be mounted.

Mount Remote Datastore Select datastore x


Datastore Server Cluster T capacity Free Space
1 Select datastore
0 gJ vsanDatastore-Cluster02 [jj] SA-Cluster-02 89.98 GB 8208 GB

CAN CEL NEXT

343
9-48 Mounting Remote Datastores (3)
A compatibility check is performed in the Check Compatibility section

Mount Remote Datasto re Check compa tibility x


I 1 Select datastore Checking compat1billty 1or remote datastore 'vsanOatastore (1)'.

2 Check compatibility
O The remote datastore type 1s vSAN.
O Server and client dusters are 1rom the same datacenter.

O The selected server cluster is remote.


O Cluster con11guratlon Is supported.

O Ail hosts in seiver and client clusters have required license.


O vSAN 1ormat verS1on supports remote vSAN.
O SA·VSAN-01 is mounting no more than 5 remote vSAN datastores

O SA-vSAN-01 is mounting from no more t han 5 seiver clusters.


O No more than 5 client dusters are mounting SB-vSAN-Ol's vSAN datastore.

O The remote vSAN dat astore can provision objects wit h its defau lt policy.
O Server cluster is healthy.
O Server and client dusters have no connectivity issues.
O Latency between client and server hosts 1s below 5000 microseconds.

CANCEL B F INIS H

344
9-49 Client Datastore View
The Datastore Sharing view provides information about shared vSAN datastores and displays
client and server cluster information. Local datastores are identified w ith the (Local) prefix.

IO SA-Cluster-01 ACTIO!\I S v

summary Monitor configure Pe•m1ss1ons Hosts VMS Datastores Networks Updates

S.rvlees v Datastore Sharing


vSphere DRS
View and manage remote vSAN datastores mounted to this cluster

vSpnere Ava labtllty- MOU N I REMOTE OATA STORE

Conligurabon > O•••s•ore Serve1 Clusler C1p1c1ty Free Sp1<t VMCount 0 Client CluSltrs

Trust Autnonty

Alarm Der ru11ons


(Loca) vsanDatastor,,...Clust ...rOI l[Jl SA-Oust~r-01 59 97 GB 56 37 GB ,
scnedu e<1 TaSks (j vsanOatasto1""-C ustP102 l[Jl SA-Ctusi~r-02 8998GB 8208GB 0 (1 SA-OustP -01

vSAN

ServKes
O sk Management
Fet 11• De n no
Datasto re Sharing

345
9-50 Server Datastore View
The screenshot shows the datastore v iew of the server cluster.

lOl SA-CILISter-02 A CTI ONS v

summary Monitor configure Perm1ss1ons Hosts VMS Datastores Networks Updates

Services v Datastore Sharing


View and manage remote vSAN datastores mounted to this cluster
vSphere DRS
vSphere A va l!at>lhty M OUN T RE MOTE DA TA STORE

Configuration )
Data store Server Cluster Capacity rree Space VM Count 0 Chent Clusters

Tru!a Authority
0 G (Local) vsanDatastore-Cluster02 1rJJ SA-Cluster-02 8998 GB 82.0SGB 1 [jJ SA-Cluster-01
Alarm Definitions
Sche<1u1eo Tasks

vSAN v

Services

Disk Management
Fault Domains
Oatastore Shanng

As expected, the vSAN datastore is marked as the local datastore.

346
9-51 Hosts: Access Status
In t he Datastore view, select t he remote datastore and click the Hosts tab.

The hosts from both the client and t he server clusters display a Connected status.

~ vsanDatastore-Cluster02 ACTIONS V

Summary Monitor Configure Permissions Files Hosts V Ms

Name State Status ..., Cluster 1' ..., Consumed CPU %

CJ sa·esxi-03.vclass.local Connected ../ Normal IEJl SA·VSAN-01 1% I


!:] sa-esxi-01.vclass.local Connected ../ Normal QJl SA-VSAN-01 10·
.b I
G sa-esxi-04 vclass.local Connected ../ Normal IEJl SA·VSAN-01 3% I
tJ sa-esxi-02.vclass.local Connected ../ Normal lliJ SA-VSAN-01 4':1o I
I:] sb-esxi-01vclass.local Connected ../ Normal llJl SB-VSAN-01 4% I
G sb-esxi-02.vclass.local Connected ../ Normal lfj] SB-VSAN-01 5% I
[j sb-esxi-05.vcloss.local Connected ../ Normal [[Jl SB-VSAN-01 4% I

347
9-52 VM Creation Test
A quick way to verify t he remote vSAN mount is to perform the vSAN VM creation test.

Select Monitor > vSAN > Proactive Tests > VM Creation Test and click Run.

IOl SA-VSAN-01 A CT IONS V

Summary Monitor Configure Permissions Hosts VMs Datastores Networks Updates

Issues and Alarms v Proactive Tests

All Issues For storage performance test, use HCIBench . HCIBench is a storage performance testing automation to

Triggered Alarms customer Proof of Concept (POC) performance testing in a consistent and controlled way. V Mware v SA
for HCIBench.
Performance v

Overview Run on 0 vsa nData store-Cluster02 - SB-VSAN-01 v

Advanced
RUN ASK VMWARE @
Tasks and Events v
Name Last Run Result
Tasks

Events V M Creation Test CD Passed

Resource Allocation v 0 Network Performance Test <D


CPU

Memory
Hosts VM Creation Test Result

Storage Virtual machine creation test creates a very simple tiny VM on every host. If that creation succeeds, the
Utilization concluded that a lot of aspects of vSAN are operational. The management stack are all operational on al
Storage Overview working, creation/deletion and 10 to objects on vSAN is working, etc. Doing such an active test can find i

Security not be able to detect. By doing so systemically it is also very easy to isolate any particular faulty host an

vSAN v
Host Status Error
Skyline Health

Virtual Objects
LI sa-esxi-02.vclass.local success

Physical Disks
LI sa-es xi·Ol.vclass.local success
Resyncing Objects

Proactive Tests LI sa-es xi-03.vclass.local success


'

Capacity
LI sa-es xi-04.vclass.local success
Performance

348
9-53 Remote Accessible Objects
To see the remote accessible objects that are placed on the remote vSAN dat astore, select
Client cluster > Monitor > vSAN > Skyline Health > vSAN object health.

[jj1 SA-VSAN-01 ACTION S V

Summary Monitor Configure Permissions Hosts VMs Data stores Networks Updates

Issues and Alarms v Skyline Hea lth


vSAN obj ect healt h
Last checked : 08/26/2020. 7:46:40 AM RET EST
All Issues
Overview Info
Triggered Alarms v Online health (Disabled)
R PAR OB ECT!) MME.D ArELY PvRGE INACCF BLE VM SWAP OB EC'")
Performance v
A vSAN Support Insight
Overview Health/Objects Object count Cluster Objects UUID
+ 1 healthy checks
Advanced
9 Healthy 3 [!jJ SA-VSAN-01 a07 c2a5f-a3c7 -aea9-Sd07 -00505601d5c5.
Tasks and Events v > Network

9 Remote-accessible 9 [!jJ SB-VSAN -01 656a465f-Sf58-34ca-bcb5-00505601d5cf.6


Tasks > Physical disk
Events
v Data
Resource Allocation v

CPU [~O~v_s_A_N_o_b_i_ec_t_h_e_a_lth~~~~~_,?
Memory fi vSAN object format health
Storage

Utilization > Cluster


Storage Overview

Security
> Capacity utilization

vSAN v > Hardware compatibility


Skyline Health
> Performance service
Virtual Objects

Remote accessible objects denot e the objects which are being accessed f rom the client cluster
to the server clust er.

349
9-54 Server Cluster Partition Health Check
To see the client and server hosts listed, select Monitor > vSAN > Skyline Health > Network >
Server Cluster Partition.
Skyline Health
Server Cluster Partition
Lhl.Cne<.ked 08/27/2020. 4840AW RETEST
Parti tion list Info
v Onnne health (Disabled) •

O Advisor

'9 vSAN SUpport lnsoght I IOI SB·VSAN·OI ~·~sxi.-04 .vclass IOclll s.a-esxi-02 vclass IOcal , sa·esx1·03 vclasstocal. Sll esxi-01-vcJass local sb esx1-0l.vclass.k>cal. sb-esxt·OS vclass.locat. Sb·eUt·02 vcl.ass local

v N etwork

0 Hosls with coonecttv1ty Issues

0 vSAN cluster part.tlon

0 An hosts have a vSAN vmkn~ con ...

0 Hosts disconnected rrom vc

0 vSAN: B•S<C (un1c•st) connectMty .•

0 vSAN· MTU check (ping with large._

0 vMot<on: Basic (urucast) connect<v ..•

0 vMot<on: MTU check (p1n9 wrth lar.•.

0 Network latency check

I0 Serwr Cluster Partition >

350
9-55 Remote VM Performance
To see remote vSAN VM performance met rics, select Monitor > vSAN > Performance >
Remote VM .
Issues and Alarms v Performance

Al l Issues .___v_M_ _.ldii'·''Mil.__B_A_c_K_E_N_D__..__10_1N_s_1G_H_T_,]


Triggered Alarms
Time Range: LAST v 1 Hour(s) [ SHOW RESULTS I
Performance v

Overview
Target cluster IOI SB-VSAN-01 v
Advanced

v Met rics about clusters in the perspective of remote vSAN VM consumption.


Tasks and Events

Tasks
IOPS (D
Events
42
Resource Allocation v

CPU
21
Memory

Storage

Utilization 0
2.25 AM 2.40 AM 2.55 AM 3. 10 AM 3.25 AM
Storage Overview
- Read IOPS - Wn te IOPS
Security

vSAN v
Latency <D G ",,
Skyline Health
629.253 ms
Virtual Objects

Physical Disks

Resyncing Objects 314.627 ms

Proactive Tests

Capacity Oms
2 25 AM 2 40 AM 2 55 AM 3 IOAM 3 25 AM
Performance

351
9-56 Physical Disk Placement
To see physical disk placement, select the VM and select Monitor > vSAN > Physical disk
placement > Remote Objects .

& Ubuntu-1 ACTIONS V

Summary Monitor Configure Permissions Data stores Networks Snapshots Updates

Issues and Alarms v Physical d isk p lacement

All Issues © This V irtual Machine is placed on a remote datastore managed by llJJ SB·VSAN·Ol
Triggered A larms
Remote objects Remote o bjects details
Performance v

Overview Name T Accessibility T


Advanced
~ Hard d isk 1 0 Remote-accessible
Tasks and Events v

Tasks
g Hard disk 2 0 Remote-accessible

Events
t:J V M home 0 Remote-accessible
Utll1zatton

vSAN v
Virtual machine swap object O Remote-accessible

Physical d isk placement


Performance

Object accessibility shows a remote-accessible status.

The blue highlight ed text indicates that this VM resides on a remot e dat astore.

352
9-57 VMware HCI Mesh Interoperability: VM
Component Protection
With VMware HCI Mesh using remote vSAN datastores, you can have VM compute resources
allocated from one cluster and storage space allocated from another cluster.

You can configure VM Component Protection (VMCP) to protect VMs if a cross-cluster


communication failure occurs.

When the cross-cluster communication fails, an all paths down (APD) state is declared after 60
seconds of isolation.

The APD response is triggered to automatically restart the affected VMs after 180 seconds.

353
9-58 VMware HCI Mesh Interoperability: SPBM
Integration
A single vSAN VASA provider acts on behalf of all vSAN clusters managed by the same
vCenter Server instance:

• The vSAN VASA provider dispatches all policy requests targeting a vSAN datastore to one
of the hosts in the corresponding vSAN cluster.

• The vSAN VASA provider maintains an up-to-date list of hosts capable of satisfying VASA
provider API calls to a vSAN datastore, using the datastore property collector:

• For local vSAN, the list of hosts comprises all hosts mounting the vSAN datastore.

• For VMware HCI Mesh, the list also includes hosts from client clusters remotely
mounting the same datastore.

Datastore Management
- - - - - - - - - - - - - VPXD
I
I
I SPBM API
I
I Datastore
I vCenter Server SPB M Property
I Collector
I
I
VASAAPI
I
I vSAN VASA Host
I vSAN Health Provider 14---~ Filter
I I
I --~--~~~~~~--~~~~~~~~-
! I VASA AP I
I Datastore Management
----------y----------------,-----------.
I I I
I I I

Server Client Client


Cluster Cluster Cluster

I Remote I
I Mount I

---------------~
I
I
I
I
I
-----------

Only hosts in the server cluster can respond to SPBM commands.

vSAN VASA provider ignores the client cluster hosts.

354
9-59 VMware HCI Mesh Interoperability:
vSphere vMotion and vSphere Storage
vMotion
VMware HCI Mesh is compatible with both vSphere vMotion and vSphere Storage vMotion.

vSphere vMotion has the fal lowing conditions:

• VMs can be migrated using vSphere vMotion wit hin t he client cluster, regardless of whether
they reside on a local or remote vSAN datastore.

• VMs are allowed to migrate w ith vSphere vMotion across client or server clusters, as long as
all VM objects reside on a mutually shared remote vSAN dat astore.

vSphere Storage vMotion has the following conditions:

• Migration between a local vSAN datastore and a remotely mounted vSAN datast ore

• Migration between a remotely mounted vSAN dat astore and a local vSAN datastore

• Migration between two remotely mounted vSAN datastores

355
9-60 VMware HCI Mesh Interoperability:
vSphere DRS
VMware HCI Mesh is compatible w ith vSphere DRS.

vSphere DRS rules are applicable on the client cluster in the fo llowing cases:

• VMs are stored on t he local vSAN datast ore.

• VMs are stored on a remotely mount ed vSAN datast ore.

In t his cont ext, vSphere DRS ru les include standard rules on t he client cluster, such as affinity and
anti-affinity rules.

356
9-61 Lab 10: Managing Remote vSAN
Datastore Operations
Migrate and run VMs from remote vSAN datastores:

1. Mount a Remote vSAN Datastore to a vSAN Cluster

2. Verify Connectivity to a Remote vSAN Datastore

3. Migrate VM Data to the Remote vSAN Datastore

4. View VM Remote Objects

5. Prepare for the Next Lab

6. Unmount a Remote vSAN Datastore

357
9-62 Review of Learner Objectives
• Describe VMware HCI Mesh technology

• Discuss VMware HCI Mesh architecture and use cases

• Describe VMware HCI Mesh common topologies

• Describe the VMware HCI Mesh client and server views

• Explain how to mount a remote vSAN datastore

• Explain VMware HCI Mesh license requirements and scalability limitations

• Discuss VMware HCI Mesh interoperability with vSphere features

358
9-63 Lesson 3: vSAN Direct

9-64 Learner Objectives


• Describe vSAN Direct as a new type of vSAN datastore

• Explain the use cases of vSAN Direct

• Describe how vSAN Direct datastores work with vSphere native Kubernetes

• Identify the operations involved in managing vSAN Direct datastores

359
9-65 About the vSAN Direct Datastore
vSAN Direct enables users to create a vSAN Direct datastore on a single blank hard drive on the
ESXi host. With vSAN Direct, users can manage local disks by using vSAN management
capabilit ies.

vSAN Direct features:

• vSAN Direct extends the hyperconverged infrastructure (HCI) management features of


vSAN to the local disks formatted with VM FS.

• vSAN Direct manages and monitors disks formatted with VMFS and provides insights into
the health, performance, and capacity of these disks.

• vSAN Direct enables users to host persistent services on VMFS storage.

• vSAN Direct enables users to define placement policies and quotas for the local disks.

360
9-66 vSAN Direct Use Cases
vSAN Direct is an excel lent fit for cloud-native applications using Kubernetes. Various tiers o f
cloud-native applications benefit from vSAN Direct.

Example use cases:

• Applications with high-end storage requirement s t hat are required to support up t o a million
IOPS, such as high-end Cassandra and MongoD B

• Applications with midrange storage requirements that require greater IO PS density, such as
Kafka (as a st orage buff er)

• Applications with low-end st orage requirements that must have a minimum t otal cost of
ownership

361
9-67 vSAN Direct Architecture
vSAN Direct support s cloud-native applications running in a vSphere 7.x supervisor Kubernetes
cluster.

Other Kubernetes clusters are not currently support ed.

Each local disk is mapped to a single vSAN Direct datast ore.

These dat astores form t he vSA N Direct storage pool. This pool can be claimed by Kubernetes in
the form of persistent volumes (PVs).

-------------
1 Tensor Flow
I

lo.•.. Ir---- -------------


Cassandra

EJEJEJ
I .;.. I

'0
----1 -------------
-------------
I
••
·::
I
I
- - - 1
I 0
•• I
I
1
I
Microsoft SOL Server :
'---- 1 '----1
Virtual Machine Application
,- --- --- - - - - - Kubernetes
Service Service Services

________________ ._.., --------------


vSAN Direct Storage
---------------
DevOps Supervisor Kubernetes Cluster VI A dmin

Supports VM FS-L ES Xi vSAN


File· Systen1
Workload Platform

On-premises I Hybrid cloud I Public c loud

As a vSphere administrator, you enable vSphere clusters for Kubernetes workloads by


configuring them as supervisor clusters.

362
9-68 vSAN Direct with Kubernetes
Key concepts for using vSAN Direct wit h Kubernetes:

• Namespaces are logical entities in Kubernetes.

• Kubernet es persistent storage is in t he form of PVs.

• PVs communicate w it h vSphere storage t hrough the intermediary cloud-native storage


control plane.

• PVs can use vSAN Direct storage by employing tag-based policies.

---------------------------------------

Storage Policy- vSAN


First-Class Disks
Based Management Management Services

---------------------------------------
Virtual
Volumes

363
9-69 Cloud-Native Operations Workflow
The diagram shows the steps taken by the vSphere administrator and the DevOps administrator
to use vSAN Direct:

1. The vSphere administrator provisions vSAN Direct datastores, using unformatted drives on
the ESXi hosts.

2. The vSphere administrator creates policies using tags to map the vSAN Direct datastores.

3. The DevOps administrator claims the PVs from the vSAN Direct storage pools identified by
the tag-based storage policies.

4. The DevOps administrator creates Kubernetes applications that consume the PVs residing
on vSAN Direct.

"----

vSphere
''' - - - -
Create namespace and assign
vSAN Direct storage policy.
Administrator
vCenter
Server

Claim PV with vSAN Direct


storage pool.
''

, ,
Consuming PV in
persistent service + -----' DevOps
Administrator
K8s

364
9-70 Claiming Disks for vSAN Direct
Key points about claiming vSAN Direct disks:

• Each disk claimed creates a unique vSAN Direct datastore.

• An unformatted local disk drive is eligible for vSAN Direct.

• You can use the vSAN claim disk w izard to claim disks for vSAN Direct.

Claim Unused Disks x


Select disks to contribute to datastores:
• Claim disks as cache or capacity to add them to the vSAN datastore
• Cla1m1ng disks for vSAN Direct will create a new datastore for each selected disk

Total Claimed 152.00 GB (100%) Unclaimed storage 0.00 B (0%)

• vSAN Capacity 60.00 GB (39.47%) • vSAN Cache 32.00 GB (21.05%) vSAN Direct 60.00 GB (39.47%)

~ Recommend conf1gurat1on © Group by: Host

Name T Claim For Drive Type T Total Capacity T Transport Type T Adapter T
Custom
v D sa-esx1-01 vclass local (C "
D Local VMware Disk (mp ~ vSAN Direct 15 OOGB Parallel SCSI vmhbaO

custom
v LI sa-esx1-0<: vclass local CC "
D '"-ocal VMware Disk (mp ~ vSA N Direct v Flas 1500 GB Parallel SCSI vmhbaO
- -
Custom
v D sa-esx1·02 vclass local (C "
D Local VMware Disk (mp ~ vSAN Direct v F s 1500 GB Parallel SCSI vmhbaO

Custom
v D sa-esxi-03 vclass local (C "
D Local VMw are Disk (mp & vSAN Direct F 1500 GB Parallel SCSI vrnhbaO

8 items

[ CANCEL l '
CREATE

365
9-71 After Claiming Disks for vSAN Direct
After claiming disks for vSAN Direct:

• vSAN Direct uses one disk per datastore.

• vSAN Direct can coexist with regular vSAN disk groups.

CL A IM UNUSED DISi<S Show : By Disk Groups

Disk Group T Disks in Use T State T Health T Type T Fault Domain T Network Partition Group T

v LJ sa-esx1-0l vclass local 3 of 3 Connected Group 1

~ Disk group (52d41e76-2adb-a08a-f289-e0b22 2 Mounted Healthy All flash

vSAN Direct disks 1

v LJ sa-esx1-02.vclass.local 3 of 3 Connected Group 1

~ Disk group (52c543f7-2e0b-e41d-eb60-85afb . 2 Mounted Healthy All flash

vSAN Direct disks 1

v LJ sa-esxi-03 vclass local 3 of 3 Connected Group 1

fa Disk group (52b631ef-d317-88a3-d380-36963. 2 Mounted Healthy All flash

vSAN Direct disks 1

366
9-72 Default Tags
A set of default vSAN Direct tags and categories is available. You can create your own tags to
use with your storage policies.

8 vSANDirect_sa-esxi-01.vclass.local_mpx.vmhbaO:CO:T2:LO ACTION S V

Summary Monitor Configure Permissions Files Hosts VMs

Type: vSAN Direct Storage

URL: ds:///vmfs/volumes/5f577be1-52b4c492-3e85-00505601d5bd/ I
Used; 1.41 GB

Details v Related Objects

Tags Custom Attributes

Assigned Tag Category Description Attribute Value

vSANDirect vSANDirectStorage

367
9-73 Tag-Based Policies
vSAN Direct support s tag-based storage policies.

To create a tag-based vSAN Direct storage policy:

1. Tag the vSAN Direct datast ores with the appropriate t ags. A default vSAN Direct tag is
already assigned.

2. Create the VM storage policy.

3. Select Enable tag based placement rules.

Create VM Storage Policy Pol icy structure x

1 Name and description


Host based services
2 Policy structure Create rules for data services provided by hosts Available data services could include encryption. 1/0 control, caching, etc .
Host based services will be applied 1n add1t1on to any datastore spec1f1c rules.
0 Enable host based ru les

Datastore specific rules


Create rules for a spec1f1c storage type to configure data services provided by the datastores. The rules will be applied when
VMs are placed on the spec1f1c storage type
0 Enable rules for "vSAN" storage

a Enable tag based placement rules

368
9-74 Storage Compatibility
The supervisor Kubernetes cluster requires a storage policy to identify datastores to store PVs:

• After you create the vSAN Direct storage policy, matching datastores w ith the vSAN Direct
tag are available as compatible storage.

• These vSAN Direct datastores can be used by cloud-native applications as a common pool
for persistent data storage .

., a vSANDirectPo~cyTest r~ sa- vcsa-01 vclass local

0 VVol No Requ rements Polley 0 sa-vcsa-01 vclass local

12 :em

Rules VM COmphance VM Template Storage Compatibility

'k•lM#ii•i!I INCOMPATIBLE

0 Expand datastore clusters Compatible storage 59 GB (53 38 GB free

T F t.:-l'

Name Datacent~r Type Free Space Capac11y Warrwngs

Q vSANDirect_sa-esxo-03 vclass local_mpx vmhbaO CO Tl LO SA·Datacenter vSAN Direct 1334 GB 14 75 GB

0 vSANDirect_sa-esx1-0l vclass loca l_mpx vmhbaO CO Tl LO SA-Datacenter vSAN Direct 13 34 GB 14 75 GB

() vSAND1rect_sa-esx1-04 vciass local_mpx vmhbaO CO Tl LO SA-Datacenter vSAN Direct 13 34 GB 14 75 GB

0 vSANDorect_sa-esxo-02 vclass local_mpx vmhbaO CO Tl LO SA-Datacenter vSAN Drect 13 34 GB 14 75 GB

369
9-75 Capacity Reporting
You can monitor vSAN Direct capacity independently of vSAN capacity.

Select Monitor > vSAN > Capacity > CAPACITY USAGE and click vSAN Direct.

CAP ACITY USAGE CAPACITY HISTOllV

$ vSAN usage 3.58 GB/59.97 GB (5.97%) $ vSAN 011e-ct usage 5.62 GB/59.00 GB (9.53%)

vSAN lvsAN Otrect l

Capacity Overview

$ Used 5 62 GB/59.00 GB (9.53%) Free sp.Jce on disks 53 38 GB ©

• Actually written 5.62 GB (9 53%)

Usage breakdown

Usage by cat~or es EX PANO All

>• System usage 5 62 GB (l00%J

Tota usage
5.62GB

System usage

370
9-76 Review of Learner Objectives
• Describe vSAN Direct as a new type of vSAN datastore

• Explain the use cases of vSAN Direct

• Describe how vSAN Direct datastores work with vSphere native Kubernetes

• Identify the operations involved in managing vSAN Direct datastores

371
9-77 Lesson 4: vSAN iSCSI Target Service

9-78 Learner Objectives


• Describe the vSAN iSCS I target service

• Discuss vSAN iSCSI target service networking

• Detail steps to configure the vSAN iSCSI target service

372
9-79 About the vSAN iSCSI Target Service
With the vSAN iSCSI target service, a remote host w ith an iSCSI initiator can transport block-
level data to an iSCSI target in the vSAN cluster. You can configure one or more iSCSI targets in
your vSAN cluster to provide block storage to legacy servers.

You can add one or more iSCSI targets that provide storage objects as logical unit numbers
(LUNs). Each iSCSI target is identified by its own unique iSCSI qualified name.

Legacy Server
1111 •
- -
D D
.J I I .J I I

iSCSI Network

I • • • • • •

iSCSI Target •• •• • •• •• •• • •• •• •• • •• •• •• • •• •• •• • •• •• •• • •••• •• • •• •• •• • iSCSI Target • • •••• •


• I I
c I I


• •
• •
• •


iSCSI iSCSI iSCSI iSCSI •




Object Object I ~
Object Object •


• •
• •


... ~


• •
• •



4 • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
vSAN Datastore
• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •



vSphere .... vSAN


I NVMe I I I [ NVMe I I NV Me I I NVMe I I
NV Me
I I NVMe I [ NVMe I I NV Me I NV Me

EJEJ I: sso ]EJ EJEJ EJEJ EJEJ EJEJ EJI: :I EJEJ EJEJ SSD

EJE] I: :I E1 EJEJ EJEJ EJG EJEJ EJI: sso ] EJB r sso :ia
SS D

..._ ' D~lfG


'oup - - OskGrov
• - - O•SkGt ou
• - - DskGrou - - OISkGrou - - OiSkGfOU - - 0.Sk GfO...
• - - Dlsk Grou
• - - D1SkGrou • -

Use the vSAN iSCSI target service to enable hosts and physical workloads that reside outside
the vSAN cluster to access the vSAN datastore.

After configuring the vSAN iSCSI target service, you can discover the vSAN iSCSI targets from
a remote host. To discover vSAN iSCSI targets, use the IP address of any host in the vSAN
cluster and t he TCP port of the iSCSI target.

373
9-80 vSAN iSCSI Target Service Networking
Before enabling t he vSAN iSCSI target service, you must configure your ESXi hosts with
VM kernel ports and NI Cs that are connected t o the iSCSI network.
Storage N etwork Infrastructure

-- - --

iSCSI
Network

vmk2
vSAN iSCSI

iSCSI iSCSI
Target Target

ESXi Host

I I 1111
iSCSI storage traffic is transmitted in an unencrypted format across the LAN. Therefore, a best
practice is to use iSCSI on trusted networks only and to isolate t he traffic on separate physical
switches or to manage a dedicated LAN.

To ensure high availability of the vSA N iSCSI target, configure multipath support for your iSCSI
applicat ion. You can use the IP addresses of two or more hosts to configure t he multipath.

374
9-81 Enabling and Using the vSAN iSCSI
Target Service
To enable and use the vSAN iSCSI target service:

1 Enable the iSCSI target service.

2 Create an iSCSI target.

3 Add a LU N to an iSCS I target.

4 Create an iSCSI initiator group.

5 Assign a target to an iSCSI initiator group.

375
9-82 vSAN iSCSI LUN Objects
vSAN iSCSI objects can appear as unassociated in vSAN storage reports because they are not
mounted directly into a VM as a VMDK. vSAN iSCSI objects are mounted through a VM 's guest
OS iSCSI initiator.

Be aware that the vSAN iSCSI LUN object is not directly mounted into VMs and will be listed as
unassociated. This designation does not mean that the vSAN iSCSI LUN object is unused or safe
to be deleted.

376
9-83 Lab 11: Configuring a vSAN iSCSI Target
Configure an iSCSI LUN and connect to it from the student desktop:

1. Enable the vSAN iSCSI Target Service

2. Create an vSAN iSCSI Target and LUN

3. Connect to the vSAN iSCSI LUN

377
9-84 Review of Learner Objectives
• Describe the vSAN iSCS I target service

• Detail vSAN iSCSI target service networking

• Detail st eps to configure the vSAN iSCSI t arget service

9-85 Key Points


• vSAN file shares can be accessed from traditional workload and cloud-native applications.

• With vSAN File Service, you can provision NFS and SM B file shares on your existing vSAN
datastore.

• Using VMware HCI Mesh, you can remotely mount datastores from other vSA N clust ers.

• A vSAN cluster can act as a storage-only cluster using VMware HCI Mesh.

• vSAN Direct creates standalone datast ores on local disk drives.

• vSAN Direct is most suitable for cloud-native applicat ion persistent st orage.

• You can configure vSAN iSCSI target s to provide block storage to legacy servers.

Questions?

378
Module 10
vSAN Cluster Maintenance

10-2 Importance
To maximize the availability of services in your environment, the maintenance of compute,
network, and storage resources on production systems must be achieved without causing
downtime.

10-3 Module Lessons


1. vSAN Cluster Maintenance Operations

2. vSAN Cluster Scaling and Hardware Replacement

3. Upgrading and Updating vSAN

379
10-4 Lesson 1: vSAN Cluster Maintenance
Operations

10-5 Learner Objectives


• Describe vSAN maintenance mode and data evacuation options

• Detail how to assess the vSAN maintenance mode precheck reports

• Define the steps to shut down a vSAN cluster for maintenance

• Explain how to migrate a vSAN cluster to a new vCenter Server instance

• List best practices for vSAN logs and traces

380
10-6 Maintenance Mode Options
ESXi hosts in vSAN clusters provide storage resources in addition to compute resources.

You must use appropriate maintenance mode options to maintain data accessibility.

When placing the host into maintenance mode, you can select one of the following vSAN data
migration options:

• Ensure accessibility

• Full data migration

• No data migration

381
10-7 About the Data Migration Precheck
You run the data migration precheck before placing a host into maintenance mode. This
precheck determines whether the operation can succeed and reports the state of the cluster
after the host enters maintenance mode.

You can select a host and the type of vSAN data migration to test.

To start the data migration precheck, select the vSAN cluster and select Monitor > vSAN >
Data Migration Pre-Check.

Data Migration Pre-check

Select a host, disk g roup, o r disk, and check the impact on the cluster if the object is removed or placed into maintenance mode.

Pre-check data migration for 0 sa - esxi · 01.vc lass . loca l "

vSAN data migration Ensure accessibility v (!) PR E- CH ECK

No va lid test is available for the selected entity and vSAN data migratio n option.

382
10-8 About the Ensure Accessibility Option
The Ensure accessibility option ensures that VMs with FTT = 0 remain accessible.

Unprotected components on the host, such as objects with FTT = 0, are migrated to other
hosts. Components of objects w ith FTT > 0 are not migrated. If sufficient components to
maintain quorum are active on other hosts in the cluster, the objects remain available. However,
the objects are noncompliant while the host is in maintenance mode.

V Ml V M2 VM1 V M2

Component is
marked as absent.
I
I
I
I
. ·--------------·
I I
I
I
I
- ---------- . '·--------------
))~)))
I I I I
Cl I I
I
I
Cl I
I

~
Cl C2
vSAN
w;
C2 ....
vSAN

I111 0 111 J I111 0 111 I I111 0 111 I J111 0 111 I I111 0 Ill I Irr 1 0 II I I I111 0 111 I
ESXil ESXi2 ESXi3 ESXi4 ESXil ESXi2 ESXi3 ESXi4
............................ ............................ -.. ................................ ... ............................... ................................ .. ................................ .. ................................ .. ...............................

Enter Maintenance Mode


W ith Ensure Accessibility

383
10-9 Ensure Accessibility: Assessing Impact
T he dat a migration precheck provides a list of VMs and objects that might become
noncom pliant.

Data Migration Pre-check

Select a host, diSk group, or dis\C, and check 1he impact on 1he duster II 1he object Is removed or p laced 1n10 ma1n1enance mode.

Pre-<:heck data migration for 0 sa·esxi · 04 vc l ass loc al v

vSAN data migration Ensure accessibility v © PRE-CME CK

La test test result I ENTER MAINTENANCE MOOE I


OS11VZOZO, 8:20:14 AM O The host can enter mam1enance mode.

Object Compliance and Accessiblll1y A Cluster Capacity Predicted Health 0

The fellowing objects will be directly attected by 1he opera11on.

~ 4 objects will become non-compt1an1. I


A rebuild operat ion tor any non<omplent ob]e<l5 will be triQgered In 60 m111utes. unless the host ~ ta ten out ol maintenance mode. You can change this timer tor the cluter In the vSAN a~anced sethn !JI.

Name T Type T Result T storage Polw:y T VUIO T

v ~ sa-vm-01 VM
a Hard diskl Disk A Non-compliant (ii vSAN Default storage Polley 27582b5t-3abe·a273·3act-00505601d5c5

v ~ sa-vm-02 VM
a Hard d1skl Disk A Non-compliant (!' vSAN Default storage Policy ab022c5t-b058<l848-621b-00505601d5cf
O VMhome Folder A Non-compliant (!) vSAN Default storage Policy a9022c5f-7ect-05c7-5159-00505001d5ct
Virtual machine swap object V Mswap A Non-compliant (i vSAN Default storage Policy b0022c5t-3204-3e3d.fb85-00505601d5ct

384
10-10 Ensure Accessibility: Delta Component
If an additional host with sufficient capacity is available in the vSAN cluster, a temporary delta
component is created to capture the new I/ 0.

The delta component contains only the new data generated after the original component is
marked absent.

After the absent component is back online, it syncs w ith the delta component. The delta
component is then discarded after the sync operation is complete.

The delta component signif icantly reduces the overall t ime required to take a component from
Active-Stale to Active state.

VM2

...... ---------., ------., .... -------.,


I I I I I I
I I I I I I
I I
C2 I I I I
vSAN
Witness
I I I I
I I I I I I I
• ............__ _ _ ____..,_ _ Capt ure new
Component is ' . C1 I I I I Delta
I I I I I I IOs
marked as absent.
I I I I I I
I I I I I I I I

:~:
I I
I
I
I
I
111 0 111 I I
I
I
I
I
I
I
111 0 111 I I
I
I
I
I
I
I
111 0 111 I I
I

ESXi1
._ __________
ESXi2 ESXi3 ESXi4
I I I
.... ------- -- -
I I
... --------- -I I
.... ----------
I

385
10-11 Delta Components in the vSphere Client
The delta component and the original absent component are linked in a special RAID-type
structure called RAID_D. You can view the component layout in the vSphere Client.

Type Comporwnt St.lie Host

v & sa·vm l >o Hard disk I (RAID I)

Witness O Active ti sa·esxi-01 vclass local


VMDK
v RAD_O

compor:ent 0 ACtl\le B sa-esxt-04 vdass local

Component A Absent sa·esx1·02 vclass local

Component O Active O sa-eslo·03 vclass local

Absent
,• ................................................... . RAID D t--~--A~c~
tiv~e..._~___,
.•
..•
••

1111 • I Ill • 1111 •


sa-esxi ·01 .vclass. local sa-esxi-02.vclass.local sa-esxi-03.vclass.local sa-esxi-04.vclass.local

386
10-12 Ensure Accessibility: Time Considerations

If the host does not ret urn, absent components are rebuilt aft er 60 minut es.

VM2

~ 60 minutes
r- - - - - - - - - - , ------, r- -------,

I I
I I
C2 I I
vSAN
Witness

_........__ _..i..,I C1 Component is


Component is C1 • ,----
, - - - - - - + - - - - rebuilt after 60 m inutes.
marked as absent. ~~ I I

-----
1111 o 111
'
1: I111
-----
o 111 I : : I111
I I ----- I
o 111 I:
I I
ESXi1 ESXi2 ESXi3 ESXi4
.... - - - - - - - - - - .... - - - - - - - - -
I
- ._ __________ I I
'-----------
I

387
10-13 Object Repair Timer Considerations
You might need to increase the Object Repair Timer value w hen planned maintenance is likely to
take more than 60 minutes but you want to avoid rebuild operations.

Consider the fallowing points:

• Rebuild operations are designed to restore redundancy. The higher the Object Repair Timer
value, t he longer your data is vulnerable to additional failures.

• You should reset the Object Repair Timer value to the default value when maintenance is
complete.

388
10-14 Object Inaccessibility: Example (1)

All VM components are healthy.

v {.;,) Hard disk 1 (RAID 1)

Witness G Active [3 sa·esxi·02.vclass local

Component G Active 13 sa-esx1·01.vclass local

Component G Active El sa·esxi·03.vclass.local

vSAN host sa-esxi-01 is put int o maintenance mode with Ensure data accessibility from other
hosts selected.

v Q Hard disk 1 (RAIO 1)

Witness G Active 13 sa·esx1·02.vclass local

Component A Absent sa·esxi·Ol.vclass.local

Component 13 sa·esx1·03 vclass local

In a healthy vSAN environment, hosts are placed into maintenance mode for software updat es
or patching. In this scenario, host sa-esxi-01 is placed into maintenance mode.

389
10-15 Object Inaccessibility: Example (2)
While vSAN host sa-esxi-01 is in maintenance mode, host sa-esxi-03 unexpectedly becomes
unavailable.

v C Hard disk 1 (RAID 1)


---------------------------------------------
Component A Absent sa-esx1-01.vclass local

Component Absent

Witness ~ Active ra sa-esxi-02.vdass local

Because both submirrors are absent, vm01 becomes disconnected.

" sa-esx1·0l.vclass.loca1 (Maintenance Mode)


[A sa-esx1·02.vclass local
o sa·esx1·03. vclass.local (Not responding)
BJ vmOI (disconnected)

While host sa-esxi-01 was in maintenance mode, a failure occurred making host sa-esxi-03
unavailable. Because two of the three components are offline, the VM becomes disconnected.

After host sa-esxi-01 is taken out of maintenance mode and comes back online, the component
becomes Active. However, the sequence number of the component on host sa-esxi-03 is
outdated and the VM remains disconnected. In other words, the component on host sa-esxi-03
is missing the most recent changes, which vSAN is aware of because of the difference in
sequence numbers.

A component is Absent when vSAN detects a temporary component failure where


components, including all its data, might recover and return vSAN to its original state. This state
might occur when you are restarting hosts or if you unplug a device from a vSAN host. vSAN
starts to rebuild the components in absent status after waiting for 60 minutes.

390
10-16 Object Inaccessibility: Example (3)

Host sa-esxi-01 is taken out of maintenance mode to resume the operations of the VM.

The component remains absent.

v ~ Hard disk 1 (RAID 1)

______________________________________________________
Component A Ii Absent _,,. sa·esx1·0l.vclass local

Component A Absent

Witness 0 Active (;! sa·eSXl·02.vclass.local

The VM remains disconnected because of stale components.


[root@sa-esxi-02 : /var/log] grep stale vmkernel . log
2020-05-28T17 : 01 : 59 . 670Z cpu3 : 263136)HBX : 4720 : 1 stale HB slot(s) owned by me have been
garbage collected on vol ' OSDATA-5e23537f-a781ac54-a3be-005056029973 '
2020-05-28Tl7 : 12 : 22 . 073Z cpu3 : 262496)PLOG : PLOGMapDataPartition : 2592 : VSAN device mpx . vmh
ba0 : CO : T3 : L0 : 2 not ready to map . Info : is stale 1 , ann . 1 grpHandle Ox431315e10900
2020-05-28T17 : 12 : 22 . 073Z cpu3 : 262496)PLOG : PLOGMapDataPartition : 2592 : VSAN device mpx . vmh
ba0 : CO : T2 : L0 : 2 not ready to map . Info : is stale 1 , ann . 1 grpHandle Ox431315e10900
2020-05-28T17 : 12 : 27 . 118Z cpu2 : 264925)PLOG : PLOGMapDataPartition : 2592 : VSAN device mpx . vmh
ba0 : CO : T3 : L0 : 2 not ready to map . Info : is stale 1 , ann . 1 grpHandle Ox431315e10900
2020-05-28T17 : 12 : 27 . 118Z cpu2 : 264925)PLOG : PLOGMapDataPartition : 2592 : VSAN device mpx . vmh
ba0 : CO : T2 : L0 : 2 not ready to map . Info : is stale 1 , ann . 1 grpHandle Ox431315e10900

Host sa-esxi-01 is taken out of maintenance mode and comes back online, and the component
becomes Active. However, the sequence number of the component on host sa-esxi-01 is
outdated and the witness is still offline. In other words, the component on host sa-esxi-01 is
missing the most recent changes, which vSAN is aware of because of the difference in sequence
numbers.

Even though two of the three components that make up the object are active, vSAN keeps the
object inaccessible to avoid data loss or corruption. The VM remains disconnected until the
object on host sa-esxi-03 is online with the most recent data. vSAN then synchronizes the stale
component with the component that contains the latest data and enables access to the object.

If a component is Active but its sequence number is different or older than the current sequence
number for the object, the component is marked as Stale. This behavior usually occurs when the
components of an object go offline and come back online concurrently at different times.

391
10-17 About the Full Data Migration Option
The Full data migration option evacuates all components from the disk groups of the host
entering maintenance mode onto other available ESXi hosts.

You use this option only when the host is being decommissioned, permanently removed, or put
into maintenance mode for a considerably long time.

The remaining hosts in the vSAN cluster must be able to satisfy the policy requirements of the
objects being evacuated.

392
10-18 Full Data Migration: Component
Placement
If you select the Full data migration option, vSAN determines the placement of each component.

VM1 VM2 VM2 VM1

----------, r--------------,
••
I I I

..---.. ••
)))).))
I
I

C1 : C1 :
. ---
I
~
I

•'
I
I I I
I • I I I

@] >j>~>>
vSAN I I I vSAN
I I I
C2 C2 w "• C1 C2 I

•••
I
I I
I I
I
• I

: 1111 0 1111 : 1111 0 1111 1111 0 1111 1111 0 111 I !111 0 1111 !111 0 111 I : 1111 0 111 I
I
•• I
I
I

: ESXi1 ESXi2
.. _____________ _ ••.. _____________ _
I
ESXi3 ESXi4
-------------- ~
I ESXi1
.. ________ ______
I . ESXi2
· -- - -----------
I
ESXi3 ESXi4
· -- - ----------- · --------------

Enter Maintenance Mode


W ith Full Data Migration

393
10-19 Full Data Migration: Cluster Size
Considerations
To use the Full data migration option, you must have additional ESXi hosts available in t he vSAN
cluster.

_______ , _______ ,
,. -- - ---- r------- -------., .. ------- - - - .. --------------, .. ---- - - - ------- .. .. -------
I
.. ------- - - - - - - - .,
I

'r • • • I
I '
I
I
vSAN vSAN vSAN vSAN I vSAN vSAN
Replica 1 Replica 1 W1tne:: Replica 2 I W1tne:: Replica 1
I
I

1111 0 1111 1111 0 1111 1111 0 1111 1111 0 1111 1111 0 1111
ESXi1 ESXi2 ESXi3 ESXi1 ESXi2 ESXi3 ESXi4

394
10-20 Full Data Migration: Assessing Impact
Before you begin a full data migration, run the data migration precheck to understand the
potent ial impact on the cluster.

Data Migration Pre-check

Select a host, disk group, or d isk, and check the impact on the clust er if the object is r emoved or p laced into maintenance mode.

Pre-check data migration for 0 sa-esxi-0 1.vclass.local v

vSAN data migration Full data migration "' © PRE - CHECK

Latest test r esult

08/11/2020, 8:35:01 AM O The host can enter maintenance mode. 1.02 GB of data will be moved .

Object Complia nee and Accessibility Cluster ca p acity Predicted Health

Before • Used 6.11 GB. Total 199. 97 GB (3%)

Arter • Used 5 26 GB. Total 149. 98 GB (4%)

Obj ect Predict ed ca pa cit y and re(;!u iremen t s

I 1.19 GB/ 49 .99 GB (2%)


ti sa-esxi-03.vclass.local I 1 20 GB/ 49 99 GB (2%)

I 1.19 GB/ 49 .99 GB (2%)


lJ sa-esxi-04.vclass.local
• 1.8 2 GB I 49 .99 GB (4 %)

G sa-esxi-02.vclass.local •• 1.86 GB/ 49 .99 GB (4%)


2.24 GB/ 49 .99 GB (4%)

0 sa-esxi-01. vclass.loca/ •
~~.......~~,,~~~~''''~''~~~~
1 86 GB/ 49 .99 GB (4 %)
Maintenance mode - no capacity

395
10-21 About the No Data Migration Option
When you select t he No data migration option, vSAN does not evacuate any data from t he
host. However, some VM objects might become inaccessible.

Selecting t his option can leave objects in a noncompliant or inaccessible state.

The No data migration option is useful when you want to shut down all hosts in a vSAN cluster
for maintenance or when data on the hosts is not required.

Selecting t he No data migration option does not create delta components.

396
10-22 No Data Migration: Assessing Impact
Before selecting No data migration, you run the data migration precheck to understand the
potent ial impact on the vSAN objects in the cluster.

Data Migratio n Pre-check

Select a host, disk group, or disk, and check the impact on the cluster if the object is removed or placed into maintenance mode.

Pre-check data migration for D sa - esxi - 0 1.vclass.local "

vSAN data migration No data migration v (!) PRE· CHECK

Latest test result

08/11/2020, 8:44:46 AM e The host can enter maintenance mode.


Object Compliance and Accessibility A duster Ca p actty Predicted Health 0

The following health checks could be d irectly affected by the operation.

[0 vSAN object health


I Overview Info

e Disk space Healt h/Obje<:t s Object count

e Component O Reduced avallabihty with no rebuild 7

397
10-23 Changing the Default Maintenance Mode
The default vSA N maintenance mode is Ensure accessibility, which can be changed t hrough an
advanced host-level setting. This setting must be identical on all hosts in t he cluster.

In t he vSphere Client, select the ESXi host and select Configure > Advanced System Settings.
The available options are ensureAccessilibity, evacuateAllData, and noAction.

Edit Advanced System Settings sa-esx i-0 1. vclass. local x


Modifying configuration parameters 1s unsupported and can cause instability. Con!Jnue only 1f you know what you are doing.
T V SAN
Name Value

VSAN.ClomRebelenceThreshold 80
VSAN.DedupScope 0
VSAN DefeultHostDecomm1ss1onMode ensureAccess1b1hty
VSAN.DomBriefloTroces

VSAN.DomFullloTreces
0
0
I
VSAN.DomlongOpTraceMS 1000
VSAN DomlongOpUrgentTreceMS 10000
VSAN.MexComponentsPerWitness 0
VSAN.MaxWltnessClusters 0
661tem$
Default host decommission mode for a gtVen node

I CANCEL I

398
10-24 Planned Maintenance
When perf orming maintenance, you must plan your tasks t o avoid failures and consider the
following recommendations:

• Unless Full Evacuation is select ed, components on a host become absent when the host
enters maint enance mode, which counts as a failure.

• Data loss can occur if too many unrecoverable failures occur and no backups exist.

• Never reboot, disconnect, or disable more hosts than the FTT values allow.

• Never start another maintenance activity before all resyncs are complet ed.

• Never put a host into maintenance mode if another failure exists in t he cluster.

• Never use FTT = 0 without application-level prot ection.

399
10-25 About vSAN Disk Balance
The vSAN Disk Balance health check helps to monitor the balance state among disks.

By default, automatic rebalance is disabled. The status of this check turns yellow if the imbalance
exceeds a system-determined threshold.

Skylir1e Health
vSAN DisJ< Balance
Last check~d : OB/11/2020, 7.07:12 AM RE TES T
O'terview Disk Balance Info
v Cluster
CO NFIGU RE AUTO MAT IC REBA LANCE
9 Ad\ia need vSAN con fig ura t1on in s...
Metric Value
9 VSAN daemon liveness
Average Disk Usage 3%
[~•~v_s_A_N_o_i_s_k_B_al_a_nc_e_~~~~~_,>
Maxim um Disk Usage 3%
9 Resync operations throt tling
Maximum Load Variance 1%
9 vcenter state is authorit ative
Average Load Variance 0%
9 VSAN cluster configur ation consist ...

400
10-26 About Automatic Rebalance
When automatic rebalance is enabled, vSAN automatically rebalances the cluster to keep the
disk balance status green.

Rebalancing can wait up to 30 minutes to start, giving time for high-priority tasks, such as
entering maintenance mode or object repair, to use resources before rebalancing.

The rebalancing threshold determines when the background rebalancing starts in the system.

For example, rebalancing begins if any two disks in the cluster have defined variance.
Rebalancing continues until it is turned off o r the variance between disks is less than half of the
rebalancing threshold.

401
10-27 Enabling Automatic Rebalance
To enable automatic reba lance and set a rebalancing threshold, you select the vSAN cluster and
select Configure > vSAN > Services > Advanced Options > Edit.

Advanced Options SA -VSAN -01 x


Object repair timer 60
The amount of minutes vSAN .,.,aits before repairing an object after a host Is e1theJ in a failed state (absent
failures) or in Maintenance Mode

Site read locality


\~/herienabled, reads to vSAN objects occur locu ly V/heo disabled, reads occur across both sites for
stretched cluster

~ Thinswap
Wheri enabled, swap objects will not reserve 100'6 of their space on vSAN datestore; storage policy
reservation w111 be respected

(]It Large cluster support


By default, a vSAN cluster can grow up to 32 nodes When you set tills option, a vSAN cluster can grow up to
a maximum or 54 nodes. If you change this option before vSAN is enabled. the change applies to the hosts
immediately If you change the option on an existing vSAN cluster. you must reboot all hosts to apply the
change After setting this option, review ttie vSAN healtti check result to determine which hosts require
rebooting.

I Automatic rebalance
I
Wheri the cluster is unbalanced, rebalance starts automatically after enablirig automatic rebalance. Rebalance
can wait up to 30 minutes to start, 9iv1n9 time to high pr1orll:y tasks hke EMM, repair, etc. to use the resources
before rebalancing

Rebalancing threshold % 30
- - ------ - - - - -
u eterm1nes when background rebalancing starts in the system If any two disks 1n the cluster have tflis
much variance theri rebalancing begins It will continue until it is turned off or the the variance between
drsks is less than 1/2 of the rebalancing threshold

[ C AN CEL ) APPLY

402
10-28 Reserving vSAN Storage Capacity (1)
You can reserve vSAN storage capacity for the following maintenance activities:

• Operations reserve: Reserves capacity for internal vSAN operations, such as object rebuild

or repair.

• Host rebuild reserve: Reserves capacity to ensure that all objects can be rebuilt if host
failure occurs. To enable, you must have a minimum of four hosts.

When reservation is enabled and capacity usage reaches the limit, new workloads fail to deploy.

Reserve Reserve
Capacity Capacity
\,o .i, ed) (% va. cs)

Usable Capacity Usable Capacity

I111 0 111 I I111 0 111 I I111 0 111 I I111 0 111 I


• • • •
• • • •
• • • •
( 111 0 111 I ( 111 0 111 I I111 0 111 I ( 111 0 111 I
( 111 0 111 I I111 0 111 I I111 0 111 I I111 0 111 I

I vSAN I I v SAN I
vSAN 7 vSAN 7 U1

403
10-29 Reserving vSAN Storage Capacity (2)
To enable vSAN capacit y reserve for internal operat ions and host rebu ild, select t he vSAN
cluster and select Monitor > vSAN > Capacity > Capacity Usage > Configure.

Enable Capacity Reserve sA-vsAN-01 x


Enabling operations reserve 1or vSAN helps ensure that there will be enough space in the clus1er 1or
internal operations to complete successfully. Enabling hos1 rebuild reserve allows vSAN to tolerate one
host 1ailure. When reservation is enabled and capacity usage reaches the limit, new workloads 1ail to
deploy. Learn more ~
The reserved capacity is displayed in the capacity overview:

• Actually written 6.11 GB (3.06%)

«) Operations reserve

II) Host rebuild reserve

[ CA NCEL ) APPLY

404
10-30 Shutting Down and Restarting vSAN
Clusters
To safely shut down a vSAN cluster, you must power off all VMs and put all hosts into
maintenance mode:

1. Place all hosts into maintenance mode, one at a time.

2. Deselect the option to move VMs to other hosts.

3. Select the No data migration option.

Enter Maintenance Mode sa-esx l-01.vclass. local x


This host is in a vSAN cluster. Once the host is in maintenance mode, it cannot access the vSAN dataslore and the stale of any virtual
machines on the datastore. No virtual machines can be provisioned on this host while in maintenance mode. You must etther power
off or migrate the virtual machines from the host manually.

0 Move powered·off and suspended virtual machines to other hosts in the cluster

vSAN data migration I No data miqration " I©


l A VMware recommends to ~n a data m1grauon pre-check before entering ma1n~nce mode. Pre-check determines 1f the
operation will be successful, and reports the state of the cluster once the host enters maintenance mode.

Put the selected hosts in maintenance mode?

G 0 TO PRE-CHECK C A NCEL I

405
10-31 Rebooting vSAN Clusters Without
Downtime
When rebooting a vSAN cluster, you must reboot one host at a t ime so that the VMs do not
incur downtime:

1. Select the Ensure accessibility data migration option when placing hosts into maintenance
mode.

2. Migrate VMs to other hosts.

3. Reboot the host.

4. Exit maintenance mode after the host is running again.

5. Repeat the process on other hosts, one at a t ime.

406
10-32 Moving vSAN Clusters to Other vCenter
Server Instances
You might be required to move the vSAN cluster from the existing vCenter Server instance t o
another:

1. Build a new vCenter Server inst ance using the same or a later version.

2. Ensure that networking is configured correctly on the new vCenter Server instance.

3. Create a clust er with only vSAN enabled.

4. Configure ot her vSAN data services t o match the original clust er.

5. Create vSAN st orage policies to match the vSAN policies o f the original cluster.

6. Disconnect and remove all hosts from the inventory in the original vCenter Server instance.

7. Add hosts to t he cluster enabled with vSAN in t he new vCenter Server instance.

8. Verif y host connectivity and VM exist ence.

9. Apply t he vSAN storage policies t o the VMs.

407
10-33 vSAN Logs and Traces
vSAN support logs are contained in the ESXi host support bundle in the form of vSAN traces.

The vSAN support logs are collected automatically by gathering the ESXi support bundle of all
hosts.

Because vSAN is distributed across multiple ESXi hosts, you should gather the ESXi support logs
from all the hosts configured for vSAN in a cluster.

VMware does not support storing logs and traces on the vSAN datastore.

By default, vSAN traces are saved to the /var/log/vsantraces ESXi host system
partit ion path.

408
10-34 Redirecting vSAN Logs and Traces
When USB and SD card devices are used as boot devices, the logs and traces reside in RAM
disks, which are not persistent during reboots.

Consider redirecting logging and traces to other persistent storage when t hese devices are used
as boot devices.

To redirect vSA N traces t o a persistent dat astore, use the esxcli vsan trace set
command.

For more information about redirecting vSAN logs and traces, see VMware knowledge base
article 1033696 at https://kb.vmware.com/s/article/1033696.

409
10-35 Configuring Syslog Servers
It is good pract ice to configure a remote Syslog server to capture all logs from ESXi hosts.

To configure a Syslog server in t he vSphere Client , select the ESXi host and select Configure >
Advanced System Settings > Syslog.global.logHost.

Edit Advanced System Settings sa-esxi-01.vclass.local x


! Mod1ty1ng con1iguratloo parameters Is unsupported and can cause 1nstabl1tty coo11nue only 11 you Know \~hat )IOU are
doing. T Syslog
Ne me
Syslog global defeullROlMe

Syslog global defoulLSIZe


8
1024

Syslog global logChed<SSLCerts true
Syslog globa11ocp1r D/scratctlllog
Syslog glObal 1oc;p1run1que ro1se
Syslog global l o~ost

Syslog Ioggers apl Forwarder rotate 8


Syslog 1oggers.ep1Forwetders1ze 10'24
Syslogloggers .euestd rot.ete 8
SvslOQ IOQQers .attes1ds1ze 1024
144 item$

The remote host to output logs to. Reset to default on null. Mul11ple hosts are supported and muS1 be separated w11h comma (,).
Example· udp:UhostNamel514, hostName2, ssl//hostName31514

[ CANCEL ]

410
10-36 Lab 12: Verifying the vSAN Cluster Data
Migration Precheck
Examine data migration options and their effect on components:

1. Examine the the Data Migration Precheck Options

411
10-37 Review of Learner Objectives
• Describe vSAN maintenance mode and data evacuation options

• Detail how to assess the vSAN maintenance mode precheck reports

• Define the steps to shut down a vSAN cluster for maintenance

• Explain how to migrate a vSAN cluster to a new vCenter Server instance

• List best practices for vSAN logs and traces

412
10-38 Lesson 2: vSAN Cluster Scaling and
Hardware Replacement

10-39 Learner Objectives


• Describe vSAN cluster scaling

• Detail the removal and replacement o f disk and disk groups in a vSAN cluster

• Detail the removal and replacement o f a host in a vSAN cluster

• Describe how to add a disk and a disk group t o scale up a vSAN clust er

413
10-40 About vSAN Cluster Scaling
vSAN scales up and scales out if you need more compute o r st orage resources in t he cluster.

Scaling up adds resources to an existing host :

• Capacity disks for storage space

• Caching tier devices for performance

Scaling out adds nodes to the clust er:

• Nodes for compute and st orage capacit y

..-------
1 Disk Group I
...-------
1 Disk Group I
...-------
1 Disk Group I
I I I I I I
Cache 1 Cache 1 Cache 1
.----·
I

I I
I
----1
I .
I SSD
• •
SSD

I
.-----·
I

I I
I I I . . . . I I I
I • • • •
I . . . . I I • • • •
I
I ______ ..

SSD
• •
SSD
• I
I
I .
SSD
. .
- - - - - - ...
SSD
.
I
I ______

SSD
• •
SSD
• ...I

• • • • • • • • • • • •
SSD SSD SSD SSD SSD SSD
• • • • • • • • • • • •

Scaling Up

Ill 0 Ill
r------- r------- --
....I -Disk
- Group- -I r-------
1
I
Disk Group

Cache
I
I
1
1

:
Disk Group

Cache
I
I
1
----·
: Cache 1
1
I
Disk Group

Cache
I
I
I
1
I
I
----1
I
.----1
I I
.----·
I I
I
I • • •
I • SSD • • SSD

I
I
• I
I I I I I I I
I • • • • • • • • • • • • I
I I I I
______ ..
I
I
I
I
SSD

- - - - - - ...
• • •
SSD
• I I
SSD

- - - - - - ...
• • •
SSD
• I
I SSD
I • • •
-----
SSD

..
I

Scaling Out

414
10-41 Increasing Capacity by Scaling Up
To increase storage capacity in a vSAN cluster:

• Replace capacity devices in an existing disk group with higher-capacity devices.

• Add capacity devices to an existing disk group.

Before replacing disks, ensure that the vSAN cluster has sufficient capacity to migrate your data
from the existing capacity devices.

415
10-42 Adding New Hosts to vSAN Clusters
You can add an ESXi host to a running vSAN cluster without disrupting any ongoing operations:

• See the VMware Compatibility Guide for supported hardware.

• Create uniformly configured hosts.

• Use the vSAN Disk Balance health check to rebalance the disks.

Ill 0 Ill Ill 0 Ill Ill 0 Ill


r---------, r---------, r---------,
I Disk Group I I
D isk Group I I Disk Group I
I I I I I I

:I
I

Cache
• • •
I:
I
:[
I

Cache
• • •
I:
I
:[
I

Cache
• • •
]:
I
I
I
SSD SSD I
I
I
I
SSD sso I
I
I
I
SSD SSD I
I
• • • • • • • • • • • •
I I I I I I
• • • • • • • • • • • •
I SSD SSD I I sso sso I I SSD SSD I
I • • • • I I • • • • I I • • • • I
L - - - - - - - - - -' L--------- -" L ----------"

Ill 0 Ill
r----------.
1 Disk Group 1
I I

:I Cache I:
I I
1 • SSD . . SSD . 1

,------.
I .

1

SSD
. .

SSD
.

I

1
I
------
·
L---------"'
· · · I

416
10-43 Adding New Capacity Devices to Disk
Groups
You can expand the capacity of a disk group by adding disks:

• Ensure that t he disks do not have partitions.

• Increase t he capacity of all disk groups to maintain a balanced configuration.

• Add devices with the same performance characteristics as the existing disks.

Adding capacity devices affects t he cache-to-capacity ratio of the disk group.

417
10-44 About Disk Claim Management
vSAN has a uniform workflow for claiming disks in any scenario. Available disks are grouped
either by model and size or by host.

You can claim disks by using one of the fallowing methods:

• Select the vSAN cluster, select Configure > vSAN > Disk Management, and click Claim
Unused Disks.

• Select the vSAN cluster, select Configure > vSAN > Disk Management, select a host, and
click Create disk group.

Disl< Management

G All 8 disks on version 13.0.

CLAl~i UNUSED DISKS ADD DISKS GO TO PRE -CHECK •••

Disk Gro up T Di.sics in U.se T

418
10-45 Replacing Capacity Tier Disks
If you detect a failure, replace a capacity device:

1. Select the disk group and remove the capacity disk from the disk group.

2. Replace the faulty drive and rescan the storage adapter.

3. Add the new disk to the disk group.

4. Verify the disk group storage space.


r---------., r---------, r---------,
I Disk Group I I
Disk Group I I Disk Group I
I I I I I I

:I Cache SSD I:
I I
:( Cache SSD I:
I I
:I Cache SSD ]:
I I
• • • • • • • • • • •
I I I I I I
SSD SSD SSD SSD SSD SSD
I • • • I I • • • • I I • • • I
I I I I I I
• • • • • • • • • •
I SSD I I SSD SSD I I SSD SSD I
• • I I • • • • I I • • • • I
L - - - - - - - - - ..1 L - - - - - - - - - .1 L - - - - - - - - - .1

Ill 0 Ill Ill 0 Ill Ill 0 Ill

If deduplication and compression is enabled on the cluster, a capacity device failure affects the
entire disk group. If you must replace a capacity device in a disk enabled for deduplication and
compression, you must remove the entire disk group.

4 19
10-46 Replacing Cache Tier Disks
When decommissioning a cache tier device, you must take the entire disk group offline:

1. Ensure that adequate capacity is available in the vSAN cluster.

2. Evacuate disk group data.

3. Delete the disk group.

4. Replace the cache device.

5. Recreate the disk group.


r---------, r---------, r---------,
I
Disk Group I I
Disk Group I I
Disk Group I
I I I I I I
I
I
I
• • •
e

I
I
I
:I
I

Cache
• • •
I:
I
:I
I

Cache
• • •
I:
I
I I I I I I
SSD SSD SSD SSD SSD SSD
I • • • • I I • • • • I I • • • • I
I I I I I I
• • • • • • • • • • • •
I SSD SSD I I SSD SSD I I SSD SSD I
I •
~---------ea
• • • I
.. _________ ,,,
I • • • • I I... _________
• • • • ,,,I

Ill 0 Ill Ill 0 Ill Ill 0 Ill

420
10-4 7 Removing Disk Groups
When you are removing a disk group from a vSAN cluster, the vSphere Client describes the
impact of a disk group evacuation.

Running a precheck before removing a disk group is a best practice. Prechecks determine if the
operation will be successful and report the state of the cluster after the disk group is removed.

Remove Disk Group s2a43Se2-as3a-d4e2-36e4-oc1ec9496dco x


& Data on t he disks from disk group "52a435e2-a53a-d4e2-36e4-0c1ec9496dc0" will be deleted. Unless you evacuat e the data on the
disks, removing the disks might disrupt working VMs.

Select data mig ration mode: <D


vSAN data migration Full data migration v <D
& VMwa re recommends to run a pre-check before removing a disk group. Pre-check determines if t he operat ion will be successful, and
reports the state of the cluster once t he disk group is removed.

Remove the disk group?

[ GO TO PRE-CHECK l [ C A NCE L l REMOVE

421
10-48 Replacing vSAN Nodes
When you are replacing a host, the replacement should have the same hardware configuration,
whenever possible.

Before removing the host, ensure the fallowing conditions:

• Data evacuation is complete.

• All objects are currently healthy.

To replace the host:

1. Place the host in maintenance mode.

2. Remove the host from the cluster.

3. Add the new host.

422
10-49 Decommissioning vSAN Nodes
To permanently decommission a vSAN node, you must follow the correct procedure:

1. Ensure that sufficient storage capacity is available in the vSAN cluster.

2. Place the host in maintenance mode and select Full data migration.

3. Wait for the data migration to complete and the host to enter maintenance mode.

4. Delete disk groups that reside on the host that you want to decommission.

5. Use the vSphere Client to move the ESXi host from the cluster to disassociate it from the
vSAN cluster.

6. Shut down the ESXi host.

423
10-50 Lab 13: Decommissioning the vSAN
Cluster
Evacuate and delet e t he vSA N cluster:

1. Configure Retreat Mode for vC LS VMs

2. Place vSAN Cluster Hosts in Maintenance Mode

3. Delete the vSAN Disk Groups

10-51 Lab 14: Scaling Out the vSAN Cluster


Scale out a vSAN stretched clust er by adding hosts to the cluster:

1. Add Hosts to the vSAN Cluster

2. Claim Disks for a vSAN Disk Group

424
10-52 Review of Learner Objectives
• Describe vSAN cluster scaling

• Detail the removal and replacement of disk and disk groups in a vSAN cluster

• Detail the removal and replacement of a host in a vSAN cluster

• Describe how to add a disk and a disk group to scale up a vSAN cluster

425
10-53 Lesson 3: Upgrading and Updating
vSAN

10-54 Learner Objectives


• Describe the stages in the vSAN upgrade process

• List the best practices to prepare for a vSAN upgrade

• Describe how to evaluate supported upgrade paths and disk formats

• Explain how to generate upgrade recommendations

426
10-55 vSAN Upgrades
The vSAN upgrade process includes several stages.

Depending on the vSAN and disk format versions that you are running, an object and disk
format conversion might be required.

Disk fo rmat changes can be t ime consuming if data evacuation is required.

If you upgrade the disk format, you cannot roll back software on the hosts or add incompatible
hosts to the cluster.

427
10-56 vSAN Upgrade Process
Before attempting a vSAN upgrade, review the complete vSphere upgrade process to ensure a
smooth, uninterrupted, and successful upgrade:

1. Plan the upgrade.

2. Configure the backup.

3. Patch or upgrade vCenter Server.

4. Patch or upgrade ESXi hosts.

5. Perform a VMware Skyline Health check.

6. Perform a disk format version upgrade.

7. Complete the upgrade.

428
10-57 Preparing to Upgrade vSAN
Before upgrading to the latest version of vSAN, always verify your current environment.

Consult the fallowing resources:

• vSAN version-specific release notes

• VMware Product Interoperability Matrices

Review the VMware Compatibility Guide to verify support for the fallowing items:

• Server, storage controller, SSDs, and disks

• Storage controller firmware and disk firmware

• Device drivers

x .....,. •••
"'

x
- • •
·- •
DOD @ SSD
- • • • •
1111111

429
10-58 vSAN Upgrade Phases
vSAN is upgraded in two phases:

1. Upgrade vSphere (vCenter Server and ESXi hosts)

2. Upgrade vSAN objects and disk format conversion (D FC)

•••

Ill 0 Ill Ill 0 Ill Ill 0 Ill

I

Cache

I I Cache
• •
I I

Cache I
SSD SSD SSD SSD SSD SSD
• • • • • • • •

• • • • • • •
SSD SSD SSD SSD SSD SSD
• • . • • • • • • . •

Disk Group Disk Group Disk Group

430
10-59 Supported Upgrade Paths
VMware supports a range of upgrade paths for vSphere, which includes vSAN.

0 Compatible

0 Incompatible

Not supported
VMware Product Interoperability Matrices

Interoperability Solution/ Database Interoperability Upgrade Path

1. Select a Solution

I V Mware VSANW

Copy CSV Print

VMware VSAN• 7.0 U1 7.0 6.7 U3 6.7 U2 6.7 U1 6.7 6.6.1 U3 6.6.1 U2 6.6.1

7.0
0
6 .7 U3
0 0
6 .7 U2
0 0 0
6.7 U1
0 0 0 0
6.7
0 0 0 0 0
6.6.1 U3
0 0 0 - - -
6.6.1 U2
0 0 0 0 0 - 0
6.6.1
- - 0 0 0 0 0 0
6.6
0 0 0 0 0 0 0
6 .5
- - 0 0 0 0 0 0 0

431
10-60 About the vSAN Disk Format
The disk format upgrade is optional. Your vSAN cluster continues to run smoothly even if you
use a previous disk for mat version.

For best results, upgrade disks to use the latest disk format version, which provides the new
vSAN feature set.

After you upgrade the on-disk format, you cannot roll back software on the hosts or add certain
older hosts to the cluster.

Disk format upgrade is an optional final step when you upgrade a vSAN cluster. You mig ht
choose not to upgrade the disk format if you want to maintain backward compatibility with hosts
on an earlier version of vSAN. For example, you might want to retain the ability to add hosts to
the cluster with vSAN 7.0 GA to provide burst capacity.

Disk format upgrades from v3.0 (vSAN 6.2) to a later version only update disk metadata and do
not require data evacuation. For more information, see VMware knowledge base article
2148493 at https://kb.vmware.com/s/article/2148493.

432
10-61 vSAN Disk Format Upgrade Prechecks
When you initiate an upgrade precheck of the on-disk format, vSAN verifies for the following
conditions:

• All hosts are connected.

• All hosts have the same software version.

• All disks are healthy.

• All objects are accessible.

vSAN also verifies that no outstanding issues exist that might prevent upgrade completion.

433
10-62 Verifying vSAN Disk Format Upgrades
After you complete the disk format upgrade, you must verify whet her the vSAN cluster is using
the new on-disk format.

Select the vSAN cluster and select Configure > vSAN > Disk Management.

The current disk format version appears under Disk Management.

QJ: vSAN-Cluster ACTIOMS V

Summary Monitor Configure Perm1ss1ons Hosts VMS Dataslores Networks Updates

Disk Management
services >
Configura tion > t# All 8 disks on version 13.0.
Llcensrng >
CLAIM U NUSE;O DISKS CRE;ATE; DISK GROUP GO TO PRE·CHECK
Trus1 Authority
Alarm Definrtions Disk Group T DI.sics In Use T State T Health T
Scheduled Tastes
> 13 sa-esxi-01.vdass local 2 of 2 connec1ed Healthy
vSAN v

Sel'\llces > IJ sa-eSKJ-02 vc1ass1oca1 2 01 2 connec1ed Healthy

Di sk tv1anagemen1
> 13 sa-esxi-03 vc1ass.1oca1 2 o1 2 Connec1ec:l Healthy
f3ult Domains

Datastore Shanng > [:J sa-esio-04 vclass.local 2 012 Connec1ed Healthy

434
10-63 vSAN Build Recommendations
vSAN build recommendat ions include patch and applicable driver updates. To update the
firmware on vSAN 7.0 clusters, you must use an image through vSphere Lifecycle Manager.

Skyline Healt h vSAN build recommendation


Last checked 08/13/2020, 10:17 46 P~.-1 RE TES T
vSAN build recommendation Info
...
v vSAN Build Recommendation
Oust er Recommenda t ion
VSAN Build Recommendation Eng1._

VSAN release catalog up-to-date

[$ vSA build reco mmendation >

435
10-64 vSAN System Baselines
vSAN build recommendat ions are provided through vSAN system baselines for vSphere
Lifecycle Manager:

• vSAN generates one baseline group fo r each vSAN cluster.

• vSAN system baselines are listed in the baselines pane of the vSphere Lifecycle Manager.

• vSAN system baselines can include custom ISO images provided by certified vendors.

• vSphere Lifecycle Manager automatically scans each vSAN cluster to verify compliance
against the baseline group.

To upgrade your cluster, you must manually remediate the system baseline through vSphere
Lifecycle Manager.

436
10-65 Review of Learner Objectives
• Describe the stages in the vSAN upgrade process

• List the best practices to prepare for a vSAN upgrade

• Describe how to evaluate supported upgrade paths and disk formats

• Explain how to generate upgrade recommendations

10-66 Key Points


• The Ensure accessibility maintenance mode option migrates only the components that are
required to keep objects available.

• The Full data migration maintenance mode option migrates all data and can be used when
the host is being decommissioned or removed permanently from the cluster.

• The No Data migration maintenance mode option is to be used when the entire cluster must
be shut down.

• A running vSAN cluster can be scaled up and scaled out without disrupting any ongoing
operations.

• vSAN on-disk for mat conversion enables new data services whose impact on your
environment must be considered before upgrading.

• Before upgrading to the latest version of vSAN, always verify your current environment.

Questions?

437
Module 11
vSAN Stretched and Two-Node Clusters

11-2 Importance
The vSAN stretched cluster is a solution that is implemented in cases where disaster avoidance
or swift disaster recovery is important.

A two-node vSAN cluster is implemented in environments where a minimal configuration is a key


requirement, typically running a small number of workloads t hat require high availability.

11-3 Module Lessons


1. vSAN Stretched Clusters

2. vSAN Stretched Cluster Failure Handling

3. Two-Node vSAN Clusters

439
11-4 Lesson 1: vSAN Stretched Clusters

11-5 Learner Objectives


• Identify characteristics of vSAN stretched clusters

• Discuss the vSAN stretched cluster architecture and use cases

• Explain how read and write I/ 0 management works in vSAN stretched clusters

• List networking requirements of vSAN stretched clusters

• Describe the purpose of the vSAN witness host

440
11-6 About vSAN Stretched Clusters
A standard vSAN cluster is limited to one site. The data availability and fa ult tolerance provided
by VM storage policies are limited to a single site.

A vSAN stretched cluster spans three sites to protect against site-level failure. If one site goes
down, the VMs can be powered on at the other site with minimal downtime.

A vSAN stretched cluster extends the concept of fa ult domains so that each site represents a
fa ult domain. The distance between the sites is limited, such as in metropolitan or campus
environments.

W itness
f111 0 1111

VM VM VM VM
01 02 03 04

vSphere vSA N

r------ r------
1 1111 0 1111 : 1 1111 o 1111 :
I
: -11-11-0--11-11 1 1
1 1111 0 1111
I I I I
I
- ::-: I 111 0 Ill I: 1 1111 0 1111 1 • • •
• • •

. . ·--
••• L - - - - - - I • • •
L------ • • •
• • • • • •
- -1 · .. • • •
• • • • • •
• • • • • •

Data Site 1 Data Site 2


(Preferred) (Secondary)

vSAN stretched clusters can be used in environments where disaster and downtime avoidance
is a key requirement.

Stretched clusters protect VMs across data centers, not only racks.

441
11-7 vSAN Stretched Cluster Use Cases (1)
vSAN stretched clusters have the fallowing use cases:

• Planned site maintenance

• Preventing service outages resulting from site failure

• Automated recovery

With stretched clusters, you can perform planned maintenance of one site without any service
downtime. You can use vSphere DRS affinity rules to run VMs on a specific data site.

You prevent production outages before an impending service outage, such as power failures.

442
11-8 vSAN Stretched Cluster Use Cases (2)
vSAN stretched clusters can be used with vSphere Replication and Site Recovery Manager.

Replication between vSAN datastores enables a recovery point objective (RPO) as low as 5
minutes.

Site
Recovery
Manager

vSphere Q vSAN

vSAN vSAN vSAN I


Mirror 1 Replica Mirror 2
I I I I I I
: -11-11 - 11 .....
0- - 1 I: : -111_1_ 0- - -
11 1 I: r ---- 1 r---- 1 r---- 1
I I I : -11-11 - 0- - I:
.....
11 1
VM I I I

1 1111 0 1111 1 1 1111 0 1111 1 1 1111 0 1111 1


I I I I I I
1 1111 0 1111 1 1 1111 0 1111 1 1 1111 0 1111 1
I I I I I I
1 Domain 1 1 1 Domain 2 1 1 Domain 3 1
• • • • • • • • •

·---
• • • • • • • • •
• • • • • • • • •
. . . ....--..... ... ....--.....
...
... - - ... --
... ...
.. .-
-
... -
... - ... -
... - ... -
.. . -
Data Site 1 Data Site 2 Recovery Site

< 5 ms Lat ency Over >10/20/ 40 Gbps A ny Distance >5 m in RPO

443
11-9 Design of vSAN Stretched Clusters
A vSAN stretched cluster spans three sites to protect against site-level failure.

• Pref erred data site

• Secondary (or nonpreferred) data site

• Witness site

Only the preferred and secondary data sites contribute to the compute and storage resources.

Pref erred and secondary sites can have a maximum of 15 ESXi hosts each, so a stretched
cluster can have a maximum of 15+15+1 hosts.

W itness
(111 0 111J

VM VM VM VM
01 02 03 04

vSphe re vSA N

r------ r------
1 f111 0 1111 : 11111 0 1111 :
I I
1 1
1 1111 0 1111 1 1111 0 1111
I I I I
• • •
1 1111 0 111 1 1 , 1111 0 1111 1 • • •
• • • • • •
• • • I I • • •
L------ L------
• • • • • •
• • • • • •
• • • • • •
• • • • • •
• • • • • •

Data Site 1 Data Site 2


(Preferred) (Secondary)

VMs deployed on a vSAN stretched cluster will have one copy of its data on site A, the second
copy of its data on-site B, and any witness components placed on the witness host in site C. This
configuration is achieved through fa ult domains alongside hosts, VM groups, and affinity rules.

The witness site contains a single virtualized witness host that stores only w itness components.
The purpose of the witness site is to provide a mechanism to break the tie if a split-brain

scenario occurs.

444
If a complete site failure occurs, a ful l copy of the VM data and more t han 50% of the
components will be available. This enables t he VM to remain available on the vSAN datastore. If
VMs need to be restarted on another site, you can configure vSphere HA to manage this task.

445
11-10 About Pref erred Sites
The preferred site is the data site that remains active when a network partition occurs between
the two data sites.

If a failure occurs, VMs on the secondary site are powered off and vSphere HA restarts them on
the preferred site.

Wit ness
f111 0 111)

VM VM VM VM

vSphere vSAN

r------ r------
1 f111 1111 :
0 1 1111 1111 :
0
1 _ _ __ 1 _ _ __
1 1
1 1111 0 1111 1 1111 0 111)
I I I I
: : : I f 111 0 1111 1 1 1111 0 111) : -::-:
•. ·--
. • .
• • •
• • •
L------ I L------
--· .. •••
• • •
• • •
• • • • • •
• • • • • •

Data Site 1 Data Sit e 2

Failure

446
11-11 About Witness Hosts
A vSAN stretched cluster requires a witness host to store the witness components for VM
objects:

• Each stretched cluster must have its own witness host.

• The witness host cannot run any VMs.

• The witness host stores only witness components to provide a cluster quorum.

• The witness host is packaged in a virtual appliance.

• The witness host includes the embedded license.

The Witness host can be deployed as either a physical ESXi host or a vSAN witness appliance. If
a vSAN witness appliance is used for the witness host, it w ill not consume any of the customer's
vSphere licenses. A physical ESXi host that is used as a witness host will need to be licensed
accordingly.

447
11-12 Sizing Witness Hosts
When deploying a vSAN witness appliance, you must estimate how many VMs are required for
the business and the amount o f components t hat make up a VM. This estimat e depends on t he
number o f virt ual disks, policy settings, and snapshot requirements.

Deployment Type Components Supported VMs in a Stretched


Cluster

Tiny 750 10

Medium 21,833 500

Large 24,000 More t han 500

Extra Large 64,000 More t han 500

448
11-13 vSAN Stretched Cluster Heartbeats
vSAN designates a master node on the preferred site and a backup node on the secondary site
to send and receive heartbeats.

Heartbeats are sent every second:

• Between the master and backup nodes

• Between the master node and the witness host

• Between the backup node and the w itness host

If communication is lost between the witness host and one of the data sites for five consecutive
heartbeats, the witness is considered down until the heartbeats resume.

Witness
(111 0 1111

Master Backup
Node Node
- - -
I I I I
I Ill 0 Ill I I Ill 0 Ill I


• •
• •
: I111 0 111 I: :I111 0 111 I: • • •
• • •



• •




- - - - - - - - - - - - - - • • •
• •
• •


• • • • • •
• • • • • •
• • • • • •

Data Site 1 Data Site 2

Preferred Site Secondary Site

449
11-14 Managing Read and Write Operations
vSAN stretched clusters use a read localit y algorithm t o read 100% from the data copy on the
local site. Read locality reduces the latency incurred during reading operations.

I111 0 111 II 111 0 111II111 0 111 I


vm
vSAN
Witness

0 Witness Site

100°/o Read
100010 W r i te 100°/o Writ e

Read
Cach e .
Write
Buffe . Read
Cache •
W r ite
Buffer
nr
....
':"'
11. l\/ ~A ~

.
I NV Me
I I NV Me
I ....
':"'
1\1\1 ~.A ~

.
I NV Me
I I NV Me I
vSAN IS D
..:... Replica 1
-

.
BB BB vSAN
..:... Replica 1
-
SD
.
.
BB BB
BB BB BB BB
• •
_SD
. --- . . . . -- - . . SD
.
II
Disk Group Disk Group Disk Group Disk Group Disk Group Disk Group

Preferred Site Second a "'ite

In vSAN stretched clusters, the mirrors are located on different sites. The distance increases the
latency. As a result, read ing the data from the remote sit e is not efficient because it can affect
the performance of applications. However, the writes must be sent to all the available mirrors on
both pref erred and secondary sites.

450
11-15 Stretched Cluster Networking
A stretched cluster has the fallowing network requirements:

• Connectivity to the management network and the vSAN network on all three sites

• VM network connectivity between the data sites

Both data sites must be connected to a vSphere vMotion network for VM migration.

Witness
(111 0 111(

Layer 3 Network

• • • • • •
• • • • • •
• • • • • •
• • • • • •
• • • • • •
• • • • • •
• • • • • •
• • • • • •

Data Site 1 Data Site 2

451
11-16 Network Requirements: Between Data
Sites
A vSA N stretched cluster network requires connectivity across all t hree sit es. It must have
independent routing and connect ivity between the data sites and the wit ness host.

Bandwid t h bet ween sites hosting VM objects and the witness node is dependent on how many
objects reside on the vSAN cluster. You must appropriately size the dat a site to t he witness
bandwidth for both availability and growth.

In a vSAN stret ched configuration, you size the writ e 1/0 according to t he int ersit e bandwidth
requirements. By default, the read traffic is handled by t he site on which the VM resides.

vSAN Cluster Configuration Requirements

vSAN stretched cluster (between data sites) 10 Gb or faster with latency <5 ms RTT.

The required bandwidth between two data sit es (B) is equal to write bandwidth (Wb) * data
multiplier (md) * resynchronization mult iplier (mr): B = Wb * md * mr.

The dat a mult iplier comprises overhead for vSAN metadata traffic and miscellaneous re lated
operations.

Using a data multiplier of 1.4 is a best practice. The resynchronization multiplier is included t o
account for resynchronizing event s. To make room for resynchronization traffic, another best
practice is to allocate 25% addit ional bandwidth capacity to the required bandwidth capacity for
resync hronizat ion events.

Example network bandwid t h calculat ion:

• A workload of 10,000 writes per second to a workload on vSAN with a typical 4 KB size
write would require 40 MBps or a 320 Mbps band width.

• B = 320 Mbps* 1.4 * 1.25 = 560 Mbps.

• Including the vSAN network requirements, the required bandwidth would be 560 Mbps.

452
11-17 Network Requirements: Between the
Data Sites and the Witness Site
The net work bandwidth required between the data sites and the w itness site is calculated
differently from the intersite bandwidth required f or data sites.

Witness sites do not maintain VM data. They contain only component metadata.

vSAN Stretched Cluster Site Bandwidth Latency

Between data sit es and witness 2 Mbps per 1,000 vSAN components • < 500 ms lat ency
host RTT (1 host per site)

• < 200 ms latency


RTT(up to 10 hosts
per site)

• < 100 ms latency


RTT (11 to 15 hosts
per site)

The bandwidth required between the witness and each data site is equal to ~ 1 138 B x number of
components I 5 seconds. 1138 B x number o f component s I 5 seconds.

Example network bandwid t h calculat ion:

• 166 VMs wit h the requirement for t he w itness to contain 996 components (166 VMs * 3
components/VM * 2 (FTT +1) * 1 (stripe widt h)

To sat isfy the witness bandwidth requirements for a total of 1,000 components on vSAN, use
the f allowing calculation:

• B = 1138 B * 8 * 1,000 I 5 s = 1,820,800 Bit s per second = 1.82 Mbps

• To convert bytes (B) t o bits (b ), multip ly by 8.

• A best pract ice is to add a 10% safety margin and round up.

• B + 10% = 1.82 Mbps+ 182 Kbps = 2.00 Mbps

• With the 10% buffer included, 2 Mbps is generally appropriat e for every 1,000 components.

453
11-18 Static Routes for vSAN Traffic
By default, vSphere uses a single default gateway. All routed t raffic tries to reach its destination
through this common gateway.

You might need to create static routes in your environment to override the default gateway for
vSAN traffic in certain situations:

• If your deployment has the witness host on a different network

• If the stretched cluster deployment has both data sites and t he w itness host on different
networks

You can create a static route before overriding the default gateway by using fallowing es x c 1 i
command:

esxcli network ip route ipv4 add -g gateway-to-use -n remote-


network

vrnk2 - Ed t S ttlngs

Pott proPQ1tl s

Pv4 addr SS 0 19~ 32

Subn t mask 255 255 2SSO

rrid ct r t gat • y for ws


~apt r

0192 1
...

I CANCEL I

454
11-19 Planning for High Availability
For high availability, a best pract ice is to run at 50% of resource consumption across t he vSA N
st retched cluster. If a complete site failure occurs, all VMs could be run on the surviving site.

Some customers might prefer to use more than 50% of the available resources. However, if a
failure occurs, not all VMs will be restarted on the surviving site.

4 55
11-20 Configuring Stretched Clusters

To configure a st retched cluster, select the vSAN cluster and select Configure > vSAN > Fault
Domains > Enable Stretched Cluster.

Group hosts into preferred and secondary fa ult domains, select a w itness host, and create a disk
group on t he witness host.

Config ure Stretc hed Clu st er Review x


Review your setting s selections b efore f inishing t he wizard.
1 COnfigure fault domains
Preferred fault d omain nam e: Preferred

2 Select witness host Hosts in preferred fa ult domain: sa-esxi-01.vdass.local


sa-esxi-02.vdass.local
3 aaim disks for witness host
Secondary faul1 d omain nam e: Secondary

4 Revi ew Hosts in secondary faul1 d om ain: sa-esxi-03.vdassJocal


sa • esxi ·04.vclass.loca I

Wit ness host: sc-w1tness-01.vclass.local

Claimed cache: 10.00 GB

Claimed capacity: 35.00 GB

C AN CEL El F INI S H

456
11-21 Replacing a Witness Host
If the wit ness host fails, a new witness host can easily be added to the st retched configuration.

Fault Domain s

Fault domain failures to tolerate 1

configuration typ e Stretched cluster D ISABLE STRETCHED CL USTER

Witness host C sc -w1tness-01.vclass.local ICHANGE WITNESS HOST I


Pref erred (preferred) •
•• Secondary .••
Used capacity • 6% Used capacity I 2%

0 0 sa-eSXJ-02.v_. I 3% O 0 sa-eSXJ-04.v__ 2%

0 a sa-eSXJ-01 vc... • 9% 0 [j sa-esxi-03 v... I 3%

457
11-22 Stretched Clusters and Maintenance
Mode
In a st retched cluster, you can use maintenance mode on data site hosts and on the witness
host:

• For a data site host , select the required vSAN data migration option.

• For the wit ness host, data migration does not occur.

Enter Maintenance Mode sa-esxi·Otvclass.local x

This hoS1 1s n a vSAN ciuS1er Once the hoS1 ts PUt In maintenance mO<!e, It will not have access to the vSAN ciataS10f"e and the
state o1 any Virtual machines on that dat as1ore A host 1n maintenance mode does not per1orm any ac11v ties on virtual machines,
1nclud1n9 virtual machine prov1sion1ng 'l'ou m19ht need to either power off or migrate tne virtual machines f rom t he host
manually
~

.,, Molle powereo-011 ano suspencsecs 111nua1machines10 other nosts 1n the cluster

\/SAN data migration Full data migration ©


./ Su111c1ent capacrty on other ho,..,..-...-:.:...~-..----..1 See d"ta1led report
Ensue acoessitity
No data ln{JatlOn

Put the selected hosts 1n maintenance mode?

I CA"I CEL I

Enter Maintenance Mode sc-wltness-01.vclass.local X

A host in main1enance mode does not pertorm any activities on virtual machines,
including virtual machine provisioning. The host con1iguration is still enabled. The
Enter Main1enance Mode 1ask does not complete until the above state is
completed. You migh1 need to either power 011 or migrate the virtual machine s
1rom the host manually. You can cance l the Enter Maintenance Mode task at any
t ime.

Put the selected hosts in main1enance mode?

CAN CEL

458
11-23 Monitoring Stretched Clusters
VMware Skyline Health provides a range of tests to verify t he health status of stretched
clusters.

ILll vSAN-Cluster ACTIO NS v

Summary Monitor Configure Permissions Hosts VMS

Issues and Alarms > Skyline Health


Last checked : 09/0712020. 7:28:44 AM RETEST
Performance >
Tasks and Events >
Iv Stretched cluster I ....

9 Witness host not foun d


Resource A llocation >
U1iliz a1ion 9 Unexpected nun1ber of fault dom ...

S1orage Overview
9 unicast agent configuration incon...
Securi1y

v
9 Invalid preferred fault domain on ..
vSAN

Skyline Health 9 Preferred fault domain unset

Vir1ual Objects
9 Witness host within v Center cluster
Physical Disks

Resync1ng Objects
9 Witness host faul t domain miscon...

Proactive Tests 9 unicast agent not configured


Capaciiy

Pertormance
9 No disk claimed on witness host

Pertormance Diagnostics 9 Unsupported host v ersion


Support

Data Migration Pre-check


9 Invalid unicast agent

Cloud Native Storage v 9 Site latency health

459
11-24 Review of Learner Objectives
• Identify characteristics of vSAN stretched clusters

• Discuss the vSAN stretched cluster architecture and use cases

• Explain how read and write I/ 0 management works in vSAN stretched clusters

• List the networking requirements of vSAN stretched clusters

• Describe the purpose of the vSAN witness host

460
11-25 Lesson 2: vSAN Stretched Cluster
Failure Handling

11-26 Learner Objectives


• Explain how vSAN stretched clusters handle failures

• Describe VM storage policies for vSAN stretched clusters

461
11-27 vSAN Stretched Cluster Failure Handling
(1)
Each site in a stretched cluster resides in a separate fa ult domain. A vSAN stretched cluster can
tolerate one link failure at a time w ithout data becoming unavailable.

The witness host serves as a tiebreaker when a decision must be made regarding the availability
of datastore components and t he network connection between the two data sites is lost . In this
case, the witness host typically forms a vSAN cluster with t he preferred site.

W itness
f111 0 1111

VM VM VM VM
01 02 03 04

vSphe re vSA N

r------ r------
1 f111 0 1111 : 1 f111 o 1111 :
I I
1 1
1 f111 0 1111 1 f111 0 1111
I I I I
• • • 111 1 1
1 1111 0 , 1111 0 1111 1 • • •
• • • • • •
• • • I I • • •
• • •
L------ L------ • • •
• • • • • •
• • • • • •
• • • • • •
• • • • • •

Data Site 1 Data Site 2


(Preferred) (Secondary)

Failure

462
11-28 vSAN Stretched Cluster Failure Handling
(2)

If the pref erred site becomes isolated from the secondary site and the witness, the witness host
forms a cluster using the secondary site. When the preferred site is online again, data is
resynchronized to ensure that both sites have the latest copies of all data. If the witness host
fails, all corresponding objects become noncom pliant but are ful ly accessible.

W itness
(111 0 111)

VM VM VM VM
01 02 03 04

vSphere vSA N

r------ r------
11111 0
1 _ _ __
1111 : 11111 0 1111 :
I
1l••1 o 1111 1
111••
I
o 1••1 I' I I
I
- ::-: I 111 I0: Ill , 1111 0 1111 1 • • •

·--
• • •
••• L - - - - - - L------ I • • •
.• .• • • • •
• • •
• • • • • •
• • • Preferred Site Isolated • • •
• • • • • •

Data Site 1 Data Site 2


(Preferred) (Secondary)

Failure

463
11-29 vSAN Stretched Cluster Site Disaster
Tolerance
Use stretched cluster-aware storage policies to tolerat e a site failure.

If one data site fails, t he rep lica on the other site is available to continue VM operations:

• Configure Site disaster tolerance to govern t he fault t o lerance between sites.

• Configure Failures to tolerate to govern fa ult tolerance within each site.

vSAN

Availability Advanced Polley Rules Tags


Sit e disaster tole rance <D None • stre1ched cluster
None • standard dJster
Failures 10 tolerate (!)
None • standard d.Jster with nested falAt domans
' Dual site mrrorhQ (stretched duster) I
None • keep data on Preferred (stretched cluster)
None - keep data on Non-p'eferred (stretched ck.Jster)
None • stretched cluster

Availability Advanced Policy Rules Tags


Site disaster 1olerance © Dual site mirroring (stre1ched d uster) v

Failures to tolera1e © I 11a11ure • RAID·l (Mirroring) I

Availability Advanced Policy Rules Tags


srte disaster tolerance © Dual sne mirroring (stretched cluster) v

Failures to tolerate © I 1 1allure • RAID-5 (Erasure Coding) I

Use dual site mirroring (stretched cluster) VM storage policy rules t o determine the fault
tolerance.

The site disaster tolerance governs t he failures to tolerate across sites. If a dat a site goes down,
the replica on t he remaining site and the w itness component remain available to cont inue
operations.

The failures to tolerat e governs t he failure tolerance within each sit e. This setting ensures that
the object can survive failure within the sites.

464
11-30 Site Disaster Tolerance: Dual Site
Mirroring
The dual site mirroring policy in a stretched cluster maintains one replica on each data site.

If one data site goes down, the replica on the remaining site and the witness component remain
available to continue operations.

When choosing this policy, you must ensure that both data sites have sufficient storage capacity
to each accommodate a replica.

Consider the number of objects and their space requirements when applying a dual site mirroring
policy.

465
11-31 Dual Site Mirroring with RAID 1
Dual site mirroring with RAID 1 ensures that the object remains accessible in t he event o f a site
failure, in addit ion to a node failure on the remaining site.

You must ensure that the number o f hosts and drives available on each site can satisfy the
Failures To To lerate and St ripe Width policy sett ings.

---------------
•I W itness Site I
I
I
I
11 Witness II I

. ---- -------- I

-- - - - ----, I - - - -
r"""':!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!~.., I
I I I 11 Witness 11 W itness I~ Component , I I
I
I ~!!!!!!!!!!!!!!!~~~., I
I I~ Component , I I 11
~~ Component p
~11_ 1 ......
_ ,
• • •
I L....:;;iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii~..I

Preferred Site
I
I No n-Preferred Site • • •
. . . I • • •
... - - - - - - - - - - - - - - - - -
.. .. ·-- --·
L - - - - - - - - - - - - - - - - - •••
. - ..
-- ...
.. .. .. -- ...
-- ...
... - ...
Data Site 1 Data Site 2

466
11-32 Dual Site Mirroring with RAID 5/6
vSAN stretched clusters also support RAID 5/6 erasure coding within t he two data sites. Wit hin
each site, four or six hosts are required for RAID 5 and RAID 6, respectively.

In t he example, the vSAN object is mirrored between t he sites. If a single host failure occurs
within a site, the object can tolerat e using RAID 5.

I--------------~
W itness Site I
I
I
I
11
Witness II I

.- ---
I
I

- - - - - ----, I - - - - . - - - - - - - · - - - -I
~ I
ponent I 1
I
I I~ Component
~I I~ Component
~I I

- I
I

I1~ ~ 1I
I
I~ Component ~I I~ Com ponent ~I I Component
I~ Component
~I i
• • • Preferred Site 1 I --
Non-Preferred Site
I
I . . .
• • •

·--
· · · I
... - - - - - - - - - - - - - - - - -
... -
.. .. .. --
I
L - - - - - - - - - - - - - - - - - •••
--· ..
-- ...
...
... - -- ...
...
Data Site 1 Data Site 2

4 67
11-33 Keeping Data on a Single Site
You can use t he following VM storage policy opt ions to place t he components of an object on a
single site within the stretched clust er:

• None-keep data on Preferred

• None-keep data on Non-preferred

No mirroring is pert or med across t he dat a sites.

vSphere Fault To lerance for VMs is supported for VMs t hat are restricted t o a single sit e.

Create VM Storage Policy vSAN x

1 Name and descnp1ion Availabili1y A dvanced Policy Rules Tags

2 Polley structure Site disaster tolerance <D None· standard cluster


None· standard cluste1·
Failures to tolerate (D Dual site mlrrorln stretched cluster
3 vSAN
None - keep data on Preferred (stretched cluster}
None - keep data on Non-preferred (stretched cluster>

CA NCEL BACK NEXT

468
11-34 Symmetrical and Asymmetrical
Configuration
vSAN 6.1 or later supports symmetrical configurations where site 1 and site 2 contain the same
number of ESXi hosts, and the witness host in a third site.

vSAN 6. 6 or later supports asymmetrical configurations where site 1 might contain 20 ES Xi


hosts, site 2 might contain 10 ESXi hosts, and the witness host is in a third site.

With the asymmetrical configuration, some workloads would be available only in site 1 (using a
PFTT=O/SiteAffinity =Preferred), and others would be available in both site 1 and site 2.

469
11-35 Activity 1

In t his scenario, an outage removes all access to the nonpreferred site.

Scenario factors:

• Site disaster tolerance: Dual-Site Mirroring

• Failures t o tolerate: 1 - RAID-1 (Mirroring)

• vSphere HA - Host Monitoring: enabled

How does t he cluster respond to the outage?

Witness
1111 0 1111

VM VM VM VM
01 02 03 04

vSphe re vSAN

------- -------
:I111 0 111 I: : 1111 o 1111 :
I
I
I111 0 111 I 1
I
' 1111
I
0 1111 1
I
• • •
• • •
I 1111 0 111 I 1 I IIll 0 1111 1 : : :
• • •
• • •
• • •
• • •
I ______ J I
------ --· ..
J • ••

• • •
• • •

Data Site 1 Data Site 2


(Preferred) (Secondary)

470
11-36 Activity 1 Solution
How does t he cluster respond to the outage?

vSphere HA restarts t he VMs on the preferred site using t he replica component that is available
on t he preferred site and t he witness component.

Witness
1111 0 111f

vSph ere HA Restart


r- - - - - - - - ""'I
VM VM I
I v M VM I

01 02 03 04 03 04
I
I
-- ---·..
.
I

vSphe

r------ r------
1 1111 0 1111 : 1 1111 0 111) :
I
1 1111 0 111) : : -111_1_0--11-1) 1
I I






:I111 0 111 ) 1 1 f111 0 1111 : -:_:_:












L------ I L------
--· .. •••
• • •
• • •

Data Site 1 Data Site 2


(Preferred) (Secondary)

471
11-37 Activity 2
In this scenario, a host in the preferred site failed.

Scenario factors:

• Site disaster tolerance: Dual-Site Mirroring

• Failures to tolerate: 1 - RAID-1 (Mirroring)

• vSphere HA - Host Monitoring: enabled

How does the system respond to the outage of one host in the preferred site?

How does the system respond to the outage w ith multiple hosts in the preferred site?

Witness
1111 0 1111

VM VM VM VM
01 02 03 04

vSphe re vSA N

------- -------
: ( 111 0I: 111 : ( 111 0 111 I:
I
I
II II x 111 I ' I
I ( 111
I
0 111 ) 1
I
I______ 111 I
• • •
• • •
• • •
• • •
I
[
I II 0
, 1 I 1111
[
0
______ ,
111 ( 1 • • •
• • •
• • •
• • •
• • • • • •
• • • • • •
• • • • • •
• • • • • •

Data Site 1 Data Site 2


(Preferred) (Secondary)

472
11-38 Activity 2 Solution
How does the system respond to the outage of one host in the Preferred site?

vSphere HA restarts the VMs in the preferred site.

How does the system respond to the outage of multiple hosts in the preferred site?

vSphere HA restarts VMs in the secondary data site.

W itn ess
1111 0 1111

vSphere HA vSphere HA
Restart Restart
r - - - - - - - - ..
I
r-------- .
I
I
VM VM I : VM VM VM I VM
or
__
I
I 01 02 ..
_________
I
I
I , 01 .____02_ .. 03
I
I
I
04
---------
vSphe re vSA N

------- -------
:I111 0 111 I: :I111 0 111 I:
I
I
1111 x 111 ) 1
I
I
I
1111 0 111 ) 1
I
•••
• • •
I 1111
, ______ J
0 111 ) 1 I
I 111 0
l ______
111 ) 1
J






• • • • • •
• • • • • •
••• • • •
••• • • •
••• • • •
• • • • • •

Data Site 1 Dat a Site 2


(Prefe rred ) (Secondary)

473
11-39 Activity 3

In t his scenario, the witness host stopped responding t o the data sites but both data sites remain
connected to each other.

Scenario factors:

• Site disaster tolerance: Dual-Site Mirroring

• Failures t o tolerate: 1 - RAID-1 (Mirroring)

• vSphere HA - Host Monitoring: enabled

How does t he cluster respond to the outage?

Why does the system respond in this way?

Witness
I111 ) ( 111 I

VM VM VM VM
01 02 03 04

vSphe re vSA N

r------ r------
I ( 111
I
0 111 I: I ( 111
I
0 111 I:
I ( 111 0 111 I : I ( 111 0 111 I :
• • • : ( 111 0 111 ( 1 : ( 111 0 111 ( 1 • ••
• • • • ••
I I
• • •
• • • L------ L------ •••
• • •
• • • • ••
• • • • • •
• • • • ••
• • • • ••

Data Site 1 Data Site 2


(Preferred) (Secondary)

474
11-40 Activity 3 Solution
How does the cluster respond to the outage?

VMs witness components are marked absent, and VMs continue to run without interruption.

Why does the system respond in this way?

Because the components on the two data sites constitute a quorum, the object remains
available.

W itness
I111 x I111

VM VM VM VM
01 02 03 04

vSphe re vSA N

.. - - - - - - .. - - - - - -
I ( 111
I
0 111 I: I ( 111
I
0 111 I:
I 1111 0 111 I : I 1111 0 111 I :
111 I
I I
::: 1 (111 1 0 111 I I 1111 0 1 • • •
• • •
• • •
• • •
L------ I L------ I • • •
• • •
• • • • • •
• • • • • •
• • • • • •
• • • • • •

Data Site 1 Data Site 2


(Preferred) (Secondary)

475
11-41 Activity 4

In t his scenario, a network outage occurs between the witness and t he preferred site.

Scenario factors:

• Site disaster tolerance: Dual-Site Mirroring

• Failures t o tolerat e: 1 - RAID-1 (Mirroring)

• vSphere HA - Host Monitoring: enabled

How does t he cluster respond to t he outage?

Why does the system respond in this way?

Witness
Failure I111 o 111 J

VM VM VM VM
01 02 03 04

vSphere vSA N

.------ I - - - - - -

I
I
1111 0 111 I: I
I
1111 0 111 I:
I 1111 0 111 I : I 1111 0 111 I :
• •
• •
I
• I

I111 0 111 11
I
I
I 1111 0 111 I '
I
•••
• • •
• • •
• • • L------ L------ •••
• • •
• • • • • •
• • • • • •
• • • • • •
• • • • • •

Data Site 1 Data Site 2


(Preferred) (Secondary)

476
11-42 Activity 4 Solution
How does t he cluster respond to the outage?

VMs continue to run without interruption. The wit ness is placed in a network partition until t he
communication is re-established.

Why does the system respond in this way?

The dat a sit es maintain a quorum for the VM dat a. Because the witness host does not have
connectivity to all hosts, it is placed in its own network part it ion to prevent conflicts.

Witness
Failure I111 o 111 I

VM VM VM VM
01 02 03 04

vSphere vSA N

.------ r------

____
,' 1111 0 1111 : I
I
1111 0 111 I:
1 1111 0 1111 1 I 1111 0 111 I :
I I
IIll
: : : I 0 1111 1
I
I 1111 0 111 1 1 •••
• • •

. . ·--
••• L ______ I I
L------ •••
• • •
• • • • • •
- -t · .. • ••
• • • • • •
• • • • • •

Data Site 1 Data Site 2


(Preferred) (Secondary)

477
11-43 Activity 5

In t his scenario, a network outage occurs between the preferred and t he secondary sit es.

Scenario factors:

• Site disaster tolerance: Dual-Site Mirroring

• Failures t o tolerate: 1 - RAID-1 (Mirroring)

• vSphere HA - Host Monitoring: enabled

• vSphere DRS: enabled

How does t he cluster respond to the outage?

What role could vSphere DRS play after t he recovery?

W itness
111 0 111
1 1

VM VM VM VM
01 02 03 04

vSphe re vSA N

r- - - - - - - ,_. - - - - - -
:I111 0 111 I: :I1•1 0 111 I:
I
I
I111 0 111 11
I
I
I
I111 0 111 1 1
I









I 1111
l ______ I
0 111 I 1 II111 1
l ______ I
0 111 I •


• •
• •
• •
• • • • • •
• • •
-- -- • • • • • •
• • •
• • •
• • • -- -- ...
...

Data Site 1 Data Site 2


(Preferred) (Secondary)

Failure

478
11-44 Activity 5 Solution
How does t he cluster respond to the outage?

All VMs running on t he preferred sit e cont inue t o run uninterrupted. All VMs on t he secondary
site are aut omatically powered o ff. and vSphere restarts t hem on t he preferred site.

What role could vSphere DRS play aft er t he recovery?

After the out age is reso lved, vSphere DRS migrates VMs based on the defined affinity rules in
place.

W itness
f111 0 1111

vSoh ere HA Restart


,.. - ----- - - ..
I
VM I v M VM I
I
VM
01 02 II 0 3 04 II 03 04
- - - - - - - - - ..
vSphere vSAN

r------ r------
1 f111 0 1111 : 1 f111 0 1111 :
: -,1-11-0--11-1f I : -111-1-0--11-11 1
I I I I
- ::-: I f 111 0 Ill I: I f 111 0 111 I :-::-:
-
.. ·--
••• L - - - - - -
• • •
-1 · ..
L------
--· .. •••
• • •
• • •
• • • • • •
• • • • • •

Data Site 1 Data Site 2


(Preferred) (Secondary)

Failure

4 79
11-45 Activity 6

In t his scenario, the preferred site has failed and vCenter Server also resides on the failed site.

Scenario factors:

• Site disaster tolerance: Dual-Site Mirroring

• Failures t o tolerate: 1 - RAID-1 (Mirroring)

• vSphere HA - Host Monitoring: enabled

• vSphere DRS: enabled

How does t he cluster respond to the outage?

Is the response affected by vCenter Server being unavailable?

Witness
(111 0 1111

VM VM

vSphere vSA N

~------ ~------

:I111 0 111 I: :I111 0 111 I:


1 1111
I
0 1111 1
I
I
I 111
I
0 111 1 1
I
I
: : : I Ill 0
··· L ______ I
1111 1 I J 111 0 111 J 1 •

• •
• •
L------ I • • •
• • •
• • •
• • •
• • •
• • •

Data Site 1 Data Site 2


(Preferred) (Secondary)

480
11-46 Activity 6 Solution
How does the cluster respond to the outage?

vSphere HA restarts all the VMs from the failed site to the other data site.

Is the response affected by vCenter Server being unavailable?

No, vSphere HA recovery is independent of the vCenter Server system.

Witness
1111 0 1111

vSp here HA
p estart
,_ -- - r --------- I
I
VM VM I
VM VM I
I

------ --------- I

vSphere vSA N

,.- - - - - - - ...-------
: 1111 0 1111 : I
I
I••• 0 111 I:
1 1111
I
0 1111 1
I
I
I
I111 0 111 1 1
I
: : : I I111
··· L ______ I
0 Ill I 1 I ( 111 0 111 I 1
I


• •
• •
L- - - - - - •

• •
• •
• • •
• • •
• • •
• • •

Data Site 1 Data Site 2


(Preferred) (Secondary)

481
11-4 7 Lab 15: Configuring the vSAN Stretched
Cluster
Configure the vSAN stretched cluster:

1. Configure the vSAN Stretched Cluster

2. Verify the Health of the vSAN Stretched Cluster

3. Configure the Dual Site Mirroring VM Storage Policy

4. Examine the VM Components Placement

482
11-48 Review of Learner Objectives
• Explain how vSAN stretched clusters handle failures

• Describe VM storage policies for vSAN stretched clusters

483
11-49 Lesson 3: Two-Node vSAN Clusters

11-50 Learner Objectives

• Explain the two-node vSAN cluster architecture and use cases

• Describe the purpose of the vSAN shared witness appliance

484
11-51 About Two-Node vSAN Clusters
vSAN two-node clusters are implemented with two ESXi hosts and a witness host.

Two-Node Cluster

' ~I

vSphere vSAN
~
,____~~~~~~~---)\'--~~~~~~~~~
r----------------~-------,

I ESXi Host ESXi Host : I Witness Host I

:In• o •n I In• o 1n I 1: In• o in I :


1 Site 1 :1 Site 2 1
-----------------L-------~

485
11-52 Two-Node vSAN Cluster Use Cases
The two-node architecture is ideal for remote office/branch office (ROSO) use cases:

• Remote offices are managed centrally by one vCenter Server instance.

• Multiple two-node clusters can share the common witness node.

Sharing a witness host between a two-node cluster and a stretched cluster or between multiple
stretched clusters is not supported.

• • •
------------------I
·--
• • •
• • •
...
.. - I

. . . -_ ..
ROBO Site 1 ...
... - I I
~ I I
I I
I111 0 111 I 111 0 '"'11,1 I I
I I
vSphere ( vSAN ''• I I

' ,,
I I
Shared Witness I
• • • I' ~
Appliances I

·--
• • •
• • • I
• • •
.: .: :
I
I
' ~ • • •
• • • I
RO BO Site 2 - -
.. .. .. -- + - - - - - - I1- .. + VM • • •
• • •
I

0 11111111 0 1111
I
.,, _
......__ ___,
• • •
• • •
• • •
I
I
I
I .'

I vSphere Q___v_sA_ N_ ___.I , ~ '


I
I

, , I I

, , I
I

, , I
• • •
• • •
, , I

ROBO Site 3
• • •
..
• • •
·--"" ,
.. .. ..
Centralized
Data Center
I
I
... - ~

------------------
I

I vSphere Q___v_S_A N_ ____,,

486
11-53 Two-Node Direct Connect vSAN
Clusters
Two-node direct connect vSAN clusters are intended for small site configurations w ithout a
physical network switch at the remote site. You can connect both data nodes of the remote site
cluster using a direct network cable.

A two-node direct connect configuration further reduces the cost of deploying a two-node
cluster. It eliminates the need to configure a physical switch at each remote site for local
connectivity between the hosts of a two-node cluster. It also eliminates physical networking
configuration overhead between the two hosts.

You can use the following command to define a VMkernel port that can be used for w itness
traffic:

esxc l i vsan ne t work ip add -i VMK# -T-witness

Two-Node Cluster

vSphere vSAN
------------------------ ·------1
I ESXi Host 1O Gb Cable ESXi Host I 1 ESXi Host
I 111 0 Ill :
vSAN Traffic I
1 1 1
1
Site 1 1 Site 2
.. _ _ _ _ _ _ I
-----------------------1
To set up a two-node direct connect vSAN cluster:

1. Connect both data nodes of the remote site cluster using a direct network cable.

2. Configure a separate network adapter to communicate w ith witness node over the WAN.

3. Configure static routes to allow communication between data nodes and the witness node.

487
11-54 Shared vSAN Witness Nodes
Multiple remote or branch office sites can share a common vSAN witness host to store the
witness components for their vSAN objects.

A single witness host can support up to 64 two-node vSAN clusters. The number of two-node
vSAN clusters supported by a shared witness host is based on the host memory.

A shared vSAN witness node has the fa llowing limitat ions:

• Does not support vSAN stretched clusters

• Does not support data-in-transit encryption

• • •
• • •
------------------I
• • •
..
... ·--
--
ROBO Site 1 ...
...
... -_ ..
~
I
I
I
I111 0 111 I 111 0 ~Ill I
I
vSphere vS AN ', I
• I
1
', Shared W itness I
• • •
:' ~ Appliance Nodes I

·--
• • •
• • • • • • I

RO BO Site 2
..
::: -





• I
.. .. ..- -- + - - - - - - I1- ~ + VM • •
• • •
• I
• • • I
I
I ~
, 1f • • •
• • •
I
I

I vSphere Q
___v_S_A N_ _____.I , ~
I , I
I

, , I I

, , I I
I

, , I
I I
• • •
, I I

·---"" , ,
• • •
• • •
I
..
• •
I Centralized
ROBO Site 3 •
. .. .. I I
.... - ~

I
Data Center I
------------------
1111 o 11111111 o 1111

I vSphere Q vS AN

488
11-55 Witness Node Locations
You can run a vSAN witness node at the following locations:

• On a vSphere environment with a VM FS datastore, an NFS dat astore, or a vSAN cluster

• On a public cloud environment backed by supported storage

• Any vCloud Air Network partner-host ed solution

• On a vSphere hypervisor (free) inst allation using any supported storage (VM FS dat astore or
NFS datastore)

489
11-56 Shared vSAN Witness Node Memory
Requirements
How many wit ness components that a shared vSAN witness node can support depends on the
allocated memory during t he appliance deployment . A best practice is to allocat e 16 GB and 32
GB memory for sharing between multiple two-node vSAN clusters.

The table lists the capacity for a shared witness appliance.

Allocated Memory Maximum Number of Components Maximum Number of Clusters


Supported Supported

>=32 GB 64,000 64

16 GB 32,000 32

8 GB 750 1

490
11-57 Shared vSAN Witness Node for a Mixed
Environment
The vSAN shared w itness node can be shared with multiple two-node vSAN clusters running
different versions of ESXi.

If the two-node vSAN cluster version is later t han the version of the vSAN shared witness node,
the witness node cannot part icipat e in t hat cluster.

• • •
• • • ~-----------------
• • •
• • •
ROBO Site 1 •


• •
• •
• •
--
vSAN [v7] • • • =~
I111 0 111 I Ill 0
' ,.Ill

vSphere ~~ vS AN '' ~ Shared vSAN


' ' I
W itness Node
'
I
• • •
' ~ • • •
• • •
VM • • •

, .,, • • •
• • •
• • •
• • •
,,
,~

, , , I

, ,


• •
• •
, ,
ROBO Sit e 2
vSAN [v6]


• •
• •
• • •
-- , , Centralized
• • •
Data Center
• • •
• • •
-- ------------------
I111 0 11111111 0 111 I
I vSphere Q......__ _v_
S_A N_ ____,

491
11-58 Configuring a Two-Node vSAN Cluster
You can use the Cluster Quickstart guide to configure a two-node vSAN cluster. However, the
witness host must be added to the data center, not to the cluster, before starting the w izard.

Configure cluster Advanced options x


Customize the cluster settings.
1 Distributed switches
> 11Sphere HA
2 Advanced options
> 11Sphere DRS

3 Claim disks v 11SA N Opt ions

Doploymont type Two node 11SAN duster v


4 Proxy settings
Data-At-Rest encrypuon
5 Select witness host

6 Claim disks for witness host Data· In- Transit encryption

7 Review Space off1acncy None

t on

Large scale duster rupport

> Host O ptions

> Enhanced 11M0Uon Compatibihty

C AN CE L B NEXT

492
11-59 Review of Learner Objectives
• Explain the two-node vSAN cluster architecture and use cases

• Describe the purpose of the vSAN shared witness appliance

11-60 Key Points


• A vSAN stretched cluster spans three sites to protect against site-level failures.

• Only data site hosts contribute to the cluster compute and storage resources.

• The witness host stores only the w itness components for vSAN stretched cluster VM
objects.

• vSAN stretched cluster-aware VM storage policy rules are available to determine fault
tolerance.

• Two-node vSAN clusters are suitable for running a small number of workloads that require
high availability.

• Shared vSAN w itness nodes can participate in multiple two-node vSAN clusters.

Questions?

493
494
Module 12
vSAN Cluster Monitoring

12-2 Importance
You must regularly monitor the health and pert ormance of a vSAN environment. Doing so
enables you to make t imely decisions and to avoid any unwanted events that might affect the
pertormance of your virtual infrastructure.

vSphere Client provides several tools to enable you to monitor the health and pert ormance of
the vSAN cluster.

12-3 Module Lessons


1. vSAN Health Monitoring

2. vSAN Pert ormance Monitoring

3. vSAN Capacity Monitoring

495
12-4 Lesson 1: vSAN Health Monitoring

12-5 Learner Objectives


• Describe how the Customer Experience Improvement Program (CEIP) enables VMware to
improve products and services

• Use proactive tests to check the integrity of a vSAN cluster

• Use VMware Skyline Health to monitor vSAN health

496
12-6 About CEIP
CEIP helps VMware improve its products and services by regularly collecting anonymized,
technical information about VMware products from your organization.

The technical information that is collected includes any or all the fallowing types of data:

• Configuration

• Feature use

• Pert ormance

• Product logs

497
12-7 Joining CEIP
To enable CEI P from vSphere Client, select Menu > Administration > Deployment > Customer
Experience Improvement Program > Join.

vm vSphere Olent '.'• v 0, . c Gv . ·1· . l' I . .',.,. • ' . v r' .;)

Admi...Utr1tion

ActHS Corbel > Customer Experience Improvement Program LEAVE PROGRAM

> Parbc1pating on VMw111e·s Customer E•penence ~provement Program (CEIP) enables VMware to provide you w111l a proactive, ret.able, and cons1sten1 vsphere env1ton<nen1 and e•per1ence e.amples of such
enhancernenlS can be seen 1n Ille lollowong features·
Solutions > • s«yt1ne HealIll for vsphere
• Skyline Heallll for vSAN
Otployment
• vCenler Server Update Planner
System Conr.gurai.on • vSAN Performance Analyucs
• Host l-larclware CompaLb1hty
• vSAN Support insight
Support > CE P collects conr.gurai.on, feature usage. and performance 1nlormallon. No per$0nally 1denbfoable 1nlormatoon s co!ected. All data 1s sanouzed and obfuscated pnor to being recer.'ed by VMnare
Cortific•l•s > Data col e<:Uon can be roabled or disabled at any ume and uses port 443 for commurncauon to vu1.vmw1re.com.

> If your organization blOcks Ille access to the internet by a lweY1a , you cans~ use \'Sphere Heallll Service ~can configure an HTTP pro• y tot1ow1ng Ille steps descnbed he<e

For add1t1ona infomiai.on on CEIP and the data collected, please see VMware's Trust & Assurance Center

Internet connection status: Connected

~IP_r_o_gr_•_m_s_1_•t_us_=___________~__J_o_inoc1
____1 LEAVEPROGRA~
La!Ht Payload: /storage/anatytJCs/Cl!lp/aud1t AU01T OATA

Nodt T

sa·vcsa·01.vclass.IOcal Opro VA Mt

498
12-8 Running Proactive Tests
You can use proact ive tests to check the integrity of your vSAN cluster. These tests are useful
to verify that your vSAN cluster is working properly before you place it in product ion.

lOl vSAN-Cluster ACTI ONS v

Summary Monitor Configure Permissions Hosts VMs Datastores Networks Updates

vSAN v Proactive Tests


Skyline Health
For storage performance test, use HCIBench . HCIBench is a storage performance testing automation tool that simplifies
V1r1ual Objects
and accelerates customer Proof of Concept (POC) performance testing in a consistent and controlled way. VMware
Physical Disks vSAN Community Forum provides support for HCIBench.
Resyncing Objects

Proactive Tests
I I
RU N ASK VMWARE 1:3'
'

Capaci1y Name Last Run Result Last Run Time

Pertormance
VM Creation Test CD Passed 09/09/2020, 7·44:19 AM
Pertormance Diagnostics

Suppori 0 Ne1work Performance Test © & Warning 09/09/2020, 7:47:19 AM

499
12-9 VMware Skyline Health
VMware Skyline Hea lth is the primary and most convenient way t o monitor vSAN health.

VMware Skyline Hea lth provides you with findings and recommendations to resolve problems,
reducing the time spent on resolving issues.

lf:lJ vSAN-Cluster ACTIO NS v

Summary Monitor Configure Perm1ss1ons Hosts VMS

Issues and Alarms > Skyline Health


last checked· 09/08/2020, 6:50·4B AM RE TES T
Performance >
> Online healt h
Tasks a1n d Events >
> Network
VSphere DRS >
Resource Allocation > > Physical disk
Utilization
> Data
storaqe overview

Secuntv > Cluster


vSAN v
> Capa city uti lization
Skyline Health

Virtual Ob1ects > Hardv~are compatibility


Physical Disks
> Pertorman ce service
Resync1no Ob1ects
Proactive Tests > vSAN Build Recomme ndation
Capacity
Performance
> Hy p erconverged cluster config uration
compliance
P,erformance Diaqnostics
Support

500
12-10 Online Health
Online hea lth includes vSAN Support Insight and Skyline Advisor.

vSAN Support Insight help vSA N users maintain a reliab le and consistent compute, storage, and
network environment. This feature is available when you join CEIP.

Skyline Advisor, included w ith your Premier Support cont ract, enhances your proactive support
experience with additional features and functiona lity, including automatic support log bundle
transfer with Log Assist.

Skyline Hea lth


Last checked: 09/08/2020, 7:00:27 AM RE T ES T

v Online hea lt h

VSAN Support Insight

0 Advisor

501
12-11 VMware Skyline Health: vSAN Cluster
Partition
To ensure the proper operation of vSAN, all hosts must be able to communicate over the vSAN
network. If they cannot, the vSAN cluster splits into multiple partitions.

vSAN objects might become unavailable until the network misconfiguration is resolved .

Skyline Health
Last checked: 09/0 8/2020. 7:00:27 AM RETEST
IVSAN cluster partition I [ SILENCE ALERT

Partition list Info


IvNetwork I
Host Pa rtition Ho st UUIO
O Hosts w ith connectivity issues
B
[$ vSAN c luster pa rtition > sa-esxi-01 vclass.local 1 51116d50· 2e00·61 e8-656t>-00505601 dSbd

O A ll hosts have a vSAN vmkn1c con .. B sa-esxi-02.vclass local 1 51116e6a· 7 c37 -a495·8d4 2·00505601dS cs

O Hos ts disconnected from vc EJ sa-esxi-03.vclass local 1 51116171 ·935e-41 b3·6919·00505601dSca

O vSAN: Basic (unicast) connecllvity •. EJ sa-esxi-04.vclass local 1 511161c3·10c2· 7 d81·4 7 a9-00505601 d5c1

• VSAN: M T U check (ping With large•.

502
12-12 VMware Skyline Health: Network Latency
Check
The network latency check looks at vSAN hosts and reports warnings based on a threshold of 5
milliseconds.

If this check fails, check VMKNICs, uplinks, VLANs, physical switches, and associated settings to
locate the network issue.

Sky line Health


Ll!51 che<~ct 09/08/2020. 7.00:27 AM RETEST
l Network latency check l I SILENCE ALERT
Only failed ne1work 1a1ency check results Ne1work Latency among hosts Info
l v Net work I From Host To Host Network latency (ms) Threshold (ms) Network latency check result
O Hosts wi1h connect1vi1y issues

O VSAN duster par1i1ion 0 sa-esxi-02 vclass local 0 sa-esxi-03.vclass.local 0.93 5

O All hosts have a VSAN vmknic co.• 0 sa-esxi-02 vclass local D sa -esxi ·Ol vclass local 0 .98 5

O Hosts disconnected tram VC 0 sa-esxl-02.vclass.local 0 sa-esxl-04.vclass.local 1.29 5

O VSAN: Basic (unicast) connectivlt ..


0 sa-esxl-04.vclass local 0 sa-esxl-02.vclass.local 0.94 5

O VSAN: MTU check (ping with lar...


(j sa-esxi-04 vclass local (j sa-esxi-01 vclass local 066 5
O vMotlon: Basic (unicast) connect I .
(j sa-esxi-04 vclass local LI sa-esxi-03 vclass.local 0 .76 5
O vMotlon . MTU check (ping with I..
Ll sa-esx1-01.vc1ass.1oca1 LI sa-esxl-03.vc1ass.1oca1 0.76 5
lo Network latency check
> 0 sa-esxi-01.vclasslocal 0 sa-esxi-04 vclass.local 0.72 5
> PhyStcal disk
0 sa-esxi-01.vclass local [J sa-esxi-02 vclass.local 1.25 5
> Data
0 sa-esxi-03 vclass local 0 sa -esxl-02 vclass.local 2 24 5
> Ouster
0 sa-esxl-03.vclassJocal 0 sa-esxl-04.vclass.local 0.78 5
> Capacity utilization
0 sa-esxl-03.vclass local 0 sa-esxl-01.vclass.local 0.69 5
> Hard~vare compatlbllity

503
12-13 VMware Skyline Health: vSAN Object
Health
This check summarizes the health state of all objects in the cluster.

You can immediately initiate a repair object action to override the default absent component
rebuild delay of 60 minutes.

You can also purge inaccessible VM swap objects.

Skyline Health IvSAN object health I


Last checked· 09/0B/2020, 7:0027 AM RE TES I
Overview Info
> Physical disk

I v Data I Healt h/ObJ&et s Obj&et count Object s UUID

~·O~vSA
~N_o_b_je_ct_h_ea_lt_h~~~~~>
~ Healthy 13 9b50575f-56fc-28ae·6258-00505601d5c5.
0 vSAN object fetmat health

504
12-14 VMware Skyline Health: Time
Synchronization
This check looks at time differences between vCenter Server and hosts. A difference greater
than 60 seconds leads this check to fail.

If this check fails, you should review the NTP server configuration on vCent er Server and the
ESXi hosts.

Skyline Health I Time is synchro nized across hosts an d VC I I SILE NCE ALERT I
Last checked 09/oe/2020, s 52 .10 AM RE TES T
Result Info

I v Cluster
I Host Ti me difference wit h VC NTPServ1ce Enabled Stat us Reason

[0 Time is synchronized across host... ? 0


0 Advanced VSAN configuration ins__
- sa-esxi-02.vclass.local + 10 s Yes 0
0 sa-esxi-04.vclass.local + 10 s Yes 0
0 VSAN daemon liveness
0 sa-esxi-01.vclass.local +10 s Yes 0
0 VSAN Disk Balance
0 sa-esxi-03.vclass.local + 10 s Yes 0
0 Resync operations throttling

505
12-15 VMware Skyline Health: vSAN Disk
Balance
This check monitors the balance state among disks. By default, automatic rebalance is disabled.

When automatic rebalance is enabled, vSAN automatically reba lances disks if a difference
greater than 30% usage is found between capacity devices.

Rebalance can wait up to 30 minutes to start, providing time for high-priority tasks such as Enter
maintenance mode and object repair to complete.

Sky line Health


Last ch&Cked: 09/08/2020, 5:38:34 PM RE TES T
I vSAN Disk Balance I [ SILENCE A LER T I
Overview Disk Balance Info
Iv Cluster I
0 Time is synchronized across hosts -· CO NFIGUR E AUTOMATIC REBALAN CE

0 Advanced VSAN configuration ins_. Metric Value

O VSAN daemon liveness Average Disk usage 4%

[~O_v_SA_N_o_
~k_s_
aia_nc_e~~~~? Maximum Disk usage 5%

Maximum Load vanance 3%


0 Resync operations throttling

Average Load Variance 1%


0 VCenter state 1s author1tat1ve

506
12-16 VMware Skyline Health: Disk Format
Version
This check examines the disk format version. For disks with a format version lower than the
expected version, a vSAN on-disk format upgrade is recommended to support the latest vSAN
features.

vSAN 7.0 U1 introduces on-disk format version 13, which is the highest version supported by any
host in the cluster.

Skyline Health l
Last checked· 09/0S/2020, S38:34 PM RE TES T
IDisk format version I [ SILE NCE ALE RT

F Cl uster I De tailed vSAN d isks fOfmat st atus Info

O Time 1s synchronized across hosts ...

0 Advanced vSAN configuration ins... vSAN host Oisks w 1lh older fo rmat Cheek Result Recommenclal lon

O vSAN daemon liveness [I sa·esx1·02.vclass.local 0/2 OK

0 vSAN Disk Balance 0 sa·esx1·04.vclass.local 0/2 OK

0 Resync operations throt thng LI sa-esx1·01.vclass.local 0/2 OK

0 vcenter state 1s authontauve


0 sa-esx1·03.vclass.local 0/2 OK

O vSAN cluster configuration consist ...

$ vSphere duster members match v._

0 Software version compatibility

[0 Disk 'format version

507
12-17 VMware Skyline Health: vSAN Extended
Configuration
This check verifies the default settings for the Object Repair Timer, site read locality, customized
swap object, and large-scale cluster support.

For hosts with inconsistent extended configurations, vSAN cluster remediat ion is recommended.

The default clusterwide setting for the Object Repair Timer is 60 minutes. T he site read locality
is enabled, customized swap object is enabled, and large-scale cluster support is disabled.

Sky line Health


Last checked: 09/08/2020, 5:38:34 PM RE TES T
I vSAN extended con figuration in sy nc I
vSAN host exten ded con figur ation status Info
Iv Cluster I ...
RE-ME- IA E- INCO~ s ST NT co~.r- G RA ON
$ Time is synchronized across hosts ...
vSAN host O bJec t repair t imer Site read locality Custo mized sw ap o bject Large scale cluster supp ort
$ Advanced vSAN confi guration ins...
LJ sa·esxi-02.vclass.local Match Match Match Match
$ vSAN daemon liveness

$ vSAN Disk Balance


0 sa-esxi-03.vclass.local Match Match Match Match

$ Resync oper ations throttling 0 sa-esxi-01.vclass.local Match Match Match Match

$ VCenter sta te is authoritative 0 sa·esxi-04.vclass.local Match Match Match Match

$ vSAN cluster configuration consist...

508
12-18 VMware Skyline Health: vSAN
Component Utilization
This check examines component utilization for the entire cluster and each host. It displays a
warning or error if the utilization exceeds 80% for t he cluster or 90% for any host.

The deployment of new VMs and rebuild operations is not allowed if the component limit is
reached.

Skyline Hea lth I l


Last cnecked. 09/08/2020, 5.38.34 PM RE TES T
I Co mponent ! SILENCE ALERT

Cluster level Host level In fo


v Capacit y ut ilization ...

$ D1skspace Ho st Typ e Ut ili za tion Healt h

$ Read cache reservations El sa·esxi-04.vclass.local Normal 1% (11 of 750)

I& Component ?_ 0 sa-esxi-03 vclass local Normal 1% (10 or 750)


$ What if the most consumed host fa ... ll sa-esxi-02.vclass.local Normal 1% (9 or 750)

> Hardware compat ibility 0 sa-esx1-01.vclass.local Normal 1% (9 or 750)


.

509
12-19 VMware Skyline Health: What if the Most
Consumed Host Fails
This check simulates a failure of the host with most resources consumed and then displays the
resulting cluster resource consumption.

Skyline Health
Last checked: 09/0S/2020, 5 3834 PM RE TES T
!w hat if the most co nsumed host fails I [ SILENCE ALERT ]

Cluster level Info


v Cap acity ut ilization

9 Disk space Resource Ut il izat10n Health

9 Read cache reservations Component utilization 1% (39 of 2250)

9 Component Disk space utilization 5% (8.1GB of 150.0GB)

I9 What if the most consumed host ... > Read cache reservations 0% (O.OGB of O.OGB)

510
12-20 Review of Learner Objectives
• Describe how the Customer Experience Improvement Program (CEIP) enables VMware to
improve products and services

• Use proactive tests to check the integrity of a vSAN cluster

• Use VMware Skyline Health to monitor vSAN health

511
12-21 Lesson 2: vSAN Performance
Monitoring

12-22 Learner Objectives


• Use performance v iews t o access metrics for monitoring vSAN clusters, hosts, and VMs

• Explain how the writing of data generates I/ 0 traffic and affects vSAN performance

• Use performance metrics to analyze the vSAN environment

512
12-23 vSAN Online Performance Diagnostics
The vSAN online performance diagnostics tool collects performance data and sends it to
VMware for diagnostic and benchmarking purposes. VMware analyzes the performance data
and provides recommendations.

fCJJ vSAN-Cluster ACTIONS v

Summary Mon11or Conhgure Permissions Hosts VMs Oatastores Networks Updates

Tis ks ind Events > Perform ance Diagnostics

Perlormance d1agnos11cs analyzes previously execu1ed benchmarks It de1ects issues. si..ggests remed1a11on steps, and provides supporltng perlormance graphs for lur1her
1nsigh1. Select a desired benchmark goal and a ume range dunng which 1he benchmark ran. The analysis m ght 1ake some tme. depending on the duster size and the time
range chosen. '111s feature 1s not expected to be used for general evaluation ol performance on a production VSAN cluster. Learn more g
Utihzation
StoraQe C>velV ew
IBenchmark goal: I MAX IOPS v Time Range: LAST v 1 Hour(s) SHOW RESUl TS

Secunty MaxlOPS
VSAN v l5iue Max Throughput T More In lo

skyline Health Min Latency


Virtual Ob1ects
Physical Disks
Resvnc1nq Ob1ec ts
Proac~ve Tes ts
Oltems
Capacity
Performance
Pe<lormance D1aanost1cs
Support
Data M1qrat1on Pre-check

513
12-24 vSAN Performance Service
The vSAN performance service monitors performance-based metrics at the cluster, host, VM,
and virtual disk levels.

The pert ormance service is enabled by default. The performance history database is created
and stored as the StatsDB object in the vSAN datastore.

1LJJ vSAN-Cluster AcT10 NS v

Summary Monitor Configure Permissions Hosts VM s Data stores N etwork s Updates

SMviC&5 > VS A N Services TURN OFF VSAN

Configw ation > Space Effie.ency None EDIT

Licensing > > Oata·A t·Rest Encryption Disabled • ' EDIT


Trust Authontv
> Oata·ln· Transit Encryption Disabled EDIT
Alarm Def1111~ons

Scheduled Tasks v Performance Service Enabled :: ·

vSAN v
Stats object health O Healthy
SeMces
Stats Object UUI O 93f14a51·ce5d·20e8· 44f6·00505601d5c5
Disk Manaqemenl
Fault Domains suits object storage porcy vSAN Default Storage Policy

Datastore Shannq
Comp iance status O Compliant
Verbose mode Disabled

Network diagnostic mode Disabled

Be aware that the vSAN performance service object is not directly mounted into VMs and will
be listed as unassociated. This does not mean they are unused or safe to be deleted.

514
12-25 About 1/0 Impact on Performance
When analyzing performance, you must consider the sources of 1/0 traffic.

Front-end storage traffic is generated by VM storage I/ 0 traffic.

Back-end storage traffic is generated by the following sources:

• vSAN Objects redundancy

• vSAN Objects rebuild

• vSAN Objects resync

VM VM VM VM

vSphere vSAN

I
I
Back-End
Storage Traffic •• 1111 I ·: 1111 J • • • .. 1111
•... l ............. , ........................ l ..........•
I I I
I • • • • • • • • • • • • I
• SSD SSD SSD SSD • • • SSD SSD 1
I •
~~
• • • • • • • • • • • I
I I
I I

:I-------
I I
I
I
I
vSAN Datastore - - - - - - - I
1 I
•••••••••••••••••••••••••••••••••••••••••••••••••••••••

515
12-26 About vSAN Cluster Metrics
In addit ion to standard cluster pert ormance metrics, vSAN clusters record st orage I/ 0 met rics
for bot h VM and vSAN back-end traffic.

You can view VM 1/0 metrics at t he fo llowing levels:

• Cluster

• Specific VMs

Perfo rmance

Vht BACKE ND IOI NS IGHT

CLUSTER LEVE L METRICS v

0 Cluster level me1rics


0 Show specHic VMs
Select maximum 10

516
12-27 Cluster-Level Metrics for VMs
The chart displays cluster-level metrics from the perspective of VM I/ 0 traffic.

IOPSQ) Throughput©

o.s
27.50 t3/•

13 7S ~I•
-
0 0.00 81•
10·00 PM 10·1 S FM 10·30 P~I 11-00 FM 11):00 PM 11)·15 FM 11)·30 FM 11):45 FM 11-00 FM

- ~.d IOPS -wr ite IOPS - Read Thr0Mghpu1 -Write Throughput

Latency© Congestions©
IS 9$9 MS 10

',, ,.., s

Oms 0
IO:ISPM 10:30 PM 10:45 PM 11:00 PM 10:00 f>M 10:1SPM 10 30 FM 10.45 PM 11:00 PM
10:00 ' "'

- R6•d u 1en cy - W;itt U1tncy - <:on~ stions.

OutstandSlg 10 <D
13

10 00 PM 10.1 s ™ 10 30 PM 10.45 Pt.I 11.00 Fl.I

517
12-28 Back-End Cluster-Level Metrics
The chart displays cluster-level metrics from the perspective of the vSAN back end.

IOPS (J) lhroughput <D


• 40 45 l:B/1
:-::::========-----======---====--..;..
- _;:~:::;... ""Z: ::::= =--=-
2 20.23 ~ll/•

0 0.00 91•
10 00"" IOISN 10-45 , .. 11'.00PM 10 00"" 10:1 S PM 10 IO • M 10 4SPl.1 I LOO PM
10 ao ' "
ft.tad Throwghp111 - ,.syR< Rt•d Throughput Wr it• Thr oughp.,1 - Ritcowry Wrh• Throwghpvt

<D C..nge.llions <D

-------
uten<Y G .!'
10

I S4S m: s

0 ,., 0
10;00 FM 10:1 S f'M 10 ao"" 10:4S PM 1000 PM 10 15 PM 10 JO PY 10:4S PM ti OOFM

t.aad U t •t1<y - R.ryt'lc bad Latency Wrhe Lltency - RacowryWrite Latency ( 0119et1lon t

OutsUnd.-.g 10 <D
'

• 5
IOttl Pl.I 10 15 PM
10 ao ' " 104S PM 1100 PM

518
12-29 Throughput Comparison
Compare the charts for t he throughput metric of VMs and the back end.

Back-end throughput shows higher values when compared with VM throughput because 1/0 is
generated for writing data to mirror copies and object repair or rebuild traffic.

VMs

Throughput©
21 SO

13 lS
~/S

~/s
~~--~~--~~~~-. ---- ---------------
0.008/s
1000 PM 10 1S FM 10 30 PM II ~0 PM

- R.!ad Thr oughp111 - Writ! Throughput

BACKE ND

Ttlroughput (j) c. .. ,

20 23 llJh

0001/s
lOOO F\I 10. ISPM 10.30 PM IO~S Fl.I 11 .0 0 FM

- ~•d Thro119hp111 • ~'fl'C ~•d Thro119hp•t W1110 Thro119hp111 - ~cowryWrot• Throughput

519
12-30 IOlnsight
IOlnsight captures 1/0 traces from ESXi and generates met rics that represent the st orage 1/0
behavior at the VM DK level.

The IOlnsight report contains no sensitive information about t he VM applications.

To start IOlnsight, select t he vSAN cluster and select Monitor > vSAN > Performance >
IOINSIGHT > NEW INSTANCE.

NEW INSTAN CE

520
12-31 Preparing an IOlnsight Instance
You select a VM or host to monitor all VMDKs associated with them.

Name the IOlnsight instance, and select the duration to run (default is 10 minutes).

The system limits IOlnsight monitoring overhead of CPU and memory to less than 1%.

New IOlnsight instance Review x


©
1 Select targets
The JOtnstght Instance will star1 running Immediately
- I
Run VSAN-Cluster. 09/09/2020, 2:34 AM
2 Settings name

Duration 10 m nutes

Target

> [] sa·esxr-03.vclass.local (1of1 VMs)

> D sa·esx1·01.vclass.local (1of1 VMs)


> D sa·esx ·0 4.vclass.local (1 of 1 VMs)
> tl sa·esx1·02.vclass local (1 of 1 VMs)

CAN CEL BA CK Ff N ISH

521
12-32 Viewing IOlnsight Instance Metrics
After the IOlnsight instance completes the collection, you can view detailed disk-related metrics.

[[]I vSAN-Cluster ACTIONS v

Summary Monitor Configure Permissions Hosts V MS Datas tores Networks Updates

S1orage Overview Perf or111ance


Security
[ VM I BACKE ND -
vSAN v
NEW INSTANCE
Skyline Heal1h

Virtual Objects 0, Search instance

Physical Disks Time Range: ALL v

Resyncing Objects

Proactive Tests

Capac11y
v SA N-Cluster, 09/09/2020, 2:34 AM I C.Ompleted) •

Per1ormance

Per1ormance Diagnostics
09/09/2020, 2:48AM - 09/09/2020, 2:58AM (10 minutes) I View Metrics

Rename
I
Support
Rerun
Da1a Migration Pre-Check
Delete
Cloud Native Storage v

522
12-33 Host-Level Metrics for Disks
The DISKS tab displays performance metrics at disk and disk group levels.

You can use the Disk Group drop-down menus t o select individual cache or capacity disks, or
the entire disk group.

LI sa-esxi-0 1.vclass.local ACTIONS V

Summary Monitor Configure Permissions VMS Datastores Networks Updates

Issues and Alarms > Perf ormance

Pe rformance > VM BACKf N D I- DISKS I PHYSICAL ADAPTERS HOST N ETWORK IOINSIGHT

Tasks and Events > Time Range: LAST .., 1 Hour(s) [ SHOW RESULTS I
Hardware Heal1h

vSAN v l o isk Group: I 52a435e2-a53a·d4e2-36e4-0c1ec9496dcO v whole group


whole group
Pertormance Metrics about disk groups. Local VMware Disk (mpx.vmhbaO:CO:TI:LO)
Skyline Heal1h Local VMware Disk (mpx.vmhbaO:CO:T2:LO)

523
12-34 Host-Level Metrics for the Cache Tier
The Write Buffer Free Percentage chart indicates the amount of free capacity on the cache tier.

As the buff er starts to fill up, the destaging rate increases. This increase is shown in the Cache
Disk De-stage Rate chart.

If the write buffer-free percentage is less than 20%, artificial latency (congestion) is introduced
to slow down the incoming data rate.

t ;5 AM 1 50 A ! ~ 05 AM 2 20AM 2 ;5 AM

-
-
!\fl:.. t uf-f.;r Ftc• : .;,.:.t n:_.:_:-:

Cache Os De-stage Rate

12.9i i ~

000;'
1 as At.t 1 50 AJ.l 2 OS AU 2 ZOAU 2 2SAM

524
12-35 Host-Level Metrics for Resync
Operations
Use the Resync IOPS, Resync Throughput, and Resync Latency charts to observe the impact of
resync operations on a disk group.

The charts display metrics for the following resync operation types:

• Policy change

• Evacuation

• Rebalance

• Repair

Resync OPS

90

1 S.O AM

-
-
F-a
Po
~

~
Ch• ri•
Cll.in; '
~;i;_:

t
-

-
~ -. :u• t10-
Ii - t ••IJ"
~Q• =

it• -
~; ::- -.l •nt• :i ;i;;;t

F•b ~ , rt . . .. ,..
• .. • tl"
.

2 nMi ~

0 00~ s
1 6> .UI 1 5.. AM 2~5 Ad 2 20AIA 2 as .ui
- :o cy C"z"i5 ii . :ul - iv on r lid
:ull" ll&b:i: 1!!-:0 lt ;1: q• r z,.. ~t:id
- ;~ cy C" z~~· n tc - :l~:::m,,., ss ::n ,\ntt - ~•t1' ll":: a \\r tt - " •: r lint•

Resync Latency

0 m:.
l i5 A 205 AM ;: .ZO All 2 ~5 AM

- Fo ')Chat>; ; · ~• = - k • ::11 2• o,. ~~ • ., :i.~ ~ :.l ;nc; ~ ;; d : Q:; :. ir P;:o.,


- Po -:y Ch.in; : .. ntc - It, :.c ... :t•~" .'1t1to - lt; b:a' :tn:o , r ! & - : : :. ••'•flt;

525
12-36 Host-Level Metrics for Network
Performance
Net work t hroughput is important to the overall health of the vSAN cluster.

The PHYSICAL ADAPTERS and HOST NETWORK tabs enable you to monitor physical NICs
and VMkernel adapters, respectively.

Performance

V IYt B A CKE N D DISKS PHYS I CA L A DAPTERS HOST l'I ET\NORK IOINSIGHT

Time Range: LAST v 1 Hour(s) [ SHOW RE SUL TS ]

Network: Host Network v

Metrics for vSAN Host Network.

The perforn1ance statistics count all netvvork IOs processed in the network adapters used by vSAN .

vSAN Host Network 1/0 Throughput©


7.05 KB/s

3 52 1<13/s
9 {9 /20, 11; I 5 PM 9 f9 {20, 1 I :30 PM 9/9/20, I 1·45 PM 9 /I 0 f20, 12:00 A tu1 9(10/20,

- Network. Inbound Throughput - Network. Outbound Throughput

526
12-37 VM Metrics
The VM tab shows the IOPS, throughput, and latency statistics of individual VMs.

The VIRTUAL DISKS tab shows metrics for each individual disk on the selected VM.

The Virtual Disk drop-down menu lists all the disks that you can select from.

& sa-vm-0 1 ACTIONS v

Sum rnary Monitor Configure Permissions Datas tores Net\vorks Snapshots Updates

Issues a n c:I A larms > Perfo rmance

Performance > VM VIRTUAL DISKS

Tasks anc:I Events > Time Range: LAST " 1 Hour(s) ( SHOW RESULTS I
Utilization

v SAN v Virtual Disk: Hard disk 1 v

Physical disk placement Metrics f or virtual disks.


Pertorman ce

527
12-38 Review of Learner Objectives
• Use performance views t o access metrics for monitoring vSAN clusters, hosts, and VMs

• Explain how the writing of data generates I/ 0 traffic and affects vSAN perf ormance

• Use performance metrics to analyze the vSAN environment

528
12-39 Lesson: vSAN Capacity Monitoring

12-40 Learner Objectives


• Use vSphere Client to monitor the vSAN capacity utilization

• Identify sources of the vSAN datastore capacity utilization

529
12-41 Capacity Usage Overview

Information in the capacity usage overview includes:

• Used ("actually w ritten") space at the capacity t ier

• Free space on disks

[DJ vSAN-Cluster A CTIONS V

Summary Monitor Configure Permissions Hosts VMS Datastores Networks Updates

v
Capacity
vSAN

Skyline Health CA PA CITY US A GE CAPAC ITY HISTORY

Virtual Objects

Physical Disks Capacity Overview


Resyncing Objects

Proactive Tests
• Used 8.15 GB/199.97 GB (4.07%) Free space on disks 191.82 GB CD
Capacity

Perlormance 8 Actually written 8.15 GB (4 .07%)

530
12-42 Capacity Usage with Space Efficiency
Deduplication and compression savings provide an overview of the space savings achieved.

Capacity

CAPA CITY US A GE C A PA CITY HISTORY

Capacity Overview

• Used 19.78 GB/199.99 GB (9.89%) Free space on disks 180.21 GB ©

8 Actually written 19.78 GB (9.89%)


Deduplication and compression saVJngs: 3.80 GB (Ratio: 1.41x)

531
12-43 Usable Capacity Analysis
The Usable capacity analysis panel enables you to select a different storage policy and see how
this policy affects the available free space on the datastore.

The effective free space is half of the free space on disks when the vSAN default storage policy
is selected.

CAPACITY USAGE CAPACITY HISTORY

Capacity Overview

• Used 8.15 GB/199.97 GB (4 .07%) IFree space on disks 191.82 GB I©


• A ctually w ritten 8 .15 GB (4.07%)

You can enable operations and host rebuild reserve_ CONFIGURE

Usable capacity analysis

Use this panel to estimate the effective free space to a new \ivorkload with the selected
storage policy (not considering deduplication and com pression). ©
Change policy t o vSAN De1aul1 S1orage Policy v

Effective free space with the p olicy: 95.91 GB

532
12-44 Capacity Usage Breakdown
The capacity usage breakdown section provides detailed information about the type of objects
or data that are consuming the vSAN storage capacity.

Usage breakdown before deduplication and compression

usage by categories COLLAPSE AL L

v • VM 3.25 GB (13.75%) VM objects

V MDK 144.00 MB (4.33%)


Primary data 48. 00 MB
Replica usage 96.00 MB

V M home objects (VM namespace) 2.96 GB (91.34%)


swap objects 144.00 MB (4.33%)
Total usage
v • System usage 20.44 GB (86.63%) 23.69 GB
Performance management objects 1.47 GB (7.2%)

File system overhead 5.95 GB (29.1%)

Checksum overhead 2.44 GB (11.94%)

Deduplication and compression overhead 10.58 GB (51. 76%)

System usage

533
12-45 Capacity History
The CAPACITY HISTORY tab displays changes in t he used capacity over a selectable date
range.

You can use the tab t o extrapolate future growth rates and capacity requirement s.

Capacity

CAPACITY USAG!; CAPACITY MIS TORY

The capac·ty usage charts for a given per'od of time.

Date Range: LAST v 1 Day(s) I SHOW RE:SULTS )

overview
199 97 CB

99 98 CB

0 008
9(8(20, 8 55 Al' 9/8(20, 2 55 pu 9/8/20, 8 55 PM 9/9(20, 2 55 AM 9/9(20, 8 55 AM

- Total Capacity - rree Capacity - Us ed Capac1iv

534
12-46 vSAN Capacity Reserve
You can enable vSAN capacity reserve for the following use cases:

• Operation reserve

• Host rebuild reserve

Enabling operation reserve for vSAN helps ensure enough space in the cluster fo r internal
operations to complete successfully.

Enabling host rebuild reserve allows vSAN to tolerate one host failure.

When reservations are enabled, and if capacity usage reaches the limit, new workloads cannot
be deployed.

Capacity

w CA PACITYH ISTORY)

Capa,c ity Overview

$ Used 19.78 GB/199.99 GB (9.89%) Free space on disks 180.21 GB ©

8 Actually w ritten 19.78 GB (9 .89%)


Deduphcat1on and compression savings: 3.80 GB (Ratio: 1.41x)

!You can enable operations and host rebuild reserve. If= ONFIGUREJ I

535
12-4 7 Lab 16: Monitoring vSAN Performance
and Capacity
Monitor the vSAN cluster performance and capacity details:

1. View vSAN Cluster Performance Metrics

2. View vSAN Disk Group Pert ormance Metrics

3. View the vSAN Storage Capacity Details

536
12-48 Review of Learner Objectives
• Use vSphere Client to monitor the vSAN capacity utilization

• Identify sources of the vSAN dat astore capacity utilization

537
12-49 Key Points
• VMware Skyline Health actively tests and monitors the vSAN environment.

• You must regularly analyze performance charts re lated to vSAN clusters, hosts, and virtual
disks.

• Monitoring vSAN capacity utilization is critical to maintaining a stable vSAN environment.

Questions?

538
Module 13
Troubleshooting Methodology

13-2 Lesson 1: Troubleshooting Methodology

13-3 Importance
The process of troubleshooting is key to restoring application functionality and performance, as
well as objects that have become inaccessible.

Understanding the PNOMA framework and troubleshooting methodologies enables you to


determine the best method to identify and correct issues and failures in a vSAN environment.

13-4 Learner Objectives


• Use a structured approach to solve configuration and operational problems

• Apply a troubleshooting methodology to logically diagnose f au Its and optimize


troubleshooting efficiency

539
13-5 PNOMA Troubleshooting Framework
VMware Global Support uses the PNOMA troubleshooting framework when t roubleshooting
vSAN issues.

I VPXD (vCenter Server) I


plication .._I_ _ _v_P_
XA_ _ ___.I ( VPXA J .._I_ _ _v_P_
X A_ _ ___.I

I Hosro I I Hosro I _I___Ho_s_r_o_ _ _I


·--------------------------------------------------------------------------------------------------------------------------------·
Management I 01sKL1s 1 osFso 1 vsANVPD I I 0 1sKL1s 1osFso1 vsANVPD I I 01sKL1s 1osFso 1 vsANVPD I
M ·--------------------------------------------------------------------------------------------------------------------------------·
0 Object I CMMDS I
·--------------------------------------------------------------------------------------------------------------------------------·
N Network Reliable Datagram Transport
·--------------------------------------------------------------------------------------------------------------------------------·
LSOM LSOM LSOM
LLOG I PLOG LLOG / PLOG LLOG I PLOG

p Physical Cache I I Cache Cache I I Cache Cache I I Cache

I: SSD :II: SSD :I I: SSD :11: SSD :J I: SSD :II: SSD :I I: SSD :II: SSD :I I: SSD :II: SSD :1 I: SSD :II: SSD :I
Disk Group Disk Group Disk Group Disk Group Disk Group Disk Group

The following vSAN components are listed in the PNOMA troubleshooting framework:

• Physical layer:

LSOM: The Logical Structure Object Manager (Disk Management).

LLOG: The logical log. LSOM uses LLOG for log recovery on reboot.

PLOG: The physical log.

The LLOG and the physical PLOG share t he write buf fer.

When a data block arrives in the write buffer, a corresponding entry for t he data block
is kept in the LLOG fo r log recovery on reboot.

However, aft er the data block is in the write buffer, vSA N must calculate where t o
place this block o f dat a on the magnetic disk when dest aging in hybrid configurat ions.
To calculat e, it consult s the fi lesystem on the magnetic disk. This placement process
could cause the filesystem t o generat e it s metadata updates. For example, the logical
block address t o physical locat ion mapping. The I/ 0 is int ercepted and buffered on the
SSD and a record in kept , t he PLOG. Aft er the physical locations for the dat a blocks
from t he filesystem are obtained, vSAN stores t he location in the PLOG. A t t his point ,
the LLOG entry is no longer kept.

540
For more information about the LLOG and PLOG, see vSAN Monitoring and

Troubleshooting at https://docs.vmware.com/en/VMware-vSphere/7.0/vsan-701-
monitoring-troubleshooting-guide.pdf.

• Network layer:

RDT: The Reliable Datagram Transport is the communication mechanism within vSAN. It
uses TCP at the transport layer. It also is responsible for creating and destroying TCP
connections (sockets) on demand.

An RDT association is used to track peer-to-peer network st ates w ithin vSAN.

• Object layer:

DOM: The Distributed Object Manager manages object access.

CLOM: The Cluster-Level Object Manager manages policy compliance.

CMMDS: The Cluster Membership, Monitoring, and Directory Services is the vSAN
record keeper.

• Management layer:

DISKLIB: The DISKLIB invokes object creation. vSAN objects created are vdisk,
namespace, vswap, or vmem.

OSFSD: The OSFSD, also called the OSFS-Daemon, is responsible for the object
creation and query tasks w ithin the vSAN filesystem.

VSANVPD: vSAN uses the vSANVPD or vSAN VASA Provider Daemon to expose
SPBM, RAID, fault-tolerance, object space reservation, and striping operations to
vCenter Server over port 8080. If the vSANVPD services are down, you cannot create
VMs or change policies for existing VMs.

• Application layer:

VPXD: If the VPXD, also know as the vCenter Server daemon service, is stopped, you
are unable to connect to vCenter Server through the vSphere Client.

VPXA: The VPXA, also called the vCenter Server agent, acts as an intermediary
between vCenter Server and hostd, allowing communication between the vCenter
Server and ESXi hosts. VPXA is the communication conduit to hostd, w hich in turn
communicates with the ESXi kernel.

HOSTD: HOSTD, also called the vmware-hostd management service, is the main
communication channel between ESXi hosts and VMkernel. HOSTD runs in the service
console and is responsible for managing most ESXi operations. It has v isibility to VMs
registered on that host, the LUNs, and VMFS volumes visible by the host.

If vmware-hostd f ails, the ES Xi host disconnect s from vCenter Server. If this


disconnection occurs, the ESXi host cannot be managed, even if you try to connect to
the ESXi host directly.

541
13-6 PNOMA vSAN Physical Layer

LSOM LSOM LSOM


LLOG / PLOG LLOG I PLOG LLOG I PLOG

Physical Cache I I Cache Cache I I Cache Cache I I Cache

I: SSD :II: SSD :I I: SSD :11: SSD :I I: SSD :11: SSD :1 I: SSD :II: SSD :1 I: SSD :11: SSD :1 I: SSD :11: SSD :1
Disk Group Disk Group Disk Group Disk Group Disk Group Disk Group

The vSAN physical layer includes:

• Host hardware

• Storage hardware

• Physical disks

• The disk management service (LSOM)

The LSOM independently runs on each host and manages data placement, access, and disk
health.

You can find the following information on the vSAN physical layer:

• Whether all hosts are in the cluster

• Whether all storage devices are in the cluster

• Whether the storage devices are at maximum capacity

• Whether best practices for sizing have been followed

• Whether hardware listed in the VMware Compatibility Guide has been used

• Whether validated and correct versions of drivers and firmware are being used

542
13-7 Activity: vSAN Physical Layer

LSOM LSOM LSOM


LLOG / PLOG LLOG I PLOG LLOG I PLOG

Physical Cache Cache Cache Cache Cache Cache

I: SSD :II: SSD :I I: SSD :11: SSD :I I: SSD :11: SSD :1 I: SSD :II: SSD :1 I: SSD :11: SSD :1 I: SSD :11: SSD :1
Disk Group Disk Group Disk Group Disk Group Disk Group Disk Group

What can go wrong at the vSAN physical layer?

543
13-8 Activity: vSAN Physical Layer Solution

LSOM LSOM LSOM


LLOG / PLOG LLOG I PLOG LLOG I PLOG

Physical Cache I I Cache Cache I I Cache Cache I I Cache

I: SSD :II: SSD :I I: SSD :11: SSD :I I: SSD :11: SSD :1 I: SSD :II: SSD :1 I: SSD :11: SSD :1 I: SSD :11: SSD :1
Disk Group Disk Group Disk Group Disk Group Disk Group Disk Group

What can go wrong at the vSAN physical layer?

• Host failure

• Offline controllers

• Offline disks

You can find information about the first three layers of PNOMA in VMware Skyline Health:

• Host failure: You can determine if all hosts are in the vSAN cluster and contributing.

• Networking: You can validate that all vSAN hosts are correctly communicating throughout
the environment.

• Physical disk: You can validate that vSAN storage devices are healthy and available. You
can also determine if the physical disks have reached capacity or if storage space is
available.

• Storage limits: You can view if your vSAN environment is fu ll at the disk or the cluster level.

You can also view the informat ion in VMware Skyline Health by using the following commands:

• You can verify whether all hosts are in the cluster from the command line by running the
esxcli vsan cluster get command.
• You can verify if all disks are in the cluster and healthy by running the es x c 1 i vs an
debug dis k summary ge t command.
• You can verify if any of the disks in the cluster are full or if limits, such as Max Components,
Used Disk Space, or Reserved Read Cache Size, are reached by running the esxc l i
vs an d ebug limi t ge t command.

544
13-9 PNOMA: vSAN Network Layer

Network I Reliable Datagram Transport ]

The vSAN network layer includes:

• Access ports or trunk ports

• Jumbo frames

• NIC teaming

• Unicast transmission

• Data flow

You can find the following information on the vSAN network layer:

• Whether the hosts can ping each other

• Whether all hosts are responding as cluster members

• Whether all hosts are in the appropriate subnet and VLAN

• Whether all hosts are using the same MTU size

• Whether the interfaces are tagged for vSAN

The network layer manages how host_ 1 talks to host_2, and network configuration. You can
use the esxcli vsan network l i s t command to list the VMkernel port being used for
vSAN.

You verify traffic in both directions to eliminate issues caused by a firewall, either by design or in
error.

To validate bidirectional traffic, use the vmkp ing - I command to specify the vmkO interface
and enter the IP address f rom one of the destination nodes. For example, vmkping -I
vmkO 10 . 21 . 21 . 181. The command response shows you if the communication path is
established.

For vSAN versions 6.5 and earlier, or environments using multicast, additional troubleshooting is
necessary.

545
13-10 Activity: vSAN Network Layer

Network I Reliable Datagram Transport ]

What can go wrong at the vSAN network layer?

546
13-11 Activity: vSAN Network Layer Solution

Network I Reliable Datagram Transport ]

What can go wrong at the vSAN network layer?

• Host networking misconfigurations such as IP, MTU, subnet, and VLAN

• Upstream networking issues such as physical hardware failure, power events, or routing
misconfigurations

• Network inconsistencies such as CRC errors, dropped packets, and buffer overrun or
underrun

Issues that are seen in the network layer typically happen because of configurations. When
troubleshooting at the network layer, test network configurations in the following order:

1. The local host

2. The VMkernel

3. The standard or distributed switch

4. The uplinks

5. The physical switch

Configuration example scenario:

You have a four-node environment with two uplinks configured for each ESXi host for
redundancy. One port is configured as an access port, which allows tagging for the one VLAN.
The second port is configured as a trunk port w ith no native VLAN. A situation occurred, causing
the ports to f lip from the access port to the trunk port and resulting in a vSAN outage because
communication to vSAN was now broken. The correction required that the ports be returned to
their original configuration, resulting in the recovery of vSAN.

547
13-12 PNOMA: vSAN Object Layer

Object
I CMMDS I
The vSAN object layer includes:

• DOM: Object availability, ownership, and rebuild

• CLOM: Object placement, policy compliance, and object health

• CMMDS: Tracking of vSAN hosts, disk groups, objects, network configurations, and policies

You can find the following information on the vSAN object layer:

• The number and types of affected objects, such as a VMDK or a namespace

• Whether the objects are on the same disk, disk group, host, or spread out in the vSAN
environment

• The type of policy assigned to the object, for example FTT=1, FTT=2, or FTT=3

• Whether the RAID assignment for the object is RAID 5 or RAID 6

• Whether the objects and components are showing as healthy

• 1/0 errors, specifically when powering on a VM


The following vSAN processes are part of the object layer:

• The DOM process handles the initial 1/0 requests that come from vSAN.

Example:

1. A VM must read or write to or from a file. The host recognizes that the VM is on vSAN
and passes the request to the DOM.

2. The DOM receives the request and walks through the path that allows the VM to talk to
the vSAN disks.

3. The DOM takes the read or write request and passes it to the correct party.

• The CLOM is responsible for ensuring that all objects understand their storage policy and
master storage policies. The CLOM ensures and validates that objects match with their
assigned policies, including faults to tolerate, policy compliance, and RAID levels.

• The CMMDS manages everything in the vSAN environment, including hosts, disk groups,
disks, objects, network configurations, policy configurations, and many other things. If the
object does not exist in the CMMDS, it does not exist in vSAN.

548
13-13 Activity: vSAN Object Layer

_I ~~~~-o_
oM~~~~--'• ~_I~~~~c_L_
OM~~~~-1
Object
I CMMDS I
What can go w rong at the vSAN object layer?

549
13-14 Activity: vSAN Object Layer Solution

Object
I CMMDS I
What can go wrong at the vSAN object layer?

• Host fails to enter maintenance mode

• Placement fails during object creation

• VM object is inaccessible

• Performance issues

The most common causes of failure and performance issues seen in vSAN environments are:

1. The leading causes of failure are drivers and firmware failures.

2. The second leading cause of failure is controller or disk hardware failure.

3. The third leading cause of failure is network configurations.

4. Incorrect RAID configuration can also cause performance issues.

Multiple failures can cause inaccessible objects.

For example, an administrator places an ESXi host into maintenance mode. During this period, a
disk fails on a different host within the vSAN cluster. These separate actions caused inaccessible
objects. The ESXi host, currently in maintenance mode, must be taken out of maintenance mode
to restore the inaccessible objects.

In a similar scenario where both hosts have disk or hardware failures, a double fa ult would occur,
which might result in data loss.

550
13-15 PNOMA: vSAN Management Layer

Management I 01 s KLIB / osFs o / vsANVPD I I 0 1s KLIB / osFs o / vsANVPD I I 01s KLIB / osFs o / vsANVPD I
The vSAN management layer includes:

• Object creat ion, such as vdisk and namespace objects through DISKLIB

• Task queries within t he vSAN filesystem through OSFSD

• Operation visibility to vCenter Server t hough VSANVPD

You can find t he following informat ion on the vSAN management layer:

• Accessible and inaccessible VMs

• Interactivity between vmkstools and virtual disks

• Lock files (uuid.lck) v isibility in the namespace fo lder

• VM power stat e

• Errors present in the vsanvpd.log

Issues specific to vSAN are identified in the management layer.

The management layer sees all st orage types in the same way.

For example:

• Applications do not see vSAN, they simply read from or writ e to an object, without requiring
an underst anding of vSAN.

• VMs simply use a simulat ed folder structure. Data st ored on vSAN does not resemble what
can be seen in the datastore browser.

• vSphere A PI for Storage Awareness API funct ions more at the applicat ion layer. vSphere
A PI for St orage Awareness A Pls are used by vCenter Server to t alk to storage appliances.

551
13-16 Activity: vSAN Management Layer

Management I 01 s KLIB / osFs o / vsANVPD I I 01s KLIB / osFs o / vsANVPD I I 01s KLIB / osFs o / vsANVPD I
What can go wrong at the vSAN management layer?

552
13-17 Activity: vSAN Management Layer
Solution

Management I 01sKLIB / osFso / vsANVPD I I 0 1s KLIB / osFso / vsANVPD I I 0 1sKLIB / osFs o / vsANVPD I
What can go wrong at the vSAN management layer?

• Inability to create a storage policy.

• Pert ormance service does not start.

• Inability to create a VM.

Storage policies are created in vCenter Server using the vSphere Client. Storage policies
created for objects on vSphere API for Storage Awareness appliances are passed from vCenter
Server to the ESXi host using vSphere API for Storage Awareness.

Various issues can prevent the creation of storage policies for objects on vSphere API for
Storage Awareness appliances. The most common of these issues are:

• Services associated with vSphere API for Storage Awareness did not start or are not

running.

• vSphere API for Storage Awareness is not or is incorrectly registered.

The Performance service is an object. If the Performance service does not start, validate the
Pert ormance service object by mounting the object to verify that the object is accessible.

If you are unable to create a VM, consider that VM creation also involves the PNOMA application
layer. Consider validating communications from VPXA to VPXD that occur between vCenter
Server and the ESXi server that will host the VM.

553
13-18 PNOMA: vSAN Application Layer

I VPXD (vCenter Server)


I
Application I VPXA I I VPXA J I VPXA I
I HOS TD I I HOS TD I I HOSTD I
The vSAN application layer includes:

• vCenter Server daemon service (VPXD)

• ESXi management agents (VPXA)

• vSphere API for Storage Awareness providers

When troubleshooting at the vSAN application layer, consider the following points:

• Verify the lower layers first.

• When hosts are nonresponsive, verify if the host services have been restarted.

• Verify if the vSphere API for Storage Awareness providers are registered.

• Verify if the local vSAN release catalog file is up to date.

The latest copy of the vSAN release catalog provides up-to-date information for VMware
Skyline Health checks, and for the drivers and firmware for the vSAN controllers.

554
13-19 Activity: vSAN Application Layer

I VPXD (vCenter Server)


I
Application I VPXA I I VPXA J I VPXA I
I HOS TD I I HOS TD I I HOSTD I
What can go wrong at the vSAN application layer?

555
13-20 Activity: vSAN Application Layer Solution

I VPXD (vCenter Server)


I
Application I VPXA I I VPXA J I VPXA I
I HOS TD I I HOS TD I I HOSTD I
What can go wrong at the vSAN application layer?

• Hosts appear as nonresponsive (hostd).

• VMware Skyline Health check reports alarms.

• vSAN objects are left behind after delete operations.

If a host is not responding in vCenter Server, verify that hosted VMs can still access their
associated storage. When a host is not responding to vCenter Server, it does not mean that it is
absent from the vSAN cluster. vCenter Server is not required for vSAN to work. The benefit of
vCenter Server for vSAN is the ease of operation. Communication issues from vSphere to
vCenter Server do not mean that the host is not working from a vSAN perspective. If this
scenario occurs, do not cold-reboot the ESXi host. Powering off the host can cause a second
outage or a double fa ult.

VMware Skyline Health runs at the application layer but shows you alarms and alerts from the
first three layers. Troubleshoot issues from the physical, network, and object layers as far as you
can go using the health checks.

vSAN objects left behind after delete operations are random issues that are caused by
applications trying to bypass vCenter Server and vSAN. Occurrences of this type of problem
have been reduced with newer applications that understand how to talk with vCenter Server.

In a healthy vSAN environment, vCenter Server should see the vSphere API for Storage
Awareness provider's status as Online and Active.

556
13-21 vSAN Layers: Creating the vSAN Cluster

Object
I CMMDS I
.. --------------------------------------------------------------------------------------------------------------------------------·
Network Reliable Datagram Transport
·--------------------------------------------------------------------------------------------------------------------------------·
LSOM LSOM LSOM
LLOG I PLOG LLOG I PLOG LLOG I PLOG

Physical Cache I I Cache Cache I I Cache Cache I I Cache

I: SSD :11: SSD :1 I: SSD :11: SSD :1 I: SSD :11: SSD :1 I: SSD :11: SSD :I I: SSD :11: SSD :1 I: SSD :II: SSD :1
Disk Group Disk Group Disk Group Disk Group Disk Grou p Disk Group

When a SAN cluster is created:

• Logical layers are created.

• Networking communicates using RD T.

• The DOM controls objects availability and initial 1/0.

• The CLOM controls object placement, object compliance, and rebalance.

• The CMMDS manages cluster membership and information such as inventory.

The vSAN physical, network, and object layers are involved and interact during the creation of
the vSAN cluster.

557
13-22 Troubleshooting by Layer and
Importance

vCenter
Server

ESXi
1111 0 1111

Tools by layer and importance:

• vCenter Server:

vCenter Server is the fastest way to identify alerts and issues.

VMware Skyline Health identifies risks and monitors, troubleshoots, and diagnoses
cluster component problems.

• ESXi:

ESXi connectivity issues can be diagnosed on the source, such as vmnic or vmk.

Use esxtop to find high loads on certain layers, disks, CPU, or memory.

• VMs:

Lower layer issues, such as CPU, memory, storage, and network, affect VMs.

The guest OS can help to find a possible cause.

Undersized or overloaded VMs are problematic.

558
13-23 Troubleshooting Process: Defining the
Problem
Troubleshooting is a systematic approach to identify the root cause of a problem and the best
solution to resolve the problem.

In this methodology, the troubleshooting process has three distinct phases. Defining the problem
is the first phase.

Identifying the Root


Cause of the Problem

• Identify and confirm • Identify possible • Identify possible


the symptoms causes solutions
• Gather useful • Determine the root • Implement the best
information cause solution

559
13-24 Defining the Problem (1)
A system problem is a fault in a system, or in one of its components, that negatively affects your
vSAN environment.

System problems arise from various sources:

• Configuration problems

• Network problems and failures

• Resource contention

• Software bugs and incompatibilities

• Hardware failures

13-25 Defining the Problem (2)


Ask questions to gather information and define the problem:

• Can the problem be reproduced?:

Determine a repeatable means to reproduce the problem.

When a problem is reproducible, validating a solution is easier.

• What is the scope?:

Examine the effects of the problem and get an idea of how many systems are affected.

• Was the system working before?:

Through examination, audits, and logs, verify that your configuration has not changed.

• Is the problem a known problem?:

Consult references such as release notes, VMware knowledge base articles, and
community forums to determine whether the problem is documented.

560
13-26 Defining the Problem (3)
For problems with VMs that are inaccessible, fail to start, or fail to respond, use a standard
approach by addressing the fa llowing questions:

• Are all t he hosts in the cluster?

• Are any hosts in maintenance mode?

• Are all the disk groups functional in the cluster?:

Are all the disks in a healthy state?

Are the disks full?

• Are the associated objects of the VM online?

• Does the cluster show aberrant behavior, aside from the affected systems?

13-27 Defining the Problem ( 4)


Determining the root cause of any outage requires a targeted approach that fa llows the logic of
expected outcomes.

Defining the problem is t he hardest part. Probing questions are key.

Cut the problem in half, and then cut it in half again:

• Is the entire cluster affected, or only a subset of the cluster?

• Are all VMs affected, or only a sampling?

• Is the observed behavior limited to one type of VM, policy, or configuration?

Most VMware Support cases are seen in the physical or network layers.

561
13-28 Defining the Problem (5)
Most support cases arise from suboptimal configurations. Review some of t he more common
types of incidents:

• Ensure that t he hardware used for the vSAN cluster is listed as compliant with the VMware
Compatibility Guide:

VMware provides best-effort support for noncompliant hardware, including driver and
firmware versions.

For storage controllers, this support also includes such things as the mode of the
controller, its queue depth, and other variables.

• Network configuration:

Layer 2 and layer 3 networking boundaries are potential points of failure.

Stretched cluster configurations require additional networking to support witness traffic.

• Build versions:

Your vSAN cluster should not include hosts w ith different versions of ESXi.

562
13-29 Activity: Defining the Problem
Your instructor leads a discussion to answer questions based on the screenshot of the problem.

1. Is this problem specific to a cluster o r a host?

2. Are any VMs negatively affected by this problem?

Summary Monitor Configure Permissions Hosts VMS Datastores Networks Updates


v 0 sa·vcsa·Ol. vclass.local
v OJ SA·DC-01
Issues and Alarms > & ere are connect1v1ty issues 1n this cluster One or more hosts are unable to communicate
> CJ MGMTHosts with the vSAN datastore. Data below does not renect the real state of the system.
> CJ W11nessHosts Performance >
v l(o SA vSA/\1-f}I

Ii sa-esxi-01 vclass local


Tasks and Events > Pl4cement end Ave ability status Affected nventory objects
O Reduced /JYIJ laDil ty w th no rebuiJd 4 {jJ V~s l
[.l sa-esx1-02.vclass.local (Not responding) vsphere DRS >
Ii sa-esxi-03 vclass local * 1 ers l

& New Virtual Machine


vsphere HA >
) l[J SB vSAN 01 Resource Allocation >
Name T Placement and Ava1labt11ty

Storage Overview
v 0 fj New Virtual Machine O Reduced ava lab11ity with no rebu Id
Sec·. ty
0 c;;;.) Hard disk 1 O Reduced ava lab1hty with no rebu Id
vSAN v
0 1::J VM home O Reduced ava !ability with no rebu Id
Skyi1ne Health
0 Virtual machine swap obiect O Reduced ava lab11ity with no rebutld
V11tual Objects
-

563
13-30 Activity: Defining the Problem Solution
Your instructor leads a discussion to answer quest ions based on the screenshot of t he problem.

1. Is this problem specific to a cluster o r a host? The problem is specific to host sa-esxi-02
but also affects the vSAN cluster.

2. Are any VMs negatively affected by this problem? Yes.

Summary Monitor Perm1ss1ons Hosts VMS Datastores Networks Updates


v 0 sa·vcsa-01.vclass.local
v OJ SA·DC-01
Issues and Alarms > & There are connect1v1ty issues 1n this cluster One or more hosts are unable to communicate
> LJ MGMTHosts with the vSAN datastore. Data below does not renect the real state of the system
> LJ WitnessHosts Performance >
" 1.18 SA VSAN 01
Ii sa·esxi-01.vclasslocal Tasks and Events > Pl4cem11n1 and Ava 14 ty status Af!11<ted nv11n1ory objects

BJ
r. sa-esx1·02.vclass.local (Not responding) >
O Reduced &'la Jabil ty w th no rebuild <4 VMS 1

t• sa-esxi-03 vclasslocal
vSphere DRS
* Ctllers I

~ New Virtual Machine


vSphere HA >
) CJ SB·vSAN-01 Resource Allocation >
Ut 1zat1on Name T Placement and Ava1lab1hty

Storage Overview
v 0 ~ New Virtual Machine O Reduced ava !ability with no rebu Id
Secunty
O G:;\ Hard disk 1 O Reduced ava lab1hty with no rebu Id
vSAN v
O tJ VMhome O Reduced ava !ability with no rebuild
Skyhne Health
0 Virtual machine swap ob1ect O Reduced ava lab1hty with no rebuild ._._
Virtual Obje{.ts
-

564
13-31 Troubleshooting Process: Identifying the
Root Cause of the Problem
Identifying the root cause of the problem is the second phase of the troubleshooting process.

Identifying the Root


Cause of the Problem

• Identify and confirm • Identify possible • Identify possible


the symptoms causes solutions
• Gather useful • Determine the root • Implement the best
information cause solution

565
13-32 Identifying the Root Cause
The vSphere Client contains powerful t roubleshooting and informat ional t ools for determining
the fallowing it ems:

• Overall cluster health:

Network

Disk

Object data

• Perf ormance data collection status

• Cluster capacity

The vSAN health service performs proactive health checks and the fo llowing tests:

• VM creation

• Storage performance

566
13-33 Identifying the Root Cause: Health
Checks
Because health check tests are weighted, a single problem can cause multiple health check
failures. Fixing t he problem that causes the first failed test often resolves other failed tests.

On the Health page, click the selected alarm or alert. Click Info fo r more information about the
health item.

For more information, click Ask VMware to open a VMware knowledge base article about the
selected alert or error.
0 Q 11 2 Q SA-vSAN-01 AC110. . v Skyline Health (last checked: 03/25/2020. 11:07 19 PM) Skyline Health (Last checked 03/25/2020. 11:07'19 PM) I llUUf I
..., 0 w-.-..cw.01 YClllHJoc• A vSAH 8ulld Re<Ol'N'Mndatoon v~N Build Recommendation Engine Health
v [)J SA oc;01 J
) 0 UGMl'HOiti .. , t!'.> .SAN Buold RKonvMndallOn- l'SSUl'S Info X
> CJ ~\HQ&lt
v Jt! SA..SAN-01 v

0 ...... "°' ......"""


0 ~,,< ...•"""
Q w nD.Ol 'l'CJMs IOUI
v
0 .SAN rfiffw cal6!0g U1> t...O.te 0 \/SAN relffW catolog up-to-d

> a sa """ ~
Tag. v 0 Onl...,. health (Otsabladl

(D Advise<


Skyline Health (Last checked: 03/25/2020. 1t07.tb=>M> I AUUT I
vSAN Budd Recommendation Engine Health

Iit. .SAN 8utld Re<onvMndaUon • tssues Info x


. 1 .... 1wogs
I Mk VMN&re I
\/SAN 6"ild Re<ommenclotJon f"9""' 1.-.S Of'I V M N 010
0 vSAN releese catalog UP IC> d
COtnP'I ly ~and VMWAt• r.iiNSO tl'IOtbc:bi• fOI S
rec~t;on odditj()n tc .tso reQUk"es VM~are

567
13-34 Identifying the Root Cause: Questions to
Consider (1)
Consider the fallowing questions when you try to identify the root cause:

• Are any of the hosts in maintenance mode?

• Was the cluster built according to best practices?

• Did the uplink fail, or is this a switch problem?

• Were the network settings changed?

SA-vSAN-01 .&c1oc ~' v

V 0 la-I/CW 01.clasS.locaf "'°"''S Upcla


v (liSA DC-01
) c:J M .._.M~ts
. . . . _ _ Alltms > Skyl•ne Health (Last checked: 03/25/2020, 1107"19 PM) I lltTfSI I
> c:J w •-..ssHOsts > •
v liSA vSAN
9 ~i-01.claulOUI
>
l] saof'S><i-02 vd.>SUOCAI VSplw•OllS >
a W-f'\•i-03 vct.tulocAI
>
) C[ S8 llSAN .()1

0 Data

0 Clustff

568
13-35 Identifying the Root Cause: Questions to
Consider (2)
• Is the host located in the cluster?

• Does the cluster information between hosts match?

[root@sb- esxi- 02 : -] esxcli vsan cluster get


Cluster Information
Enabled : true
urren oca ime :
Local Node UUID : 5e234b0d- f73d- 164f- 5efb- 0050560154f2
Local Node Type : NORMAL
Local Node State : AGENT
Local Node Health State : HEALTHY
Sub-Cluster Master UUID : 5e234bf7-7b37-ld9d-e55a-0050560154ed
Sub-Cluster Backup UUID : 5e234879-5cfb-2661-de29-0050560154fB
Sub-Cluster UUID : 52f99421-30f0-0f31-a4a6-44d6f1Ba565b
Sub-Cluster Membership Entry Revision : 3
Sub- Cluster Member Count : 4
Sub-Cluster Member UUIDs : 5e234879-5cfb-2661-de29-0050560154f8 , 5e234bf7-7b37-ld9d-e55
a-0050560154ed, 5e233570-940e-5dd0-ba44-005056029d23 , 5e234b0d-873d-164f-5efb-0050560154f
2
Sub-Cluster Member HostNames : sb-esxi-03 . vclass . local , sb-esxi-01 . vclass . local , sb-esx
i-04 . vclass . local , sb-esxi-02 . vclass . local
Sub-Cluster Membership UUID : fdf4ab5e-lf67-5249-2876-0050560154ed
Unicast Mode Enabled : true
Maintenance Mode State : OFF
Config Generation : e4931e65-72Ba-48Be-a45d-4220c6df7a51 3 2020-03-25T17 : 45 : 43 . 902

569
13-36 Identifying the Root Cause: Questions to
Consider (3)
• What are the drive states for the hosts?

If the host state is Not Responding, the number of Disks in Use is 0 .

Disk Group T Disks 1n Use T State T vSAN Health Status

fa Disk group (0000000000766d6B6261313a343 . . 3 Mounted Healthy

v [J sb-esxi-02.vclass.local 6 of 6 Connected Healthy

fa Disk group (0000000000766d686261313a313 . 3 Mounted Healthy

E3 Disk group (0000000000766d6B6261313a343 ... 3 Mounted Healthy

v CO sb-esxi-01 vclass.local O of O Not Respond ... Unknown

E3 Disk group (0000000000766d6B6261313a313 ... 0 Mounted Healthy

E3 Disk group (0000000000766d6B6261313a343 . 0 Mounted Healthy

570
13-37 Identifying the Root Cause: Questions to
Consider ( 4)
• Are the associated objects of the VM active?

• If absent objects exist, are they on the same host?

, .• New Virtual Machine ACTIONS v

, Monitor P<!tl'NSSIOnS O<ltaStOt s N tWOrk$ Snai>Sl'IOtS UPdat

Issues end Alllrms v 0 GrOUP components by host pleeement

All issues Vlrtuo Objoct Components


Trlgoer~ Ala•ms

Ptff~nce v
Ty.,. ComPontn\ Statt Hos I ,... •

Overview v Vir1uol machino swap ObjKt (RAIO 0)


Advar1CC!Ct
Component $ Actlve Ii se (IW.0lvdass.IOcel
Tesks end Events v I•

Tos-:~
I ..., CJ VM homo (RAIO 1)
I
Events Component $ ActlW ll se-i!Sll 01 vdass.tocal
UtJizaoon

VSAN v I Component
I • AbSent r. se OSJ 02 vcbss local

Physic (l;sk llbeemcnt Witness $ ActiVt' sa-i!Sllf>-03 vclass local


I•
Pt!rf0tmance
I ...,
.
Q Hard Cfil.i< I (RAJO 1)
I
Component 0 Act.W ll sa C!sxl-Otvdllss.loc

Component 0 ActNe fi. se csx 03 vdass IOcel

W.tnC!SS • Al>Scnl r. so sx 02\ldasslOCal "'

' I I •

571
13-38 Identifying the Root Cause: Questions to
Consider ( 5)
• Is aberrant behavior seen in VMware Skyline Health for the cluster?

Ira SA-vSAN-01 ACTION S v

Summary Monitor Configure Permissions Hosts VMS Datastores Networks Updates

Issues and Alarms > Skyline Health (Last checked : 05/02/2020, 1:05:18 AM ) RETEST I
Performance > .., O Network •

Tasks and Events > 0 Hosts with connectivity issues

vSphere DRS > 0 vSAN cluster partition


vSphere HA >
0 All hosts have a vSAN vmkn1c conhgured
Resource Allocation >
Ut zat.on 0 Hosts disconnected from VC
Storage Overview
Secur•ty
0 vSAN: Basic (unicast) connectivity check

vSAN v
0 vSAN: MTU check (ping with large packet size)
Sky1 ne Health
Virtual Ob1ects 0 vMot1on: Basic (unicast) connect1v1ty check

Physical Disks
0 vMot1on MTU check (ping with large packet size)
Resync1ng Ob1ects

572
13-39 Troubleshooting Process: Resolving the
Problem
Resolving the problem is the third phase of the t roubleshooting process.

Defining Identifying the Root Resolving


the Problem Cause of the Problem the Problem

• Identify and confirm • Identify possible • Identify possible


the symptoms causes solutions
• Gather useful • Determine the root • Implement the best
information cause solution

573
13-40 A voiding and Resolving Common
Problems (1)
Only a handful of the problems reported to VMware Support occur frequently enough to merit a
discussion on how to avoid or resolve them.

Support > Support Contact Options • VMware

Support Contact Options

Get Help From VMware Support

Get Support I Contact Us Support Request H istory Support Contracts


Get answers to frequently asked questions and View. update. or request closure of a Support Request Renew or upgrade support contracts. view support
solutions for common support issues If you don·t find contract details. and view support contract purchase
Support Request History »
what you·re looking for. you can file a Support Request history
Get Support » Support Contracts »

574
13-41 A voiding and Resolving Common
Problems (2)
Problem:

• VMs become unavailable as or after the ESXi host enters maintenance mode.

Avoidance:

• Verify the status of the vSAN cluster and the health of its objects before you start
maintenance tasks.

• The best practice is to work on only one host at a time. Ensure that no resync activity exists
bet ore you begin maintenance work on other hosts.

Resolution:

• Taking the host out of maintenance mode typically enables unavailable VMs and objects to
regain quorum and become available.

575
13-42 A voiding and Resolving Common
Problems (3)
Problem:

• Powering on a VM results in an 1/0 failure error.

Avoidance:

• Ensure that all VM components, such as the namespace and the VMDK objects, are
available.

• Review the vSphere Client health checks.

Resolution:

• Resolving object accessibility problems brings the VM into an available state.

• Validate the snapshot chain, if snapshots exist.

• Remove hosts from maintenance mode, if applicable. Whether this operation is helpful in
solving the issue depends on the policies in place, as well as the evacuation mode selected.

• Contact VMware Support if disks failed or are in an unhealthy state.

576
13-43 A voiding and Resolving Common
Problems ( 4)
Problem:

• A vSAN host fails to enter maintenance mode.

Avoidance:

• Verify the status of the vSAN cluster and the health of its objects before you begin any
maintenance task. Inaccessible objects can block maintenance mode operations.

• Use the vSphere Client to examine the VM storage policy that is used for your VMs. Modify
them, if necessary, to allow the host to enter maintenance mode.

Resolution:

• If the problem persists after you try these steps, contact VMware Support.

577
13-44 A voiding and Resolving Common
Problems (5)
Problem:

• Erratic performance is seen, including disks or disk groups intermittently going offline.

Avoidance:

• Ensure that the HBA driver and firmware, solid-state drives, and capacity disks are at
supported levels according to the vSAN Compatibility Guide.

• Verify new third-party updates and hardware against the vSAN Compatibility Guide.

Resolution:

• For a controller or disk issue, use maintenance mode to try to isolate the problem host.

• Events, such as disk failure or vSAN congestion, might contribute to the problem.
Congestion on the vSAN network is not inherently bad but can be an indicator of a problem
when it is prolonged.

• Contact VMware Support if you are unsure about the appropriate resolution.

578
13-45 A voiding and Resolving Common
Problems (6)
Problem:

• The vSAN on-disk upgrade process fails.

Avoidance:

• Verify that the cluster is healthy.

• Do not manually delete or create disk groups during the upgrade, because these actions
disrupt the on-disk upgrade workflow.

• Do not try to put hosts in maintenance mode during the upgrade.

Resolution:

• During the upgrade, objects and the disk format are upgraded in a sequential process.
Failures that occur during the upgrade result in unique scenarios. Contact VMware Support
for corrective steps.

579
13-46 A voiding and Resolving Common
Problems (7)
Problem:

• The vSAN cluster has network-isolated hosts.

• VM high availability events might occur.

Avoidance:

• Verify correct network settings and connectivity between all hosts in the cluster before
enabling vSAN.

Resolution:

• Fix the incorrect network settings.

580
13-4 7 A voiding and Resolving Common
Problems (8)
Problem:

• Some VMware Skyline Health checks appear as failed.

• Alarms might be triggered.

Avoidance:

• Follow best-practice recommendations for design, deployment, monitoring, and


maintenance tasks.

• Transient conditions, such as resync activity, can trigger temporary warnings.

Resolution:

• VMware Skyline Health check failures often correlate w ith specific vSAN activities.

• Review the VMware Skyline Health check details to understand whether a corrective action

1s necessary.

• The VMware knowledge base has articles on each VMware Skyline Health check and what
failures indicate.

For links to these articles, see VMware knowledge base article 2114803 at
https:/ /kb.vmware.com/s/article/2114803.

581
13-48 A voiding and Resolving Common
Problems (9)
Problem:

• The vSAN datastore is out of space.

Avoidance:

• Plan for the fact that VM storage policies affect the allocated space in the vSAN datastore.
Policies that include multiple failures to tolerate are especially storage-intensive.

• Editing storage policies in use can affect the datastore well beyond the estimated final
consumption as the changes are committed. New objects must sometimes be created
before old ones can be removed to free up space.

Resolution:

• Scale the cluster out or up.

• Power off and delete VMs.

• Migrate some of the VMs to other datastores if possible.

582
13-49 Review of Learner Objectives
• Use a structured approach to solve configuration and operational problems

• Apply a troubleshooting methodology to logically diagnose f au Its and optimize


troubleshooting efficiency

583
13-50 Key Points
• The vSphere Client is the main t ool for monitoring the healt h and performance o f your
vSAN cluster.

• The health service act ively tests and monitors the vSAN environment.

• You can use ESXC LI commands to view information about your vSAN environment,
including information not available in the vSphere Client.

Questions?

584
Module 14
Troubleshooting Tools

14-2 Importance
Although vSAN is primarily configured and managed through the vSphere Client, vSAN has
additional troubleshooting tools.

14-3 Module Lessons


1. VMware Skyline Health

2. Commands for vSAN

3. Useful Log Files

585
14-4 Lesson 1: VMware Skyline Health

14-5 Learner Objectives


• Discuss VMware Skyline Health and the associated service

• Describe the use of VMware Skyline Health to identify and correct problems in vSAN

• Apply information presented by VMware Skyline Health to problem-solving

586
14-6 About VMware Skyline Health
All customers w ith active support for vSAN 6.7 and later are entitled to VMware Skyline Health.

VMware Skyline Hea lth provides self-service findings for:

• Configuration

• Patches

• Upgrades

• Security

Skyline Health

Key Capabilities
• vSphere and vSAN findings
• Available in vSphere Client
• Supports vSAN 6.7 and up

VMware Skyline Hea lth for vSAN provides findings based on VMware Skyline Health data from
thousands of vSAN deployments. It does not require the Skyline Collector w hich means t hat no
data need be sent t o VMware to receive the benefit s o f VMware Skyline Health.

VMware Skyline Hea lth findings include rules based on VMware knowledge base articles and
best practices. To get the most out of VMware Skyline Health, vCenter Server must be
connected online and enrolled in the Customer Experience Improvement Program (C EIP).
However, customers can st ill receive some health checks offline.

VMware Skyline Hea lth offline tests run hourly. Of f line t ests do not require active support or
CEIP enrollment.

VMware Skyline Hea lth replaces Health in the vSAN U I and contains both VMware Skyline Health
findings as well as the vSAN Health summary. VMware Skyline Health is available wit h vSphere
6.7 P01 (or vSAN 6.7 U3a) and later.

587
14-7 Accessing VMware Skyline Health
You access VMware Skyline Healt h using the vSphere Client by selecting Skyline Health under
vSAN for a selected vSA N cluster.

~ SA-vSAN-01 A CTION S v

Summary Monitor Configure Permissions Hosts VMs Oatastores Networks Updates

vSphere DRS > Skyline Health (Last checked: 0 4/20/2020, 11:23:50 PM) [ RETEST ]

vSphere HA >
~ Online health (Last check 57 minute(s) ago)
.
Resource Allocation >
Ut zatlon
9 Network
Storage Overview

vSAN v
9 Physical disk

Skyline Health ~I
Vlftual Ob.iects e oata

Physical Disks
Resync1ng ObJects 9 Cluster
Proactive Tests
capacny
Performance
9 Capacit y utilization

Performance Diagnostics
Suppe>rt 9 Hardware compat1b1h t y
Data Migration Pre-check

VMware Skyline Hea lth provides proactive findings and recommendat ions to avoid problems
before they occur, reducing the t ime spent on resolving support requests.

588
14-8 VMware Skyline Health Check Categories
VMware Skyline Hea lth checks are sorted int o categories that cont ain individual healt h checks.

VMware Skyline Health Description


Check Category

vSAN Build Monitors vSAN build recommendations for vSphere Lifecycle


Recommendations Manager

Online Health Monitors vSAN cluster health and sends failed health check to the
VMware analytics back-end system for advanced analysis

Net work Monitors vSAN net work health

Physical Disk Monitors t he healt h of physical devices in t he vSAN cluster

Data Monitors vSAN data health

Cluster Monitors vSAN cluster health

Capacity Utilization Monitors vSAN free disk space to ensure that capacity use does
not exceed the t hreshold setting

Hardware Compatibility Monitors t he cluster component s to ensure t hat t hey are using
support ed hardware, soft ware, and drivers

Performance Service Monitors t he healt h of the vSA N performance service

VMware Skyline Hea lth includes several health check categories. Many checks have
preconfigured health check tests that run every hour to monit or, t roubleshoot , diagnose the
cause of cluster component problems, identify issues in t he environment , and avoid problems
before they occur.

Additional checks include Hyperconverged cluster configuration compliance: Monitor host


compliance for hyperconverged cluster configuration.

For more information, see VMware knowledge base article 2114803 at


htt ps:/ /kb.vmware.com/s/article/2114 803. This article provides an overview and serves as a
homepage for t he VMware knowledge base articles about t he vSAN on-premises and online
healt h check.

589
14-9 VMware Skyline Health for vSAN
VMware Skyline Health includes preconfigured health check tests to monitor, troubleshoot, and
diagnose the cause of vSAN cluster problems. It also identifies potential risks.

H 12 a ~ a SA-vSAN-01 •Ct•o•S " On the Skyline Health page, click vSAN Health Details, or the alarm or
-O
v
~
w vcsa-01 ...rtass k>Ca Sunvnary "' Hosts , alert, and then click Info for more information about the health item.
v [h SA·OC·01 (j) vS/<N he th alNm Al llOStS ,,,,.,. I \/SAN vml<tllC Con!>QYl'ICI'
) CJ MGMTHosts (j) \ISAN health a!Mm '\/SAN Clustet pan t10t! Skyline Health (Last checked: 04/20/2020. 7:2l:S2 PM) Skyline Health (Last checked: 04/20/2020. 7:21:52 PM) I •UESI I
) CJ Wotl'IO$SHO$tS
) e SA vSAN OI 0 NtlWOrk O Netw0<k vSAN cluster partition
)Q SB-\/SAN-01
Related Obteets x
© vSAN ckrster pa1 tittOn - -...~ I<J: vSAN cluiter ~rt1tion Siltnce Alert
vSphe<eHA
© A hosts heve a vSAN vmkn.c configured (j) A• tiosts have a vSAN vrr«noc ....
Tags "' e Hosts dlSConncK.tfld from VC 0 Hosts doconne<ted from VC

0 .SAN Basoc (unocost) come<lMty c -k 0 vSAN BISIC (unoca>t) comect1


vSANOvetv1ew
0 .SAN MTV check (ptng w th oarge packet size) 0 vSAN MTV check (pong v. U'I o
VSAN cap.c.ty 12 46 GB
0 v M011on Basic (U'llCOSl) COMK\,vrty check
Skyline Health (Last checked: 04/20/2020. 7:21:52 PM) I OUESI I
0 vM<>tH>n: MTU check (ptng w1lh 1dfge packet sue)
0

··-
Network vSAN cluster part1t1on
VSANHealth
e
-
Network klt"'1Cy check
x
I© vSAN cluster partition I A~k VMwere I
To~·~ iVi\ONly. e vSAN hosts n'IU\t ~ dble
0 All host). '4\IV a Y'SAN vmknic
For more information, click Ask VMware to open a VMware to c::onvnun1catct ovvr both muttic•st and urvcast f tlwy
carmot. a vSAN ctuster w t.?L! r.to multiple part1tJOnS. l•
Hl»tS dlSConrtKl@d from VC
knowledge base article about the selected alert or error. sub groups of hosts that can comm~ote but not to other

0 •SAN B•SIC (umcast) C""'*'' sub g<OUPS. When !hot happens, vSAN objects mogM
btocome i..nava ~bl• until ttMt network rnscoo'tg<JfatJOn is
0 •SAN M rv che<I. (ping w ·~.. resolved

VMware Skyline Health checks all aspects of a vSAN cluster. VMware Skyline Health performs
checks on several items, such as hardware compatibility, network connectivity, storage device
health, and cluster health.

Using VMware Skyline Health, vSAN administrators can ensure that the vSAN deployment is
fully supported, functional, and operational. Administrators can also receive immediate
indications to a root cause if a fai lure occurs.

To verify the configuration and operation of your vSAN cluster:

1. Navigate to the vSAN cluster.

2. Click the Monitor tab.

3. Under vSAN, select Skyline Health to review the different vSAN health check categories.

4. If the Test Result column displays the Warning (yellow) or Failed (red) icon, expand the
category to review the results of individual health checks.

5. Select an individual health check to view the detailed information.

In the Info section, you can click Ask VMware to open a VMware knowledge base article
that describes the health check and provides information about how to resolve the issue.

590
14-10 Online Health Checks

vSAN 7.0 has a built-in online health check capability that monitors vSAN cluster health and
sends the collected data to the VMware analytics back-end system for advanced analysis.

ILJI SA-vSAN-01 ACTIONS v

~ 1r Monitor Configure Permissions Hosts VMs Datastores Networks Updates

vSphere DRS > Skyline Health (Last checked: 04/20/2020, 11:23:50 PM) I RETEST I
vSphere HA >
v O Online health (Last check: 57 minute(s) ago)
..
Resource Allocation >
G) Advisor
Uti izalion
Storage Overview
• vSAN Support Insight
Security

vSAN v G) Audit CEIP Collected Data

Skyl ne Health
0 Physical network adapter link speed consistency
Virtual Objeets
Physical Disks 0 Patch available for critical vSAN issue for All Flash clusters with deduphcauon enabled
Resync1ng Ob1ects
Proactwe Tests • vSAN max component size
capacity
Performance
Pertormance 01agnost1cs
O Network
Support
Data M grauon Pre-check 0 Physical disk

You must participate in the Customer Experience Improvement Program (CEIP) to use online
health checks.

The online health check feature improves as data about customer implementations is collected
and issues are documented. The cloud checks provide a link to a related VMware knowledge
base article for the issue detected so that customers can solve the issue without the need to
contact VMware Technical Support.

591
14-11 vSAN Release Catalog Up-to-Date
Health Check
The vSAN release catalog up-to-date health check object verifies the age of the vSAN release
catalog.

The vSAN release catalog is used for vSAN build recommendations. The catalog shows
warnings or errors when it is older than 90 or 180 days. The vSAN release catalog is updated
with new releases or critical patches.
VITI vSphere Client M··, v - , . . · C ") v 1.1,1,· ,., ,., 1. ,,'J""-<f r1f. • t.1 v _-:-

lfJ ~ ~ Q iJ SA-vSAN-01 •c110Ns v

Sur-m f Morutor Con 1Qur P. Tl st VM C T .t' N• tw "' Updates


v 0 sa·vcsa 01 vdass local
v (b SkOC-01 vSphero ORS > Skyline Health (Last checked: 04/20/2020. 11:23:50 PM) I I
To manually update the vSAN ) CJ MGMTHosts
RETEST

> CJ WotnessHosts vSpher•HA > A vSAN Bu1k:I RecommendatK>n


release catalog, locate and click ) l[J SA \/SAN 0 1 ~source AllocatlOft >
vSAN release catalog up.to-date

the vSAN release catalog up-to- ) 0 SB -VSAN01


Ut IZ I )fl
= vSAN Build Re<:ommendation vSAN release catalog Info 111 o X

date Health check object and Storage~w (i) vSAN buold recomlTl<'Odauon update From File S ence Al&rt

select Update From File. vSAN v


0 vSAN release catalog up·to-d- l nllty Tun• 1n UTC

C'8'rent t.me 0412V2020. 12 00 40


Virtual ObtKts 0 Onhne health (Last check: 5 7
Local • - • • OB copy laot updated 05/IV2020 8 23 58 At

R<tsyncing Objects
Proact1Ve Tests

• vSAN Support Insight


"-"ormance
~rf0<manc" Olagno.t1cs (i) Audit CE•P conect@d Data
SupQott
• •
21 .....
Data M wat.on Pre<heck 0 Physocal nett work adapter ~nk
-
vSphere and vSAN support various hardware configurations. The list of hardware components
and corresponding drivers that are supported with vSAN can be found in the vSAN release
catalog. You must only use hardware, firmware, and drivers found in this guide to help ensure
the stability and performance of a vSAN environment.

From the Health and Performance pane, you can easily update the information in the vSAN
release catalog. If t he environment has Internet connectivity, updates can be obtained directly
from VMware. Otherwise, updates are obtained in the vSAN release catalog. Updates can be
downloaded as a file to enable offline updates.

For more information about the vSAN release catalog up-to-date health check object, including
the vSAN release catalog downloaded link, see VMware knowledge base article 58891 at
https:/ /kb. vmware.com/ s/ article/58891.

592
14-12 Scenario: Troubleshooting Network
Health Issues (1)
In this example, the Skyline Health check network category has two tests that failed.

0
~
0 SA-vSAN-01 AcT10Ns v

Surr '""' y Monitor Confi<Jurn Perm1s~ons Hosts V"" 0"' ~t'""" II' t ,..,i,c Undates
0 sa vcsa 01 vclass local
v [b SA·OC·OI
issues and Alarms Skyline Health (Last checked: 04/20/2020, 7:21:52 PM) I llE TEST I
) CJ MGMTHoslS
> r:::J W1tnessHosts AH Issues v O Network •
) • SA 11t:.AN 01 Triggered Alarms

) f2 SB-vSAN-01 Performance v
0 Hosts with conncct1v1ty issues

Overview 0 vSAN c uster par11tion


AC1v11nce<1

Tasks and Events v 0 A hosts have a vSAN vmknic conl19ured

Tasks 0 Hosts d1sconnocted from vc


Events

vSphere DRS 0 vSAN· Basic (unicast) connectivity check

0 vSAN M TU check (ping with large pac ket size)


Fauns

0 vMot lon Basic (unicast) connect1v1ty check


V M DRSScore
CPU Utillzatlon 0 vMouon MTU chock (ping with largo packet size)
Memory Ut 2il\JOn

Network Ut zation e Network latency check


-

Monitor the Health pane regularly. Expand the test category to v iew the individual tests. Select a
test to view detailed test results. You can click Retest to manually run all tests. Otherwise, the
tests are run every 60 minutes.

Identify the test categories that do not have a status of Passed. If an issue is detected in the
environment, a result of Failed or Warning appears next to the test category in the Health pane.
Maximizing the test category displays t he specific tests that failed or produced a warning.

The example shows the Monitor tab of a vSAN cluster. The vSphere Client reports that two
vSAN health alarms were triggered.

593
14-13 Scenario: Troubleshooting Network
Health Issues (2)
In this example, the network category has two tests that failed:

• vSAN cluster partit ion

• All hosts have a vSAN VMKNIC configured

For more information about a failed health check test, select the individual test.

Skyline Health (Last checked: 04/20/2020, 7:21 :52 PM) [ RETEST I


v O Network ..
Hosts with connectivity issues

<D vSAN cluster partition

<D All hosts have a vSAN vmkn1c configured

Hosts disconnected from VC

Q) vSAN: Basic (unicast} connectivity check

Q) vSAN: MTU check (ping with large packet size)

Clicking a specific VMware Skyline Health check test provides more details about why the test
failed or produced a warning.

594
14-14 Scenario: Troubleshooting Network
Health Issues (3)

View the details of the test that failed.

Click Ask VMware for more help.

Skyline Health (Last checked: 04/20/2020, 7:21:52 PM) RETEST I


.., O Network All hosts have a vSAN vmknic configured

0 Hosts with connect1v1ty issues


! Hosts with no vSAN vmknic present Info x
0 vSAN cluster partition Siience A lert

0 All hosts have a vSAN vmknic ... Host


1.

Skyline Health (Last checked: 04/20/2020, 7:21:52 PM) RETEST

.., O Network All hosts have a vSAN vmknic configured

I0 Hosts with connect1v1ty lssuc·i::J Hosts with no vSAN vmkn1c present Info x
0 vSAN cluster partition A sk V Mware

10 All hosts have a vSAN vm knic ...


In order to participate 1n a vSAN cluster, and form a single
group of fully connected hosts. each host 1n a vSAN cluster
must have a vmkn1c (VMkernel network interface or VMkernel
0 Hosts disconnected rrom vc
adapter) configured for vSAN traffic

You can view details of each test that did not pass in the Info panel. The Info panel explains the
health check to help you identify a possible root cause of the issue. For a more detailed
explanation, click Ask VMware.

595
14-15 Scenario: Troubleshooting Network
Health Issues ( 4)
Clicking Ask VMware takes you to a VMware reference.

In t his example, you are taken to VMware knowledge base art icle 2108062.

C ~· - vSAH-Clutt.. ·H< x Clt vSAHHulthS....C•·N-•· x = vSAH-S....C• --


--~~~~~~~~~~~~~~~
x +
©a •t ••• -(::r

ti lnfn1stNCtU<t t:J ~t S..t-A ti~· S..t-8 0 ~· Cloent (SA-YC••

KB V Mwore Know ledge IJCJse Training Communlly Store My VMware

Sf ARCtt
Additional related articles
are available on the right
side of the page.
vSAN Health Service - Network Health - All Related Resources
0 KB • vsAN 11 "1th s. ~ o Haroware
hosts have a vSAN vmknic configured compaUOlilty • vSAN HCL DB Auto
Update 2t'6132)
(2108062) 0 KB • VSAN Health Service • I/SAN HQ
Heatth - VSAN HQ. OB up-to-date
(2109870)

v Purpose 0 KB • How to manualy rvmove and


re<reato a VSAN <!ISi< greup USAng
This artk:le explains the purpose o r tile Networt< Health - All hOSts have a VSAN vmkntc conngure<J clleek 11'1 llle
VSAN Health Service and provides detals on why It ~hi report an err0<.
0 KB • VSAN ClalastO<e USlflg Intel
v Resolution NVM<> P3700 dlSl<s rep0<1 cttecksum
err0<s (SOt21439)
a : What does the Network Health - All hosts have a vSAN vmknic configured check do? O KB • vSAN He~~h Sorvice • Network

To partlclpalo In a vSAN duster. and f0<m a single partition of !Ully connected ESXI hosts. each ESXI hOSt In a VSAN f-1<.allh . Hosts W1th connoctMty

_...._ ..,_ _.........._.. --·- -- --


cluster must have a vmknlc (VMkernel N1C 0< VMkornel aoapter) configured 10< vSAN traffic. This check erisures
__ ~ --- _..._.._ ----
tssoos (2108317)

Clicking Ask VMware takes you to a VMware knowledge base article t hat describes t he test and
probable causes, and o ffers advice on how to resolve the issue.

596
14-16 vSAN Capacity Check
The CAPACITY USAGE and CAPAC ITY HISTORY panels display the Capacity usage, Usable
capacity analysis, and Capacity hist ory views at t he vSAN cluster level.

0 SA-vSAN-01 •c110Ns ..,


Capacity Usage displays the percentage used
Monitor ?. :nisStOns \/Ms Datastores Updbtes
f HO$lS Ntlworks
and free capacity on vSAN disks.
VSphereHA > CAPACIJV USAGE CAPAClfY HISTORY

Resource AJloc.attOn >


Capacity Overview
Ut 7at~
Usable capacity analysis estimates the effective
Ste CM "VII " $ Used 19.27 GB/179 98 GB (10.71") Free space on d•sks 160.71 GB <D space available with selected storage policies.
~\J ty

VSAN
• Act<Mllly wr1tt.., 19 27 GB (10.71")
c;. cyi Health
Oedup & com1><ess10n savings 18 35 GB <Rato 2 88x)
V rtual Objects
Phvslca Dos.ks 9 r f SSIOl"IS Hosts VMS Oatastores N twMs Updates
Resvndno OlllO< •• Usable capacity analysis
> CAPACIJY USAGE CAPACITY HISTORY
- - - -
us.e tnts panel to estimate the effective fr.e space availabk! 1f RffOUICO .UOC.tlon ) The capac11y usage charts for a given period o f ume
"9 dedup rallO Is I (D
~1orrNnC.• lo 19 P< ·y a .u Ut z,,t.on O..te Ran99 LAST • t
-- Day(s) I SHOW AfSUl!S ]

Chlngo policy to Effect.v. free space w1


vsAN oetaut Storage Policy S.C..ty

Management Storage policy • Encryption VSAN


Management Storage Poltcy · Regular Skytino ... th
Management Storage Poltey - Single Node
Management Storage poltcy • Thin
VM Encryption Policy
, VSAN Default Storage Poticy
9. 11 c.a
S 1010 JIOl'M S to 1o. CJ 10 PM S II 10. J IOAM \II l0.9 1 0 .tM \ ,, 10.

The CAPACITY HISTORY pane displays


capacity information over the selected
time period.
Cloud NatJW Sl0f9 )

For more information about capacity check usage categories, see Monitor vSAN Capacity at
htt ps:/ I docs. vmware.com/ en/VMware-vSphere/7. 0 I com. vmware. vsphere. vsan-
monitoring.doc/GUI D-6F7F134E-A6 F7-4459-8C31-C021 FF281F54.html.

597
14-17 Performance Service Charts

The performance service monitors performance-based metrics at the cluster, host, VM, and
v irtual disk levels.

vrn vSphere Chent M· v ' . 1--~" _ " :. i · , , • 1' •• , ••; Hf r..f • : •• ,, """

Point to a performance
0
chart for a specific metric
~==-----~----1
V W·VCW-01..:i..• local
Ill SA·OC-01
v

) CJ MGMTHosts
from the selected time.
) CJ W•t ~IS
) tJ SA 11SAN 01
) D S8·vSAN·01

Recent Tasks A arms

You use the performance charts to monitor and troubleshoot performance problems like IOPS,
throughput, congestion, and latency concerns.

598
14-18 vSAN Performance Checks: VM
The VM panel for the selected vSAN cluster displays an overview of vSAN performance
statistics in a graphical format at the vSAN host and vSAN cluster levels.

IOPs
•A • I II:> Congestion
,.,... . . . ~··'... 1 ttcM(• ( • .,., Ill ~
~ Throughput .
<•"""""'
.....
,. · - G>

• •MU•----------
' ""'
0
... __ ,........,.'" ..........,..
_...,
'"
10 4\ AM

.. -- - - •

Latency

You can use the vSAN performance charts to monitor t he workload in your cluster and
determine the root cause of problems.

When the Performance service is running, the cluster summary displays an overview of vSAN
performance statistics in a graphical format, including vSAN IOPS, t hroughput, and latency. At
the cluster level, you can v iew detailed statistical charts for all VM consumption and the vSAN
back end.

To view iSCSI performance charts, all hosts in the vSAN cluster must be running ESX i 6.5 or
later, and the iSCSI Target service must be enabled.

vSAN cluster pert ormance charts are v iewed using the following steps:

1. Under vSAN, select Performance.

2. Select the VM tab and select a time range for your query.

vSAN displays performance charts for VMs running on the cluster, including IOPS,
throughput, latency, congestions, and outstanding 1/0. The statistics on these charts are
aggregated from the hosts within the cluster.

3. Select Backend and select a time range for your query.

vSAN displays performance charts for the cluster back-end operations, including IOPS,
throughput, latency, congestions, and outstanding 1/0. The statistics on these charts are
aggregated from the hosts within the cluster.

4. Select iSCSI and select an iSCSI target or LUN and select a time range for your query.

vSAN displays performance charts for iSCSI targets or LUNs, including IOPS, bandwidth,
latency, and outstanding I/ 0.

For more information about vSAN performance views and graphs, see VMware knowledge base
article 2144493 at https://kb.vmware.com/s/article/2144493.
599
14-19 vSAN Performance Checks: Disks
The DISKS panel displays performance metrics for the disk group and individual disks at the disk
group and vSAN host level.

0 sb-esxi-02.vclass.local •crio• • v
Morutor C • r P :-nissions VMS OatastOfcs Networks Updates

I sHoW RtsvLts I Graphs can present performance data about


-------1
tmeRangeo , . ,,... 1 HOurCs) )

~ormonce
Tngoe-rtOAlMn
v ~~-~-~~
P -
~11da
_7_3·~-
1't9-
233 _
f--
~-fd0cl-76_
n -=
..noio~~~
------'~~ entire disk groups or individual disks.
Overview
M•tr.cs about disk 9roups A SK VMWAR(
Advancld
Frontond(Gutil) IOPS ©
10

~ sb-esxi-02.vclass.local
• I 5S PM 5t y Monitor ;, t: P. '" C t wes Ne Updotes
v fronl~nd Rt.td K>P!t 0
O"" Group Local VMworo 01.ic (mpx vmhbaO:Co·r2·tO)
0 - - - - - 1 ~.tdCM~ Rt~ tOPS
I ll""' I IO,,..
ffotl1trtd \\oult KW'> 0
0
V.ntf a.,,tftr 'Wntt K>PS 0 ' JN l1'ih.
S211da73-d41't·9233·f~·664fcl0ca7672 •

Al I Metrics about Capac-1ty-t1er disks ASK V MWAOE

Trio red Al f'


PhySICai/Firmwar" Layer IOPS ())

..... Performance v 1o

I 2S PM
Phy...c.if-l•Vf'r Rtid tOP\ 0
Phy\tUl-l•Yff \lrntt tOP$ 0
Tnks and Events v

Metric specifics such as Total Queue Throughput


and Physical-layer Read Latency are visible by
pointing to the selected graph. vSAN v

Sll.001'

1~6001 s

The examples show the delayed 1/0 throughput for the entire disk group and the
physical/firmware layer latency for a local physical drive within the disk group.

vSAN requires at least one disk group and can be configured w ith up to five disk groups per
host.

Using the Disk Group pane, you can distinguish how each disk group is performing, independent
of any other disk groups on the same host.

The Disk Group pane has several graphs with which you can monitor the performance of the
cache tier and vSAN internal queues. This pane also has a graph that shows the disk group
capacity and usage.

600
14-20 vSAN Performance Check: Physical
Adapter
The PHYSICAL ADAPTER panel displays the pNIC Throughput, pNIC Packet Per Second, pNIC
Packet Error Rate, pNIC vSwitch Port Drop Rate, and pNIC Flow Control views.

D sb-esxi-02 .vclass.local ACTIO N S v

Summary Monitor Configure Permissions VMs Oat astores Networks Updates


~

Issues and Alarms v VM BACKEN O DISK S : PHYSICA L A DA PTERS HOST N ET W ORK •


. .

All Issues Time Range. LA ST " 1 Hour(s) I SHOW RESULTS I


Tnggered AlarrT•s

Performance v
Physical Adapter. vmn1c3 v

Overview
Metrics fo r vSAN physical NIC. ASK V MWARE
Advanced
The performance statistics count all network IOs processed en the network adapters used by vSAN The counted
Tasks and Events v
network IOs are not lnnlted to vSAN traffic only
Tasks
Events pNIC Throughput <D
4 .37 K8 \
Hardware Health

vSAN v
2. 18 KS\
Performance
Sky! ne Health
0 .00 8 \
2 4'.> P\4 3 OOPM 31SPM 3 30PM 3 4S PM

- pNIC lhroughput Inbound - pNIC Throughput Outbound

...

If a host's physical network adapter used by vSAN is slow, compared with other hosts,
pert ormance issues might be introduced. You might notice symptoms such as high network
latency, congestion, and so on.

For more information about the vSAN health service and the physical network adapter link
speed consistency check, see VMware knowledge base article 50387 at
https:/ /kb. vmware.com/ s/ article/50387.

For more information about how the health test checks the network latency between vSAN
hosts and displays network latency in real-time, see VMware knowledge base article 2149511 at
https:/ /kb. vmware.com/ s/ article/2149511.

601
14-21 vSAN Performance Check: Host
Network
The HOST NETWORK panel displays the metrics for the host network and the vSAN VMkernel
Net work Adapter network.

D sb-esxi·02.vclass.local •c••o•s "


'T\ ry Mon.tor i;J rmlsslons VMS Data!.tOf'es Netw0f1<s Updat s

tsSUHandAlarms ..., I VM I l•CkENO J OtS1<S ) PHYSICAL •OAPTEAS Mli·(iitijM.];iM •

AJI .. me Range LAST.... I Hour(s) I SHOW PE:SUlfS )

Performance v 0 sb-esxi-02.vclass.local ACTtONS v

Metncs for VSAN Host Network.


Advanced

Tas.ks ilOd Events v


-hO P9'rtormallee stetista counl ell networ~ tOs procesWd m tl"Mt network adapters usoc= tswe. and Alarms v I YM I B•CKENO I o SKS I PHYS C.&L •O•PTEOS Mi!.Jiiiijffl·l·ii •

Alf Issues T.,ne Range l.AS T .., 1 ttour(s) I SHOW QESVL TS J


vSAN Host Network VO Tnrougnput © TrlQ9'il r.O Ala
JSI Ill t.
P&rformanc:e v

vSAN v Ove<¥ w
Net-
--
vmk2

Metric:s for vSAN VMkerMI Networtc Adapter ASK VMWAA[


126 kl \
Pwrlo<man<:o Actvanced

Skyline ~aitn TIW performanc• stall.Sltes count ••network K>s process.ct tn the network adapters used by vSAN The counted
Tasks and Events v
0001 \ l8tWOfk tOs "'e not lm1ted to vSAN lrdltfic only
IOOPM J IS PM ) TMks
V~l(errl91 NetWOtk Adapter ThrOU<Jhput CD
1\1 kl'\

vSAN
1l6klt

0001 ~
I OOPM

- ttwOl.fOhput. Inbound - lhrt)uj)hput Outbound

By selecting Host Network from the Network drop-down menu, the HOST NETWORK health
check table lists the physical network adapter used by vSAN on each host . To determine which
host has a link speed that is inconsistent with t he ot hers, examine the last column in the table,
which shows t he link speed.

Selecting t he VMkernel, in this example vmk2, the HOST NETWORK health check table lists
information about network traffic for the VMkernel.

For more information about the vSAN Health service and the physical network adapter link
speed consistency check, see VMware knowledge base article 503387
htt ps:/ /kb. vmware.com/ s/ article/50387.

602
14-22 vSAN Performance Check: VM Virtual
Disks
The VIR TUA L DISKS pane shows metrics for each individual disk (VMDK) on the select ed VM.

VIR TUAL DISKS panel metric views:

• IOPS and IOPS limit

• Delayed normalized IOPS

BJ New Virtual Machine


-
Q •
Permissions
ACTIO N S v

Dau1storcs Networks SnapshOts Updates


v O sa-vcsa 01 vda\sJocal
v 13l SA·DC·01 VM VlllTUAL DISKS
lssutt and Alarms
> CJ MGMTHosts
> C) WotnessHosts All Issues Time Range L AST.., 1 Hour(s) I SHOW RESULTS j
v (J SA·vSAN·Ol Tr1ggere<1 Alarms
--
13 ~esxl-01 vcla~s local Performance "' Virtual Disk: Hard disk 1
13 sa esxl·02 vclass local Overview
IOPS and IOPS limit Q)
Advanced
E/J New Vlrt Machine 10
Tasks and E~nts v
) SB vSAN 01
Tasks
s
Events
Utl 2Dtlon

VSAN v 0
10 SO AM 11 OS AM 11 20AM 11 SS AM II SOAM
Physical disk plat«: '
IOPS Lu"'t
I
- Nonnahzttt - N0<m&1<zcd IOPS
Per10f'mance
I Delayed N0tmal1zed tOPS ©
10

0
10 SOAM 11 OS AM 11 20AM 11 SS AM 11 SO AM

A VM can be configured wit h one or more virtual disks. T he V IRTUAL DISKS pane shows IOPS,
throughput, and latency for the selected virtual disk.

Each virt ual disk can be assigned a different storage policy. T he storage policy sett ings can
contribute to the pertormance characteristics of the v irtual disk. For example, you might create a
st orage policy t hat sets an IOPS limit for t he VM object.

For more information about VM Consumption graphs, see VMware knowledge base article
214 4 493 at htt ps:/ /kb. vmware.com/ s/ article/21 4 4493#VMConsumptionGraph.

603
14-23 Running Proactive Tests
Proact ive tests are useful for verifying that your vSAN cluster is working properly before you
put it into production.

lij1 SA-vSAN-01 ACTIONS v

Summary Monitor Configure Permissions Hosts VMs Oatastores Networks Updates

Proactive Tes ts
vSphere ORS >
For storage performance test. use HCIBench HCIBench 1s a storage performance testing automation tool that
vSphere HA > s1mp 1hes and accelerates customer Proof of Concept (POC) performance testing 1n a consistent and controlled way

Resource Allocation > VMware vSAN Community Forum provides support for HCIBench.

Ut1hzat1on 8 A SKVMWARE
Stor age Overview
N•me L•st Run Result L•st Run Time
Security

vSAN v -- -- -
Skyline Health
-- --
Virtual Ob1ccts
Physical Disks
Resync1ng Objeets 2 terns

I Proactive Tests
capacity
I
Performance
Ru n t he test t o view it s details here.
Performance Diagnostics
Support
"-·- ,4 -·-· -- ""·- ..............

vSAN includes two additional tests t hat you can run proactively, instead of reactively, on a
vSAN cluster. You can use these tests to verify t hat the vSAN environment is f unctioning
correctly and perf arming as expected.

The following proactive tests are available:

• VM creation test : This test t ypically takes 20-40 seconds, and at most 180 seconds if
timeouts occur. One VM create and one VM delete task are spawned per host. These tasks
are displayed in t he Recent Tasks pane.

• Network Performance Test: This test assesses if connectivity issues occur and if t he
network bandwidth between hosts can satisfy vSAN requirements.

604
14-24 Exporting Support Bundles: Local Files
When a serious error occurs, VMware Technical Support might ask you to generate a vm-
support package. The package includes log files and other information, including core dumps.

Administrat ion
Upload File to Service Request
• Access Control
In this example, local log files are uploaded and sent to
Global Permissions
YO<. can uplc)<Jd f s dlrectry to VM w re u ng t
httPSJ/" vmware com 443/"
0
""'e the support representative by clicking Administration >
• Lkensing I UPLOAD FILE TO SERVICE REQUEST I Support > Upload File to Service Request.
Licenses

• Solutions

Client Plug"'s
vCenter Server Extensoons
Upload File to Service Request
x
• Deployment
System Configuration
Customer Experience Improvement _
Service request ID •
•Support

Upload File to Servic" Requ.,st

• Single Sign On
File to upload [ BROWSE I
Users and Groups
Configuration

• Certifkates
Certillcate Management [ C AN CEL I

An administrator can select Upload File to Service Request to enter an existing support request
number and upload the necessary logs to VMware Support.

The data collected in a host support bundle includes the name of the affected ESXi host, logs,
VM descriptions (but never the contents of virtual disks or snapshot files), information about the
state of the affected VM, and, if present, core dumps.

VMware Technical Support routinely requests diagnostic information from you when a support
request is addressed. Data collected in a host support bundle might be considered sensitive.
A lso, as of vSphere 6.5, support bundles can include encrypted information from an ESXi host.

For more information about data collected when gathering diagnostic information from vSphere
products, see VMware knowledge base article 214 7388 at
https:/ /kb. vmware.com/ s/ article/214 7388.

605
14-25 Exporting Support Bundles: vCenter
Server
The vm-support package and core dumps can be exported from different objects in the
vSphere Client. In this example, the system logs are exported from vCenter Server.

O sa-vcsa-01. vclass.local
v 0 sa-vcsa 01 vdass local Summary M or C"' •• gure

v l1J SA·DC-01
> LJ MGMTHosts Ill New Dat<>eenter Export System Logs - sa- Select hosts x
> O W1tnessHosts vcsa-01 vclass.local s.r.ct f'loil 1ogs1aia.c int i1n•qXlft t:ud9 vau~~ ~we..,,.. Slirwl'.., dof'wt• 1.110CJS" aw bu'df
CJ New FOider _ , . . , , _ ..
) l] SA·vSAN-01
) l] SB·vSAN·Ol
I Export System Logs

~ Assign License
I
_
I~-"

-g
Q IC~vei-.'I lrout

t.e-ftD-02-YdeulOcM

-...........
........,..
• ......
.," .........,,,

-
10!J

100

Tags & Custom Attr1bytes

AOd Permiss.on.
• I
Cl --...........
....... .(11~-·

lt>-~--04 vcms.iQc..i
..........
CCl"l"lkl«I

Camo<!OO
0

ID
SA vSA"'IO'I

l'.J sa VSAN .ct


s.e-...s.A~ -Ct
100

100

700
Alarms "' a K-wftl'W'fol

ustom Altnbutes c:i -.....c-.. ..._ Export System Logs· sa- Select logs x
- - - - - . vcsa-01 vclass.local ~~~~~~.c:::=wi::&.::.c:::.;m::;~=-..-.......
v 8 s.lltd Al I 0.S.W.C1 A.I
El •

.......

__,,
c .... Ii. I ••.::• I IXPO•t LOGS

If you deployed vCenter Server or vCenter Server Appliance, you can export a support bundle
containing log files for the node that you select in the vSphere Client.

For more information about exporting support bundles for vCenter Server, see Export a
Support Bundle at https://docs.vmware.com/en/VMware-
vSphere/ 6. 7I com. vmware. vsphere. vcsa.doc/ GUI D-C54CA3 F8-BD 7 4-4339-A2A5-
A E89F1C55175.html.

606
14-26 Exporting Support Bundles: ESXi Host
In this example, the system logs are exported from an ESXi host.

[;i ActtOnS sb-csx 01 vclass local

BJ New Virtual Machine.•• Export System Logs - sb-esxi-01.vclass.local x


v GJ sa-vcsa·Ol.vclass.local
Deploy OVF Template...
v [b SA -OC-01 v G Sele<t A DI Oesetect AD

) D MGMTHosts v G System
~ :oreOumps
> bl W1tnessHosts +
a a. ;e
Ba•~Mnrool
) Qjl SA-vSAN-01
a ft{)u~
v l[!IJ SB-vSAN-01 Maintenance Mode .. Cl E Image

(2 st>--esxi-01 vclass local a QI".


U Gath<• " "fl<l! data
Connection
sb-esxi·02.vclass.local
Ouretion: s«ond(s) Interval S<!Cond(S)

r;f sb-esxi-03.vclass.local Power ..


8 sb-esxi-04.vclass.local
Cert1f1cates .. Password

Storage .. Conf rm pessword

<D You can .,poao - meet: to VMwar"by ll0'"9 IOAdiBisUatlOn > 5'IA>O'I UplOacl
toServU~
Add Networking ...
C ANC EL EXPORT LOGS
Host Profiles

Export System Logs ..

Reconfigure for vSphere_

VMware Technical Support routinely requests the diagnostic information from you when a
support request is addressed. Data collected in a host support bundle might be considered
sensitive.

Furthermore, as of vSphere 6.5, support bundles can include encrypted information from an
ESXi host. You can make that password available to your support representative on a secure
channel. If only some hosts in your environment use encryption, some files in the package are
encrypted.

For more information on what information is included in the support bundles, see VMware
knowledge base article 2147388 at https://kb.vmware.com/s/article/2147388.

607
14-27 Review of Learner Objectives
• Discuss VMware Skyline Health and the associated service

• Describe the use of VMware Skyline Health to identify and correct problems in vSAN

• Apply informat ion present ed by VMware Skyline Health to problem-solving

608
14-28 Lesson 2: Commands for vSAN

14-29 Learner Objectives


• Use vsantop to view vSAN performance metrics

• Discuss how t o run commands from the vCent er Server and ESXi command lines

• Discuss how t o access vSphere ESXi Shell

• Use commands to view, configure, and manage your vSphere environment

• Discuss the esxcli vsan name space commands

• Discuss when t o use Ruby vSphere Console (RVC) commands

609
14-30 About vSphere ESXi Shell
You can use vSphere ESXi Shell to obtain command-line access to an ESXi host.

vSphere ES Xi Shell includes the fa llowing functionality:

• ESXCLI commands

• Addit ional vSphere and storage-related commands

610
14-31 Accessing vSphere ESXi Shell
You can access vSphere ESXi Shell in the fo llowing ways:

• Local access through the host's Direct Console User Interface (DCUI):

Enable the vSphere ESXi Shell service, either in the DCUI or the vSphere Client.

Access vSphere ESXi Shell from t he DCUI by pressing Alt+F1.

Swap between the DCUI and local ESXi Shell by pressing Alt+F2 and Alt+F1,
respectively.

Log out of the ESXi Shell by pressing Ctrl+C or entering the exit command.

Disable the vSphere ESXi Shell service when not using it.

• Remote access t hrough SSH:

Enable the SSH service, either in the DCUI or the vSphere Client.

Use an SSH client, such as Pu TTY, to access vSphere ES Xi Shell.

Disable the SSH service when you are not using it.

611
14-32 Examining the vsantop Utility
The vsantop utility focuses on monitoring vSAN performance metrics at an individual host
level.

Ensure t hat t he following conditions are true to use vs an top:

• You are running vSphere 6. 7 update 3 or lat er.

• SS H is enabled on the host.

• You are logged in to the host with root user privileges.

[root@sa-esxi-01 : -] vsantoo
8 : 13 : 38prn I entity type : host-dornclient

node Id iops throughput latencyAvg latencyStd ioCount congestion oio


e2351bc-c 0 4860 1140 248 5 0 1

Information and metrics presented for the host sa-esxi-01 host -domclient entit y are:

• The entity type, for example, host-domclient

• IOPS, nodeld, throughput, lat ency Avg, latencyStd, ioCount, congestion, and loD

vSAN includes a CLI called vsantop that provides t his data. The vsant op utility is built wit h an
awareness of vSAN architecture to retrieve focused metrics at a detailed interval.

You can invoke batch mode using the fallowing syntax:

vsantop - b - d <de l ay> - n <iterations> > <fi l ename and


loca t ion>

Sample command: vs an top - b - d 10 - n 6 > /vmf s/vo l umes/ datastore -


1/ t est/tes t . csv

This command executes vs an top in bat ch mode by capturing a snapshot of t he metrics every
10 seconds for 6 iterations and it stores it in t he respective location. As a result, the out put file
has one minute of statistical data.

612
14-33 Navigating vsantop
You can view or switch between entity types by ent ering the E command and choosing any of
the supported entit y types.

[root@sa-esx · - 1 : - vs~~to
9:00 :22pm I entity type: host-domclient

atency.;v latencySt:i ic:cunt ccnge3t1cn ClC icp3Rea:i thrcughput


Se2351bc-c 0 4759 3186 3442 s 0 1 I 4759
[root@sa-esxi-01 : -J vsantop

Current Entity type : host-domclient

1: cache-disk 2: capacity-disk 3 : clam-disk-stats


4: elem- host-s tats 5: cluster 6: cluster-domclient
7: cluster-domcompmgr 8: cmmds 9 : cn:.mds-net
10 : cn:mds- workloadstats 11: ddh-disk-stats 12 : disk-group
13: dom-per-proxy-owner 14: dom-proxy-owner 15 : dom-world-cpu
16: host 17: host- cpu * 18 : host-domclient
19 : host-domcompmgr 20 : host-domowner 21 : host-memory-heap
22: host-memory-slab 23: host- vsansparse 24: lsom-world-cpu
25: n:s-client-vol 26 : Ob] eCt 27 : statsdb
28 : s ystem-mem 29: virtual-disk 30: virtual-machine
31: vmdk-vsansparse 32 : vsan-cluster-capacity 33 : vsan-distribution
34: vs an-host-net 35: vsan-iscsi-host 36: vsan-iscsi-lun
37 : vsan-iscsi-tarqet 38 : vsan-pnic-net 39 : vsan-vnic-net

40: VSCSl.

Select entity type with 1-40, any other key to return: I


[root @sa- esxi- 01 : -] v santop
9 : 03 :36pm I entity type : cluster

ncdeid icpsRead thrcughput latenc:iA·,·g icpsl\ri te throughput latenc'.i•A·•g congesticn OlC


52d17529- c 0 4822 861 0 0 0 0 1

Enter the chosen entity type number to change the viewed entity type.

vSAN archit ecture comprises mu ltiple layers of software and hardware entities. An entity t ype
can denote a host, drive, or a soft ware component and has a unique identifier. Each instance of
vsantop can accommodate one entity type and up to nine columns of associated metrics. This
categorization helps you to understand usage patterns and correct or optimize the appropriate
entity.

Each ent ity type can have up to nine metric fields t hat can be displayed at any inst ance. You can
add or remove the relevant metric fields by using the f command. This command also displays a
list o f metrics associat ed with the ent ity.

For more information about navigating the vs an top utility, see Getting started with vsantop at
vsantop/ gett ing-started/">https:/I core. vmware.com/ resource/ gett ing-started-vsantop.

6 13
14-34 Examples of vsantop Entity Outputs
Information and metrics presented for the vsan-host-
[root@sa-esxi-01 : - ] vsantop net entity are nodeld, rx and tx throughput, and tcp tx
10 : 20 : 47pm 11 entity type : vsan- host- ne t J packets.

Se 2351bc- c 1827 9 0 4798 5 0 4 3 0

Information and metrics presented for the vsan-


[root@sa-es xi-01:-] vsantop cluster-capacity entity are nodeld, capacity total,
10 : 21 : 38pm I e ntity type : vs an- cluste r - capacity used, and free, and space saved by deduplication.
I
ncdeid total used free sa-.:edB:i :>ed dedupRatic
1
totalDp~..re
52d17529- c 1932545310 2102336921 1722311618 1970911862 282

Information and metrics presented for the vsan-


[root@sa-esxi-01:- vsanto distribution entity are nodeld, components,
10 : 23 :15pm I entity type : vsan-distribution domClients, domOwners, and domColocat.
----
5e2351bc-c 3 3 1 3 1

614
14-35 ESXCLI Commands
ESXCLI commands offer options in the following namespaces:

• esxcli namespace
• esxcli device namespace
• esxcli elxnet namespace
• esxcli fcoe namespace
• esxcli graphics namespace
• esxcli hardware namespace
• esxcli iscsi namespace
• esxcli network namespace
• esxcli nvme namespace
• esxcli rdma namespace
• esxcli sched namespace
• esxcli software namespace
• esxcli storage namespace
• esxcli system namespace
• esxcli vm namespace
• esxcli vsan namespace

615
14-36 Viewing vSphere Storage Information (1)
You use the esxcli storage command to display storage information, including
multipathing configuration, LUN specifics, and datastore settings.

esxcli storage
[root~sa-esxi-01 : -]
Usage : esxcli storage {cmd} [cmd options]

Available Namespaces :
core VMware core storage commands .
hpp VMware High Performance Plugin (HPP) .
nf s Operations to create , manage , and remove Network Attached
Storage filesystems .
nfs41 Operations to create , manage , and remove NFS v4 . 1
filesystems .
nmp VMware Native Multipath Plugin (NMP) . This is the VMware
default implementation of the Pluggable Storage
Architecture .
san IO device management operations to the SAN devices on the
system .
vf lash virtual flash Management Operations on the system .
vmfs VMFS operations .
vvol Operations pertaining to Virtual Volumes
f ilesystem Operations pertaining to filesystems , also known as
datastores , on the ESX host .
iofilter IOFilter related commands .

616
14-37 Viewing vSphere Storage Information (2)
You use the esxcli storage core device l ist command to display storage
device-related information.
[root@sa- esxi - 01 : -] esxcli storage core device list
mpx . vmhba0 : CO : T3 : LO
Display Name : Local VMware Disk (mpx . vmhba0 : CO : T3 : LO )
Has Settable Display Name : false
Size : 30720
Device Type : Direct- Access
Multipath Plugin : NMP
Devfs Path : /vmfs/devices/disks/mpx . vmhba0 : CO : T3 : LO
Vendor : VMware
Model : Vi r tual d i sk
Revision : 2 . 0
SCSI Level : 6
Is Pseudo : false
Status : on
Is RDM Capable : false
Is Local : true
Is Removable : false
Is SSD : true
Is VVOL PE : false
Is Offline : false
Is Perennially Reserved : false
Queue Full Sample Size : 0
Queue Full Threshold : 0
Thin Provisioning Status : unknown
Attached Filters :
V'AAI Status : unsupported
Othe r UIDs : vml . 0000000000766d686261303a333a30
Is Shared Clusterwide : false

[ r oot@sa- esxi - 0 1 : - ] esxcli sto rage c o re device l ist I grep vmhba O


mpx . vmhba 0 : CO : T3 : LO
Display Name : Lo c a l VMware Dis k (mpx . vmhba 0 : CO: T3 : LO)
Devfs Pat h : / vmfs /devi c es / disks / mpx . vmhba 0 : CO : T3 : LO
mpx . vmhba 0 : CO: T2 : LO
Display Name : Lo cal VMware Dis k (mpx . vmhba 0 : CO: T2 : LO)
Devfs Pat h : / vmfs /devi c es / disks / mpx . vmhba 0 : CO : T2 : LO
mpx . vmhba O: CO: Tl : LO
Display Name : Local VMware Disk (mpx . vmhba O: CO: Tl : LO)
De vfs Pat h : / vmfs / devi c es / disks / mpx . vmhba O: CO : Tl : LO
mpx . vmhba O: CO: TO: LO
Display Name : Local VMware Disk (mpx . vmhba O: CO : TO: LO)
Devfs Pat h : / vmfs / devices / disks / mpx . vmhba O: CO: TO: LO

617
14-38 Viewing vSphere Network Information (1)
You use the esxcli network command to display physical and v irtual network information.

esxcli network
Lroot~sa-esxi-Ul : -J
Usage : esxcli network {cmd} [cmd options]

Available Namespaces :
ens - Commands to list and manipulate Enhanced Networking Stack (ENS)
feature on virtual s wi tch .
firewall A set of commands for firewall related operations
.
1p Operations that can be performed on vmknics
multicast Operations having to do with multicast

nic Operations having to do wi th the configuration of Network
Interface Card and getting and updating the NIC settings .
port Commands to get information about a port
• •
sr1ovn1c Operations having to do wi th the configuration of SRIOV enabled
Network Interface Card and getting and updating the NIC
settings .
vm A set of commands for VM related operations
vswitch Commands to list and manipulate Virtual Switches on an ESX host .
diag Operations pertaining to network diagnostics

14-39 Viewing vSphere Network Information


(2)

You use the esxcli network nic 1 is t command to display vmnic information. '

[root@3a-e3x i-Ol : -J e3 XCli networ k nic li3t


Name PCI Device Driver Admin Statu3 Lin k S t atu3 Speed Duplex MAC Addre33 MTU De 3cript ion
------ ------------ -------- ------------ ----------- ----- ------ ----------------- ---- -----------
vrr.nicO 0000 : 03 : 00 . 0 nVI!'. xnet3 Up Op 10000 Full 00 : 50 : 56 : 01 : 3d : f7 1500 Vl1ware Inc . vmxnet3 Virtual Ethernet Controller
vmnicl OOOO : Ob : OO . O nvmxnet3 Op Op 10000 Full 00 : 50 : 56 : 01 : 3d : f8 1500 Vl1ware Inc . vmxnet3 Virtual Etherne t Controller
vmnic2 0000 : 13 : 00 . 0 nvmxne t3 Op Op 10000 Full 00 : 50 : 56 : 01 : 3d :f9 1500 Vl1ware Inc . vmxne t3 Virtual Etherne t Controller
vmnic3 OOOO : lb : OO . O nvmxnet3 Op Op 10000 Full 00 : 50 : 56 : 01 :3d:fa 1500 Vl1ware Inc . vmxnet3 Virtual Ethernet Controller

618
14-40 Listing Available Subcommands (1)
You use the esxcli esxcli command 1 is t command to display all available
subcommands.

You can include the grep command filters to more easily find the command you need.

[root@sa-esxi-01:~] esxcli esxcli command list I qrep cluster


vsan.cluster get Get information about the vSAN cluster that this host is joined to.
• •
vsan.cluster JOl.n Join the host to a vSAN cluster.
vsan.cluster leave Leave the vSAN cluster the host is currently joined to.
vsan.cluster new Create a vSAN cluster with current host joined . A random sub-cluster UU
ID will be generated.
vsan.cluster.preferredfaultdornain get Get the preferred fault domain for a stretched cluster.
vsan.cluster.preferredfaultdomain set Set the preferred fault domain for a stretched cluster.
vsan.cluster restore Restore the persisted vSAN cluster configuration.
vsan.cluster.unicastagent add Add a unicast agent to the vSAN cluster configuration.
vsan.cluster.unicastagent clear Removes all unicast agents in the vSAN cluster configuration.
vsan.cluster.unicastagent list List all unicast agents in the vSAN cluster configuration.
vsan.cluster.unicastagent remove Remove a unicast agent from the vSAN cluster configuration.

14-41 Listing Available Subcommands (2)

You use the esxcli esxcli command 1 is t command to display all available
subcommands.

You can include grep debug to filter your search to include the debug level.

[root@sa-esxi- 01 : ~] esxcll esxcli command list I grep debug


nvme.ariver.loglevel set Set NVMe driver log level and debug level
vsan.debug.advcfg list List all advanced configuration options with non-default values.
vsan.debug.controller list Print detailed information about all vSAN disk controllers (output may c
hange between releases)
vsan.debug.disk list Print detailed information about all vSAN disks in the cluster.

vsan.debug.disk overview Print overview information about all vSAN disks in the cluster.
vsan.debug.disk.summary get Print summary information about all vSAN disks in the cluster.
vsan.debug.evacuation precheck Examine what it takes if an entity (disk group or host) is evacuated in
.
various modes (Action) . The result is accurate when all hosts in the vSAN cluster are o f the same version and have the same disk format .
vsan.debug.limit get Print summary information about vSAN limits (output may change between r
eleases)
vsan.debug.memory list Print both userworld and kernel memory consumptions of vSAN.
vsan.debug.mob start Start vSAN Managed Object Browser Service.
vsan.debug.mob status Query vSAN Managed Object Browser Service is running o r not.
vsan .debug.mob stop Stop vSAN Managed Object Browser Service.
vsan .debug . object.health .summary get Print health summary information about all vSAN objects in the cluster (
output may change between releases)
vsan.debug.object list Print detailed information about vSAN objects in the cluster. This comma
nd would only show 100 obj ects at most by default.
vsan.debug.object
.
overview Print overview information about all vSAN obj ects in the cluster. This c
ommand would only show 100 objects at most by default.
vsan.debug.resync list Print detailed information about vSAN resyncing objects (output may chan
ge between releases)
vsan.debug.resync.summary get Print summary information about vSAN resyncing objects (output may chang
e between releases)
vsan.debug.vmdk list Print summary information about VMDKs on local vSAN datastore (output ma
y change between releases)

619
14-42 Other Useful Commands in vSphere ESXi
Shell (1)
In addit ion to ESXCL I commands, vSphere ESXi Shell provides other useful commands.

Press the Tab key twice to display the additional commands.

[root@sa-esxi-01 : -]
BootModuleConfig . sh host shutdown . sh scp
VmfsLatencyStats . py ho std sdrsinjector
Xorg ho std-probe secpolicytools
[ I hostd- probe . sh sed
[[ hostdCgiServer sensord
amldump hostname seq
apiForwarder hwclock services . sh
apply-host-profiles indcf g sets id
appl yHostProf ile inetd sf cbd
applyHostProf ileWrapper init sh

14-43 Other Useful Commands in vSphere ESXi


Shell (2)
Enter - h or - - he 1 p t o learn how to use commands.

[root@sa-esxi-01 : -] esxcfg-nics --help


esxcfg-nics <options> [nic]
-s l --speed <speed> Set the speed of this NIC to one of 10/100/1000/10000 .
Requires a NIC parameter .
-d l --duplex <duplex> Set the duplex of this NIC to one of ' full ' or ' half '.
Requires a NIC parameter .
-a l --auto Set speed and duplexity automatically . Requires a NIC parameter .
-1 1--list Print the list of NICs and their settings .
-e l --ens Print the ENS settings of all NICs .
-r l --restore Restore the nics configured speed/duplex settings (INTERNAL ONLY)
-h l --help Display this messaqe .

620
14-44 Other Useful Commands in vSphere ESXi
Shell (3)
Use the vdq command to see if the disks can be used in a vSAN cluster.
[root@sa-esxi-01 : -] vdq -iq
[
{
'' Name '' : ''mpx . vmhbaO : CO : T3 : LO '',
'' VSANUUID '' : '' 5292a16e-3e2b-8aal-4fc7-7ald5a3863bl '',
'' State '' : '' In-use for VSAN '',
'' Rea son '' : '' None '',
'' IsSSD '' : '' 1 '',
'' IsCapacityFlash '': '' l '',
'' Is PDL '' : '' 0 '' ,
'' Size (MB) '' : '' 30720 '',
'' FormatType '' : '' 512n '',
},

{
'' Name '' : ''mpx . vmhbaO : CO : T2 : LO '',
'' VSANUUID '' : '' 52c0ba5a-c9bf-3a 7d-4 9d5-8c3alb3abde8 '',
'' State '' : '' In-use for VSAN '',
'' Rea son '' : '' None '',
'' Is SS D'' : '' 1 '' ,
'' IsCapacityFlash '': '' 1 '',
'' Is PDL '' : '' 0 '' ,
'' Size (MB) '' : '' 3 0 7 2 0 '' ,
'' FormatType '' : '' 512n '',
},

621
14-45 Python Scripts for Testing Systems
Python scripts are useful when introducing f au Its for testing purposes. Several scripts are
applicable to vSAN.

[root@sa-esxi-01 : /usr/lib/vmware/vsan/bin] ls
VSANDeviceMonitor . py upgrade - vsanmgmtd- config . pyc
VsanSystemCmd vitRecoveryTool . pyc
__,..pycache~ vitd
clom-tool vitsafehd
clomd vsa.n-config. py
cmmdsAnalyzer . py vsan-health-status . pyc
cmmdsTimeMachine.py vsan-perfsvc-collector.py
cmmdsd vsanDiskFaultinjection . pyc
configVsanRP vsanobserver
dbobjtool vsanobserver.sh
ddecomd vsanObserverObfuscated.sh
epd vsansparseRealign
fixDescriptors . py vsanTraceCollector . pyc
iperf 3 vsanTraceReader
iperf3.copy vsanTraceReader.py
killinaccessiblevm.s . py vsanUpdateUuid.py
obfuscatecmmdsDump . py vsandf.pyc
obfuscateLog . pyc vsandpd
reboot_helper . py vsandpd-support.py
rpd vsanmgmtd
slotfstool vsansvcctl.py
tokenBucket.py

Python scripts often change between ESXi releases.

622
14-46 Using Python to Inject Errors
You can use the vsanDiskFau l tinj ection. pyc script to introduce a hot-unplug failure
state into a storage device on a host.

This script introduces a soft error, not a real hardware failure.

# python / usr~ libjvmware / vsan / binL vsanDiskFaultiniection.pvc -u -d mpx . vmhbaO:CO:Tl : LO


Injecting not unplug on evice vmlil::>aO:CO : Tl:LO
vsis -e se re i abi i y 7Vii\ks ress 7 Scsi a €ll r·-nJ.-
e ctError Oxl
vsish -e set / storage / scsifw/ paths / vmhbaO:CO:Tl:LO / injectError Ox 00 4C0400000002
#

623
14-4 7 About PowerCLI
PowerC LI delivers a set of cmdlets for managing vSA N.

- Export - SpbmStoragePolicy - Remove - SpbmStoragePolicy


- Get - SpbmCompatibleStorage - Remove - VsanDisk
- Get - SpbmStoragePolicy - Remove - VsanDiskGroup
- Get - VsanDisk - Set- VsanClusterConfiguration
- Get - VsanDisk - Set VsanFaultDomain
- Import - SpbmStoragePolicy - Set- SpbmStoragePolicy
- New- SpbmRuleSet - Test - VsanClusterHealth
- New- SpbmStoragePolicy - Test - VsanNetworkPerf ormance
- New- VsanDisk - Test - VsanStoragePerf ormance
- New- VsanDisk - Test - VsanVMCreation

This list shown does not include all cmdlets available for vSAN.

624
14-48 PowerCLI Commands: Example 1
An example of the Get-VsanDisk PowerCLI command is shown.

Canon ic a lNaroe Device Path lsSsd


-------------
rnpx.urohba1:C0:T1:L0
---------- -----
/urofs/deuices/disks/ropx.urohba1:C0:T1:L0 T1'ue
rnpx.urohba1:C0:T2:L0 /urofs/deuices/disks/ropx.urnhba1:C0:T2:L0 ll'Ue
rnpx.urohba1:C0:T3:L0 /urofs/deuices/disks/ropx.umhba1:C0:T3:L0 T1'lte
rnpx.urohba1:C0:T4:L0 /urofs/deuices/disks/ropx.urnhba1:C0:T4:L0 True
mpx.urohba1:C0:T6:L0 /urofs/deuices/disks/ropx.umhba1:C0:T6:L0 True
rnpx.urohba1:C0:T5:L0 /urofs/deuices/disks/ropx.urohba1:C0:T5:L0 True
rnpx.urohba1:C0:T1:L0 /urofs/deuices/disks/ropx.urnhba1:C0:T1:L0 True
rnpx.urohba1:C0:T2:L0 /urofs/deuices/disks/ropx.urnhba1:C0:T2:L0 T1'ue
rnpx.urohba1:C0:T3:L0 /urofs/deuices/disks/ropx.urnhba1:C0:T3:L0 T1'ue
mpx.urohba1:C0:T4:L0 /urofs/deuices/disks/ropx.urohba1:C0:T4:L0 True

625
14-49 PowerCLI Commands: Example 2
An example of the Get - VsanDiskGroup PowerC LI command is shown.

lo rCLI C:') Get -VsanD i s kGr o up


Name UMHost
---- ------
Disk group (0000000000?66d686261313a313a30) sa-esxi-01.uclass •..
Disk group (0000000000?66d686261313a343a30) sa-esxi-01.uclass ..•
Disk group (0000000000?66d686261313a313a30) sa-esxi-02.uclass .. .
Disk group (0000000000?66d686261313a343a30) sa-esxi-02.uclass .. .
Disk group (0000000000?66d686261313a313a30) sb-esxi-03.uclass ..•
Disk group (0000000000?66d686261313a343a30) sb-esxi-03.uclass •.•
Disk group (0000000000?66d686261313a313a30) sb-esxi-04.uclass .. .
Disk group (0000000000?66d686261313a343a30) sb-esxi-04.uclass .. .
Disk group (0000000000?66d686261313a323a30) sc-witness-01.ucl •.•

626
14-50 ESXCLI Namespaces in vSAN
In vSAN 7, the esxcli vsan command offers the following namespaces and ESXCLI
functions.

# vmware -1
VMware ESXi 7 . 0 GA
I
#
I esx cli vsan
Osage : esx cli vsan {cmd} [end options)

Available Namespaces :
c~uster Con:mands for vSAN host cluster configuration
crr.mds Commands for vSAN CMMDS (Cluster monitoring, reerr.bership, and directory service) .
datastore Corr.rnands for vSAN datastore configuration
debug Corr.mands for vSAN debugging
encryption Corunands for vSAN Encryption .
health Con:mands for vSAN Health
.
iscsi Corr.mands for vSAN iSCSI target configuration
network Corr.mands for vSAN host network configuration
perf Corr.mands for vSAN performa.~ce service configuration .

re sync Commands for vSAN resync configuration


storage Corr.mands for vSAN physical storage configuration
faultdomain Con:mands for vSAN fault domain configuration
maintenancemode Commands for vSAN maintenance mode operation
policy Corr.ma::'lds for vSAN storage policy configuration
trace Con:mands for vSAN trace configuration

627
14-51 Using the esxcli vsan network Command

You use the esxcli vsan network command to gather information about t he vSAN
network and other network-related information.

[root@sa-esxi-01 : -] esxcli vsan network


Usage : esxcli vsan network {cmd} [cmd options]

Available Namespaces :
-
lp Commands for configuring IP network for vSAN .
ipv4 Compatibility alias for " ip"

Available Corrunands :
clear Clear the vSAN network configuration .
list List the network configuration currently in use by vSAN .
remove Remove an interface from the vSAN network configuration .
restore Restore the persisted vSAN network configuration .

628
14-52 Using the esxcli vsan network list
Command
You use the esxcli vsan network l ist command to verify if the VMkernel port is
used by vSAN.

[root@sa-esxf-01 : -] esxcli vsan network list


Interface
VmkNic Name : vmk2
IP Protocol : IP
Interface UUID : 7e 803a5e-be2d-e 8 4 8 -24b6-00505602b80e
Agent Group Multicast Address : 224 . 2 . 3 . 4
Agent Group IPv6 Multicast Address : ff19 :: 2 : 3 : 4
Agent Group Multicast Port : 23451
Master Group Multicast Address : 224 . 1 . 2 . 3
Master Group IPv6 Multicast Address : ff19 :: 1 : 2 : 3
Master Group Multicast Port : 12345
Host Unicast Channel Bound Port : 12321
Multicast TTL : 5
Traffic Type : vsan

629
14-53 Activity: Using the esxcli vsan network
Command
The esxcli vsan ne t wo rk command is used to list network details, and to add and
remove the vmnic that provides vSAN network connectivity.

Does running the remove command on a host in a vSAN cluster create a network partition?

[root@sa-esxi-01 : -] esxcli vsan network remove -i vmk2


[root@sa-esxi-01 : -]

630
14-54 Activity: Using the esxcli vsan network
Command Solution
The esxcli vsan network command is used t o list network details, and to add and
remove t he v mnic t hat provides vSA N network connectivity.

Does running t he remove command on a host in a vSAN cluster create a network partition?
Yes.

[root@sa-esxi-01 : - ] esxcli vs an network remove -1 vmk2
y,_ "1 • -

..
rrnnt-ra
..
[root@sa-esxi-01 : - ] esxcli vs an network list
[root@sa-esxi-01 : - ]
.
[root@sa-esxi-01 : - ] esxcli vs an network ipv4 add -1 vmk2
[root@sa-esxi-01: - ]
[root@sa-esxi- 01 : - ] esxcli vs an network list
Interface
VmkNic Name : vmk2
IP Protocol : IP
Interface UUID: 20fadb59-0ac3-4ba7-98a5-005056013df7
Agent Group Multicast Address : 224 . 2 . 3 . 4
Agent Group IPv6 Multicast Address : ff19 :: 2:3 : 4
Agent Group Multicast Port : 23 451
Master Group Multicast Address : 224 . 1 . 2 . 3
Master Group IPv6 Mu l ticast Address : ff19::1:2:3
Master Group Multicast Port : 12345
Host Unicast Channel Bound Port : 12321
11ul t icast TTL : 5
Traffic Type : vsan
[root@sa-esxi-01: - ]

631
14-55 Using the ESXCLI Debug Namespace
You can use the debug namespace to troubleshoot vSAN.

[root@sa-esxi-01 : -J esxcli vsan debug


Usage : esxcli vsanldebug {cmd} [cmd options]

Available Namespaces :
disk Debug commands for vSAN physical disks
object Debug commands for vSAN objects
re sync Debug commands for vSAN resyncing objects
advcfg Debug commands for vSAN advanced configuration options .
controller Debug commands for vSAN disk controllers
evacuation Debug commands for simulating host , disk or disk group evacuation in
various modes and their impact on objects in vSAN cluster
limit Debug commands for vSAN limits
memory Debug commands for vSAN memory consumption .
mob Debug commands for vSAN Managed Object Browser Service .
vmdk Debug commands for vSAN VMDKs

632
14-56 Activity: Using the esxcli vsan debug
Command
See the screenshot to answer questions about the esxc l i vsan debug namespace.
• Which namespace is used to v iew the state of the virtual disks?

• Which namespace is used to v iew the queue depth?

[root@sa-esxi-01 : -t esxcli vsan debug


Usage : esxcli vsan debug {cmd} [cmd options]

Available Namespaces :
disk Debug commands for vSAN physical disks
object Debug commands for vSAN objects
re sync Debug commands for vSAN resyncing objects
advcfg Debug commands for vSAN advanced configuration options .
controller Debug commands for vSAN disk controllers
evacuation Debug commands for simulating host , disk or disk group evacuation in
various modes and their impact on objects in vSAN cluster
limit Debug commands for vSAN limits
memory Debug commands for vSAN memory consumption .
mob Debug commands for vSAN Managed Object Browser Service .
vmdk Debug commands for vSAN VMDKs

633
14-57 Activity: Using the esxcli vsan debug
Command Solution
See the screenshot to answer questions about the e sxc l i vsan debu g namespace.
• Which namespace is used to v iew the state of the virtual disks?

• Which namespace is used to v iew the queue depth?

[root@sa-esxi-01 : -I esxcli vsan debug


Usage : esxcli vsan debug {cmd} [cmd options]

Available Namespaces :
disk Debug commands for vSAN physical disks
object Debug commands for vSAN objects
re sync Debug commands for vSAN resyncing objects
advcf Debug commands for vSAN advanced configuration options .
controller Debug commands for vSAN disk controllers
evacuation Debug commands for simulating host , disk or disk group evacuation in
various modes and their impact on objects in vSAN cluster
limit Debug commands for vSAN limits
memory Debug commands for vSAN memory consumption .
mob Debug commands for vSAN Managed Object Browser Service .
[ vmdk- ) Debug commands for vSAN VMDKs

634
14-58 Using ESXCLI to Investigate Object
Health (1)
You use the esxcli vsan debug object overview command t o display the overall
health of individual objects.

[rooc@sa- es xi - 01 : -J esxcli vsan debug objecc overview


Objecc OUID Group OUID Version Size Osed SPBM Profile Healchy Componencs
------------------
77f6965e-1639-6aac-c884-005056013df7 00000000-0000-0000-0000-000000000000 11 255.00 GB 0 . 04 GB N/A l of l
3bce975e-10le-Oeld-d623-00505qq_l3e28 3bce975e-10le-Oeld-d623-005056013e28 11 0 . 00 GB 0 . 48 GB N/A 2 of 4
42ce975e- edl4- e0f8- 7056- 00505Ekm.3e28 3bce975e-10le- Oeld-d623-005056013e28 11 0 . 00 GB 0.01 GB N/A 2 of 4
621e985e-36cb-9a8a-2f72-005056013e28 621e985e-36cb-9a8a-2f72-005056013e28 11 255.00 GB 2.03 GB vSAN Defaulc Scoraqe Policy 3 of 3
7cce975e-65el-0108-c64d-005056013e2c 7cce975e-65e1-0108-c64d-005056013e2c 11 0 . 00 GB 0 . 46 GB N/A 2 of 4
83ce975e- bd72- 8ac0- 9e5b- 005056013e2c 7cce975e- 6Sel- 0108- c64d- 005056013e2c 11 0 . 00 GB 0 . 01 GB N/A 2 of 4
8lf596Se-6f5e-f 022-fdf5- 005056013df7 7ef5965e-0187-007f-672d-005056013df7 11 12 . 00 GB 2 . 59 GB FSVM Profile DO NOT MODIFY 1 of 1
8lf596Se- 3f5b-2d51- ea57-005056013df7 7ef5965e- 0187- 007f-672d-005056013df7 11 0 . 25 GB 0 . 26 GB FSVM- Profile- DO- NOT- MODIFY l of l
3bf6965e-lca5-ed77-b8lf-005056013df7 7ef5965e-0187-007f-672d-005056013df7 11 12.00 GB 1.01 GB FSVM Profile DO NOT MODIFY l of l
8lf596Se-Ob44-ab7a-ad88-005056013df 7 7ef5965e-0187-007f-672d-005056013df7 11 15 . 00 GB 0 . 08 GB FSVM Profile DO NOT MODIFY l of 1
7ef5965e- 0187- 007f- 672d- 005056013df7 7ef5965e-0187- 007f- 672d- 005056013df7 11 255.00 GB 0 .4 1 GB FSVM- Profile- DO- NOT- MODIFY l of l
3bf6965e-6984-bea7-lal6-005056013df7 7ef5965e-0187-007f-672d-005056013df7 11 0 . 25 GB 0 . 00 GB FSVM Profile DO NOT MODIFY l of 1
3bf696Se-fbc7 - 73d7- e96a- 005056013df 7 7ef5965e-Ol87- 007f- 672d- 005056013df7 11 15 . 00 GB 0 . 05 GB FSVM Profile DO NOT MODIFY l of 1
8dfcda5e-f290-5a0a-f095-005056013cdc 85fcda5e-fb26-e0ee-5569-005056013cdc 11 0 . 00 GB 0 . 01 GB N/A
- - 2 of 4
85fcda5e-fb26-e0ee-5569-005056013cdc 85fcda5e-fb26-e0ee-5569-005056013cdc 11 0 . 00 GB 0 . 42 GB N/A 2 of 4
abcd975e - lld9- f80a- 566c- 005056013df7 a5cd975e - 1la9- a2ce - 24c7- 005056013df7 11 0 . 00 GB 0 . 01 GB N/A 2 of 4
a5cd975e-11a9-a2ce-24c7-005056013df7 a5cd975e-lla9-a2ce-24c7-005056013df7 11 0 . 00 GB 0 .52 GB N/A 2 of 4
e7cd975e- 7fbe-548c- 9f03- 005056013cdc e7cd975e-7fbe-54 8c- 9f03- 005056013cdc 11 0 . 00 GB 0 . 49 GB N/A 2 of 4
edcd975e-d097-5fc5-118c-005056013cdc e7cd975e- 7fbe-548c-9f03- 005056013cdc 11 0 . 00 GB 0 . 01 GB N/A 2 of 4

635
14-59 Using ESXCLI to Investigate Object
Health (2)
You use the esxcli vs an debug object heal th summary get command in
troubleshooting vSAN.

[root@ sa-esxi-02 : ~] esxcli v san debug object health summary get


Health Status Number Of Objects
------------------------------------------------ -----------------
healthy 12
nonavailability-related-incompliance 0
nonavailability-related-reconfig 0
reduced-availability-with-active-rebuild
data-move
0
0 •
inaccessible 0
reduced-ava1 :ab 1 1 t y-w1 th- n o- re u1 ct-de ay-t1mer 0
r educed-availability-with-no-rebuild 0
- . --

[root@sb-esxi-02 : - J esxcli vsan debuo object health summary oet


Health Status Number Ot Ob)ects
------------------------------------------------ -----------------
nonavailability-related-incompliance O
reduced-availability-with-no-rebuild-delay-ti~r O
data-move 0
inaccessible 10
reduce -ava al51l1t_y-w1tn-act1ve-rebu1ld 0
healthy 2
reduced-availability-with-no-rebuild 0
nonavailability-related-recontig 0
rrootRsh-~sxi-02: - l

636
14-60 Using ESXCLI to Investigate VMDK Files
You use the esxcli vsan debug vmdk list command to display the health of
individual VMDK objects.
[root@sa-esxi-01 : -] esxcli vsan debug vmdk list
...........................--~------------------------------------------
0 b j e ct : 14e4al5e-2898-70bl-9790-00505602b80e
Health : reduced-availability-with-no-rebuild
Type : vdisk
Path : /vmfs /volumes/vsan : 52dl7529cc48e68f-8lac938c97a594lb/Ode4al5e-f9e9-b5f4-54eb-005
05602b80e/New Virtual Machine . vmdk
Directory Name : N/A

Object : Ode4al5e-f9e9-b5f4-54eb-00505602b80e
Health : reduced-availability-with-no-rebuild
Type : vmnamespace
Path : /vmfs/volumes/vsan : 52dl7529cc48e68f-8lac938c97a594lb/New Virtual Machine
Directory Name : New Virtual Machine

Object : 6f813a5e-fa21-8792-3f80-00505602b80e
Health : reduced-availability-with-no-rebuild
Type : vmnamespace
Path : /vmfs/volumes/vsan : 52dl7529cc48e68f-8lac938c97a594lb/ . vsan . stats
Directory Name : . vsan . stats

637
14-61 Activity: Using the esxcli vsan debug
vmdk list Command
The esxcli vsan debug vmdk list command can show other objects on the vSAN
datastore.

Is the iSCSI target service configured in this vSAN cluster?

Ob) ct : btc6 dS9-b712-2SdZ-6939-00SOS60 13 21


He ltb: h ... ltby
Typ : ap C
P th: /V'ltlf.~/ vol ~/v~ n:S2130Sc4430dd84S-dS9b19437l6tdc8 I
Directory r: : Ub110 4 lta1

ObJeet : 44c7 dS9-bb36-bd.da-09S2-00SOS60 13e28


K 1th: h lthy
Typ.e: vd1 eJ1i:
Poth: I t a/vol 3/v n:S2130Sc4430dd81S-dS9'b1943736tdc88/3ic7 dS9·
ct19-b69?- 088-00SOS6013C28/ub 14 04lt~2.V?tdlc
Directory rl : ti/A

ObJ ct : 2 d'S9-t201-S16c-d 4b-OOSOS6013dt 7


H alth: h lthy
Type: vm:sv p
P th: /vt:it. /vol /v n:S2130Se1130dd81S-dS9b191373,tdc86/c c7 d59·
t 2 c-2d)c- OS4-00SOS6013c2 /W> 11 0 ~lt~ S-27S t 0b t 0.v~vp
1>1rcctorv 1: : ti/ A

ObJeCt : 3224d6S9-4dec-1 t7-3ttt-OOSOS6013e28


Kc 1th: h lthy
Type : V'll'C v p
P th: /vc.1 / vol / v n:S2130Sc1430dd 1S-dS9b1913736tdc88/3tc7 dS9·
ct19- 92-6088-00SOS6013c28/ub l101lt~2-0 4t S 118.v~vp
Directory : : ti/A
:t>Jcct : Odt868S9-7 cl-cb4 1-c9 1c-OOSOS601d84a
ff"' 1th: he lthy
Typ : • :sp cc

CbJecc: Odt86859-7ac l -cb q1-c9 lc-OOSOS601d84a


Heolth: heolthy
Type: vmnamespace
Poth: /vrnt~/ vo lwne~/v~on :S2c l3480d 4c 3b tSc-9667b2o9 4t69c020
Directory Name: . 1SCSl-CONfIC

638
14-62 Activity: Using the esxcli vsan debug
vmdk list Command Solution
The esxcli vsan debug vmdk list command can show other objects on the vSAN
datastore.

Is the iSCSI target service configured in this vSAN cluster? Yes.

ObJ ct: bfc6 dS9-b712-2Sd2-8939·00SOS6013 21


He lth: be ltby
Ty : p c
P th: I c/vol c/vD n:S2130Sc14l04d81S-dS9b1913?l6tdc I
Dlr Ctory l: : W>1404lt81
Cl>) ct : 44 c? dS9-bb36-bdd -0952-00S0560 13e28
K 1th: h l~hy
TYP"': vd1:s>:
P h: /vrr.t / vol s/v n:S2130Sc4430dd84S-dS9b1943?36tdc88/3tc7 dS9·
et19-b692-6088-00SOS6013e28/Ub1404lt~?.vndk
Dir ctory 1: : l/ A

Cb)cct : 1 24d S9-t 201-S1 6c-d 4b-OOSOS 0 13dt7


H1th: h,,. lthy
Typ'": vm:su p
P th: /Vb.f. / vol / va n:SZ130Sc1130dd81S-d59b1943736tdc88/c c7 s ·
t2 ~-2dlc-cOS ~ -OOSOS60 1 3c2 /ub 140 41 ~S-2?$tObt0.vDvp
1>1r:ectory r: : :111.
Cb)ect : 322 4d,S9-4dec-4 1 7-3 fft -OOSOS60 13e28
Kc 1th: h lthy
Type: v=sv p
P th: /vr:i~ / vol / v n: 2 130Sc1130dd 1 -dS9b1913?36tdc /3 f c7 dS •
etl9-ba.92-6068-00SOS6013 28/Ub140 11 ~2-04!5 178 .v:svp
D1ccctory r: : ll A
:t>3ec t: Odt868S9-7 e l -cb41-c9 1c-OOSOS60 ld84e
H 1th: he lthy
TVJ)e : ~P c~

ObJect : Odt868S9 -7ac1 -cb ~l-c9 lc-O OSOS60 1d8 4a


Keolth: heolthy
Type: vmnemespace
Path: / vrnts/ volwnc~/vsan:S2c13 480d4c 3btSc-9667b2 a94t69c02t
D1rectory Name: .1SC~I CetlFIC

639
14-63 vSAN Health Check Results: Overall
State
The esxcli vsan heal t h c luster l is t command displays the overall state o f t he
cluster.
# esxcli v san health cluster list
Health Test Name Status
--------------------------------------------------
Overal l health
----------
green (OK)
Cluster green
ESXi vSAN Health service installation green
vSAN Health Service up-to- date green
Advanced vSAN configuration in sync green
vSAN CLOMD liveness green
vSAN Disk Balance green
Resync operations throttling green
Software version compatibility green
Disk format version green
Network green
Hosts disconnected from VC green
Hosts with connectivity issues green
vSAN cluster partition green
All hosts have a vSAN vmknic configured green
vSAN : Basic (unicast) connectivity check green
vSAN : MTU check (ping with l arge packet size) green
v Motion : Basic (unicast) connectivity check green
v Motion : MTU check (ping with large packet size) green
Network latency check green
Data green
vSAN object health green
Limits green
Current c l uster situation green
After 1 additional host failure green
Host comp onent limit green
Physical disk green
Operation health green
Disk capacity green
Congestion green
Component limit health green
Component metadata health green
Memory pools (heaps) green
Memory pools (slabs) green
Performance service green
Stats DB object green
Stats master election green
Performance data collection green
All hosts contributing stats green
Stats DB object conflicts green
-

640
14-64 Using ESXCLI to Investigate Health
Check Results
You use the esxcli vsan health cluster 1 is t command t o display the most
recent health check result s.
(root@sa-esxi-01 : -J esxcli vsan health cluster list
Health Test Name Status
( r~t@sa-esx i -01: -) esxcl i vsan hea l th cluster l i st
Overall health green (OK) Health Test Name Status

Advanced vSAN configuration in sync green Overa l l heal th red (Network misco n!i quration)
vSAN daemon liveness green Netwo rk red
vSAN Disk Balance green Hosts with co nnec tivity i ssues red
Resync operations throttling green vSAN luste a
Software version compatibility green All hosts have a vSAN vmlmic confiqured qreen
vSAN: Bas i c (unic ast) connect i vity check qreen
Disk format version green vSAN: MIU check (p i nq with larqe packet size ) qreen
Network green vMot ion: Basic (uni cast ) co nnec t ivi t y chec k qreen
Hosts with connectivit y issues green vMotio n: MTU chec k (pi nq with larqe packet size ) qreen
vSAN cluster partition green
All hosts have a vSAN vmknic configured green Data red
vSAN : Basic (unicast) connectivity check green vSAN obJect health red
vSAN: MTU check (ping with large packet size) green Perfo rtrAnce se rvice red
vMotion : Basic (unicast) connectivity check green Stats DB Ob)eCt red
vMotion: MTU check (ping with large packet size) green - .....,tats master e'lec tio n qreen
Perfo rmance data col l ec t ion qreen
Data green All hosts co ntributinq stats qreen
vSAN ob "ect health reen Stats DB ob)ect conflicts green
apac1 y ut1 green Phys ical di sk qreen
Disk space green Operation health qreen
Read cache reservations green Disk c apaci t y qreen
Component green Conqest ion green
What if the most consumed host fails green
J
Component limit health qreen
Physical disk green Component l!l!!tadata health qreen
Memor y pools (heaps } qreen
Operation health green Memor y pools (slabs ) green
Disk capacity green Cluster qreen
Congestion green Advanced vSAN co nfi quration i n sync qreen
Component limit health green vSAN daemon liveness qreen
Component metadata health green vSAN Disk Balance qreen
Memory pools (heaps) green Resync operations throttlinq qreen
p
Software version co:r:patibi lity green
Performance service green Disk f ormat versio n qreen
Stats DB object green Capaci t y ut ili zation qreen
..
~tin! etec-erl5ft~~-------------------------'l!jfi'l!~i""-- Disk spac e qreen
Performance data collection green Read c ache reservatio ns qreen
All hosts cont ributing stats green Component green
Stats DB object conflicts green What i f the most consumed host ta i ls q r een

641
14-65 vSAN Health Check Results: Query Failed
Tests

The esxcli vsan heal th cluster get - t ''name of test'' command can
query any failed test and show vSAN disks as absent.

[root@sa-esxi-01 : -] esxcli vsan health cluster get -t " vSAN cluster partition "
vSAN cluster partition red

Checks if the vSAN cluster is oartitioned due to a network issue .


Ask VMware : http : // www . vmware . com/esx/support/askvmware/index . php?eventtype=com . vmware . vs
an . health . test . clusterpartition

Partition list
Host Partition Host UUID

172 . 20 . 211 . 200 1 5e2351bc-ce8d-eelb-cf36-00505602b80e


172 . 20 . 211 . 207 1 5e235396-8145-b769-fa88-005056027dae
172 . 20 . 211 . 206 Partition is unknown 5e2352dl-eb3d-Oc16-5ce8-005056029973

Youusetheesxcli vsan health cluster getcommandtoqueryandviewvSAN


Health test results. To query a test, select any test seen in the esxcli vs an heal th
cluster l i s t results.

Youusetheesxcli vsan health cluster get -tcommandtoqueryaspecific


vSAN Health test. This command can be helpful in troubleshooting issues from the command
line. In this example, the vSAN cluster is partitioned because of a network issue.

642
14-66 Activity: Using the esxcli vsan health
cluster get t Command
The esxcli vsan heal th cluster get - t ''name of test'' command returns
the reason for the test result.

• Why does a warning appear?

• What can you do to address the cluster partition?

[root@sa-esxi-01 : -] esxcli vsan health cluster get -t " vSAN cluster partition "
vSAN cluster partition red

Checks if the vSAN cluster is partitioned due to a network issue .


Ask VMware : http : // www . vmware . com/esx/support/askvmware/index . php?eventtype=corn. vmware . vs
an . health . test . clusterpartition

Partition list
Host Partition Host UUID

172 . 20 . 211 . 200 1 5e2351bc-ce8d-eelb-cf36-00505602b80e


172 . 20 . 211 . 207 1 5e235396-8145-b769-fa88-005056027dae
172 . 20 . 211 . 206 Partition is unknown 5e2352dl-eb3d-Oc16-5ce8- 005056029973

643
14-67 Activity: Using the esxcli vsan health
cluster get t Command Solution
The esxcli vsan heal th cluster get - t ''name of test'' command returns
the reason for the test result.

• Why does a warning appear? A vSAN cluster partition has been identified.

• What can you do to address the cluster partition? Verify vSAN host network connectivity.

rroot@sa-es xi-01 : -1 esxcli vsan health cluster get -t " vSAN cluster partition "
vSAN cluster partition red

Checks if the vSAN cluster is partitioned due to a network issue .


As ware : tp : www . vrnware . com esx supper as vrnware in e x . p p?eventtype=corn. vrnware . vs
an . health . test . clusterpartition

Partition list
Host Partition Host UUID

172 . 20 . 211 . 200 1 5e2351bc-ce8d-eelb-cf36-00505602b80e


172 . 20 . 211 . 207 1 5e235396-8145-b769-fa88-005056027dae
172 . 20 . 211 . 206 Partition is unknown 5e2352dl-eb3d-Oc16-5ce8- 005056029973

644
14-68 Using ESXCLI to Investigate vSAN
Controllers
You use the es x cli vsan debug controlle r l i st command t o query the
contro ller f o r it s statist ics.

[root@sa-esxi-04 : ~ ] esxcli vsan debug controller list


Device Name : vmhbaO
Device Display Name : Intel Corporation P I IX4 for 430TX/440BX/ MX IDE Control l er
Used By VSAN : false
PCI ID : 8086/7111/15ad/1976
Driver Name : vmkata
Driver Version : 0 . 1-lvmw . 650 . 1 . 26 . 5969303
Max Supported Queue Depth : 1

Device Name : vmhbal


Device Display Name : LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI
Used By VSAN : true
PCI ID : 1000/0030/15ad/ 1976
Driver Name : mptspi
Driver Version : 4 . 23 . 01 . 00-lOvmw
Max Supported Queue Depth : 127

645
14-69 Activity: Using the esxcli vsan debug
controller list Command
You refer to the V Mware Compatibility Guide and determine that the minimum 1/ 0 controller
queue depth for your env ironment is 512.

• Do the listed controllers meet VMware requirements for v SAN controllers?


[root@sa-esxi-04 : ~ ] esxcli v san debug con t r ol l er list
Device Name : vmhbaO
Device Display Name : Intel Corporation PIIX4 for 430TX/440BX/ MX IDE Controller
Used By VS .fl.N: false
PC! ID : 8086/ 71 11/ 15ad/1976
Driv e r Name : vmkata
Driver Version : 0 . 1- l vmw . 650 . 1 . 26 . 5969303
Max Supported Queue Depth : 1

Device Name : vmhba l


Device Dis p lay Name : LSI Logic / Symbios Log i c 53c1030 PCI -X Fusio n-MPT Dua l Ult r a320 SCS I
Used By VSAN: true
PC! ID : 1000/0030/15ad/197 6
Driver Name : mptspi
Dr i v e r Versio n: 4 . 23 . 0 1. 00-lOvmw
Max Sup por t ed Queue Depth : 127

646
14-70 Activity: Using the esxcli vsan debug
controller list Command Solution
You refer to the V Mware Compatibility Guide and determine that the minimum 1/ 0 controller
queue depth for your env ironment is 512.

• Do the listed controllers meet VMware requirements for v SAN controllers?


No, neither controller meets the minimum required queue depth.
[root@sa-esxi-04 : ~ ] esxcli vsan debug contr oller list
Device Name : vmhbaO
Device Dis p lay Name : Intel Corporation PIIX4 for 4 30TX/440BX/ MX IDE Controller
Used By VSAN: false
PC! ID : 8086/ 7 1 11/ 15ad/197 6
Driver Name : vmkata
Driver Version : 0 . 1- l vmw . 650 . 1 . 26 . 5969303
Max Sup ported Queue Depth : 1

Device Name : vmhba l


Device Dis p lay Name : LSI Logic / Symbios Log i c 53c 1030 PCI-X Fusio n - MPT Dua l Ul tra320 SCSI
Used By VSAN: t r ue
PC! I D: 1000/0030/ 15ad/ 197 6
Driv er Name : mptspi
Dr i v e r Ve r sio n: 4 . 23 . 0 1. 00- l Ovmw
Max Supported Queue De pth : 127

Queue depth is important, as problems are with controllers that have small queue depths.
Controllers w ith queue depths of less than 256 can affect VM 1/0 performance when vSAN is
rebuilding components either because of a failure or when entering maintenance mode.

647
14-71 Using ESXCLI to Investigate Fault
Domains
The esxcli vsan faul tdomain get command ret rieves details about t he fault domain
membership of the hosts.

[root@sa-esxi-04: - ] esxcli vsan faultdomain get


Fault Domain Id: 59aabf 2 9-f948-36f 2 -7f75-005056013 e 2 c
Fault Domain Name:
[root@sa-esxi-04: - ]

[root@sb-esxi-01: ~ ]
esxcli vsan faultdomain get
Fault Domain Id: a054ccb4-ff 68-4c73-cbc2-d272d4Se3 2 df
Fault Domain Name: Preferred
[root@sb-esxi-01: ~ ]

648
14-72 Activity: Using the esxcli vsan
f aultdomain Command
The esxcli vsan faul tdomain get command shows if a host is a member of a fau lt
domain.

• Which host is a member of a fault domain?

[root@sa-esxi-04:-] esxcli vsan faultdomain get


Fault Domain Id: 59aabf29-f948-36f2-7f75-005056013e2c
Fault Domain Naroe:
[root@sa-esxi-04:-]

[root@sb-esxi-01:-] esxcli vsan faultdomain get


Fault Domain Id: a054ccb4-ff 68-4c73-cbc2-d272d45e32df
Fault Domain Name: Preferred
[root@sb-esxi-01:-]

649
14-73 Activity: Using the esxcli vsan
f aultdomain Command Solution
The esxcli vsan faul tdomain get command shows if a host is a member of a fau lt
domain.

• Which host is a member of a fault domain?

[root@sa-esxi-04:-] esxcli vsan faultdomain get


Fault Domain Id: 59aabf29-f948-36f2-7f75-005056013e2c
Fault Domain Naroe:
[root@sa-esxi-04:-]

[root@sb-esxi-01:-] esxcli vsan faultdomain get


Fault Domain Id: a054ccb4-ff 68-4c73-cbc2-d272d45e32df
Fault Domain Name: Preferred
[root@sb-esxi-01:-]

650
14-7 4 Using ESXCLI to Investigate Drive Type
and Tier
You use the esxcli vsan storage list command to display information about all the
drives (also called disks) on a host.
[root@sb-esxi-03:- ) esxcli vsan storage list
mpx .~a0 :CO :T2:LO
Device: mpx.vmhbaO:CO:T2:LO
Display Name: mpx.vmhbaO:CO:T2:LO
Is SSD: true
VSAN UUID: 524e3cb7-d353-7519-7de0-cb0df5912c18
VSAN Disk Group UUIJ: 529e4al3- 76!6- !62c- 09d6- 4be300!d65c4
VSAN Disk Group Name: mpx.vmhbaO:CO:Tl:LO
Used by this host: true
In CMMDS: true
On-disk format version: 11
Deduplication: true
Compression: true
Checksum: 5086114152554878937
Checksum OK: true
Is Capacity Tier: true
Encryption Metadata Checksum OK: true
Encryption: :alse
DiskKeyLoaded : false
Is Mounted: true
Creation Time: Wed Feb 5 09 : 18 : 47 2020

mpx.vmhbaO:CO:Tl:LO
Device: mpx .vmhbaO:CO:Tl:LO
Display Name: mpx.vmhbaO:CO:Tl:LO
Is SSD: true
VSAN UUIJ: 529e4a13- 76f6- f62c- 09d6- 4be300fd65c4
VSAN Disk Group UUID : 529e4a13-76f6-f62c-09d6-4be300fd65c4
VSAN Disk Group Name : mpx . vmhbaO:CO:Tl:LO
Used by this host: true
In O!MDS: true
On-disk format version: 11
Deduplication: true
Compression: true
Checksum: 408489704197683578
Checksum OK: true
Is Capacity Tier: fa lse
Encryption Metadata Checksum OK: true
Encryption : false
DiskKeyLoaded: false
Is Mounted: true
Creation Time: Wed Feb 5 09 : 18:47 2020

651
14-75 Activity: Using the esxcli vsan storage list
Command
The esxcli vsan storage list command displays details about each storage device
attached to the host.

• Is the first device a magnetic or a solid-state drive?

• Is it used for cache or for capacity?

mpx . vmhbal:CO : T2 :LO


Device : mpx . vmhbal : CO : T2 :LO
Display Name: mpx . vmhbal : CO :T2 :LO
Is SSD : true
VSAN UUID: 5286dd02-4a63-52aa- 63f3-8aaee4139eaa
VSAN Disk Group UUID : 52e4397e-cc90-71b3-c08d-cl
VSAN Disk Group Name: mpx . vmhbal : CO :Tl :LO
Used by this host: true
In CMMDS : true
On- disk format version: 5
Deduplication : false
Compression : false
Checksum : 2162719562245289367
Checksum OK : true
Is Capacity Tier: true
Encryption: false
DiskKeyLoaded : false

mpx.vmhbal:CO:Tl:LO
Device : mpx.vmhbal:CO:Tl:LO
Display Name: mpx .vmhbal : CO :Tl:LO
Is SSD : true
VSAN UUID: 52e4397e-cc90-71b3-c08d-cle0e51dde49
VSAN Disk Group UUID : 52e4397e-cc90-71b3 - c08d-cl
VSAN Disk Group Name: mpx .vmhbal : CO :Tl:LO
Used by this host: true
In C1'1MDS : true
On-disk format version: 5
Deduplication : false
Compression : false

652
14-76 Activity: Using the esxcli vsan storage list
Command Solution
The esxcli vsan storage list command displays details about each storage dev ice
attached to the host.

• Is the first device a magnetic or a solid-state drive? Solid-state drive.

• Is it used for cache or for capacity? For capacity.

mpx . vmhbal:CO:T2:LO
Dev ice : mpx . vmhba1 : CO : T2 :LO
c:::--Disola~ Name: mpx . vmhbal : CO :T2 : LO
Is SSD : true
SAN UUTD: 528 6dd02 -4a63 - 52aa-63f3 - 8aaee4139eaa
VSAN Disk Group UUID: 52e 4 397e-cc90-71b3-c08d-cl
VSAN Disk Group Name: mpx.vmhbal : CO : Tl:LO
Used by this host : true
In CMMDS: true
On-disk format version: 5
Deduplication : false
Compression : false
Checksum: 2 1 627 195622 4 5289367
Che c ksum OK: true
Is Capacity Tier: true
ncryp ion: a se
DiskKeyLoaded : false

mpx . vmhbal:CO:Tl : LO
Device : mpx . vmhbal : CO :Tl:LO
Display Name: mpx . vmhbal : CO :Tl : LO
Is SSD : true
VSAN UUID: 52e4397e-cc90-71b3-c08d-c1e0e51dde49
VSAN Disk Group UUID: 52e4397e-cc90-71b3-c08d-cl
VSAN Disk Group Name: mpx . vmhbal : CO :Tl:LO
Used by this host : true
In C1'1MDS : true
On-disk format version: 5
Deduplication: f alse
Comp ressio n: false

653
14-77 Using ESXCLI to Investigate iSCSI
Information
You use the esxcli vsan lSCSl status get command to display iSCSI information
' '

for a host.

[root@sa-esxi-04 : - ] esxcli vsan iscsi status get


Enabled : true
[root@ sa-esxi-04 : -] esxcli vsan iscsi target list
Alias iSCS I Qualified Name (IQN)
Interfac e Port Authentication type LUNs Is Compliant UUID
I /0 Owner UUID
---------- -----------------------------------------------------------

vSAN-iSCS I iqn . 1998-01 . com . vmware : lcc8bfb8-e85e-58d9-1855-077f0a6fc04a


vmk2 3260 No-Authentication 1 true 3 70edd59-07
4 2 -4032 -0bfc-OOSOS6013 e2c 59aal:>f2 9-f948- 36f2-7f75-005056013e2c
[ root@ sa-esxi-04 : - ]

654
14-78 Activity: Using the esxcli vsan iscsi
Command
• •
The esxcli vsan lSCSl status get command displays details about the vSAN iSCSI

service.

• Is iSCS I enabled?

• Is CHAP being used?

[root@sa-esxi-04 : -] esxcli vsan iscsi status get


Enabled: true
[r oot@sa-esxi-0 4 : - ] esxcli vsan iscsi target list
Alias iSCSI Qualified Name (IQN)
Interface Port Authentication type LUNs Is Compliant UUID
I/0 OtJner UUID
---------- -----------------------------------------------------------
--------- ---- ------------------- ---- ------------ -----------
------------------------- ------------------------------------
vSAN-iSCSI iqn.1998-01 . com.vmtJare : lcc8bfb8-e85e-58d9-1855-077fOa6fc04a
vmk2 3260 No-Authentication 1 true 370edd59-07
42-4032-0bfc-005056013e2c 59aabf29-f948-36f2-7f75-005056013e2c
[root@sa-esxi-0 4 : - ]

655
14-79 Activity: Using the esxcli vsan iscsi
Command Solution
• •
The es x c l i vsan l SCS l st a tus get command displays details about the vSAN iSCSI

service.

• Is iSCS I enabled? Yes.

• Is CHAP being used? No.

[root@sa-esxi-04 : -] esxcli vsan iscsi status get


Enabled : true
[root@sa-esxi-04 : -) esxcli vsan iscsi target list
Alias iSCSI Qualified Name (IQN)
Interface Port Authentication type LUNs Is Compliant UUID
I /0 Owner UUID
---------- -----------------------------------------------------------

vSAN-iSCSI iqn . 1998-01 . com . vmware :lcc8bfb8-e85e-58d9-1855-077f0a6fc04a


vmk2 3260 1 No-Authentication l 1 true 370edd59-07
42-4032-0bfc-OOSOS6013e2c S9aabf29-f948-36f2-7f75-00SOS6013e2c
D
[root@sa-esxi-04 : - )

656
14-80 Using ESXCLI to Investigate Cluster
Details
The esxcli vs an cl u s t er command has several subcommands that enable the
administrator to manage the host's membership in a vSAN cluster.

[root@sa-esxi-04 : ~]
esxcli vsan cluster
Usage : esxcli vsan cluster {cmd} [cmd options]

Available Namespaces :
preferredfaultdomain Commands for configuring a preferred fault domain for vSAN .
unicastagent Commands for configuring unicast agents for vSAN .

Available Commands:
get Get information about the vSAN cluster that this host is joined to .
join Join the host to a vSAN cluster .
leav e Leave the vSAN cluster the host is currently joined to .
new Create a vSAN cluster with current host joined . A random sub-cluster UUID
will be generated .
restore Restore the persisted v SAN cluster configuration .

657
14-81 Activity: Using the esxcli vsan cluster get
Command

The esxcli vs an cl u s t er ge t command displays det ailed informat ion for the vSAN
cluster of which this host is a member.

• What is t he role of this node for this cluster?

• What is t he subclust er UUID?


[root@sa-esxi-04 : -) esxcli vsan cluster get
Cluster Information
Enabled : true
Current Local Time : 2017- 10-10T19 : 59 : 01Z
Local Node UUID: S9aabf29-f948-36f2-7f75-00SOS6013e2c
Local Node Type: NORMAL
Local Node State: AGENT
Local Node Health State : HEALTHY
Sub-Cluster Master UUID : S9aabf36-7Sdf-f7Sb-4Sfa-OOSOS6013e24
Sub-Cluster Backup UUID : S9aabf2b-93df-2246-4Sf7-00SOS6013e28
Sub-Cluster UUID: S2130Se4-430d-d845-d59b-1943736fdc88
Sub-Cluster Membership Entry Revision : 15
Sub-Cluster Member Count : 4
Sub- Cluster Member UUIDs : S9aabf2b-93df-2246- 4Sf7-00SOS6013e28, S9aabf36- 7Sdf-f7Sb -
4Sfa-OOSOS6013e24, S9aabf29-f948-36f2-7f75-00SOS6013e2c, S9aabf3e-e787-Sf05-aade-OOSOS
6013df7
Sub-Cluster Membership UUID : 7f10cc59-fb30-Sdc4-64bb-OOSOS6013e24
Unicast Mode Enabled : true
liaintenance Mode State : OF F
Config Gene ration : 734bca23-7d13-4460-8125-c7e844d7297c 2 20 17-10-09T22 : 36 : 2S . S73

658
14-82 Activity: Using the esxcli vsan cluster get
Command Solution
The esxcli vs an cl u s t er ge t command displays det ailed informat ion for the vSAN
cluster of which this host is a member.

• What is t he role of this node for this cluster? Agent.

• What is t he subclust er UUID? 521305e4-430d-d845-d59b-1943736fdc88.


[root@sa-esxi-04 : -] esxcli vsan cluster get
Cluster Information
Enabled : true
Current Local Time : 2017-10-10T19 : 59 : 01Z
Local Node UUID: S9aabf29-f948-36f2-7f75-00SOS6013e2c
Local Node Type: NORMAL
Local Node State: AGENT
Local Node Health State : HEALTHY
Sub-Cluster Master UUID : S9aabf36-7Sdf-f7Sb-4Sfa-OOSOS6013e24
Sub-Cluster Backun UUID : S9aabf2b-93df-2246-4Sf7-00SOS6013e28
Sub-Cluster UUID: S2130Se4-430d-d845-d59b-1943736fdc88
Sub-Cluster Membership Entry Revision : 15
Sub-Cluster Member Count : 4
Sub-Cluster Member UUIDs : S9aabf2b-93df-2246-4Sf7-00SOS6013e28, S9aabf36-7Sdf-f7Sb-
4Sfa-OOSOS6013e24, S9aabf29-f948-36f2-7f75-00SOS6013e2c, S9aabf3e-e787-Sf05-aade-OOSOS
6013df7
Sub-Cluster Membership UUID : 7f10cc59-fb30-Sdc4-64bb-OOSOS6013e24
Unicast Mode Enabled : true
liaintenance Mode State : OFF
Config Generation : 734bca23-7d13-4460-8125-c7e844d7297c 2 2017-10-09T22:36 : 2S.S73

659
14-83 About Ruby vSphere Console
Ruby vSphere Console is a Linux console UI for vSphere on vCenter Server and is used for
managing and troubleshooting vSAN environments.

RVC is built in to vCenter Server Appliance as a command shell.

660
14-84 Logging In to the Ruby vSphere Console
(1)
To log in to the Ruby vSphere Console, you first connect to vCenter Server Appliance, log in as
root, and run the she 11 command to access the Bash shell.

Osi:ig us ername "root" .

VMware vCenter Serv er 7 . 0 . 0 . 10100

Type : vCenter Server with an embedded Platform Services Controller

Connected t o service

# List APis : " h e lp api list "


# Lis t Plugi:is : " help pi lis t "
* Lau:ich SASH : "shell "

Initially, you log in to vCenter Server as root.

14-85 Logging In to the Ruby vSphere Console


(2)
Then you run the rvc command and log in as administrator.

root@sa- vcsa- 01 [ ~ ]# rvc administrator@vsphere .local@ l ocalhost


Install the " ffi" gem for better tab completion .
~ARNING : Nokogiri was built against LibXML version 2 . 9 . 4, but has dynamically loaded 2 . 9 . 8
The authenticity of host 1 localhost 1 can't be established.
Public key fingerprint is f081d32dc92dca4742a84b93eb28cb60eb 3 fd69fc47362d05b4b2fcab0515003.
Are you sure you want to conti nue co nnecting (y/ n) ? y
~arning : Permanently added 1 loca lhost 1 (vim) to the list of known hosts
password :
~elcome to RVC . Try the 'help ' command.

o I
1 localhost/

You are prompted for a user@host user account. You must use a user who has administrator
privileges on vCenter, vSAN data center, and vSAN clusters, for example, the
administrator@vsphere. local user.

661
14-86 Navigating the vSphere and vSAN
Infrastructure
Ruby vSphere Console includes commands such as 1 s and cd to navigate t he vSphere
infrast ruct ure hierarchy. Press Ctrl+L to clear the screen.
o I

> cd 1
localhose> ls
0 SA-DC-01 (daeaceneer)
/ localhose> cd 0
/ localhose/ SA-DC-01> ls
0 seorage /
1 compueers [hose) /
2 neeworks [neework) /
3 daeaseores [daeaseore) /
4 vms (vm) I
..J-.LOJ:::.a..u· !!! r_ / S - c.d
/ localhose/ SA-DC-01/ compueers> ls
0 SA-vSAN-01 (cluseer) : cpu 33 G~z, memory 3 GB
1 SB-vSAN-01 (cluseer) : cpu S GHz , memory 3 GB
2 :ioe-Spare/
3 Wieness-Nodes/
/ localhose / SA-DC-01 / co~pueers> cd 0
/ localhose / SA-DC-01 / co~pueers / SA-vSAN- 01> ls
0 hoses/
1 resourcePool [Resources): cpu 33 . 82 / 33.82 / normal, mem 3.38/ 3.38/ normal
/ localhose/ SA-DC-01/ con:pueers/ SA-vSAN-01> ls 0
0 sa-esx i-02 . vclass . local (hose) : cpu lk4k2 . 80 G:i z, memory 8 . 00 Ga
1 sa-esx i-03 . vclass . local (hose) : cpu 1K4k2 . 80 GHz, memory 8 . 00 GB
2 sa-esx i-04 . vclass . local (hose) : cpu 1 K4k2 . 80 G:iz, memory 8 . 00 Ga
3 sa-esx i-01 . vclass . local (hose) : cpu lk4k2 . 80 GHz, memory 8 . 00 GB

662
14-87 Using Ruby vSphere Console Help
Run the help vs an command to get a list of all available RVC commands related to vSAN
administration and management.

> help vsan


amespaces :
health
pert

Sl.Zl.ng
stretchedcluster
vsanmgmt

Commands:
apply license to cluster : Apply license to VSAN
- - -
check limits : Gathers (and checks) counters against limits
-
check state : Checks state of VHs and VSAN objects
-
clear disks cache : Clear cached disks 1ntormat1on
- -
cluster change autoclaim : Enable/Disable autoclaim on a VSAN cluster
- -
cluster change checksum : Enable/Disable VSAN checksum enforcement on a cluster
- -
cluster info : Print VSAN config info a.bout a cluster or hosts
-
cluster set default policy : Set default policy on a cluster
- - -
crnrnds find : CH~tDS Find
-
disable vsan on cluster: Disable VS.AN on a cluster
- - -
disk object info : Fetch information about all VSAN obJects on a given physical disk
- -
disks info : Print physical disk info a.bout a host
-
disks stats : Show stats on all disks in VSAN
-
enable vsan on cluster : Enable VSAN on a cluster
- - -
enter maintenance mode : Put hosts into maintenance mode
- -
Choices tor vsan-mode : ensureObjectAccessibility, evacuateAllData, noAction

663
14-88 Using the Ruby vSphere Console to List
vSAN Commands
Enter vsan. and press Tab twice to list all the available vSAN Ruby vSphere Console
commands and namespaces.

> vsan .
vsan . apply license to cluster vsan . lldpnetmap
- - -
vsan . check limits vsan . obj status report
-
vsan.check state
-
vsan . object info
-
- -
vsan . clear disks cache vsan . object reconfigure
-
vsan . cluster change autoclaim vsan . observer
- -
vsan . cluster change checksum vsan . observer process statsfile
vsan.cluster info
- - vsan.perf.
- -
-
vsan . cluster set default policy vsan .p roactive rebalance
vsan . cmmds find vsan .p roactive rebalance info
- - -
vsan . disab l e vsan on cluster vsan .p urge inaccessible vswp objects
- - - - - -
vsan.disk object info vsan.reapply vsan vmknic config
-
vsan.disks info
-
vsan.recover spbm
- - -
- -
vsan . disks stats vsan .resync dashboard
- -
vsan.enable vsan on cluster vsan . scrubber info
- - - . . -
vsan.enter maintenance mode vsan. s1z1ng .
- -
vsan . fix renamed vms vsan.stretchedcluster.
- -
vsan .health . vsan . support information
-
vsan .host claim disks differently vsan . v2 ondisk upgrade
- -
vsan .host consume disks
- -
vsan . vm object info
-
-
vsan.host evacuate data
- -
vsan . vm perf stats
-
- - - -
vsan .host exit evacuation vsan . vmdk stats
- - -
vsan .host info vsan . vsanmgmt .
-
vsan.host wipe non vsan disk vsan . whatif host failures
- -
vsan.host wipe vsan disks
- - - -
- - -
> I

664
14-89 Viewing Host-Specific Information
The vs an. host info command displays information about hosts participating in the vSAN
cluster.

localhosc/SA-DC-01/compucers / SA-vSAN-01 / ~oscs> vsan . hosc_info 0


0- : lv : +O ecc ing ~osc in o rom sa-esx i- ass . local (may cake a
~omenc)
Produce : VMware ESXi 7 . 0 . 0 build-15843807
vSAN enabled : yes
Cluscer info :
Cluscer role : backup
Cluscer OUID : 521dec18-ba02-eelc-65b2-fcdc98ab19fb
Node OUID : Se95d45c-c227-7e78-le3e-005056013cdc
l1ember OUIDs : [ " 5e95d45c-c227-7e78-le3e-005056013cdc " , 11 Se95fac4-4ce0-4182-eb3c-005056013e2c"
5056013e28 " , " 5e96ed6d-lb87-ab11-7948-005056013df7 " J (4)
Node evacuaced: no
Scorage info :
Auco claim : no
Disk l1appi!'1gs :
Cache Tier : Local VMware Disk (mpx . vm."t\baO : CO : Tl : LO) - 10 GB, vll
Capacicy Tier : Local VMware Dis k (mpx. vmhbaO : CO : T3 : LO) - 20 GB, vll
Capacicy Tier : Local VMware Disk (mpx .vmhbaO:CO : T2:LO) - 20 GB, vll
FaulcDomaininfo :
Noc configured
Necworkinfo :
Adapcer : vmkl (10 . 10 . 10 . 29)
Daca efficie!'lcy enabled : no
E!'1crypcion enabled : no

665
14-90 Viewing Host-Specific Disk Information
The vs an. dis ks info command displays disk information for a specific host.
/ localhost/ SA-DC-01/ computers/ SA-vSAN-01/ hosts> vsan . disks_info 0
0 0- 6 - 02 : 19 : 21 + : at ering dis information f o r ost sa- esxi - 02 . vclass . local
2020-0 6-02 19 : 19 : 22 +0000 : Done gathering disk information
Disks on host sa-esxi-02.vclass.local:
+-----------------------------------------+-------+-------+---------------------------------·
------+
I DisplayNaree I isSSD I Size I State
I
+-----------------------------------------+-------+-------+---------------------------------·
------+
I Local VMware Disk (mpx . vmhbaO :CO :T3 : LO) I SSD I 20 GB I inOse
I
I Vl1ware Virtual disk I I vSAN Format Version : vll
I
+- ----------------------------------- -----+-------+-------+---------- -----------------------·
------+
I Local Vl1ware Disk (mpx. vmhbaO:CO:T2:LO) I SSD 20 GB I inOse
I
I VMware Virtual disk I I I vSAN Format Version : vll
I
+-----------------------------------------+-------+-------+---------------------------------·
------+
I Local Vl1ware Disk (mpx. vmhbaO:CO:Tl : LO) I SSD 10 GB I inOse
I
I Vl1ware Virtual disk I I vSAN Format Version : vll
I
+-----------------------------------------+-------+-------+---------------------------------·

666
14-91 Using the Ruby vSphere Console to
Investigate VM Objects

You use the vsan. check state command to display the status of any invalid or
inaccessible VM or vSAN objects.

/localhost/SA-DC-01/corr.puters/SA-vSAN-01/hosts> vsan.che c k state 0


2020-06-02 19 : 20 : 03 +0000: Step 1 : Check f or inaccessible vSAN o b jects
Detected 0 obj ects to be inaccessible

2020- 06-02 19 :2 0 : 03 +0000 : Step 2: Check f o r invalid /inaccessible Vl1s

2020-06-02 19 :2 0 : 03 +0000: Step 3 : Check f or Vl1s f or which VC/hostd/ vmx are out o f s ync
Did not fi nd Vl1s f o r which VC/hostd/vmx are out of sync

/ localhost/SA Datacenter/computers> vsan.check_state O


2Cl12 - 0"Z- 19 l.5.:_52...: a 1 +OOllQ : ~t.eo :.........Che.c:;)Lt.or ina~s_JJ:Ue........vS.AJLob.ieJ.i.J~~-------­
Detec ted 1 objects to be inaccessible
Detec ted f c6c 6f59-bb0e-e720-38fb-00505601 3e 24 on sa-esxi-02. v class.local to be inaccessible

2017-07-19 15:52:02 +0000 : Step 2 : Check for invalid/inaccessible VHs

2017-07-19 15:52 : 02 +0000: Step 3 : Chec k for VHs f o r which VC/ hostd/ vmx are out of sync
Did not find VHs for which VC/ hostd/ vmx are out of sync

Do not delete unassociated objects w ithout further investigation, including the stats database or
iSCSI LUN objects.

667
14-92 About Unassociated Objects
An unassociated object is a vSAN object that has no association w ith a valid entity, such as a
VM.

Considerations about unassociated objects:

• Most effort investigating unassociated objects is consumed by object identification.

• Identification of unassociated objects is challenging for the fallowing reasons:

The object is healthy and functional.

The objects are available, online, and normal.

Some objects are unassociated by definition, such as the perf stats db or iSCSI LUNs.

• The object might or might not be useful to the customer.

Concerns about unassociated objects:

• The object consumes vSAN resources and must be rebalanced periodically, accommodated
for during maintenance, and so on.

• The resources that vSAN puts into managing these unassociated objects are wasted if the
unassociated objects are useless.

668
14-93 Signs of Unassociated Objects
Various signs indicate the existence of unassociated objects:

• Unexplained f alders observed in the datastore explorer view indicate the existence of
objects that might be unassociated.

• A failure to create a VM on a vSAN datastore w ith sufficient available space often indicates
that vSAN has reached the component limit.

Example: Heap objectNameCache already at its maximum size of


20976488. Cannot expand.

• Unassociated objects can cause observable behavior through problems of consumption:

All data is migrated off a disk group, but some space is still consumed.

The calculated consumed space does not match the consumed space on the datastore.

669
14-94 Using the Ruby vSphere Console to
Investigate Unassociated Objects
You use the vs an. obj status report - t - u command to display the status of any
unassociated VM or vSAN objects.
/localhost/SA-DC-Ol/computers/SA-vSAN-01/hosts> vsan.obj_status_report O -t -u
•• eryinq a son v ...
2020-06-02 19:20:51 ...0000 : Queryinq 001-f_OBJECT in the system from sa-esxi-02. vclass. local ..
2020-06-02 19:20:51 +0000: Queryinq all disks in the s ystem from sa-esxi-02 .vclass.local ...
2020-06-02 19 :20 :51 +0000: Queryinq LSOM_OBJECT in the system from sa-esxi-02 . vclass.local .
2020-06-02 19:20:51 +0000 : Queryinq all object versions in the system ...
2020-06-02 19:20:52 +0000: Got all the info , computing table ...

Histogram of component health for non-orphaned objects

+-------------------------------------+------------------------------+
I Num Healthy Comps I Total Num Comps I Nwn objects with such status I
+-------------------------------------+------------------------------+
I 4/4 (OK) I 9
I 1/1 (OK) I 8
I 3/ 3 (OK) I 1
+-------------------------------------+------------------------------+
Total non- orphans: 18

Histogram of component health for possibly orphaned obj ects

+-------------------------------------+------------------------------+
I Num Healthy Comps I Total Num Comps I Nwn objects with such status I
+-------------------------------------+------------------------------+
+-------------------------------------+------------------------------+
Total orphans : o

Total v ll obj ects: 60


+-----------------------------------------+---------+---------------------------+
I Vlof/Object I objects I num healthy I total con:.ps I
+-----------------------------------------+---------+---------------------------+
I vSAN file Service Node ( 4) I 8 I I
I 3ef6965e- 22ee- 9797 - 89cd- 005056013cdc I I 1/1 I
I 6bf6965e-058d-282d-e268-005056013cdc I I 1/ 1 I
I 41f6965e-e2fd-6294-8f38-005056013cdc I I 1/1 I
I 6bf6965e-dab6-3b55-592d-005056013cdc I I 1/1 I
I 64f6965e-deae-a4cb-9cl8-005056013cdc I I 1/1 I
I 6cf6965e-87c9-f57e-f466-005056013cdc I I 1/1 I
I 69f6965e-4096-0d7e-5f20-005056013cdc I I 1/1 I
I 5969d65e-396b-4017-lc63-005056013cdc I I 1/1 I

670
14-95 Creation of Unassociated Objects (1)
Unassociated objects might exist for various reasons:

• A VM is improperly deleted, for example, by using the datastore browser.

• The objects were associated with a vSphere Replication job that failed.

• A snapshot consolidation task left related snapshot vSAN objects behind.

• The objects are remnants from a bypassed or discarded snapshot chain.

• The objects were created manually, outside vCenter Server or the hostd management
agent.

14-96 Creation of Unassociated Objects (2)


Unassociated objects might exist for several reasons:

• Uploading files such as ISO images not associated with a VM creates unassociated
namespaces.

• The objects are remnants of VMs that were removed from inventory but not deleted from
disk.

• The objects are remnants from external software that does not delete vSAN objects
correctly. This condition is common when using third-party replication and application
virtualization software.

• The objects are associated with advanced vSAN configurations such as iSCSI or vSAN
pert ormance metrics.

671
14-97 Using the Ruby vSphere Console to
Investigate a VM
You use the vs an. object info command to gather details about VM objects.

/localhost/SA Datacenter/computers> vsan . object info 0 c9d46359-9525-eb2d-8040-00505601d852


-
DOM Object : c9d46359-9525-eb2d-8040-0050560ld852 (v5, owner : sb-esxi-01 . vclass . local, proxy owner : None, policy
: spbmProfileid = 9117066f-6135-462b-a300-ea7e69df727f, hostFailuresToTolerate = 1, stripeWidth = 1, spbmProfil
eGenerationNumber = 0, replicaPreference =Capacity, proportionalCapacity = [O, 100], CSN = 60, spbmProfileName
= raid5, SCSN = 59)
RAID 5
Component : a33a6659-7108-389e-a864-0050560ld852 (state : ACTIVE (5), host : sa-esxi-01.vclass . local, md: 5216
0217-29e4-f208-19b3-4d6cbacb7c2a, ssd : 526d7fef-f3a7-8672-7907-fe6c23clf087,
votes : 2, usage: 0 . 2 GB, proxy component: false)
Component : ffd46359-faef-ca98-lb78-0050560ld852 (state : ACTIVE (5), host: sb-esxi-02.vclass.local, md: mpx.
vmhbal:CO : T6 : LO, ssd: mpx.vmhbal : CO:T4 : LO,
votes: 1, usage: 0.1 GB, proxy component: false)
Component : ffd46359-5df0-ce98-365c-0050560ld852 (state : DEGRADED (9), host : Unknown, rnd : mpx . vmhbal:CO:T6 : L
0, ssd : Unknown, note : LSOM object not found,
votes : 1, proxy component : false)
Component : ffd46359-del4-d298-6017-0050560ld852 (state : ABSENT (6), csn : STALE (57!=60), host : sa-esxi-02 . v
class . local, md : mpx . vmhbal : CO : T5 : LO, ssd : mpx . vmhbal : CO : T4 : LO,
votes : 1, usage : 0 . 1 GB, proxy component : false)

672
14-98 Activity: Using the vsan.object_info
Command
The vs an. object info command, followed by an object's UUID, displays details for
specific objects.

• What method for failure tolerance is used by this VM?

• How many components are not healthy?

• What is preventing the VM from starting?

/localhost/SA Datacenter/computers> vsan.object info 0 c9d46359-9525-eb2d-8040-00505601d852


-
DOM Object: c9d46359-9525-eb2d-8040-0050560ld852 (vS, owner : sb-esxi-01 .vclass.local, proxy owner: None, policy
: spbmProfileid = 9117066f-6135-462b-a300-ea7e69df727f, hostFailuresToTolerate = 1, stripe~idth = 1, spbmProfil
eGenerationNumber = 0, replicaPreference =Capacity, proportionalCapacity = [O, 100], CSN = 60, spbmProfileName
= raidS, SCSN = 59)
RAID S
-
Component: a33a6659-7108-389e-a864-0050560ld852 (state: ACTIVE (5), host: sa-esxi-01.vclass.local, md: 5216
0217-29e4-f208-19b3-4d6cbacb7c2a, ssd: 526d7fef-f3a7-8672-7907-fe6c23clf087,
votes: 2, usage: 0.2 GB, proxy component: false)
Component : ffd46359-faef-ca98-lb78-00505601d852 (state: ACTIVE (5), host: sb-esxi-02.vclass.local, md: mpx .
vmhbal:CO:T6:LO, ssd: mpx .vmhbal:CO:T4:LO,
votes: 1, usage: 0.1 GB, proxy component: false)
Component: ffd46359-Sdf0-ce98-365c-00505601d852 (state: DEGRADED (9), host: Unknown, md: mpx.vmhbal:CO : T6 : L
O, ssd: Unknown, note: LSOM object not found,
votes: 1, proxy component : false)
Component: ffd46359-de14-d298-6017-00SOS601d852 (state: ABSENT (6), csn: STALE (57!=60), host: sa-esxi-02 . v
class . local, md: mpx . vmhbal : CO : TS:LO, ssd : mpx . vmhbal:CO : T4 : LO,
votes: 1, usage : 0 . 1 GB, proxy component : false)

673
14-99 Activity: Using the vsan.object_info
Command Solution
The vs an . object info command, followed by an object's UU ID, displays details for
specific objects.

• What method for failure tolerance is used by this VM? RAID-5.

• How many components are not healthy? Two.

• What is preventing the VM from starting? Not enough components exist for a quorum.

/localhost/SA Datacenter/computers> vsan.object info 0 c9d46359-9525-eb2d-8040-00SOS60ld852


-
DOM Object: c9d46359-9525-eb2d-8040-00SOS60ld852 (vS, owner : sb-esxi-01 .vclass.local, proxy owner: None, policy
: spbmProfileid = 9117066f-6135-462b-a300-ea7e69df727f, hostFailuresToTolerate = 1, stripe~idth = 1, spbmProfil
eGenerationNumber = 0, replicaPreference =Capacity, proportionalCapacity = [O, 100], CSN = 60, spbmProfileName
= raidS SCSN = 59)
RAID S
-
01t'l dnent: a33a6659-7108-389e-a864-00S0560ld852 (state: ACTIVE (5), host: sa-esxi-01.vclass.local, md: 5216
0217-29e4-f208-19b3-4d6cbacb7c2a, ssd: S26d7fef-f3a7-8672-7907-fe6c23clf087,
votes: 2, usage: 0.2 GB, proxy component: false)
Component : ffd46359-faef-ca98-lb78-0050560ld852 (state: ACTIVE (5), host: sb-esxi-02.vclass.local, md: mpx .
vmhbal:CO:T6:LO, ssd: mpx .vmhbal:CO:T4:LO,
votes: 1, usage: 0.1 GB, proxy component: false)
Component: ffd46359-Sdf0-ce98-365c-0050560ld852 (state: DEGRADED (9), host: Unknown, md: mpx.vmhbal:CO : T6 : L
O, ssd: Unknown, note: LSOM object not found,
votes: 1, rox component : false)
Component: ffd46359-de14-d298-6017-00SOS601d852 (state: ABSENT 6), csn: STALE (57!=60), host: sa-esxi-02 . v
class . local, md: mpx . vmhbal : CO : TS:LO, ssd : mpx . vmhbal:CO : T4 : O,
votes: 1, usage : 0 . 1 GB, proxy component : false)

674
14-100 Using the Ruby vSphere Console to
Investigate Swap Objects
You use the vs an. purge inaccessible vswp objects command to purge any
inaccessible VSWP objects.

/localhost/SA Datacenter/computers ~ vsan.purge inaccessible vs~p objects


- - -
--force T~o-Node Cluster -
--help sc-~itness-01.vclass.local -o
SA\ Compute- 01 sc-~itness-02.vclass.local -1

/localhost/SA Datacenter/computers> vsan.purge inaccessible vswp objects 1


- - -
2017-10-10 20 : 37 :10 +0000: Collecting all inaccessible vSAN objects ...
2017-10-10 20 : 3 7:10 +0000: Found 0 inaccessbile objects.
/ localhost/SA Datacenter/computers>

The feature to purge inaccessible VSWP objects is also available in the vSAN Object Health
health check in the vSphere Client.

675
14-101 Using the Ruby vSphere Console to
Investigate Object Status
You use the vsan . obj status report command to display a status report for all vSAN
objects.

/localhost/SA Datacenter/computers> vsan.obj status report 1


- -
2017-10- 10 20 : 41 : 32 +0000: Querying all V~ls on vSAI\J ...
2017-10-10 20 : 41 : 32 +0000: Querying all objects in the system from sa-esxi-01 . vclass . local . . .
2017-10-10 20 : 41 : 32 +0000: Querying all disks in the system from sa-esxi-01 . vclass .local .. .
20 17-10- 10 20 :41: 33 +0000: Querying all components in the system from sa-esxi-01.vclass . local • • •
2017-10-10 20 : 41 : 34 +0000: Querying all object versions in the system • • •
20 17-10- 10 20 :41: 37 +0000 : Got all the info, computing table ...

676
14-102Activity: Using the
vsan.obj status report Command
The vs an . ob j sta tu s r e p o r t command displays a summary of informat ion for all vSAN
objects.

• Do any objects need investigat ion?

Histogram of component health for non- orphaned objects

+-------------------------------------+------------------------------+
I Num Healthy Comps I Total Num Comps I Num objects with such status I
+-------------------------------------+------------------------------+
I 3/3 (OK) 8 I
I 4/4 (OK) 2 I
I 1/ 1 (OK) 2 I
+-------------------------------------+------------------------------+
Total non-orphans : 12

Histogram of component health for possibly orphaned objects

+-------------------------------------+------------------------------+
I Num Healthy Comps I Total Num Comps I Num objects with such status I
+-------------------------------------+------------------------------+
+-------------------------------------+------------------------------+
Total orphans : O

Total vl objects : 0
Total v2 objects : 0
Total v2 . S objects : 0
Total v3 objects : 0
Total vs objects : 12

677
14-103Activity: Using the
vsan.obj status report Command
Solution
The vs an . ob j sta tu s repor t command displays a summary of information for all vSAN
objects.

• Do any objects need investigat ion? No.

Histogram of component health for non- orphaned objects

+-------------------------------------+------------------------------+
I Num Healthy Comps I Total Num Comps I Num objects with such status I
+-------------------------------------+------------------------------+
I 3/3 (OK) 8 I
I 4/4 (OK) 2 I
I 1/1 (OK) 2 I
------~------------~-------------------+------------------------------+
Total non-orphans : 12

Histogram of component health for possibly orphaned objects

+-------------------------------------+------------------------------+
I Num Healthy Comps I Total Num Comps I Num objects with such status I
+-------------------------------------+------------------------------+
~-------------------------------------+------------------------------+
Total orphans : O

Total vl objects : 0
Total v2 objects : 0
Total v2 . S objects : 0
Total v3 objects : 0
Total vs objects : 12

678
14-104 Using the Ruby vSphere Console to
Predict Failures
The vs an . whatif host failures command runs a simulation on the cluster t o predict
st orage usage if a host failure occurs.

/localhost/SA Datacenter/computers> vsan.~hatif host failures 1


- -
Simulating 1 host failures :

+-----------------+---------------------------- w~
· ------------~~---~-------------~
I Resource I Usage right no~ Usage after failure/re-protection
+-----------------+---------------------------- -----------------------------------
I HDD capacity I 11% used (71. 33 GB free) 14% used (51. 3 4 GB free )
I Components I 1% used (2966 available) 2 % used (2216 available)
I RC reservations I 0 % used (0 . 00 GB free) 0 % used (0 . 00 GB free )
+-----------------+---------------------------- --------~---------~------------~----

You can also view predictive storage usage information in the Skyline Health > Limits - After
one additional host failure check.

679
14-105Review of Learner Objectives
• Use vsantop to view vSAN performance metrics

• Discuss how t o run commands from the vCent er Server and ESXi command lines

• Discuss how t o access vSphere ESXi Shell

• Use commands to view, configure, and manage your vSphere environment

• Discuss the esxcli vsan name space commands

• Discuss when t o use Ruby vSphere Console (RVC) commands

680
14-106 Lesson 3: Useful Log Files

14-107 Learner Objectives


• Explain which log files are useful for vSAN troubleshooting

• Use log files to help troubleshoot vSAN problems

681
14-108 Log Files for vSAN
Several logs are usefu l when troubleshooting vSA N 7.0.
Log files located in the ES Xi host I var I log directory are:
• boot . gz
• clomd.log
• hostd . log
• vmkernel . log
• vmkwarning . log
• vobd . log
• vsanmgmt . log
• vsantraces . gz
• vsanvpd.log
Log files located in the vCenter Server /var I log /vmware and
/var I l og /vmware/ vs an - heal th/ directories are:
• sps . log
• vsanvcmgmtd . log
• vmware - vsan - health - service . log

682
14-109 Examining boot.gz
The boot . g z log captures everything t hat happens during the boot process.

Two troubleshooting use cases for the boot . gz log are:

Firmware validation
[root@sb- esxi - 04 : /var/log] zcat boot . gz lgrep -i firmware
2020- 03 - 03T23 : 07 : 29543Z cpu0 : 266071 ) lsi rnr3 : rnfiGetAdapterinfo : 1606 : firmware version
l 2s . s . 4 . ooo6 I -
Validating the successful mounting of vSAN disks during system boot up
.J..
[root@sb-esxi-04 : /var/log] zcat boot . gz lgrep PLOGAnnounceSSD
2020- 03- 03T23 : 06 : 25 . 143Z cpu0 : 262477)PLOG : PLOGAnnounceSSD : 8123 : Trace task started for devi
ce 527df3bd- c551 - 5da4 - 4afb- ba4932ed9a45
2020-03-03T23 : 06 : 25 . 143Z cpu0 : 262477)PLOG : PLOGAnnounceSSD : 8136 : Successfully added VSAN SSD
(mpx . vmhba0 : CO : Tl : L0 : 2) with UUID 527df 3bd-c551-5da4-4afb-ba4932ed9a45 . kt 1 , en 0 , enc 0 .
2020-03-03T23 : 06 : 43 . 467Z cpul : 262477)PLOG : PLOGAnnounceSSD : 136 : Successful y added VSAN SSD
(mpx . vmhba0 : CO : Tl : L0 : 2) with UUID 527df 3bd-c551 - 5da4 - 4a f b - ba4932ed9a45 . kt 1 , en 0 , enc 0 .

The boot . g z log is a compressed log file that captures everything t hat happens during t he
boot process. Everything t hat happens from the time that t he host is started is captured in the
boot . g z log. Because this log is compressed, you can extract t he file to read or use the z cat
tool to view it from the host.

The following reasons exist to view this log file:

• To find t he firmware of the controller: Not all controllers present t heir firmware t o the ESXi
host after t he boot process but most do present their firmware during t he boot process.

Using the grep - i firmware command makes the search easier.

As a caution, verify the time stamp for the firmware. If the controller's firmware has been
updated but the host has not been reboot ed, the firmware version number might not be
w hat is running on t he ESXi host.

• If local st orage devices do not mount during the ESXi host boot cycle, look for errors and
issues relat ed to disk mounting during the boot process in the boot . gz file.

After a successful boot, you w ill see a success message and disk UUID listed for all vSAN
disks. Seeing the success message and disk UUIDs means that these disks can be
successfully read t hrough by the LLOG and PLOG .

683
14-110 Examining clomd.log
The clomd . log log is the clust er-level object manager daemon log.

The following collected records are useful when troubleshooting:

• The DECOM_STATE related to the host's maintenance mode evacuation option settings
sb-esx - 1·
poWl;.ll:~
o s lurr.e e2 4d-ac 92 e - 5 154ed omd . o
2020- 05-20T15 : 47 : 36.481Z 263597 info clomd(30263987712 ) (Or1g1nator@687 6) CLOM_ProcessDecon:Update: Node 5e234bf7- 7b37-ld9d-e55a-005056
0154ed state change. Old : DECOM STATE NONE New : DECOM STATE ACTIVE Mode:l J obUu1d :OOOOOOOO- OOOO- OOOO- OOOO- OOOOOOOOOOOO
- nto- c on: - - r1g1nator _ t -
0154ed state change . Old : DECOM_STATE_ACTIVE New: DECOM_STATE_AUDIT Mode:l JobUu1d : 4dla3a46-9ff3-b799-416c-390!bc3bl2f9

• The number of magnetic disks currently available in the vSAN environment


2020-05-06T14 : 42 : 35 . 713Z 263604 info clomd(33092017664) [Originator@6876] CLOMMemoryTest : Cl
uster usages - usableDataMDs : 8 hF'Ull(P] : 0 . 078 meanfull(P] : 0 . 061 l f ull(P] : 0 . 044 hfUll(S) : 0 . 00
0 meanF'Ull[S) : 0 . 000 lfUll[SJ : 0 . 000 capTotal : l62684370944 capUsed : 9852475064 ssdCapTotal :4 293
7022464 ssdCapUsed : O
- - rs : "lfZ': 35 ......... uz ZOJ604 into clomat JJOYZOl 7664 J Wl'TIJ"Il'rnt:otl!bS 761 CLUMMemo r YTes : 't.
s1.,...,..
uster usages - usableDataMDs : 8 hfull[PJ : 0 . 078 meanF'Ull[PJ : 0 . 061 lfull[PJ : 0 . 044 hfull[SJ : 0 . 00
0 meanfUll(S] : 0 . 000 lfull(S) : 0 . 000 capTotal : l62684370944 capUsed : 9854576312 ssdCapTotal :4 293
7022.4,A4 s"dCapllsp d · O
020-05-06Tl5 : 48 : 16 . 625Z 263604 inf o clomd(33092017664) (Originator@6876) CLOM_LogDis kState :
Cluster state . 4 nodes , 8 MDs, 4 SSDs

The clomd . log log is t he clust er level object manager daemon log.

W hen talked about in PNOMA, CLOM ensures t hat all objects are compliant with their storage
policies.

You can see which maintenance mode evacuation option is applied, and at what time any stat e
change took place:

DECOM _ ST A TE_ ACTIVE: 0 - No Data Migration

DECOM _ ST A TE_ ACTIVE: 1 - Ensure Accessibility

DECOM _ STA TE_ACTIVE: 2 - Full Data Migration

Every hour, t he cl omd . log log reports how many magnetic disks are in t he environment.

The term magnetic disks refers to capacity disks here. The reference to magnetic disks refers to
both the spinning disks and SS D within our vSAN environment. SSDs always refers to cache
devices and MDs refers to capacity devices whether they are HD or SSD.

In t he second screenshot, you begin by looking at the time stamp. You can see that, at 15:48:16,
eight magnetic or capacity disks and four SDD or cache disks are available in t he vSAN
environment . If the number o f available storage devices later is less t han eight and four, you can
see at what time t he loss of disks occurred and compare t he recorded time o f occurrence with
other logs.

Also, t he c l omd . log log is helpful for detecting object state changes, for example, healthy or
absent .

Every 24 hours, the cl omd . log log reports how many ESXi hosts are in t he vSAN clust er.
684
14-111 Examining hostd.log
The host d. 1 o g log tracks anything related to VM activities:

• Power On

• Power Off

• Resets

• Reconfiguration

/var/run/log/hostd . log : 2020 - 03 - 03T15 : 35 : 35 . 960z info hostd[FFBE8B20]


[Originator@6876 sub=Vmsvc . vm623 - nac- 65 - b0 - 87 - 7b- 4951 user=vpxuser] State
Transition (VM STATE ON - > VM STATE RECONFIGURING )

2020-03-03T21 : 34 : 02 . 960z : [netcorrelator] 38903692us : [vob . net . vrnnic . linkstate . down] vrnnic vrnnicO linkstate down
2020 - 03 - 03T21 : 34 : 02 . 960z : [scsicorrelator] 38977221us : [vob . scsi . scsipath . add] Add path : vrnhbaO : CO : TO : LO
2020 - 03 - 03T21 : 34 : 02 . 960z : [netCorrelatorJ 39833699us : [vob . net . vrnnic . linkstate . down] vmnic vmnicS linkstate down
2020-03-03T21 : 34 : 02 . 960z : [netcorrelatorJ 39833728us : [vob . net . vmnic . linkstate.downJ vmnic vmnicO linkstate up
2020-03-03T21:34 : 03 . 960z : [netcorrelator] 39833699us : [esx . problem . net.vmnic . linkstate . down] vmnic vrnnicO linkstate down

The hos td. log log is important to vSAN in that it records everything related to VM
operations. In situations where VMs cannot access storage, you should see an entry in the
ho std. l og log indicating that the VM is having an issue talking with its storage. For example, it
cannot find its component or it cannot write to storage.

The first screenshot says VM STATE RECONFIGURING and indicates that a change was
made to the VM. You can see exactly when the change occurred and then use the
vmware. l og log to find more information about what happened. The vmware. log log is
inside the namespace for the VM.

In the second screenshot, the hos td. l og log reports the link-state of vmnicO as down. A
vmnic or physical NIC with a link-state of down indicates that a host is disconnected from the
vSAN network, which prevents VMs on that host from accessing their objects. The use of N IC
teaming can prevent this scenario.

685
14-112 Activity: Mounting vSAN Disks Issues
You cannot add a disk group to vSAN . Answer t he questions based on the screenshot of the
ho std . log log.

• What is t he issue detected by LSOM?

• What is t he UUID of the failed SS D?

• Can the Redo l og be recovered from the PLOG?

• What is t he status o f t he initialization for the SSD?

2017-12-07T2 1 :05:06.618Z cpu44:67463)WARNING : LSOM:


LSOMAddDiskGroupDispatch :8303: Failed to add disk group. SSD 52e50c82-4f5e-f136- 1511 -
7d7dc41289d5: Corrupt Redolog
2017-12-07T2 1 :05:06.61 BZ cpu50:67340)PLOG : PLOGVerifyDiskGroupNotifyCompletion :44 10:
Notify disk group fai led for SSD UUI D 52e50c82-4f5e-f136- 151 1-7d7dc41289d5 :Corrupt
Redolog was recovery complete ? No
2017 -12-07T2 1:05:28.791Z cpu28:66107)WARNING : PLOG:
PLOGCheckRecoveryStatusForOneDevice:7336: Recovery failed for disk 52e50c82-4f5e-f136-
15 11 -7d7dc4 1289d5
2017-12-07T2 1:05:28.79 1Z cpu28:66107)VSAN : Initi alization for SSD: 52e50c82-4f5e-f136-
151 1-7d7dc41289d5 Failed

686
14-113 Activity: Mounting vSAN Disks Issues
Solution
You cannot add a disk group to vSAN . Answer t he questions based on the screenshot of the
h o std . log log.
• What is t he issue detected by LSOM? Corrupt Redolog.

• What is t he UUID of the failed SS D? UUID 52e50c82-4f5e-f136-1511-7d7dc41289d5.

• Can the Redol og be recovered from t he PLOG? No.

• What is the status o f t he initialization for the SS D? Failed.

2017-12-07T2 1:05:06.618Z cpu44:67463)WARNING: LSOM:


LSOMAddDiskGrou DisJ:2atch:8303: Failed to add disk group. SSD 52e50c82-4f5e-f136- 151 1-
7d7dc41289d5: Corru t Redol og~=---!
2017-12-07T2 1 :05:06.61 SZ cpu50:67340 PLOG: PLOGVeri DiskGroupNoti Completion:4410:
Notify disk group fai led for SSD UU ID 52e50c82-4f5e-f136- 151 1-7d7dc41289d5 :Corrupt
Redolog_ was recovery complete ? No
2017- 12-07T21 :05:28.79 1Z cpu28:66107)WARNING : P LOG:
PLOGCheckRecoveryStatusForOneDevice:7336: Recovery failed for disk 52e50c82-4f5e-f136-
1511 -7d7dc41289d5
2017-12-07T2 1:05:28.79 1Z cpu28:66107)VSAN: Initi alization for SSD: 52e50c82-4f5e-f 136-
1511 -7d7dc41289d5 Failed

In t his act ivity, using the h o st d . log you see that:

• The Redol og is corrupt.

• The Redol og cannot be recovered.

• The PLOGCheckRecoveryStatusForOneDevice stat us is Recovery Failed.

• The Initializat ion of the SSD 52e50c82-4f5e-f136-1511-7d7dc41289d5 has failed.

Corrupt Redol og indicates some sort of corrupt record in LLOG and PLOG. If a corrupt LLOG
or PLOG record is preventing a disk group from coming back online, the disk group must be
deleted and recreated . vSAN automatically rebuilds lost components. If t he components cannot
be rebuilt, for example, if no other RAID1 mirrors or insufficient RAID-5 or RAID-6 st ripes exist,
then the VM data must be restored from backup.

687
14-114 Examining vmkernel.log (1)
The vmke rn e 1 . 1 og log records activities related to VMs and ESXi. Useful records when
troubleshooting are:

• Timestamps

• Heartbeat timeouts

• Reports of read or write issues


l 2P20- 05-06!J.a: 2,6 :,,03 . SSJ Zi cpu3 : 263259) OOIDS :
CMM!>SLoqStat e!ran!!ition : 1773 : 52d17529-cc48-e68f-81ac-938c97a5941b : !ran!!itioning (5e2352d1-eb3d-Oc16-5ce8- 00505
6029973 from Digcover to Rej oin : (Reag on: Found a ma!!ter node )
2020- 05-06!18 : 26: 03 . 557Z cpu3 : 263259) 00IDS : ReJ oinSetup : 2790 : 52d17529-cc48-e68f-81ac-938c97a5941b: Setting batching t o 1
2020- 05-06!18 : 26: 03 . 557Z cpu3 : 263259) CMMDSNet : CMMDSNet SetMa!!ter : 1244 : Updating magter node : old=none new=5e2351bc-ce8d-eelb-cf36-00505602b80e
202 - 0 - 06!18 : 26:03 . 5 Z cpu3 : 263259)C¥.MDS: ~SAgentlikeSetMemberghip:540 : 52d17529- cc48-e68f - 81ac- 938c97a5941b: Setting new member!!hip uuid 56fdb25e- ce2
2-b5bb-9769-00505602b80e
2020- 05-06!18 :26: 04.557Z cpu3 : 263259 ) ~J>S: RejoinRxMa!! t erHeartbeat :2281: 52d17529-cc48-e68!-81ac-938c97a5941b : Saw !!elf li!!ted in mag t er heartbeat

C~\'\DS: MasterSendHeartbeatRequest:l474: Sendi ng a rel i able heartbeat reques t t o 5alfe47d-e4ef - ba4b- alc3 - d094660509fd
CW-\OS: C~Y.\OSHeartbea t RequestHBWork: 844: Request heartbeat: Retry the operation.
C~Y.\OS: CW-\OSHeartbea tRequestHBWork:844: R•o•n•'•'o•c•+o...;.h•o•~-~-+-h•o•~-+_·__~_.._r_r_o_c_c.._____________________________________________________.,.
C~\'\DS: C~Y.\OSHeartbeatCheckHBLogWork:726: Chec k node returned Failure for node 5alfe4 7d- e4ef - ba4b- alc3 - d094660509fd count 5
C~y.IDS: C~Y.\OSStateDestroyNode: 676: IDestroviniz node Sal f e47d - e4ef - ba4b- alc3 - d094660509fd: Heartbeat timeout I
C~Y.\OS: MasterSendHeartbeatRequest:l474: Sending a rel i able heartbeat request to 5alfe491-6612-be51-ae48-d0946604 e641
C~Y.\05: CW-\OSHeartbeat RequestHBWork: 844: Request heartbeat: Success.
C~Y.1DS: MasterRxHeartbeatRequest:2184 : Replied to a reliable heart beat req uest. Last msg sent: 238 ms back
C~Y.\05: RejoinRxMasterHeartbeat: 1941: Saw self listed in master heartbeat

688
14-115 Examining vmkernel.log (2)
For vSAN, important records in vmkerne l . l og log are SCSI read and write errors.

Ox28 indicates a read error.

WARNING: NMP: n=p_Dev1ceRequeatraat0eT1ceProbe:237: NMP device ~t10.ATA____5ATA_5DD 8450075052400104305• state in doubt; requested fast path atate update
!c•iDev1ceIO: 2927: Clld(Ox4 39d4l008d001 Ox28, Clld!N Oxl:>l252 froa world 67389 to dev : "tl0.ATA~SATA_SDD 8450075052400104305" failed H:Ox 2 D:OxO P:OxO Inval1d

WARNING : NMP : nmp_DeviceRequestFastDevicePrc


ScsiDeviceIO : 292 7 : Cmd(Ox 4 39d41008d00) Ox 2 8 ,

Ox2a indicates a write error.


WARNING: NMP: l\llf>_O.vic•R•que•tFaatO.v1ceProbe:237: NHP d e vice -t10.ATA____SATA_5DD 8450075052400104337• atat• in doubt; r•qu••t•d faat path atat• update
Sc•iDeviceIO: 2927: Clld(Ox 439ecll8382001 Ox2a, cmdSN Ox l3f29 frc= world 65591 to dev : "tlO.ATA~SATA_SDD 8450075052400104337" failed H:Ox 2 D:OxO P: OxO Invalid
5C•1Dev1ceIO: 2927: Clld(Ox4 39edl8382001 Ox2a, Clld!N Ox l3f29 frCOl world 65591 to dev : "tl0.ATA~SATA_5DD 8450075052400104337" failed H:Ox 2 D:OxO P:OxO Invalid

WARNING : NMP : nmp_DeviceRequestFastDeviceP..rc


ScsiDeviceIO : 2927 : Cmd(Ox4 39ed1838200) Ox 2a,
ScsiDeviceIO : 2927 : Cmd(Ox 439ed1838200) Ox 2a,

Events within the ESXi host are important to record. Timestamps are a critical piece of
information because they tell you when the issue occurred. Using the date and time provided by
the time stamp, you can look at all other logs to see what occurred at the time of the issue.

Additional information recorded in the vmkernel. log file includes issues with reads and
writes with the local disks:

• A 0 x 2 8 error indicates a read issue with the disk.

• A 0 x2A error indicates a write issue w ith a disk.

If vSAN sees an Ox2A error associated with a disk UUID, vSAN will take that disk offline.

689
14-116 Examining vmkwarning.log
The vmkwa r ning . log log records act ivities related t o V Ms. You can use t he g re p and
ta i l commands t o search vmkwarn i ng . l og for indications o f the pro blem.

Use the grep command to search the vmkwarning . l og file for unregistered devices.
# grep unregistered vmkwarning .log_5.=4
20~:f7-0!5-30T1'l :
11: 'l6 .1--SSZ cpuO: oT5S"9 opID=a4e71359) WARNING: NMP: nmnUnclaimPath: 1579: Physical path "v
mhba0:CO:T2:L0" is the last path to NMP device " Unregistered Device". The device has been unregistere
d.
2017-05-30T14:11:52.830Z cpu0:67527 opID bclf970d\WARNING: NMP: nmnUnclaimPath:1579: Physical path " v
mhbaO:CO:Tl:LO " is the last path to NMP device "Unregistered Device". The device has been unregistere
d.
2017-05-30T14:12:18.636Z cpu0:67559 opID=4a68e3cf)WARNIN?: NMP: nmpUnclaimPath : 1579: Physical path " v
mhbaO: CO: T2: LO " is the last path to .I
NMP device ( 11 Unregistered Device " The device has been unregistere
d.

Use the tail command to display more information.


# tail -f vmkwarnin .log
2017-09-26T09:44:03.881Z cpu1:66262)WARNING: PLOG: PLOG FreeDevice: 4 90: Stashing Ox4304fc3dd450 ford
isk mpx .vmhba0:CO:T5:L0:2 Ox 809
2017-0 9-26T09:44:03.882Z cpu1:66263)WARNING: PLOG: PLOG FreeDevice:490: Stashing Ox 4304fc3e3820 for d
isk mpx .vmhba0:CO:T4:LO:l Ox 10809
2017-09-26T09:44:03.882Z cpu1:66263)WARNING: PLOG: PLOG FreeDevice:453: Releasing trace_task op for d
evice mpx.vmhba0:CO:T 4 :L0:2
2017-09-26T09:44:03.882Z cpu1:66263)WARNING: PLOG: PLOG FreeDevice:490: Stashing Ox 4304fc3e25b0 ford
isk m~x . vmhba0:CO:T4 : L0 : 2 Ox 10809
2017-09-26T09:44:07 . 881Z cpu0:86321)WARNING: LSOM: LSOMEventNotify:6868 : Virtual SAN device 521f68f5-
60e9-e3bf-7 4 50-de850c0779d4 has g~o_n_e~o_f_f_l_
i_n_
e_·~~~~~~~~~~~~~~~~~~~~~~~~~~~-"

690
14-117 Examining vobd.log
The vobd . 1 og log records st orage and network-related activities:

• Disk latency

• When a host enters maintenance mode

• When a host exits maint enance mode

• NIC uplink status of up or down

2020-05-07T00:32:17.816Z: (GenericCorrelator] 22024925476us : [vob . user.maintenancemode.entering) The host has bequn entering maintenance mode
2020-05-07T00:32:17.816Z: (UserLevelCorrelator] 22024925476u: : [vob .user.maintenancemode.entering] The host has bequn entering maintenance mode
2020-05-07T00:32:17.816Z: [UserLevelCorrelator] 22024925932u: : (esx. audit.maintenancemode.entering] The host has begun entering maintenance mode.
2020- 05- 07T00 : 32 :19. 846Z : (UserLev elCorrelator] 22026955146u: : [vob .user.maintenancemode.entered] The host has entered maintenance mode
2020-05-07T00:32:19.846Z: [UserLevelCorrelat or] 2202695574Sus : [esx.audit.maintenancemode.entered] The host has entered maintenance mode.
2020-05-07T00:32:19.846Z: [GenericCorrelator] 22026955146us: [vob .user.maintenancemode.entered] The host has entered maintenance mode
2020-05-07T00:4l:00.927Z: (UserLevelCorrelator] 22548035986u:: [vob .user.maintenancemode.exited] The host has e xited maintenance mode
2020-05-07T00:41:00.927Z: (GenericCorrelator] 22548035986us: [vob . user.maintenancemode.exited) The host has exited maintenance mode
2020-05-07T00:41:00.927Z: (UserLevelCorrelator] 22548037288u: : [esx.audit.maintenancemode .exited] The host has ex ited maintenance mode.

2020-05-06I18 : 25:37 . 528Z : [netCorrelator] 27298830us : [vob .net.vmnic.linkstate.up] vmnic vmnic3 linkstate up
2020-05-06I18 : 25:37 .544Z : [netCo rrelator] 27315774us: [vob . net.vmnic.linkstate.up] vmnic vmnicO linkstate up
2020-05-06I18 :25:37.SS9Z : [netCorrelator] 27329213us: [vob .net.vmnic.linkstate.up] vmnic vmnicl linkstate up
2020- 05- 06I18 :25:37 .576Z : [netCorrelator] 2734 7089us : [vob.net.vmnic.linkstate.up] vmnic vmnic2 linkstate up

The vobd . 1 og file records latency information about the individual disks, such as latency

issues.

After an individual disk is recorded as having latency issues, you can verify if it is healthy and
when the lat ency is happening, such as during intensive read or write operations. You can look
deeper to see if the disk is a capacity disk or a cache disk. If the issue is with a cache disk in a
hybrid environment, both reads and writes can be aff ected.

A caveat with maintenance mode recording in the vobd . log file is t he selected maintenance
mode evacuation option is not recorded, but the timestamp is present and useful f or cross-
referencing o t her log files.

691
14-118 vobd.log: Device Repaired
After the error condition is cleared, the vob d . log log shows the device coming back online.

Device Going Offline

2017-09-26T09: 4 4:02.892Z: [VsanCorrelator] 4065029556us: [esx.problem.vob.vsan.pdl.offline] vSAN devi


ce 522a93dc-9eaa-db0b-fcc4-c22044c30 417 has gone offline.
2017-09-26T09 : 4 4 :07.881Z : [VsanCorrelator] 4070018632us : [vob.vsan.pdl.offline] Virtual SAN device 52
lf68f5-60e9-e3bf-7450-de850c0779d4 has gone offline.
2017-09-26T09:44:07.881Z: [VsanCorrelator] 4070019003us: [esx.problem.vob.vsan.pdl.offline] vSAN devi
ce 52lf68f5- 60e9- e3bf- 7 450- de850c0779d4 has gone offline.
2017-09-26T09: 44 :09.158Z: [scsiCorrelator] 4071294 976us: [vob.scsi.device.state.permanentloss.noopens
] Permanently inaccessible device :mpx.vmhbaO:CO:Tl:LO has no more open connections. It is now safe t
o unmount datastores (if any) and delete the device.
2017-09-26T09: 44 :09.158Z: [scsiCorrelator] 4071295289us: [esx.problem.scsi.device.state.permanentloss
.noopens] Permanently inaccessible device: mpx.vmhbaO:CO:Tl:LO has no more opens. It is now safe to u
nmount datastores (if any) : Unknown and delete the device
2017-09-26T09: 44 :09. 159Z: [scsiCorrelator] 4 071295658us: [vob.scsi.scsipath.remove] Remove path: vmhb
aO:CO:Tl:LO
2017-09-26T09: 48:11.324Z: [scsiCorrelator] 4 3134 61719us: [vob.scsi.scsipath.add] Add path: vmhbaO:CO:
Tl·LQ
2017-09-26T09: 48:11.330Z: [scsiCorrelator] 4313467278us: [vob.scsi.scsipath.pathstate.on] scsiPath vm
hbaO:CO:Tl:LO changed state from dead
2017-09-26T09 : 48:22.744Z: [Vsa.n correlator] 4324881 4 89us: [vob.vsan.pdl.online] Virtual SAN device 521
f68f5-60e9-e3bf-7450-de850c0779d4 has come online.
2017-09-26T09: 48:22.744 Z: [VsanCorrelator] 4324881728us: [esx.clear.vob.vsan.pdl.online] vSAN device
52lf68f5-60e9-e3bf-7 450-de850c0779d4 has come online.

Device Coming
Online

692
14-119 Examining vsanmgmt.log
The vs anmgmt . 1 og log records connections from vCenter Server t o vSAN for the
perf ormance service and health check.

Ping Test
LULU-U~-UblLU:u0:48.402Z info vsand[263976) [opl0-03703ece rYsanHealthPino : :Pinqiest) Ready to send pino with id• 28568, seq• 1 I
2020-05-06120:00 : 48 . 407Z info vsand[263976) [opI0-03703ece f'sanHealthPino :: _parseRecvPacket) Pinoer : all host response come back, pino done Seq: l, size:64
2020-05-06T20:00:48.411Z info vsand[263976) [opI0-03703ece lvsanHealthPino::PinoJ Run pino test !or the hosts ['172.20.12.52', '172.20.12.51') from local 172.20.
12.53
2020-05-06T20:00:48.411Z info vsand[263976) [opI0-03703ece WsanHealthPino::PinotestJ Ready to send pino with id• 28568, seq• 2
2020-05-06T20:00:48.415Z info vsand[263976J [opI0-03703ece f'sanHealthPino::_parseRecvPacketJ Pinoer: all host response come back, pino done Seq:2, size:64
2020-05-06T20:00:48.417Z info vsand[263976) [opI0-03703ece lv'sanHealthPino::PinoJ Run pino test for the hosts ['172.20 . 12.52', '172 . 20.12.51'1 from local 172.20.
12 . 53
2020-05-06T20:00:48.417Z in!o vsand[263976) [opI0-03703ece .VsanHealthPino : :Pinotest) Ready to send pino with id• 28568, seq• 3
2020-05-06T20:00:48.422Z info vsand[263976J [opI0-03703ece ~sanHealthP1no::_parseRecvPacket) Pinoer: all host response come back, p1no done Seq:3, size:64
2020-05-06T20:00:48.426Z info vsand[263976J [opI0-03703ece sanHealthP1no::PinO) Pinoer: pino taI'(,let nwr..ber: 2
2020-05-06T20:00:48.426Z info vsand[263976) [opI0-03703ece sanHealthPino::PinoJ Run pino test !or the hosts ['172.20.12.52', '172.20.12.51'1 from local 172.20.
12.53

vSAN Metadata Node Information


2020-05-06Tl8:34:48.106Z in!o vsand[263976] [opIO-W2778-W2780-3d69 VsanCon!ioOtil::isfeatureEnabledOnBost] feature VSAN/VsaoMetadataNode current enabled state:
1
2020-05-06Tl8:34:50.604Z in!o vsand[263910] [opIO-W2778-W2779-3d72 VsanCapabilitySystemiicpl::_GetCon!iqOptionValue] _GetConfiqOptionValue value l for Vsan¥.etada
taNode

vSAN Storage Device Information


2020-05-06Tl9:46:38.000Z info vsand[263981) (opIO-Thread-2 VsanSysteml::pl:: Con!ioin!oPrintLooJ deviceUame•/ vm!s/ devices/ disks/ ::px.VlllhbaO:CO:T2:L dev1ceType•
disk, key-key-v1.111.host.Scs101sk-0000000000766d686261303a323a30, uu1d•0000000000766d686261303a323a30, canonica ame-n:.px. :CO :T2 : , a sp ay?Iame• ca • are isk (::px.vir.h
baO:CO:T2:LO) , lunType-c!isk, vendor-VMware , model•Virtual disk , revis1on•2.0, scs1Level•6, serialNwr.ber-unavailable, queueDepth•1024, vStoraoeSupport-vStoraoeUnsupport
ed, protocolEndpoint•False, perenniallyReserved•False, clusteredVmdkSupported•False, devicePath•/ vm!s/ devices/ disks/ a:;px.vmhbaO:CO:T2:LO, ssd•True, localjislc-True, emulatedjJX
OIFEnabled•False, scsiOislcType•nativeS12

693
14-120 Examining vmware-vsan-health-
service.log
The vmware - vsan - heal th - service. log log on vCenter Server records health issues
useful for troubleshooting vSAN issues:

• vSAN health service results

• INFO and WARNING messages


2020-05-12T23 : 51 : 09 . 525Z INFO vsan-m t 58427] [VsanPyVmomiProfiler :: logProfile opID=noOp
Id] invoke-method : ha-vsan-health-system GetHclinfo : 0 . 65s : sb-esxi-03 . vclass . local
2020- 5- l2 T2°:3 : 5'i : o g . z INFO vsan-mgmt 4228] [VsanPyVmomiProfiler :: logProfile opID=noOp
Id] invoke-method : ha-vsan- health-system : QueryVsanConfigsByFilter : 0 . 03s : sb-esxi-02 . vcla
ss . local
2020-05-12T23 : 52 : 07 . 462Z I NFO vsan-m t[57607] [VsanPyVmomiProfiler : : l ogProfile opID=noOp
Id] i nvo e-met o : vsan- er ormance-mana er QueryNodeinformation : 0 . 06s : sb-esxi-02 . vclas
s . loca
2020-05-12T23 : 52 : 07 . 462Z WARNING vsan-mgmt[57607] [VsanVcPerformanceManagerimpl : : PerHostT
hreadMain opID=noOpid] sb-esxi-02 . vclass . local : Node info : (vim . cluster . VsanPerfNodeinfor
mat i on) [
(vim . cluster . VsanPerfNodeinformation) {
ve r sion = ' 7 . 0 . 0 ',
hostname = <unset>,
error = <unset>,
isCmmdsMaster = false ,
isStatsMaster = false ,
vsanMasterUuid = ' 5e234bf7- 7b37 - ld9d- e55a- 0050560154ed ',
vsanNodeUuid = ' 5e234b0d- 873d- 164f-5efb- 0050560154f2 ',
masterinfo = <unset>,
diagnosticMode = false

The vmware - vsan - heal th - service. log file on vCenter Server in the
Ivar I 1 og I vmw are I vs an - hea 1th directory is useful for troubleshooting File Service
enablement and ESX Agent deployment issues.

694
14-121 Lab 17: Reviewing the Troubleshooting
Lab Environment
Review information to become familiar with the troubleshooting lab environment:

1. Access Your Lab Environment

2. Determine the Normal Cluster State

3. Use the vSphere Client to Examine the Lab Environment

4. Use vSphere ESXi Shell to Construct ESXCLI Commands

5. Use vSphere ESXi Shell to Examine the Lab Environment

14-122 Lab 18: Troubleshooting the Maintenance


Mode Issue
Diagnose and fix the vSAN cluster problem:

1. Troubleshoot the Problem

2. Review Your Findings and Fix the Problem

3. Clean Up for the Next Lab

14-123 Lab 19: Troubleshooting the vSAN


Datastore Capacity Increasing Issue
Diagnose and fix the vSAN cluster problem:

1. Troubleshoot the Problem

2. Review Your Findings and Fix the Problem

695
14-124 Lab 20: Troubleshooting the Two-Node
vSAN Cluster Configuration Issue
Diagnose and fix t he vSAN cluster problem:

1. Run the Break Script

2. Troubleshoot the Problem

3. Review Your Findings and Fix the Problem

4. Clean Up for the Next Lab

14-125 Lab 21: Troubleshooting the vSAN


Cluster Issue
Diagnose and fix the vSAN cluster problem:

1. Run the Break Script

2. Troubleshoot the Problem

3. Review Your Findings and Fix the Problem

4. Clean Up for the Next Lab

14-126 Lab 22: Troubleshooting the vSAN Node


Configuration Issue
Diagnose and fix the vSAN cluster problem:

1. Run the Break Script

2. Troubleshoot the Problem

3. Review Your Findings and Fix the Problem

4. Clean Up for the Next Lab

696
14-127 Lab 23: Troubleshooting the vSAN
Cluster Configuration Issue (1)
Diagnose and fix t he vSAN cluster problem:

1. Run the Break Script

2. Troubleshoot the Problem

3. Review Your Findings and Fix the Problem

4. Clean Up for the Next Lab

14-128 Lab 24: Troubleshooting the vSAN


Cluster Configuration Issue (2)
Diagnose and fix the vSAN cluster problem:

1. Run the Break Script

2. Troubleshoot the Problem

3. Review Your Findings and Fix the Problem

4. Clean Up for the Next Lab

14-129 Lab 25: Troubleshooting the vSAN


Cluster Configuration Issue (3)
Diagnose and fix the vSAN cluster problem:

1. Run the Break Script

2. Troubleshoot the Problem

3. Review Your Findings and Fix the Problem

4. Clean Up for the Next Lab

697
14-130Lab 26: Troubleshooting the vSAN
Cluster Configuration Issue ( 4)
Diagnose and fix t he vSAN cluster problem:

1. Run the Break Script

2. Troubleshoot the Problem

3. Review Your Findings and Fix the Problem

4. Clean Up for the Next Lab

14-131 Lab 27: Troubleshooting the vSAN


Cluster Datastore Capacity Reporting
Issue
Diagnose and fix the vSAN cluster problem:

1. Run the Break Script

2. Troubleshoot the Problem

3. Review Your Findings and Fix the Problem

4. Clean Up for the Next Lab

698
14-132 Review of Learner Objectives
• Explain which log files are useful for vSAN troubleshooting

• Use log files to help troubleshoot vSAN problems

14-133 Key Points


• The vSphere Client is a useful tool for troubleshooting vSAN problems.

• With ESXCLI commands and the vs an top utility, administrators can investigate problems
w ith specific hosts from the command line.

• You can use the RVC to view information about your vSAN environment, when unavailable
by using ESXCLI commands.

• Theesxc l i vsan health c l uster getandlistnamespacesgivevaluable


access to the health check feature even when vCenter Server is unavailable.

• Including log files as a troubleshooting tool gives administrators an in-depth v iew of vSAN
operations.

Questions?

699
700

You might also like