VMware Vsphere: Troubleshooting

Download as pdf or txt
Download as pdf or txt
You are on page 1of 134

VMware vSphere: Troubleshooting

Lab Manual
ESXi 7 and vCenter Server 7

VMware® Education Services


VMware, Inc.
www.vmware.com/education
VMware vSphere: Troubleshooting
Lab Manual
ESXi 7 and vCenter Server 7
Part Number EDU-EN-VSTS7-LAB (04-JUN-2021)

Copyright © 2021 VMware, Inc. All rights reserved. This manual and its accompanying
materials are protected by U.S. and international copyright and intellectual property laws.
VMware products are covered by one or more patents listed at
http://www.vmware.com/go/patents. VMware is a registered trademark or trademark of
VMware, Inc. in the United States and/or other jurisdictions. All other marks and names
mentioned herein may be trademarks of their respective companies. VMware vSphere®
vMotion®, VMware vSphere® High Availability, VMware vSphere® ESXi™ Shell, VMware
vSphere® Client™, VMware vSphere®, VMware vSAN™, VMware vRealize® Log Insight™ for
vCenter™, VMware vRealize® Log Insight™, VMware vRealize®, VMware vCloud Director®,
VMware vCloud Director® for Service Providers, VMware vCloud®, VMware vCenter®
Server Appliance™, VMware vCenter Server®, VMware View®, VMware Horizon® View™,
VMware Verify™, VMware Horizon® 7, VMware Horizon® 7, VMware Horizon® 7 on VMware
Cloud™ on AWS, VMware vSphere® Storage I/O Control, VMware PowerCLI™, Project
Photon OS™, VMware Photon™, VMware NSX®, VMware vCenter® Log Insight™, VMware
Go™, VMware ESXi™ and VMware ESX® are registered trademarks or trademarks of
VMware, Inc. in the United States and/or other jurisdictions.]

The training material is provided “as is,” and all express or implied conditions, representations,
and warranties, including any implied warranty of merchantability, fitness for a particular
purpose or noninfringement, are disclaimed, even if VMware, Inc., has been advised of the
possibility of such claims. This material is designed to be used for reference purposes in
conjunction with a training course.

The training material is not a standalone training tool. Use of the training material for self-
study without class attendance is not recommended. These materials and the computer
programs to which it relates are the property of, and embody trade secrets and confidential
information proprietary to, VMware, Inc., and may not be reproduced, copied, disclosed,
transferred, adapted or modified without the express written approval of VMware, Inc.

www.vmware.com/education
Typographical Conventions

The following typographical conventions are used in this course.

Conventions Usage and Examples

Monospace Identifies command names, command options, parameters, code


fragments, error messages, filenames, folder names, directory names,
and path names:

• Run the esxtop command.

• ... found in the /var/log/messages file.

Monospace Identifies user inputs:


Bold
• Enter ipconfig /release.

Boldface Identifies user interface controls:

• Click the Configuration tab.

Italic Identifies book titles:

• vSphere Virtual Machine Administration

<> Indicates placeholder variables:

• <ESXi_host_name>

• ... the Settings/<Your_Name>.txt file

www.vmware.com/education
www.vmware.com/education
Contents

Lab 1 Using the Command Line................................................................................................... 1


Task 1: Access Your Student Desktop System ...................................................................................................... 1
Task 2: Validate the vSphere Licenses ..................................................................................................................... 2
Task 3: Directly Access the DCUI of the ESXi Host ............................................................................................ 3
Task 4: Remotely Access the DCUI of the ESXi Host ........................................................................................ 4
Task 5: Use ESXCLI Commands to View Host Hardware Configuration..................................................... 5
Task 6: Use ESXCLI Commands to View Storage Information ....................................................................... 5
Task 7: Use ESXCLI Commands to View Virtual Switch Information ............................................................ 6
Lab 2 Using vim-cmd Commands ............................................................................................. 7
Task 1: Get VM Information............................................................................................................................................ 8
Task 2: Manage the ESXi Hosts ................................................................................................................................... 9
Task 3: Register a VM ...................................................................................................................................................... 9
Task 4: Power On a VM ................................................................................................................................................ 10
Task 5: Unregister a VM................................................................................................................................................ 10
Lab 3 Using Standalone ESXCLI and DCLI ...........................................................................11
Task 1: Log In to Standalone ESXCLI........................................................................................................................ 11
Task 2: Load the Digital Security Certificate from the vCenter Server System ..................................... 12
Task 3: Test the Digital Security Certificate from the vCenter Server System ...................................... 13
Task 4: (Optional) Add Credentials and Thumbprint for ESXCLI Commands.......................................... 14
Task 5: Use the DCLI to Manage vCenter Server .............................................................................................. 15
Lab 4 ESXi Command History...................................................................................................17
Task 1: View ESXi Command History ........................................................................................................................17

v
Lab 5 Monitoring NIC Teaming During Failover ................................................................ 19
Task 1: Verify the Distributed Switch Configuration .......................................................................................... 20
Task 2: Verify Network Operation on the ESXi Host ........................................................................................ 21
Task 3: Monitor the ESXi Host When the Active Link Goes Down ............................................................. 21
Task 4: Monitor the ESXi Host When the Standby Link Goes Down ........................................................ 22
Task 5: Reconfigure the Port Group pg-SA-Production-01 ........................................................................... 23
Lab 6 Monitoring and Recovering Distributed Switches ................................................25
Task 1: Display Distributed Switch Information.................................................................................................... 26
Task 2: Disable the Network Rollback Option ......................................................................................................27
Task 3: Recover from a Distributed Switch Failure ............................................................................................27
Task 4: Enable the Network Rollback Option...................................................................................................... 30
Task 5: Migrate Management Network.................................................................................................................. 30
Lab 7 Applying the Troubleshooting Methodology ......................................................... 31
Task 1: Run a Break Script ............................................................................................................................................ 31
Task 2: Narrow the Scope of the Problem to a VM.......................................................................................... 32
Task 3: Narrow the Scope of the Problem to the ESXi Host ........................................................................ 33
Task 4: Resolve the Problem ..................................................................................................................................... 34
Task 5: Verify the Solution .......................................................................................................................................... 35
Lab 8 Troubleshooting Network Problems.........................................................................37
Task 1: Run a Break Script ........................................................................................................................................... 38
Task 2: Verify That the System Is Not Functioning Properly ........................................................................ 39
Task 3: Troubleshoot and Resolve the Problem ................................................................................................ 40
Task 4: Verify the Solution........................................................................................................................................... 41
Lab 9 Investigating Disk Issues on ESXi .............................................................................. 43
Task 1: Run a Break Script ........................................................................................................................................... 43
Task 2: Create a Virtual Machine .............................................................................................................................. 44
Task 3: Troubleshoot the Problem .......................................................................................................................... 45
Task 4: Resolve the Problem ..................................................................................................................................... 45
Task 5: Verify the Solution .......................................................................................................................................... 46
Lab 10 Troubleshooting Storage Performance Issues .................................................. 47
Task 1: Generate VM Disk Activity........................................................................................................................... 48
Task 2: Start esxtop Utility and Review Disk Statistics.................................................................................... 48
Task 3: Monitor Performance by Storage Adapter .......................................................................................... 49

vi
Task 4: Monitor Performance by Storage Device ............................................................................................. 50
Task 5: Monitor Storage Performance by VM ..................................................................................................... 51
Lab 11 Troubleshooting VM Power-On Problems............................................................ 53
Task 1: Create and Power On the VM .................................................................................................................... 54
Task 2: Troubleshoot Problems or Errors ............................................................................................................. 55
Task 3: Resolve the Problem ..................................................................................................................................... 56
Task 4: Verify the Solution.......................................................................................................................................... 56
Lab 12 Troubleshooting VM Snapshot Problems ..............................................................57
Task 1: Power On the VM............................................................................................................................................ 58
Task 2: Troubleshoot Problems or Errors ............................................................................................................. 58
Task 3: Resolve the Problem ..................................................................................................................................... 59
Task 4: Verify the Solution.......................................................................................................................................... 60
Lab 13 Working with VM Snapshots Using the Command Line .................................. 61
Task 1: Power On a VM ................................................................................................................................................ 62
Task 2: Create Snapshots and Monitor Their Creation .................................................................................... 62
Task 3: Monitor Snapshot Deletion.......................................................................................................................... 62
Lab 14 Troubleshooting Storage Problems ....................................................................... 63
Task 1: Run a Break Script ........................................................................................................................................... 63
Task 2: Verify That the System Is Not Functioning Properly ........................................................................ 65
Task 3: Troubleshoot and Resolve the Problem ................................................................................................ 66
Task 4: Verify the Solution.......................................................................................................................................... 66
Lab 15 Troubleshooting Cluster Problems.......................................................................... 67
Task 1: Create a Cluster and Power Off VMs ...................................................................................................... 67
Task 2: Run the Break Script Break-8-1.ps1 .......................................................................................................... 68
Task 3: Run a Break Script .......................................................................................................................................... 68
Task 4: Verify That the System Is Not Functioning Properly ........................................................................ 70
Task 5: Troubleshoot and Resolve the Problem ................................................................................................ 70
Task 6: Verify the Solution ............................................................................................................................................71
Lab 16 Resolving VM Power-On Problems .........................................................................73
Task 1: Run a Break Script ........................................................................................................................................... 74
Task 2: Troubleshoot the Problem ...........................................................................................................................75
Task 3: Resolve the Problem ......................................................................................................................................75
Task 4: Verify the Solution.......................................................................................................................................... 76

vii
Lab 17 Troubleshooting VM Problems .................................................................................. 77
Task 1: Run a Break Script ........................................................................................................................................... 78
Task 2: Verify That the System Is Not Functioning Properly ........................................................................ 79
Task 3: Troubleshoot and Resolve the Problem ................................................................................................ 80
Task 4: Verify the Solution.......................................................................................................................................... 80
Lab 18 Restarting ESXi Management Agents .................................................................... 81
Task 1: Restart Management Agents Using the DCUI ....................................................................................... 81
Task 2: Restart Management Agents from the Command Line ................................................................... 82
Lab 19 Troubleshooting ESXi Host Disconnection Problems...................................... 83
Task 1: Run a Break Script ........................................................................................................................................... 83
Task 2: Troubleshoot the Problem .......................................................................................................................... 84
Task 3: Resolve the Problem ..................................................................................................................................... 84
Task 4: Verify the Solution.......................................................................................................................................... 84
Lab 20 Troubleshooting vCenter Server Connection Problems ............................... 85
Task 1: Run a Break Script ........................................................................................................................................... 86
Task 2: Troubleshoot the Problem .......................................................................................................................... 86
Task 3: Resolve the Problem ..................................................................................................................................... 87
Task 4: Verify the Solution.......................................................................................................................................... 87
Lab 21 Troubleshooting vCenter Server and ESXi Host Problems .......................... 89
Task 1: Run a Break Script ........................................................................................................................................... 90
Task 2: Verify That the System Is Not Functioning Properly ......................................................................... 91
Task 3: Troubleshoot and Resolve the Problem ................................................................................................. 91
Task 4: Verify the Solution.......................................................................................................................................... 92
Lab 22 Appendix: Troubleshooting Network Communication Failures .................. 93
Task 1: Verify the IP Configuration ........................................................................................................................... 94
Task 2: Verify the VLAN Configuration ................................................................................................................. 94
Task 3: Verify the Speed, Duplex, or MTU Configuration ............................................................................... 94
Task 4: Verify the Uplink Configuration ................................................................................................................. 95
Task 5: Verify the Teaming Configuration ............................................................................................................ 95
Task 6: Verify the Network Link Status ................................................................................................................. 95
Task 7: Investigate a Host Failure ............................................................................................................................ 96
Task 8: Investigate a Network Failure .................................................................................................................... 96
Task 9: Investigate a Communications or Port Failure ..................................................................................... 97

viii
Lab 23 Appendix: Troubleshooting Storage Failures .................................................... 99
Task 1: Follow Storage Troubleshooting Procedures..................................................................................... 100
Task 2: Investigate a VM Disk Failure .................................................................................................................... 101
Task 3: Investigate an I/O Overload Problem .................................................................................................... 101
Task 4: Investigate an iSCSI Storage Failure ..................................................................................................... 102
Task 5: Investigate an NFS Storage Failure ....................................................................................................... 103
Task 6: Investigate a Fibre Channel Storage Connectivity Failure ............................................................ 104
Task 7: Investigate a FCoE Failure ........................................................................................................................ 104
Task 8: Troubleshoot a Path Failure ..................................................................................................................... 105
Task 9: Troubleshoot a Local Disk Failure .......................................................................................................... 106
Task 10: Troubleshoot a Storage Array Failure ................................................................................................ 106
Task 11: Troubleshoot a Storage Site Disaster ...................................................................................................107
Lab 24 Appendix: Troubleshooting Cluster Failures .................................................... 109
Task 1: Troubleshoot a vSphere vMotion Migration Failure .......................................................................... 110
Task 2: Investigate a Management Agent Problem ........................................................................................... 111
Task 3: Reset Migrate Enabled and Verify the Result ...................................................................................... 112
Task 4: Investigate an HA Configuration Problem ............................................................................................ 112
Task 5: Investigate an HA Resources Problem................................................................................................... 113
Task 6: Investigate Why DRS Never Migrates ...................................................................................................114
Task 7: Investigate Why DRS Rarely Migrates ...................................................................................................114
Task 8: Investigate DRS Erratic Behavior .............................................................................................................114
Lab 25 Appendix: Troubleshooting Virtual Machine Failures ...................................... 115
Task 1: Investigate a CID Problem ...........................................................................................................................116
Task 2: Investigate a Quiesced VM Problem .......................................................................................................116
Task 3: Investigate a General Snapshot Failure .................................................................................................. 117
Task 4: Investigate a Power-On Failure ................................................................................................................ 117
Task 5: Investigate a VM That Shows an Invalid or Orphaned State ......................................................... 118
Task 6: Investigate a VMware Tools Installation Failure .................................................................................. 118
Lab 26 Appendix: Troubleshooting ESXi Host and vCenter Server System
Failures.............................................................................................................................................. 119
Task 1: Investigate a Certificate Problem ............................................................................................................ 120
Task 2: Replace Self-Signed Certificate with CA-Generated Certificate ................................................ 120
Task 3: Restart the vCenter Server Service ...................................................................................................... 120
Task 4: Investigate a vCenter Server Database Free Space Problem ...................................................... 121

ix
Task 5: Investigate a vCenter Server PostgreSQL Problem ........................................................................ 121
Task 6: Investigate a Purple Diagnostic Screen................................................................................................. 122
Task 7: Investigate Why an ESXi Host Is Unresponsive................................................................................. 122
Answer Key ................................................................................................................................... 123

x
Lab 1 Using the Command Line

Objective and Tasks


Use the command line to review the ESXi host configuration:

1. Access Your Student Desktop System

2. Validate the vSphere Licenses

3. Directly Access the DCUI of the ESXi Host

4. Remotely Access the DCUI of the ESXi Host

5. Use ESXCLI Commands to View Host Hardware Configuration

6. Use ESXCLI Commands to View Storage Information

7. Use ESXCLI Commands to View Virtual Switch Information

Task 1: Access Your Student Desktop System


You access and log in to your student desktop system.

Use the following information from the class configuration handout:

• Student desktop system name or IP address

• Student desktop system user name

• Student desktop system password

1. Verify that you are successfully logged into the student desktop.

NOTE

If not, log in to your student desktop by entering vclass\administrator as the user name and
VMware1! as the password.

1
Task 2: Validate the vSphere Licenses
You log in to the vCenter Server system and determine whether the vSphere licenses are valid. If
the licenses are expired, you add valid licenses to the vCenter Server system and ESXi hosts.

1. Open the Firefox web browser.

2. Select the vSphere Client (SA-VCSA-01) bookmark in the vSphere Site-A folder to connect
to vCenter Server Appliance at https://sa-vcsa-01.vclass.local/ui.

3. On the VMware vSphere Login page, enter the vCenter Server user name
[email protected] and password VMware1! and click Login.
4. Select Menu > Administration.

5. In the navigation pane, click Licenses.

6. Click Assets.

7. Verify that the required assets are licensed.

The following assets should be licensed:

• sa-vcsa-01.vclass.local under VCENTER SERVER SYSTEMS

• sa-esxi-01.vclass.local under HOSTS

• sa-esxi-02.vclass.local under HOSTS

• sa-esxi-03.vclass.local under HOSTS

An asset is licensed if the license expiration date is in the future.

8. If the licenses are not expired, go to task 3.

9. If any license has expired, obtain new licenses from this link.

2
Task 3: Directly Access the DCUI of the ESXi Host
You directly access the ESXi host’s direct console user interface (DCUI).

Accessing the DCUI directly is useful when troubleshooting vSphere problems.

The VM console provides access to the DCUI of the ESXi host.

1. Click on the CONSOLES tab to open a list of available consoles.

2. In the list of VMs, find the VM named SA-ESXi-01.


a. Click SA-ESXi-01 to switch to the console for SA-ESXi-01.

3. Click in the console window, press F2, and log in to the host by entering root as the ESXi
host user name and VMware1! as the password.

4. Use the up and down arrow keys to view the menu selections.

5. Navigate to the Troubleshooting Options menu and press Enter.

6. If vSphere ESXi Shell is disabled, select Enable ESXi Shell and press Enter to activate it.

7. If SSH is disabled, select Enable SSH and press Enter to activate it.

8. Press Esc until you are logged out of the DCUI.

9. Press Ctrl+Alt to release the insertion point from the ESXi console window.

10. Press Esc until you are logged out of the DCUI.

11. Repeat steps for sa-esxi-02.vclass.local and sa-esxi-03.vclass.local.

12. Return to the student desktop.

a. Click on the CONSOLES tab to open a list of available consoles.


b. Click STUDENT-A-01 to switch to the console for the student desktop.

3
Task 4: Remotely Access the DCUI of the ESXi Host
You access the ESXi host’s DCUI from an MTPuTTY session.

Accessing the DCUI remotely is useful when troubleshooting vSphere problems.

1. Minimize the Firefox browser on your desktop system.

2. On the desktop, double-click the MTPuTTY icon.

3. Double-click the entry for the SA-ESXI-01 host.

4. If a security warning displays, click Yes.

The session automatically connects as root.

a. If the connection does not automatically complete, log in manually by entering the ESXi
host user name root and password VMware1!

5. At the command prompt, enter dcui.

6. Press F2 to display the login screen and log in by entering the ESXi host user name root
and password VMware1!

7. View the default gateway of the host.

a. Using the down arrow key, select Configure Management Network and press Enter.

b. Select IPv4 Configuration and view the IP configuration in the right pane.

c. Press Esc to return to the main menu.

8. Use the up and down arrow keys to view the other menu selections.

You must not change any settings.

9. Press Esc until you are logged out of the DCUI and press Ctrl+C to exit the DCUI process.

10. Press Ctrl+C to exit the DCUI process.

4
Task 5: Use ESXCLI Commands to View Host Hardware Configuration
You use the CLI to view the hardware configuration of the vSphere environment.

1. If the SSH session closed, double-click the entry for the SA-ESXI-01 host in the MTPuTTY
utility.

2. View the hardware configuration by using the command prompt.

a. Enter esxcli hardware clock get to view the time and date on the host.

b. Enter esxcli hardware cpu list | less to view the number of CPUs on
the host.

You must press the space bar to scroll through the output. When done, press q to exit
the less utility.

c. Enter esxcli hardware memory get to view the host memory.

d. Enter esxcli hardware pci list and find VMkernel Name: vmnic7 to
identify the PCI address that it is listed under.

Task 6: Use ESXCLI Commands to View Storage Information


You use the CLI to view the storage configuration of the vSphere environment.

1. View the storage configuration by using the command prompt.

a. Enter esxcli storage vmfs extent list to view the number of VMFS
extents that are available to the host.

b. Enter esxcli storage core adapter list to view the SCSI host bus
adapters.

c. Enter esxcli storage core path stats get to view the SCSI path
statistics.

d. Enter esxcli storage filesystem list to view the boot partitions and the
datastores that are available to each host.

e. Enter esxcli storage nfs list to view the information about the NFS 3
datastores that are available on this host.

5
Task 7: Use ESXCLI Commands to View Virtual Switch Information
You use the CLI to view the virtual switch configuration of the vSphere environment.

1. View the virtual switch configuration by using the command prompt.

a. Enter esxcli network ip dns server list to view the IP address of the
DNS server.

b. Enter esxcli network nic list to view the physical NICs.

c. Enter esxcli network vswitch standard list to view that two standard
switches are available to the host.

d. Enter esxcli network vswitch dvs vmware list | more to view the
available distributed switches.

e. Enter esxcli network vswitch standard portgroup list to view


the standard switch port groups.

f. Enter esxcli network ip interface list | less to view the VMkernel


interfaces on the host.

g. Enter esxcli network ip interface ipv4 get to view the IP address


and subnet mask of the VMkernel interfaces on the host.

h. Enter esxcli network ip route ipv4 list to view the default gateway
address for the VMkernel interfaces on the host.

2. Close the SA-ESXi-01 tab to end the SSH session.

6
Lab 2 Using vim-cmd Commands

Objective and Tasks


Use vim-cmd commands to manage ESXi hosts and VMs:

1. Get VM Information

2. Manage the ESXi Hosts

3. Register a VM

4. Power On a VM

5. Unregister a VM

NOTE

For useful information related to this lab, see "VMware ESXi vim cmd Command: A Quick
Tutorial" at https://communities.vmware.com/docs/DOC-31025. Before starting the lab,
review this reference and then use the information, as needed, while performing the lab tasks.

7
Task 1: Get VM Information
You use vim-cmd commands to list information about the VMs that run on the sa-esxi-
03.vclass.local host. You also use vim-cmd commands to change the power state of a VM.

1. Use MTPuTTY to establish an SSH session with sa-esxi-03.vclass.local.

2. List the commands available under the vmsvc namespace.


vim-cmd vmsvc
3. List and review information about the VMs that are registered on the ESXi host.

a. List information about the VMs that are registered on the ESXi host.

vim-cmd vmsvc/getallvms
b. Record the VMID for the Win-6 VM. __________

4. Get the configuration of the VM running on the ESXi host.

vim-cmd vmsvc/get.guest <VMID of Win-6>


Information about VM disk capacity does not appear in the command output because Win-6
is powered off. The get.guest command only provides disk capacity information if the
VM is powered on and has VMware Tools installed.

5. List the power-related commands under the vmsvc namespace.

vim-cmd vmsvc/power
6. View the power state of Win-6.

vim-cmd vmsvc/power.getstate <VMID of Win-6>


The command output should state that Win-6 is powered off.

7. Power on Win-6.

vim-cmd vmsvc/power.on <VMID of Win-6>


The command should return the ESXi command prompt and boot the VM.

8. View the power state of Win-6.

vim-cmd vmsvc/power.getstate <VMID of Win-6>


9. Wait for the OS and VMware Tools services to fully start and then get the configuration of
Win-6 running on the ESXi host.

vim-cmd vmsvc/get.guest <VMID of Win-6> | less


10. From the command output, determine the disk capacity for this VM.

8
Task 2: Manage the ESXi Hosts
You use vim-cmd commands to place the sa-esxi-03.vclass.local host in maintenance mode,
take it out of maintenance mode, and view host configuration information.

1. Place sa-esxi-03 in maintenance mode.

vim-cmd hostsvc/maintenance_mode_enter
The operation times out because Win-6 is powered on, and the host does not belong to a
fully automated DRS cluster.

2. Use vim-cmd to shut down Win-6.

3. Place sa-esxi-03 in maintenance mode.

4. View the configuration of host sa-esxi-03.

vim-cmd hostsvc/hostsummary | less


5. In the command output, find information about the ESXi host's memory size, CPU
information, number of NICs, and number of HBAs.

6. Take sa-esxi-03 out of maintenance mode.

vim-cmd hostsvc/maintenance_mode_exit

Task 3: Register a VM
You use vim-cmd commands to register the Win-11 VM with the host.

The Win-11 files are on the Shared3 datastore.

1. Register Win-11 with the vCenter Server system.

vim-cmd solo/registervm /vmfs/volumes/Shared3/Win-11/Win-


11.vmx
The command returns the VMID of the newly registered VM.

2. List all the VMs on sa-esxi-03.vclass.local.

vim-cmd vmsvc/getallvms
Win-11 should appear in the list.

3. Verify that Win-11 appears in the vSphere Client inventory.

a. In the Firefox bookmarks toolbar, click the vSphere Client (SA-VCSA-01) bookmark in
the vSphere Site-A folder.

b. On the login page, enter [email protected] as the user name and


VMware1! as the password.
c. Verify that Win-11 appears in the Hosts and Clusters inventory.

9
Task 4: Power On a VM
You use vim-cmd commands to power on the Win-11 VM.

1. Return to the MTPuTTY session for sa-esxi-03.

2. Use vim-cmd to get the VMID for Win-11.


3. View the power state of Win-11.

The command output should state that Win-11 is powered off.

4. Power on Win-11 using vim-cmd.

5. View the power state of Win-11 again and verify that this VM is powered on.

Task 5: Unregister a VM
You use vim-cmd commands to unregister the Win-11 VM from the host and the vCenter
Server system.

1. Use vim-cmd to power off Win-11.

The VM must be powered off before it can be unregistered.

2. Unregister Win-11.

vim-cmd vmsvc/unregister <VMID of Win-11>


3. Verify that Win-11 is unregistered.

vim-cmd vmsvc/getallvms
Win-11 should not appear in the list.

4. View Win-11 in the vSphere Client inventory.

Win-11 should be in an orphaned state. An orphaned VM is one that exists in the vCenter
Server database but is no longer present on the ESXi host.

a. If Win-11 is not in an orphaned state, refresh the vSphere Client to update the navigation
pane.

5. In the vSphere Client, remove Win-11 from the Hosts and Clusters inventory.

10
Lab 3 Using Standalone ESXCLI and
DCLI

Objective and Tasks


Use Standalone ESXCLI and DCLI to review the ESXi host configuration and the data center
configuration:

1. Log In to Standalone ESXCLI

2. Load the Digital Security Certificate from the vCenter Server System

3. Test the Digital Security Certificate from the vCenter Server System

4. (Optional) Add Credentials and Thumbprint for ESXCLI Commands

5. Use the DCLI to Manage vCenter Server

Task 1: Log In to Standalone ESXCLI


You start an MTPuTTY session to log in to the Ubuntu-CLI VM so that you can use Standalone
ESXCLI.

1. On your student desktop system, double-click the MTPuTTY icon.

2. In the Servers pane on the left, double-click Ubuntu-CLI.

3. If a PuTTY Security Alert dialog box appears, click Yes to accept and cache the server’s
host key.

You are automatically logged in as the root user.

11
Task 2: Load the Digital Security Certificate from the vCenter Server
System
You load the digital security certificate from the vCenter Server system into the Ubuntu VM for
use with ESXCLI commands.

With this digital security certificate, you can run commands on ESXi hosts without entering a
digital thumbprint for each ESXi host.

NOTE

All commands are case-sensitive.

1. To examine the CPU hardware on sa-esxi-01, enter the esxcli command from the
vSphere CLI VM.
esxcli -s sa-esxi-01.vclass.local hardware cpu list
2. Enter root for the user name.
This command fails. For security reasons, you are required to enter the thumbprint of the
target ESXi host. Instead of manually entering a long thumbprint, you will load the digital
certificate from the vCenter Server system.
3. Minimize the MTPuTTY utility but do not close it.
4. Return to the Firefox web browser, open a new tab, and go to https://sa-vcsa-
01.vclass.local.
5. Click Download trusted root CA certificates.

6. Select Save File and click OK.

7. Open Windows File Explorer and go to the Downloads folder (select This PC > Downloads)
on the student desktop.

8. Right-click download.zip and select Extract All.

9. Click Browse and navigate to C:\Materials\Downloads\Certs\vcsa-cert.

10. Click OK and click Extract.

11. Use Windows File Explorer to navigate to


C:\Materials\Downloads\Certs\vcsa-cert\certs\lin.
Two files are in the folder. Both files begin with an eight-character hexadecimal code, for
example, d819a6fb.0 and d819a6fb.r0. The d819a6fb.0 file is the certificate. The
d819a6fb.r0 file is a certificate revocation list (CRL) file.
12. Rename the d819a6fb.0 file to sa-vcsa-01.crt.

The file extension must be .crt using lowercase letters.

12
13. Click the WinSCP utility icon on the student desktop taskbar.

14. Select the Ubuntu-CLI site and click Login to open an SCP session to the Ubuntu-CLI VM.

15. If you see a security warning, click Yes to add the thumbprint to the cache.

16. In the left pane, navigate to C:\Materials\Downloads\Certs\vcsa-


cert\certs\lin.
17. In the right pane, navigate to the /usr/local/share/ca-certificates folder.

Different operating systems use different folders and procedures to load the digital
certificates of certificate authority (CA) servers. The procedure used in this lab is required for
Ubuntu Linux servers. If you host vSphere CLI software on a different OS, you must look up
the required procedure and file location for that OS.

18. Select the sa-vcsa-01.crt certificate file in the left pane and click Upload.

19. Click OK to upload the file.

20. Close the WinSCP window and return to MTPuTTY.

21. In the Ubuntu-CLI SSH session, enter the update-ca-certificates command.

The command output shows that a new certificate is added.

22. Leave your MTPuTTY session open.

Task 3: Test the Digital Security Certificate from the vCenter Server
System
You test the vCenter Server system's digital security certificate that you loaded into the Ubuntu
VM for use with ESXCLI commands.

1. Use the Ubuntu-CLI VM session in MTPuTTY and enter the command to change the
directory to where the certificate is stored.

cd /usr/local/share/ca-certificates/
You must either be in the same directory in which the certificate file is stored or use the full
path to the certificate file when you enter a command.

2. Enter the command to test your certificate.

esxcli --vihost sa-esxi-01.vclass.local --server sa-vcsa-


01.vclass.local --cacertsfile sa-vcsa-01.crt hardware cpu list
The name of the server must be in FQDN form to match the name on the security certificate.
3. When prompted for a user name, enter [email protected].

4. When prompted for a password, enter VMware1!.

You should see a complete configuration description of all CPUs on sa-esxi-01.vclass.local.

13
Task 4: (Optional) Add Credentials and Thumbprint for ESXCLI
Commands
You add the user name, password, and digital thumbprint of the sa-esxi-01 host into the Ubuntu
VM credential store for use with ESXCLI commands.

1. Return the MTPuTTY utility session to the Ubuntu-CLI VM and enter this command.

cd /root/vmware-vsphere-cli-distrib/apps/general
2. Try to display a list of the CPU hardware by entering this command.

esxcli -s sa-esxi-01.vclass.local hardware cpu list


3. When prompted for a user name, enter root.

The command fails, but it shows the thumbprint of the ESXi host.

4. Add the user name and password for the sa-esxi-01.vclass.local ESXi host to the local
credentials store.

a. Add the root user.

./credstore_admin.pl add -s sa-esxi-01.vclass.local -u


root
b. When prompted for the password, enter VMware1!

When adding credentials to the credential store, you always add the user name and
password before you add the thumbprint.

5. Add the thumbprint to the credentials store.

a. Add the thumbprint.

./credstore_admin.pl add -s sa-esxi-01.vclass.local -t


<thumbprint>
b. Replace <thumbprint> with the thumbprint provided in the error message that you
received, for example,

1D:67:07:E9:58:FC:97:81:AC:17:8F:BF:0E:74:E9:8F:BD:61:27:D5
The thumbprint is case-sensitive and must match exactly.

14
6. Display a list of the CPU hardware.

esxcli -s sa-esxi-01.vclass.local hardware cpu list


This command is the same one that failed in an earlier step. Now the command should
successfully connect to the sa-esxi-01.vclass.local host and display the CPU hardware.

You can use the following commands to manage the credentials store:

• ./credstore_admin.pl help
• ./credstore_admin.pl list
• ./credstore_admin.pl add
• ./credstore_admin.pl remove
• ./credstore_admin.pl clear
To remove a bad thumbprint, run this command:

• ./credstore_admin.pl remove -s server-name -t


<thumbprint>
To remove a bad user name and password, use this command:

• ./credstore_admin.pl remove -s server-name -u <user>

Task 5: Use the DCLI to Manage vCenter Server


You use the Data Center CLI from the Ubuntu-CLI VM to manage the vCenter Server system.

1. Return the MTPuTTY utility session to the Ubuntu-CLI VM and enter the command to start a
DCLI interactive session to vCenter Server.

dcli +interactive +server sa-vcsa-01.vclass.local +cacert-


file /usr/local/share/ca-certificates/sa-vcsa-01.crt
2. At the dcli> prompt, enter the command to list the datastores visible to vCenter Server.

com vmware vcenter datastore list


3. When prompted, enter [email protected] as the user name.

4. When prompted, enter VMware1! as the password.

15
5. Enter y to save the credentials.
You can use the following commands to manage the credentials store:

• +credstore-list
• +credstore-add
• +credstore-remove
6. Enter exit to quit the DCLI.

16
Lab 4 ESXi Command History

Objective and Tasks


Determine commands run by each user in the ESXi Shell command history:

1. View Command History

NOTE

For information about vSphere ESXi Shell logins and commands, see VMware knowledge
base article 2004810 at https://kb.vmware.com/kb/2004810.

Task 1: View ESXi Command History


You view the command history on sa-esxi-03.vclass.local.

An administrator might run commands directly on an ESXi host that cause downtime or
disconnection. In the same session, you can use the up arrow key to find which commands were
previously run. However, if the session is closed or you log in as a different user, you must use a
different method to view the history of the commands that were previously run.
1. Use MTPuTTY to connect to sa-esxi-03.vclass.local.

2. Determine the most recent date and time that sa-esxi-03 was placed in maintenance mode
using the vim-cmd command.

a. Use /var/log/shell.log to determine the most recent date and time that sa-
esxi-03 was placed into maintenance mode using the vim-cmd command.

b. Record the user that ran the vim-cmd command. __________

c. Record the date and time that the command was run. __________

3. Use /var/log/auth.log to determine the date and time that the user logged in and
the IP address from which the user logged in.

17
18
Lab 5 Monitoring NIC Teaming During
Failover

Objective and Tasks


Monitor NIC teaming behavior when one of the links in the team goes down:

1. Verify the Distributed Switch Configuration

2. Verify Network Operation on the ESXi Host

3. Monitor the ESXi Host When the Active Link Goes Down

4. Monitor the ESXi Host When the Standby Link Goes Down

5. Reconfigure the Port Group pg-SA-Production-01

NOTE

For useful information about the NIC teaming failover process, see the following references.
Review these references before you start the lab and use the information, as needed, while
performing the lab tasks.

Reference Link

NIC teaming in ESXi and ESX https://kb.vmware.com/kb/1004088

Configuring NIC teaming, failover, and load https://docs.vmware.com/en/VMware-


balancing on standard switches and distributed vSphere/index.html
switches Search for configure NIC teaming.

19
Task 1: Verify the Distributed Switch Configuration
You verify that networking for the sa-esxi-01, sa-esxi-02, and sa-esxi-03 hosts is configured
correctly on the dvs-SA-Datacenter distributed switch.

1. Log in to the vSphere Client.

2. In the vSphere Client, reset all the triggered alarms to return them to a normal state.

3. Select Menu > Networking.

4. Expand the dvs-SA-Datacenter distributed switch and select pg-SA-Production-01.

5. Select ACTIONS > Edit Settings > Teaming and failover.

6. Move Uplink 5 to Unused uplinks and click OK.

7. Click OK on the warning pop-up window to confirm that no active uplinks exist.
8. Add Uplink 5 as a standby uplink on pg-SA-Production-02.

NOTE

Ensure that you add the uplink to pg-SA-Production-02, and not pg-SA-Production-01.

9. Verify that the pg-SA-Production-02 distributed port group consists of two uplinks: Uplink 6
(active uplink) and Uplink 5 (standby uplink).

10. Verify that vmnic4 is assigned to Uplink 5 and vmnic5 is assigned to Uplink 6.

11. Verify that the linux-a-07 VM is connected to pg-SA-Production-02.

20
Task 2: Verify Network Operation on the ESXi Host
You verify that networking on sa-esxi-02.vclass.local is functioning properly by pinging the
gateway from the linux-a-07 VM.

1. Power on the linux-a-07 VM and open a web console from the vSphere Client.

2. Log in to the VM.

a. Enter root for the user name.

b. Enter VMware1! for the password.

3. From the linux-a-07 VM, ping the gateway (172.20.11.10).

The ping should be successful.

Task 3: Monitor the ESXi Host When the Active Link Goes Down
You bring Uplink 6 (active link) down and monitor the behavior of the ESXi host sa-esxi-
02.vclass.local.

1. Start an MTPuTTY session with sa-esxi-02.vclass.local.

2. View the uplinks in use.

a. Run the esxtop command.

b. Enter n to view the uplinks in use.

Q1. Which uplink is used by linux-a-07 VM?


A1. vmnic5, the active uplink

3. Take down Uplink 6 (vmnic5) and monitor the behavior of sa-esxi-02.vclass.local.

a. Start a second SSH session with sa-esxi-02.vclass.local.

b. Enter the esxcli command to take down vmnic5.


esxcli network nic down -n vmnic5
4. Verify network connectivity to sa-esxi-02.

a. From linux-a-07, ping the gateway (172.20.11.10).

The ping should continue to be successful.

5. Return to the esxtop display and verify the uplink that the VM is using.

Q2. Which uplink is now used by the linux-a-07 VM?


A2. vmnic4, the standby uplink

6. In the vSphere Client, check for messages related to vmnic5 being down on sa-esxi-
02.vclass.local.

Q3. What messages did you find?


A3. Network uplink
Physical
On sa-esxi-02.vclass.local's Summary tab, the critical alarm
On sa-esxi-02.vclass.local's Monitor tab, the Events pane shows the same alarm but with a little more information, informing you that
redundancy
NIC vmnic5 is downlost .
appears.

21
7. View the log files on sa-esxi-02 for any entries related to vmnic5 being down.

Q4. What log entries did you find?


hostd.log
vmkernel.log
Setting
[vmnic5]
vmnic5:
A4.
In the
link
Taking
link vobd.log
down on link
down
down
In the Physical
physical
...
notification ... NICvmnic5
adapter vmnic5... and
file, the following messages are posted:
files, the following message is posted: .

8. Enter the command to bring vmnic5 back online.


esxcli network nic up -n vmnic5
9. View the log files on sa-esxi-02 for any entries related to vmnic5.

Q5. What log entries did you find?


A5.
In the
hostd.log
vmkernel.log
vmnic5: link
link up
device vobd.log
event
upUp Physical
received
notification
notification
In the

... NIC vmnic5 is up


... and
file, the following messages are posted:
files, the following message is posted: .

Task 4: Monitor the ESXi Host When the Standby Link Goes Down
You bring Uplink 5 (standby link) down and monitor the behavior of the ESXi host sa-esxi-
02.vclass.local.

1. Using the vSphere Client, view the configuration on pg-SA-Production-02 to verify that
Uplink 5 is a standby uplink and Uplink 6 is an active uplink.

2. Enter the command to take down Uplink 5 (vmnic4) and monitor the behavior of sa-esxi-
02.vclass.local.

esxcli network nic down -n vmnic4


3. Verify network connectivity to sa-esxi-02.vclass.local.

a. From linux-a-07, ping the gateway (172.20.11.10).

The ping should continue to be successful.

4. Return to the esxtop display and verify the uplinks that the VM is using.
Q1. Which uplink is now used by the VM?
A1. vmnic5, the active uplink

5. Enter the command to bring vmnic4 back online.

esxcli network nic up -n vmnic4

22
Task 5: Reconfigure the Port Group pg-SA-Production-01
Using best practices, you configure pg-SA-Production-01 to ensure network reliability if an
outage occurs.

1. Return to the Networking view in the vSphere Client.

2. Edit the Settings on pg-SA-Production-01.

3. Set Uplink 5 as the active uplink and Uplink 6 as the standby uplink.

23
24
Lab 6 Monitoring and Recovering
Distributed Switches

Objective and Tasks


Use command-line tools to monitor distributed switches and recover from a distributed switch
failure:

1. Display Distributed Switch Information

2. Disable the Network Rollback Option

3. Recover from a Distributed Switch Failure

4. Enable the Network Rollback Option

5. Migrate the Management Network

NOTE

For information about monitoring distributed switches and recovering from a distributed
switch failure, see the following references. Review these references before you start the lab
and use the information, as needed, while performing the lab tasks.

Reference Link

Locating the connection ID for an uplink/vmnic https://kb.vmware.com/kb/2053259


in a vSphere Distributed Switch (2053259)

Adding an ESX host into a Distributed Virtual https://kb.vmware.com/kb/1020736


Switch fails with the error: Unable to Create
Proxy DVS (1020736)

Configuring vSwitch or vNetwork Distributed https://kb.vmware.com/kb/1008127


Switch from the command line in ESXi/ESX
(1008127)

25
Task 1: Display Distributed Switch Information
You run the net-dvs command to display information about the dvs-SA-Datacenter distributed
switch configuration.

The command retrieves this information from the /etc/vmware/dvsdata.db binary file.
This file is maintained by the ESXi host and is updated at 5-minute intervals.

1. Use MTPuTTY to log in to sa-esxi-02.vclass.local.

2. Display the output for the distributed switch configuration one page at a time.

net-dvs | less
3. Find information about the distributed switch.

a. Find the switch UUID.

The UUID is the long hexadecimal string that follows the word switch.

b. Identify how many uplinks are connected to the switch.

Scroll down and look for common.uplinkPorts.

c. Identify the ports that the uplinks are connected to.

Scroll down and look for host.uplinkPorts.

d. Identify the MTU for this switch.

Find the mtu string.

e. Verify that Cisco Discovery Protocol (CDP) is enabled for this switch.

CDP is enabled when CDP is set to listen, advertise, or advertise & listen.

26
Task 2: Disable the Network Rollback Option
In the vSphere Client, you disable the network rollback option. The network rollback feature
prevents the ESXi hosts from disconnecting from the management network.

By disabling this option, you force the ESXi host to disconnect from the management network.

1. Open a new Firefox tab.

2. In the Firefox bookmarks toolbar, select vSphere Client (SA-VCSA-01) from the vSphere
Site-A folder.

3. At the login window, enter [email protected] as the user name and


VMware1! as the password.
4. Select sa-vcsa-01.vclass.local in the navigation pane.

5. Click the Configure tab.

6. Under Settings, select Advanced Settings and click EDIT SETTINGS.

7. Click the filter icon next to the Name column to search for parameters with the word
rollback.
8. Change config.vpxd.network.rollback to false and click SAVE.

9. Remain logged in to the vSphere Client.

Task 3: Recover from a Distributed Switch Failure


You might encounter a situation where the distributed switch is misconfigured, causing you to
lose connectivity to your ESXi hosts.

You recover connectivity to your ESXi hosts by creating a standard switch from the command
line.

The DCUI provides an option to create a standard switch, but this option is disruptive and can
cause you to lose much of your distributed switch configuration. Instead, you can manually create
a standard switch from the command line. By manually creating a standard switch, you can
control the vmnics and VMkernel interfaces that get migrated to the new standard switch.

1. If necessary, log in to the vSphere Client.

2. In the pg-SA-Management network for sa-esxi-02.vclass.local, unassign the uplinks (Uplink 1


and Uplink 2) from the dvs-SA-Datacenter distributed switch.

Hint: Right-click dvs-SA-Datacenter in the inventory and select Add and Manage Hosts.

NOTE

Only unassign the physical adapters. Do not modify anything else.

27
3. Verify that you can no longer ping sa-esxi-02.

Hint: Use MTPuTTY to log in to sa-esxi-03.vclass.local and try to ping sa-esxi-02.

In the vSphere Client inventory, sa-esxi-02.vclass.local should eventually appear as Not


responding.
4. Log in to the DCUI for sa-esxi-02.vclass.local and open the pop-out console.

a. Click on the CONSOLES tab to open a list of available consoles.

b. In the list of VMs, find the VM named SA-ESXi-02.

c. Click on SA-ESXi-02 to switch to the console for SA-ESXi-02.

5. In the DCUI window, press ALT+F1 to go to the vSphere ESXi Shell.

For Mac users, press fn+option+F1.

6. At the vSphere ESXi Shell login window, log in by entering root as the user name and
VMware1! as the password.
7. Enter the command to verify the status of the current distributed switch configuration.

esxcli network vswitch dvs vmware list | less


The command output shows that the vmnic0 and vmnic1 uplinks are not present.

8. Create a standard switch and add the uplinks to it.

a. Create a standard switch called recoveryswitch.

esxcli network vswitch standard add --vswitch-


name=recoveryswitch
b. Verify that recoveryswitch is created.

esxcli network vswitch standard list


c. Create a port group called recoveryportgroup on recoveryswitch.

esxcli network vswitch standard portgroup add -


p=recoveryportgroup -v=recoveryswitch
d. Verify that recoveryportgroup is added to recoveryswitch.

esxcli network vswitch standard list

28
e. Add the vmnic0 and vmnic1 uplinks to recoveryswitch.

esxcli network vswitch standard uplink add -u=vmnic0 -


v=recoveryswitch
esxcli network vswitch standard uplink add -u=vmnic1 -
v=recoveryswitch
9. Configure the vmk0 interface on recoveryportgroup.

a. Remove vmk0 from the pg-SA-Management port group on dvs-SA-Datacenter.

esxcli network ip interface remove -i=vmk0


b. Recreate vmk0 on recoveryportgroup.

esxcli network ip interface add -i=vmk0 -


p=recoveryportgroup
c. Set the original IP address on vmk0.

esxcli network ip interface ipv4 set -i=vmk0 -


I=172.20.10.52 -N=255.255.255.0 --type=static
d. Verify that the IP address is set correctly for vmk0.

esxcli network ip interface ipv4 get


e. Recreate the default route.

esxcli network ip route ipv4 add -g 172.20.10.10 -n


default
10. Restart all the services on sa-esxi-02.

services.sh restart
11. Verify that you can ping sa-esxi-02 again.

Hint: Try to ping sa-esxi-02 from sa-esxi-03.

In the vSphere Client inventory, sa-esxi-02.vclass.local appears as Connected.

a. If sa-esxi-02 does not appear as Connected, right-click the host, select Connection, and
click Connect.

12. Log out of DCUI and select the STUDENT-A-01 console from the VM list to return to the
student desktop.

a. Click on the CONSOLES tab to open a list of available consoles.

b. Click on STUDENT-A-01 to switch to the console for the student desktop.

29
Task 4: Enable the Network Rollback Option
You enable the network rollback option.

1. In the vSphere Client, select sa-vcsa-01.vclass.local in the navigation pane and click the
Configure tab.

2. Under Settings, select Advanced Settings and click EDIT SETTINGS.

3. Click the filter icon next to the Name column to search for parameters with the word
rollback.
4. Change config.vpxd.network.rollback to true and click SAVE.

Task 5: Migrate Management Network


You migrate the management network from recoveryswitch to dvs-SA-Datacenter.

1. In the vSphere Client, migrate the management network of sa-esxi-02.vclass.local from


recoveryswitch to dvs-SA-Datacenter.

Hint: Right-click dvs-SA-Datacenter and select Add and Manage Hosts.

After migrating the networking over to dvs-SA-Datacenter, sa-esxi-02 should remain


connected to the vCenter Server system, and the management network should be up and
running.

a. Assign vmnic0 to Uplink 1 and vmnic1 to Uplink 2.

b. Assign vmk0 to pg-SA-Management.

30
Lab 7 Applying the Troubleshooting
Methodology

Objective and Tasks


Follow the troubleshooting methodology to solve a networking problem:

1. Run a Break Script

2. Narrow the Scope of the Problem to a VM

3. Narrow the Scope of the Problem to the ESXi Host

4. Resolve the Problem

5. Verify the Solution

Task 1: Run a Break Script


You run a break script to damage networking in the lab environment.

1. Use the vSphere Client to power on the linux-a-01 virtual machine.

You must wait for the guest OS on the VM to initialize.

2. Find the VM's IP address.

3. Double-click the PowerCLI icon on the student desktop.

4. In the PowerCLI window, enter cd \Materials\Scripts\Mod4.

5. To run the break script, enter .\Break-ts-method.ps1 and wait for it to finish.

31
6. In the vSphere Client, open a remote console on the linux-a-01 VM.

An end-user support request is filed: The linux-a-01 VM cannot ping its default gateway,
172.20.11.10.

NOTE

In the lab environment, having multiple VM consoles open at the same time might degrade
performance. Never open more than one VM console at a time in the lab. This problem does
not occur in production systems.

7. Log in to linux-a-01 by entering root as the user name and VMware1! as the password.

8. Verify that the linux-a-01 VM cannot ping the default gateway 172.20.11.10.

The problem is now defined. You continue using the troubleshooting methodology by
narrowing the scope of this problem to identify its cause.

Task 2: Narrow the Scope of the Problem to a VM


Following the troubleshooting methodology, you narrow the scope of the problem to the linux-a-
01 VM. You determine whether the networking problem affects this VM only.

1. In the linux-a-01 VM console, enter the ifconfig -a command to verify the IP


configuration.

Q1. Is the host IP in the correct subnet?


A1. Because this host is on the Production network, the IP subnet should be in the 172.20.11.0/24 range. If the host IP is configured as a DHCP address and a network problem occurs, no IP address is assigned.

2. Enter the route -n command to confirm the default gateway address.

Q2. Does the host have the correct default router?


A2. The default router for the Production network should be 172.20.11.10. However, in a DHCP network configuration, no router is assigned if a network problem occurs.

3. In the vSphere Client, verify that the correct uplink (network) is connected to the VM.

Q3. Does the host have the correct network configured?


A3. The network should be configured as either the pg-SA-Production-01 or the pg-SA-Production-02 network.

4. In the vSphere Client, verify that the network link status is connected.

Q4. Does the host have a network link status of connected?


A4. Yes, the network status is connected.

The problem is not with this VM's configuration.

5. Power on a second VM (linux-a-02) to verify that the problem is not specific to linux-a-01.

6. Verify that the second VM is running on the same ESXi host as linux-a-01 and that it is
connected to the same network (pg-SA-Production-01 or pg-SA-Production-02).

Both the pg-SA-Production-01 and pg-SA-Production-02 network port groups are on the
same physical network.

32
7. Open a remote console on linux-a-02 and log in by entering root as the user name and
VMware1! as the password.
8. Repeat the same tests on linux-a-02.

• Is the host IP in the correct subnet?

• Does the host have the correct default router?

• Is the correct network configured on the host?

• Does the host have a connected network link status?

You determine that the problem is not VM-specific.

9. Close the linux-a-02 remote console.

Task 3: Narrow the Scope of the Problem to the ESXi Host


Following the troubleshooting methodology, you further narrow the scope of the problem. You
determine whether the networking problem affects a specific ESXi host.

1. In the vSphere Client, verify that a VMkernel adapter is not assigned on the ESXi host for the
Production network.

a. Select Hosts & Clusters > ESXi_host_name> Configure > VMkernel adapters.

• Is the host IP address in the correct subnet?

• Does the host have the correct default router?

2. Verify that the VLAN setting of any distributed switch is correct on vCenter Server.

a. In the vSphere Client, select Networking > dvs-SA-Datacenter > port_group_name >
Actions > Edit Settings > VLAN.

Does the network have the correct VLAN configuration?

The VLAN setting should match the physical network VLAN setting.

If a VLAN is assigned where it should not be, or if the VLAN setting is incorrect,
communications do not work.

33
3. Verify that the speed and duplex setting of any ESXi host is correct on the vCenter Server
system.

This setting must match the actual network hardware.

a. In the vSphere Client, select Hosts and Clusters > ESXi_host_name > Configure >
Networking > Physical adapters to verify the setting.

You can also run the command esxcli -s <ESXi_host_name> network


nic list.
Does the network have the correct speed and duplex configuration?

4. In the vSphere Client, verify that the correct uplink (network) is connected to the virtual
switch.

a. Select Networking > dvs-SA-Datacenter > Configure > Topology and select the
individual port groups that you want to verify.

If a virtual switch is connected to the wrong uplink on any ESXi host, the distributed
switch does not work or behaves erratically. All standard and distributed switches must
be connected to the same uplinks, and the uplinks must be correct for the physical
hardware.
In this case, the pg-SA-Production-01 port group should be connected to Uplink 5
(vmnic4) on both sa-esxi-01 and sa-esxi-02. The pg-SA-Production-02 port group
should be connected to Uplink 6 (vmnic5) on both sa-esxi-01 and sa-esxi-02.

Q1. Does the host have the correct uplink configured?


A1. Uplinks are not correct. The sa-esxi-01 host does not have uplinks configured on the pg-SA-Production-01 and pg-SA-Production-02 port groups.

Task 4: Resolve the Problem


You correct the configuration on the sa-ESXi-01 host and resolve the networking problem.

1. In the vSphere Client, select Networking.

2. In the navigation pane, right-click the dvs-SA-Datacenter distributed switch and select Add
and Manage Hosts.

3. Select Manage host networking and click Next.

4. Click the + Attached hosts icon.

5. Select sa-esxi-01.vclass.local and click OK.

You select only the sa-esxi-01 host, which is the ESXi host with the configuration problem.
When troubleshooting, the best approach is to change only what needs to be changed to
resolve the problem. Otherwise, production systems that do not require changes might be
impacted.

6. Click Next.

34
7. Select the vmnic4 physical adapter and click Assign uplink.

8. Select Uplink 5 and click OK.

9. Select the vmnic5 physical adapter and click Assign uplink.

10. Select Uplink 6 and click OK.

11. Click Next.

12. Click Next to skip the Manage VMkernel adapters page.

13. Click Next to skip the Migrate VM networking page.

14. Click Finish.

Wait for the update network configuration task to finish.

Task 5: Verify the Solution


You verify that network connectivity is restored to the linux-a-01 VM.

1. Reopen the linux-a-01 VM console.

2. Because this VM is configured with DHCP, enter the dhclient command to renew the IP
address configuration.

3. Enter the ifconfig -a command in the linux-a-01 VM console to verify the IP


configuration.

4. Enter the route -n command to verify the default gateway address.

5. Enter the ping 172.20.11.10 command.

The ping should be successful.

6. Press Ctrl+C to stop the ping.

7. Close the linux-a-01 VM console.

8. Reopen the linux-a-02 VM console.

Because you powered on a second VM (linux-a-02) to troubleshoot the problem, you must
repeat these steps on this VM.

9. Because this VM is configured with DHCP, enter the dhclient command to renew the IP
address configuration.

10. Enter the ifconfig -a command in the linux-a-02 VM console to verify the IP
configuration.

11. Enter the route -n command to verify the default gateway address.

35
12. Enter the ping 172.20.11.10 command.
The ping should be successful.

13. Press Ctrl+C to stop the ping.

14. Close the linux-a-02 VM console.

If any warnings or alerts remain in the vSphere Client, you should clear them before
continuing to the next lab. Sometimes, a refresh of the vSphere Client clears stale warnings
or alerts.

36
Lab 8 Troubleshooting Network
Problems

Objective and Tasks


Identify, diagnose, and resolve virtual networking problems:

1. Run a Break Script

2. Verify That the System Is Not Functioning Correctly

3. Troubleshoot and Resolve the Problem

4. Verify the Solution

37
Task 1: Run a Break Script
You run a break script to damage networking in the lab environment.

Several break scripts are provided to create network problems. Each script damages networking
in the lab environment in a different way. You can run the break scripts in any order. And you can
choose which problems to resolve.

IMPORTANT

The VMs that are impacted by each break script are listed in the Support Request table in
task 2. Before running a break script, verify that the impacted VM or VMs are powered on
with the guest operating systems online.

1. On the student desktop, double-click the PowerCLI icon.

2. In the PowerCLI window, enter cd \Materials\Scripts\Mod4.

3. Enter the name of a break script.

For example, you enter .\Break-6-1.ps1.

In the Difficulty column of the table, 1 signifies least difficult and 3 signifies most difficult to
resolve.

Break Script Difficulty

.\Break-6-1.ps1 1

.\Break-6-2.ps1 2

.\Break-6-4.ps1 2

.\Break-6-5.ps1 3

.\Break-6-8.ps1 3

.\Break-6-9.ps1 3

NOTE

After a break script completes, do not run another break script until after you complete
tasks 2 through 4 for each network problem. You must run the scripts one at a time.

4. Wait until the You are ready to start the lab message appears.

5. Leave the PowerCLI window open.

38
Task 2: Verify That the System Is Not Functioning Properly
You verify that networking is damaged in your lab environment.

1. Use the support request summary information to verify that you see the symptoms reported
for your break script and that your lab environment is not working.

NOTE

The first time you open a virtual machine console, you are prompted to click either Web
Console or VMware Remote Console. You must click Web Console.

Break Script Impacted Support Request


Virtual
Machines

.\Break-6-1.ps1 linux-a-01 The linux-a-01 VM cannot ping its default gateway,


172.20.11.10.

.\Break-6-2.ps1 linux-a-04, The linux-a-04, linux-a-05, linux-a-09, and linux-a-10


linux-a-05, VMs are no longer accessible over the network. You
linux-a-09, cannot open a remote console to them. Some
linux-a-10 datastores are also marked as inactive or inaccessible.

.\Break-6-4.ps1 linux-a-01, linux- Users on the linux-a-01 and linux-a-02 VMs cannot
a-02 communicate with each other. The IP address for
linux-a-01 is 172.20.11.200. The IP address for linux-a-
02 is 172.20.11.201.

.\Break-6-5.ps1 linux-a-01, linux- Users on the following VMs report total network
a-02, communication failures: linux-a-01, linux-a-02, linux-a-
linux-a-03, 03, and linux-a-04.
linux-a-04

.\Break-6-8.ps1 linux-a-11 A vSphere administrator attempted to use vSphere


vMotion to move linux-a-11 VM from sa-esxi-01 to sa-
esxi-02. The VM is connected to a standard virtual
switch. The migration wizard reports a compatibility
issue.

39
Break Script Impacted Support Request
Virtual
Machines

.\Break-6-9.ps1 linux-a-11, linux- A vSphere administrator migrates linux-a-11 VM from


a-12 sa-esxi-02 to sa-esxi-01. The VM is connected to a
standard virtual switch. The migration succeeds, but
the VM loses network connectivity.

To run this script:

1. Ensure that linux-a-11 and linux-a-12 are on host sa-


esxi-02.

2. Power on linux-a-11 and linux-a-12.

3. Open a web console on linux-a-11 and log in.

4. Start a ping to the IP address of linux-a-12 (usually


172.20.12.201).
You can verify this address from the linux-a-12 VM
Summary tab because VMware Tools is installed.

5. Run the break script.

IMPORTANT

Before you run Break-6-9.ps1, read the Support Request description.

Task 3: Troubleshoot and Resolve the Problem


You troubleshoot and resolve the problem with the network.

1. Use the available techniques and tools to troubleshoot and resolve the problem.

• Lab topology handout, which provides important information about the network,
storage, host, and VM configurations.

• Lecture manual for this course

• Virtual machine, vCenter Server, and ESXi host log files

• VMware knowledge base articles, available at http://kb.vmware.com

• Internet

2. Apply your resolution.

40
Task 4: Verify the Solution
You verify that the virtual network is functioning properly.

1. Reread the support request summary information in task 2.

2. Use the vSphere Client and VM web console, as needed, to verify that the problem is
resolved.

3. Leave the vSphere Client open until you complete all network troubleshooting problems.

4. After you verify that the problem is resolved, return to task 1 and run another break script.

41
42
Lab 9 Investigating Disk Issues on
ESXi

Objective and Tasks


Analyze and resolve disk space issues on an ESXi host:

1. Run a Break Script

2. Create a Virtual Machine

3. Troubleshoot the Problem

4. Resolve the Problem

5. Verify the Solution

NOTE

For useful information about troubleshooting ESXi storage problems, see VMware
knowledge base article 1003564 at https://kb.vmware.com/kb/1003564. Review this
reference before you start the lab and use the information, as needed, while performing the
lab tasks.

Task 1: Run a Break Script


You run a break script to introduce a disk problem on one of your ESXi hosts.

1. Use MTPuTTY to log in to sa-esxi-01.vclass.local.

An SSH connection starts, and you are automatically logged in to sa-esxi-01.vclass.local as


user root.

2. Change to the studentscripts directory.

cd /vmfs/volumes/sa-esxi-01-local/studentscripts

43
3. List the contents of the studentscripts directory.
ls /vmfs/volumes/sa-esxi-01-local/studentscripts
4. Enter ./script1.sh to run the script.

The script runs for a few seconds and returns to a command prompt.

5. Leave the SSH session open.

Task 2: Create a Virtual Machine


You try to create a VM but the task fails.

1. In the Firefox bookmarks toolbar, select the vSphere Client (SA-VCSA-01) bookmark in the
vSphere Site-A folder.

2. At the login window, enter [email protected] as the user name and


VMware1! as the password.
3. Create a VM on sa-esxi-01.vclass.local.

Parameter Value

Name Win-1

Datastore Shared3

Compatibility ESXi 7.0 and later

Guest OS Windows Server 2012 (64-bit)

Disk 5 GB (Thick Provision Lazy Zeroed)

Network Leave the default.

NOTE

The VM creation task fails. If you do not see an error message in the Recent Tasks pane,
click the Refresh icon at the top of the window.

44
Task 3: Troubleshoot the Problem
You analyze diagnostic messages and log files to identify the root cause of the failed task.

1. In the vSphere Client, review the error messages that appear in the Recent Tasks pane.

You must determine whether the cause of the problem is the VM, ESXi host, or storage.

2. In the vSphere Client, select the ESXi host's Monitor tab and review the Tasks and Events
list.

3. Identify tasks and events that provide insight into the cause of the problem.

4. Use the MTPuTTY session to sa-esxi-01.vclass.local to view the


/var/log/vmkernel.log file and identify the log entries, if any, that provide hints
about the cause of the problem.

NOTE

Focus on log entries that have a time stamp close to the time that the error occurred.

5. View the /var/log/hostd.log file and identify the log entries, if any, that provide hints
about the cause of the problem.

6. Using the information that you found in the vSphere Client and the log files, identify the root
cause of the problem.

7. List all the possible ways to resolve the problem.

Task 4: Resolve the Problem


You resolve the problem by analyzing possible solutions.

1. Analyze each possible resolution and its impact, if any, on the vSphere environment.

2. Apply the appropriate resolution.

45
Task 5: Verify the Solution
You verify that the VM creation task is successful.

1. Create a VM on sa-esxi-01.vclass.local.

Parameter Value

Name Win-1

Datastore Shared3

Compatibility ESXi 7.0 and later

Guest OS Windows Server 2012 (64-bit)

Disk 5 GB (Thick Provision Lazy Zeroed)

Network Leave the default.

2. Verify that the VM is successfully created.

3. Verify that the VM powers on successfully.

46
Lab 10 Troubleshooting Storage
Performance Issues

Objective and Tasks


Use the esxtop utility to analyze storage performance issues that affect HBAs, LUNs, and VMs:

1. Generate VM Disk Activity

2. Review esxtop Disk Statistics

3. Monitor Performance by Storage Adapter

4. Monitor Performance by Storage Device

5. Monitor Storage Performance by VM

NOTE

For useful information about using the esxtop utility, see the following references. Review
these references before you start the lab and use the information, as needed, while
performing the lab tasks.

Reference Link

Using esxtop to identify storage performance https://kb.vmware.com/kb/1008205


issues for ESX/ESXi (1008205)

Interpreting esxtop Statistics https://communities.vmware.com/docs/DOC-


9279

Identifying disks when working with VMware https://kb.vmware.com/kb/1014953


ESXi/ESX (1014953)

47
Task 1: Generate VM Disk Activity
You power on the Win-4, Win-5, and Win-6 VMs to generate disk activity.

After logging in to these VMs, a script runs on each VM to generate disk activity.

1. If you are logged out of the vSphere Client, log in again.

2. Power on the Win-4, Win-5, and Win-6 VMs.

3. Open a web console to each of the Windows VMs.

You are automatically logged in as administrator.

Wait for a few minutes for the scripts to start.

NOTE

Because the lab environment contains a small number of VMs, you cannot generate enough
load (IOPS) in the environment.

Task 2: Start esxtop Utility and Review Disk Statistics


You start the esxtop utility and review disk statistics to familiarize yourself with what the statistics
mean and how to use them.

1. In MTPuTTY, open an SSH session to sa-esxi-03.vclass.local.

2. Enter esxtop.

By default, you are presented with CPU statistics.

3. Find out what the disk statistics mean and how they are useful in troubleshooting
performance issues.

a. Read VMware knowledge base article 1008205 at


https://kb.vmware.com/s/article/1008205.

48
Task 3: Monitor Performance by Storage Adapter
You view storage adapter (HBA) statistics on sa-esxi-03.vclass.local to determine which adapter
experiences the highest disk activity.

1. In the esxtop display, enter d to view the disk adapter information.

The output should be similar to the example.

2. Enter f to display the Fields menu.

The asterisks next to the A, B, C, E, and G fields signify that statistics in these fields are
shown in the disk statistics display. These fields act like a toggle. If you enter a, the A fields
are shown in the display. You can turn on (and off) any of the fields by toggling the letter.
3. Verify that only the A, C, D, E, and G fields are selected (an asterisk should appear next to
the letter).

49
4. Press any key, such as Return, to get back to the disk statistics display.

The table shows the statistics that you see when you select the D, E, and G fields.

Field Letter Statistic Names

D: Queue Stats AQLEN

E: I/O Stats CMDS/s, READS/s, WRITES/s, MBREAD/s, and


MBWRTN/s

G: Overall Latency Stats (ms) DAVG/cmd, KAVG/cmd, GAVG/cmd, and QAVG/cmd

5. Interpret the statistics that are shown in the storage adapter display.

Q1. Which HBA might be the cause of slow storage performance?


A1. vmhba65, because this HBA shows high IOPS.

Q2. What condition is degrading storage performance?


A2. A high number of read commands are being issued from vmhba65.

Task 4: Monitor Performance by Storage Device


You view storage device (LUN) activity on sa-esxi-03.vclass.local.

1. In the esxtop utility, enter u to view information about the storage devices (LUNs).

You should see a similar output to this example.

2. Enter f to display the Fields menu.

3. Verify that only the A, F, G, and I fields are selected.

The table shows the statistics that you see when you select the F, G, and I fields.

Field Letter Statistic Names

F: Queue Stats DQLEN, ACTV, QUED, %USD, LOAD

G: I/O Stats CMDS/s, READS/s, WRITES/s, MBREAD/s, and


MBWRTN/s

I: Overall Latency Stats (ms) DAVG/cmd, KAVG/cmd, GAVG/cmd, and QAVG/cmd

50
4. Interpret the statistics that are shown in the storage device display.

Q1. Which storage device seems to be affected?


A1. The device with the storage identifier naa.60003ff44dc75adcaf760d6a0ac8e3fe

5. In MTPuTTY, open a second SSH session to sa-esxi-03.vclass.local.

6. Enter the command to view the datastore name of the affected storage device.
esxcli storage vmfs extent list
Q2. What is the datastore name of the affected storage device?
A2. Shared3

Task 5: Monitor Storage Performance by VM


You use the VM disk view in esxtop to monitor the disk activity on sa-esxi-03.vclass.local.

1. In the esxtop display, enter v to view information about the VM disk activity.

You should see a similar output to the example.

2. In the Fields menu, verify that only the B, C, D, E, I, J, and K fields are selected.

The table shows the statistics that you see when you select the I, J, and K fields.

Field Letter Statistic Names

I: I/O Stats CMDS/s, READS/s, WRITES/s, MBREAD/s, and


MBWRTN/s

J: Read Latency Stats (ms) LAT/rd

K: Write Latency Stats (ms) LAT/wr

3. Interpret the statistics that are shown in the VM disk display.

Q1. Which VM or VMs might be contributing to slow storage performance?


A1. Win-4, Win-5, and Win-6 are running several read commands per second. However, these VMs do not seem to be causing a significant amount of latency because the load is still less.

Q2. What possible solutions can help you get better performance?
A2.
Also, check
Enable
Migrate one the
Storage DAVG value
I/O VMs
or two Control or latency
and the values
set datastore.
to another for5 the
value to ms. VM. Add another VMkernel port and vmnic for software iSCSI multipathing and set the multipathing policy to Round Robin.

4. In the vSphere Client, shut down Win-4, Win-5, and Win-6 and close each VM's console
window.

51
52
Lab 11 Troubleshooting VM Power-On
Problems

Objective and Tasks


Analyze and resolve a VM problem that prevents you from powering on the VM:

1. Create and Power On the VM

2. Troubleshoot Problems or Errors

3. Resolve the Problem

4. Verify the Solution

NOTE

For useful information about troubleshooting VM power-on failures, see VMware knowledge
base article 1014501 at https://kb.vmware.com/kb/1014501. Review this reference before
you start the lab and use the information, as needed, while performing the lab tasks.

53
Task 1: Create and Power On the VM
You create a VM and attempt to power on this VM to determine the power-on problem.

1. In the Firefox bookmarks toolbar, click the vSphere Client (SA-VCSA-01) bookmark in the
vSphere Site-A folder.

2. At the login window, enter [email protected] as the user name and


VMware1! as the password.
3. Create a VM called linux-a-14.

If you cannot create this VM, shut down all the other VMs stored on Shared storage.

a. Configure options for name, location, host, storage, compatibility, and guest OS with the
values in the table.

Option Value

Name linux-a-14

Location SA-Datacenter

Host sa-esxi-03.vclass.local

Storage Shared

Compatibility ESXi 7.0 and later

Guest OS Family Linux

Guest OS Version VMware Photon OS (64-Bit)

b. Configure hardware options.

Option Value

Memory 8 GB

Hard disk 2 GB

c. Leave the default values for the remaining hardware options.

4. Power on the linux-a-14 VM.

The VM fails to power on.

54
Task 2: Troubleshoot Problems or Errors
You view and analyze the warning and error messages that are generated in the vSphere Client
and log files as a result of the VM's failure to power on.

1. In the vSphere Client, find information that helps you determine the cause of the linux-a-14
VM's failure to power on.

• What error messages are displayed?

• Are any alarms triggered?

• What tasks are initiated?

• What events occurred while the tasks were running?

2. Determine potential causes of the problem drawing on your observations.

a. Record your initial ideas about what might be causing the problem.

b. Record other potential causes of the problem, if any.

c. Determine how to verify your initial assumption of what might be causing the problem.

3. Verify whether your initial assumption is valid by viewing log files to find relevant information.

a. Find the log files in the /var/log directory that contain information related to the
linux-a-14 VM.

b. Focusing on the files that contain information about linux-a-14, examine each of these log
files to identify data that is related to linux-a-14.

NOTE

Even if the VM name is mentioned in a log file, the information in that file might not be
helpful when troubleshooting.

If you are new to troubleshooting, you might find it worthwhile to investigate all log files
to familiarize yourself with the types of information that each log provides.

As you gain more experience with troubleshooting, you can go directly to the most
useful log files.

c. Identify the log files in /var/log that contain information that is useful in determining
the problem's root cause.

4. Identify the root cause of the problem.

55
Task 3: Resolve the Problem
You identify potential resolutions to the problem and apply the most appropriate resolution
based on your analysis.

1. Identify ways to resolve the problem and describe any negative impacts of these resolutions.

a. If the problem can be resolved in more than one way, list the potential resolutions and
explain how each resolution works.

b. If any of these resolutions might have a negative impact on the environment, describe
the possible negative impact of each.

2. Choose a resolution to implement.

NOTE

For purposes of this lab, do not choose resolutions that involve increasing the size of the
datastore.

3. Apply the resolution that you selected.

Task 4: Verify the Solution


You run vSphere commands to verify that the problem is resolved and that the linux-a-14 VM
powers on successfully.

You do not use the vSphere Client to perform this task.

1. In the command line, identify the VM ID of the linux-a-14 VM.

2. Verify the power state of the VM.

3. Power on the VM.

4. Verify that the linux-a-14 VM powers on successfully.

5. After the linux-a-14 powers on successfully, power off the VM and delete it.

56
Lab 12 Troubleshooting VM Snapshot
Problems

Objective and Tasks


Analyze and resolve a VM snapshot problem that prevents you from powering on a VM:

1. Power On the VM

2. Troubleshoot Problems or Errors

3. Resolve the Problem

4. Verify the Solution

NOTE

For information about troubleshooting VM snapshot problems, see the following references.
Review these references before you start the lab and use the information, as needed, while
performing the lab tasks.

Reference Link

"Cannot open the disk" errors powering on a VM https://kb.vmware.com/kb/1004232


(1004232)

"The parent virtual disk has been modified since the child https://kb.vmware.com/kb/1007969
was created" error (1007969)

57
Task 1: Power On the VM
In the vSphere Client, you attempt to power on a VM called linux-a-13.

1. If you are logged out of the vSphere Client, log in again.

2. Locate a VM called linux-a-13 in the inventory and power it on.

The VM should fail to power on.

Task 2: Troubleshoot Problems or Errors


You view and analyze the warning and error messages that are generated in the vSphere Client
and log files as a result of the VM power-on failure.

1. In the vSphere Client, find information that might help you to identify the cause of the linux-a-
13 VM's failure to power on.

• What error messages are displayed?

• Are any alarms triggered?

• What tasks are initiated?

• What events occurred while the tasks were running?

2. Identify potential causes of the problem drawing on your observations.

a. Record your initial thoughts about what might be causing the problem.

b. Record other potential causes of the problem, if any.

c. Determine how to verify your initial assumption of what might be causing the problem.

58
3. Verify whether your initial assumption is valid by viewing log files to find relevant information.

a. Determine which log files in the /var/log directory contain information related to the
linux-a-13 VM.

b. Focusing on the files that contain information about linux-a-13, examine each of these log
files to identify data that is related to linux-a-13.

NOTE

Even if the VM name is mentioned in a log file, the information in that file might not be
helpful when troubleshooting.

If you are new to troubleshooting, you might find it worthwhile to investigate each of
these log files to identify the types of information about the failure that the logs provide.

As you gain more experience with troubleshooting, you can go directly to the log files
that are the most useful.

c. Identify the log files in /var/log that contain information that is useful in determining
the problem's root cause.

4. Identify the root cause of the problem.

Task 3: Resolve the Problem


You identify potential resolutions to the problem and apply the most appropriate resolution
based on your analysis.

1. Identify ways to resolve the problem and describe any negative impacts of these resolutions.

a. If the problem can be resolved in more than one way, list the potential resolutions and
explain how each resolution works.

b. If any of these resolutions might have a negative impact on the environment, describe
the possible negative impact of each.

2. Choose a resolution to implement.

3. Using the command line, apply your resolution.

NOTE

Although the vSphere Client might be easier for you to use, practice resolving the problem
using the appropriate vSphere commands.

59
Task 4: Verify the Solution
You run vSphere commands to verify that the problem is resolved and that the linux-a-13 VM
powers on successfully.

You do not use the vSphere Client to perform this task.

1. Identify the VM ID of the linux-a-13 VM.

2. Verify the power state of the VM.

3. Power on the VM.

a. If the task seems to hang, use the vSphere Client to determine whether a question is
pending on the VM.

4. Verify that the linux-a-13 VM powers on successfully.

60
Lab 13 Working with VM Snapshots
Using the Command Line

Objective and Tasks


Create, monitor, and manage snapshots from the command line:

1. Create and Power On a VM

2. Create Snapshots and Monitor Their Creation

3. Monitor Snapshot Deletion

NOTE

For useful information about using the command line to create and delete VM snapshots and
to monitor these tasks, see the following references. Review these references before you
start the lab and use the information, as needed, while performing the lab tasks.

Reference Link

Consolidating/Committing snapshots in https://kb.vmware.com/kb/1002310


ESXi (1002310)

Snapshot removal task stops at 99% in https://kb.vmware.com/kb/1007566


ESXi (1007566)

How to monitor snapshot deletion https://kb.vmware.com/kb/2146185


using the vim-cmd command (2146185)

Quick Tutorial for vim-cmd commands https://communities.vmware.com/docs/DOC-31025

Man page for the watch command https://www.man7.org/linux/man-


pages/man1/watch.1.html

61
Task 1: Power On a VM
You use the vSphere Client to power on the Win-2 VM.

1. If you are logged out of the vSphere Client, log in again.

2. Locate and power on Win-2 VM.

The VM should power on successfully.

Task 2: Create Snapshots and Monitor Their Creation


You use the command line to create five snapshots for the Win-2 VM. As you create each
snapshot, you monitor changes to the list of data disks in the VM's home directory.

NOTE

Use concurrent SSH sessions to run the commands.

1. Enter the vim-cmd command to create five VM snapshots, where each snapshot includes
the VM's memory.

2. Enter the watch command to monitor changes to the VM’s home directory.

3. View the watch command output, as each snapshot is created.

Each new snapshot file should appear in the VM's home directory.

4. Enter the vim-cmd command to view details about the snapshot creation task.

5. Monitor the creation of the five snapshots.

a. Run the watch command to monitor the snapshot creation task.

b. Run the vim-cmd command to view details about the snapshot creation task.

Task 3: Monitor Snapshot Deletion


You delete the Win-2 VM snapshots and monitor changes to the list of data disks in the VM's
home directory.

1. Delete all the VM snapshots and monitor their deletion by running the watch command.

a. Enter the watch command to monitor the VM’s home directory.

b. Enter the vim-cmd command to view details of the snapshot deletion task.

62
Lab 14 Troubleshooting Storage
Problems

Objective and Tasks


Identify, diagnose, and resolve virtual storage problems:

1. Run a Break Script

2. Verify That the System Is Not Functioning Properly

3. Troubleshoot and Resolve the Problem

4. Verify the Solution

Task 1: Run a Break Script


You use PowerCLI to run a break script to damage storage in the lab environment.

Several break scripts are provided to create storage problems. Each script damages storage in
the lab environment in a different way. You can run the break scripts in any order. And you can
choose which problems to resolve.

1. Double-click the PowerCLI icon on the student desktop system to start a PowerCLI session.

2. In the PowerCLI window, enter cd \Materials\Scripts\Mod5.

63
3. Enter the name of a break script.

For example, you enter .\Break-7-1.ps1 to run the first break script.
In the Difficulty column of the table, 1 signifies least difficult and 3 signifies most difficult to
resolve.

Break Script Difficulty

.\Break-7-1.ps1 1

.\Break-7-2.ps1 2

.\Break-7-3.ps1 3

.\Break-7-4.ps1 3

.\Break-7-5.ps1 3

.\Break-7-6.ps1 3

.\Break-7-7.ps1 2

.\Break-7-8.ps1 2

.\Break-7-9.ps1 2

.\Break-7-10.ps1 3

IMPORTANT

After the break script completes, do not run another break script until you complete tasks 2
through 4 for each storage problem. You must run the scripts one at a time.

4. Wait for the You are ready to start the lab message to appear.

5. Leave the PowerCLI window open for the next problem and go to task 2.

64
Task 2: Verify That the System Is Not Functioning Properly
You verify that storage is damaged in your lab environment.

1. Use the support request summary information to verify that you see the symptoms reported
for your break script and that your lab environment is not working.

Break Script Support Request

.\Break-7-1.ps1 A vSphere administrator cannot create any VMs on the NFS datastore.
The administrator also cannot migrate any existing VMs to the NFS
datastore.

.\Break-7-2.ps1 A vSphere administrator cannot establish a console connection to any VM


on the Shared or Shared2 datastore. In the inventory pane, all VMs stored
on the Shared or Shared2 datastore are marked as inaccessible.

.\Break-7-3.ps1 A vSphere administrator cannot establish a console connection to any VM


hosted on sa-esxi-02.vclass.local, which is stored on the Shared datastore.
NOTE: Before troubleshooting, run the script and wait for sa-
esxi02.vclass.local to finish rebooting.

.\Break-7-4.ps1 Storage paths have disappeared from one ESXi host iSCSI storage
adapter. The vSphere administrator did not specify which ESXi host or
storage had the problem. You might need to wait for 10-15 minutes, after
executing the script, for the problem to show up.

.\Break-7-5.ps1 A vSphere administrator cannot establish a console connection to any VM


that is stored on the Shared datastore.

.\Break-7-6.ps1 End users report extremely poor performance on several VMs. All VMs
that were reported are stored on the Shared datastore.

.\Break-7-7.ps1 A vSphere administrator reports that storage performance is very slow on


the Shared datastore. The vSphere administrator did not specify which
ESXi host had the problem.

.\Break-7-8.ps1 A vSphere administrator reports that storage performance is very slow on


the Shared datastore. The vSphere administrator did not specify which
ESXi host had the problem.

.\Break-7-9.ps1 A vSphere administrator cannot establish a console connection to any VM


that is stored on the Shared datastore. The Shared datastore is also
marked as inactive.

65
Break Script Support Request

.\Break-7-10.ps1 A vSphere administrator reports that some VMs stored on the Shared
datastore are now marked inaccessible. The vSphere administrator did not
specify which ESXi host or VMs had the problem.

2. After verifying that the system is not functioning, go to task 3.

Task 3: Troubleshoot and Resolve the Problem


You troubleshoot and repair the problem with storage.

1. Use the available techniques and tools to troubleshoot and repair the problem.

• Lab topology handout, which contains important information about the network,
storage, host, and VM configurations

• Lecture manual for this course

• VM, vCenter Server, and ESXi host log files

• vRealize Log Insight

• VMware knowledge base articles available at http://kb.vmware.com

• Internet

2. After applying your resolution, go to task 4.

Task 4: Verify the Solution


You verify that all storage systems are repaired.

1. Reread the support request summary information in task 2.

2. Use the vSphere Client and remote consoles, as needed, to verify that the problem is
resolved.

3. Leave the vSphere Client open until you complete all storage troubleshooting problems.

4. Return to task 1 and run another break script.

66
Lab 15 Troubleshooting Cluster
Problems

Objective and Tasks


Identify, diagnose, and resolve cluster problems:

1. Create a Cluster and Power off VMs

2. Run the Break Script break-8-1.ps1

3. Run a Break Script

4. Verify That the System Is Not Functioning Properly

5. Troubleshoot and Resolve the Problem

6. Verify the Solution

Task 1: Create a Cluster and Power Off VMs


You create a cluster in the lab environment.

This cluster is used by the break scripts. Without this cluster, the break scripts fail to complete.

1. Create a cluster called Lab Cluster and leave all features disabled.

2. Move sa-esxi-01.vclass.local and sa-esxi-02.vclass.local into the cluster.

3. Power off all VMs in the inventory before running any break scripts.

67
Task 2: Run the Break Script Break-8-1.ps1
You use PowerCLI to run the script called Break-8-1.ps1, which configures the cluster and impacts
Ramdisk use in the lab environment.

1. Double-click the PowerCLI icon on the student desktop system to start a PowerCLI session.

2. In the PowerCLI window, enter cd \Materials\Scripts\Mod6.

3. Enter the name of the break script.

.\Break-8-1.ps1
After the script completes, do not run another break script until you complete tasks 4
through 6.

IMPORTANT

You must start with the first break script, Break-8-1.ps1. After you run the first break script
and solve that problem, you can run the remaining break scripts in any order.
You need to run the .\Break-8-1.ps1 script only once.

4. Wait for the You are ready to start the lab message to appear.

5. Leave the PowerCLI window open for the next problem, skip task 3, and go to task 4.

Task 3: Run a Break Script


You use PowerCLI to run a break script to damage the lab environment in some way.

Several break scripts are provided to create cluster problems. Each script damages the cluster
configuration in the lab environment in a different way. You can run these break scripts in any
order, and you can choose which problems to resolve.

1. Double-click the PowerCLI icon on the student desktop system to start a PowerCLI session.

2. In the PowerCLI window, enter cd \Materials\Scripts\Mod6.

68
3. Enter the name of a break script.

For example, you enter .\Break-8-4.ps1 to run the first optional break script.

In the Difficulty column of the table, 1 signifies least difficult and 3 signifies most difficult to
resolve.

Break Script Difficulty

.\Break-8-4.ps1 2

.\Break-8-6.ps1 2

.\Break-8-8.ps1 2
NOTE: Before running this script, move sa-esxi-03.vclass.local
into the cluster that you created (Lab Cluster).

.\Break-8-10.ps1 2

IMPORTANT

You can run the remaining break scripts in any order. After the break script completes, do
not run another break script until you complete tasks 4 through 6 for each cluster problem.
You must run the scripts one at a time.

4. Wait for the You are ready to start the lab message to appear.

5. Leave the PowerCLI window open for the next problem and go to task 4.

69
Task 4: Verify That the System Is Not Functioning Properly
You verify that the cluster configuration is damaged in your lab environment.

1. Use the support request summary information to verify that you see the symptoms reported
for your break script and that your lab environment is not working.

Break Script Support Request

.\Break-8-1.ps1 A vSphere administrator reports that one of the hosts in the


inventory is experiencing issues because of RAM disk exhaustion.
The administrator did not state which host was experiencing the
problem or which Ramdisk was full.

.\Break-8-4.ps1 A vSphere administrator reports that several issues related to


vSphere HA appear in the Issues pane of the vSphere Client.

.\Break-8-6.ps1 NOTE: Before running this script, move sa-esxi-03.vclass.local into


the cluster that you created (Lab Cluster).
A vSphere administrator cannot power on any VMs in the Test or
Production resource pools.

.\Break-8-8.ps1 A vSphere administrator reports that CPU use is not balanced across
hosts in the cluster.

.\Break-8-10.ps1 A vSphere administrator reports that CPU use is not balanced across
hosts in the cluster.

2. After you verify that the system is not working, go to task 5.

Task 5: Troubleshoot and Resolve the Problem


You troubleshoot and repair the problem with your configuration.

1. Use the available techniques and tools to troubleshoot and repair the problem.

• Lab topology handout, which contains important information about the network,
storage, host, and VM configurations

• Lecture manual for this course

• VM, vCenter Server, and ESXi host log files

• VMware knowledge base articles, available at http://kb.vmware.com

• Internet

2. After applying your resolution, go to task 6.

70
Task 6: Verify the Solution
You verify that the cluster is repaired.

1. Reread the support request summary information in task 4.

2. Use the vSphere Client and remote consoles, as needed, to verify that the problem is
resolved.

3. Leave the vSphere Client open until you complete all cluster troubleshooting problems.

4. Return to task 3 and run another break script.

71
72
Lab 16 Resolving VM Power-On
Problems

Objective and Tasks


Troubleshoot a VM that fails to power on:

1. Run a Break Script

2. Troubleshoot the Problem

3. Resolve the Problem

4. Verify the Solution

NOTE

For useful information about troubleshooting a VM that fails to power on, and if you need
help while performing the tasks in this lab, see VMware knowledge base article 2001005 at
https://kb.vmware.com/kb/2001005.

Although this knowledge base article covers many possible similar issues and solutions, the
solution that you require might not be listed.

Search for other articles that are specific to the error message that you receive when you try
to power on the VM and it fails. It is important to understand the various factors and errors
that can cause a VM to fail when powered on.

73
Task 1: Run a Break Script
You run a break script on the ESXi host on which Win-4 is located.

1. In the vSphere Client, verify that the Win-4 VM is on sa-esxi-01.vclass.local.

a. If necessary, migrate Win-4 to sa-esxi-01.vclass.local.

2. Use MTPuTTY to log in to sa-esxi-01.vclass.local.

3. Determine the datastore on which Win-4 is located.


vim-cmd vmsvc/getallvms
a. Record the VMID on which Win-4 is located. __________

b. Record the datastore on which Win-4 is located. __________

4. Change to the studentscripts, located in the sa-esxi-01-local datastore.

cd /vmfs/volumes/sa-esxi-01-local/studentscripts
5. Run the script3.sh script, where <datastore name> is the datastore on which
Win-4 is located.

./script3.sh /vmfs/volumes/<datastore name>/Win-4/Win-4.vmx


The script output looks like this example.

Script Running......
Powering off VM:

Script Complete: Power on the VM from vCenter


If the script returns with a message stating that the power off failed, Win-4 is already
powered off and you can ignore the message.

6. Using the vSphere Client, power on Win-4.

Win-4 should fail to power on.

74
Task 2: Troubleshoot the Problem
You view and analyze error messages that occurred when the Win-4 VM failed to power on. You
view information in the vSphere Client and files on the sa-esxi-01.vclass.local host to determine
the root cause.

1. Find information in the vSphere Client that might give you hints about why the Win-4 VM did
not power on.

• What error messages are displayed?

• Are any alarms triggered?

• What tasks are initiated?

• What events are displayed?

2. Based on your observations, determine the potential causes of the problem.

• What are your initial thoughts as to what is causing the problem?

• Are there any other potential causes? If so, what are they?

• How can you verify your initial assumption of what is causing the problem?

3. Identify log files that might contain information about Win-4's failure to power on and view
them to find relevant information.

4. Identify the root cause of the problem.

Task 3: Resolve the Problem


Drawing on your problem analysis, you apply the resolution that is the most appropriate for the
problem.

1. List the ways to resolve the problem.

• Is there more than one way to resolve the problem? If so, list the potential resolutions
and explain why each resolution might work.

• Do any of these resolutions have a negative impact? If so, which ones and why?

2. Choose a resolution to implement.

3. Using the command line, apply the resolution that you selected.

75
Task 4: Verify the Solution
You use vSphere commands, instead of the vSphere Client, to verify that the problem is resolved
and that the Win-4 VM powers on successfully.

1. Use MTPuTTY to establish an SSH session with the ESXi host on which the Win-4 VM is
located.

2. Using the command line, power on Win-4.

3. Using the command line, verify that the Win-4 VM powers on successfully.

4. Verify that Win-4 also shows as powered on in the vSphere Client.

76
Lab 17 Troubleshooting VM Problems

Objective and Tasks


Identify, diagnose, and resolve VM problems:

1. Run a Break Script

2. Verify That the System Is Not Functioning Properly

3. Troubleshoot and Resolve the Problem

4. Verify the Solution

77
Task 1: Run a Break Script
You use PowerCLI to run a break script to damage VMs in the lab environment.

Several break scripts are provided to create VM problems. Each script damages VMs in the lab
environment in a different way. You can run the break scripts in any order. And you can choose
which problems to resolve.

1. Double-click the PowerCLI icon on the student desktop system to start a PowerCLI session.

2. In the PowerCLI window, enter cd \Materials\Scripts\Mod7.

3. Enter the name of a break script.

For example, you enter .\Break-9-1.ps1 to run the first break script.

In the Difficulty column of the table, 1 signifies least difficult and 3 signifies most difficult to
resolve.

Break Script Difficulty

.\Break-9-1.ps1 2

.\Break-9-2.ps1 2

.\Break-9-6.ps1 2

.\Break-9-7.ps1 2

.\Break-9-8.ps1 3

.\Break-9-9.ps1 3

NOTE

After the break script completes, do not run another break script until you complete tasks 2
through 4 for each VM problem. You must run the scripts one at a time.

4. Wait for the You are ready to start the lab message to appear.

5. Leave the PowerCLI window open for the next problem and go to task 2.

78
Task 2: Verify That the System Is Not Functioning Properly
You verify that VMs are damaged in your lab environment.

1. Use the support request summary information to verify that you see the symptoms reported
for your break script and that your lab environment is not working.

Break Script Support Request

.\Break-9-1.ps1 NOTE: If this script throws an error on first invocation, running a


second time should resolve the issue.
An end user cannot power on the linux-a-06 VM.

.\Break-9-2.ps1 A vSphere administrator reports that the linux-a-02 VM is missing


from inventory.

.\Break-9-6.ps1 A vSphere administrator cannot mount the VMware Tools ISO on


the linux-a-08 VM.

.\Break-9-7.ps1 An end user cannot power on a VM. The user did not report which
VM failed to power on.

.\Break-9-8.ps1 NOTE: If this script throws an error on first invocation, running a


second time should resolve the issue.
A vSphere administrator cannot mount the VMware Tools ISO into
any VM.

.\Break-9-9.ps1 An end user cannot power on the linux-a-03 VM.

2. After verifying that the system is not functioning properly, go to task 3.

79
Task 3: Troubleshoot and Resolve the Problem
You troubleshoot and resolve the problem with the VMs, drawing on relevant techniques and
tools.

1. Use the available techniques and tools to troubleshoot and resolve the problem.

• Lab topology handout, which contains important information about the network,
storage, host, and VM configurations

• Lecture manual for this course

• VM, vCenter Server, and ESXi host log files

• VMware knowledge base articles, available at http://kb.vmware.com

• Internet

2. Apply your resolution and go to task 4.

Task 4: Verify the Solution


You verify that the VM problem is resolved.

1. Reread the support request summary information in task 2.

2. Use the vSphere Client and remote consoles, as needed, to verify that the problem is
resolved.

3. Leave the vSphere Client open until you complete all the VM troubleshooting problems.

4. Return to task 1 and run another break script.

80
Lab 18 Restarting ESXi Management
Agents

Objective and Tasks


Restart the ESXi services using the DCUI and the command line:

1. Restart Management Agents Using the DCUI

2. Restart Management Agents from the Command Line

NOTE

For useful information about restarting the ESXi management agents, see VMware knowledge
base article 1003490 at https://kb.vmware.com/kb/1003490. Review this reference before you
start the lab and use the information, as needed, while performing the lab tasks.

Task 1: Restart Management Agents Using the DCUI


You restart the management agents from the DCUI.

For troubleshooting purposes, it might be necessary to restart the management agents on your
ESXi host.

1. Log in to the DCUI for sa-esxi-01.vclass.local and open the pop-out console.

a. Click on the CONSOLES tab to open a list of available consoles.

b. In the list of VMs, find the VM named SA-ESXi-01.

c. Click SA-ESXi-01 to switch to the console for SA-ESXi-01.

d. Press F2 and log in to the DCUI with user name root and password VMware1!

81
2. Restart the management agents on sa-esxi-01.
a. Use the down arrow key to select Troubleshooting Options.
b. From the Troubleshooting Mode Options menu, select Restart Management Agents.
The warning states that restarting the management agents disconnects all the remote
management software. You temporarily lose any SSH session opened to the ESXi host.
While the management services are restarting, you cannot access the ESXi host directly
from the vSphere Client. The ESXi host shows up as disconnected from the vCenter
Server system.
c. To proceed with restarting the management agents, press F11.
d. After the agents restart, press Enter.
e. Press ESC twice to log out of the DCUI.
f. Click on the CONSOLES tab to open a list of available consoles.
g. Click STUDENT-A-01 to switch to the console for the student desktop.

Task 2: Restart Management Agents from the Command Line


You restart the management agents from the command line.

Management agents can be restarted from the local console or an SSH session.
1. Use MTPuTTY to log in to sa-esxi-01.vclass.local.
2. Enter the command to restart the management agents.
services.sh restart
Progress is output to the terminal and written to the /var/log/jumpstart-
stdout.log file.

NOTE

The services.sh restart command restarts all the services on the ESXi host. This
command must be used with care because it can cause downtime in a production
environment.

3. Enter the command to restart the hostd management agent.

/etc/init.d/hostd restart
Instead of restarting all management agents at the same time, you can restart an individual
agent, such as hostd or vpxa.

4. Verify that the hostd agent restarted successfully by viewing hostd.log.

Hint: Search for BEGIN SERVICES in /var/log/hostd.log.

82
Lab 19 Troubleshooting ESXi Host
Disconnection Problems

Objective and Tasks


Troubleshoot an ESXi host disconnection problem and recover the ESXi host without causing any
VM downtime:

1. Run a Break Script

2. Troubleshoot the Problem

3. Resolve the Problem

4. Verify the Solution

NOTE

For useful information about troubleshooting an ESXi host in a nonresponding state, see
VMware knowledge base article 1003409 at https://kb.vmware.com/kb/1003409.

Task 1: Run a Break Script


You run a break script to introduce a problem on sa-esxi-01.vclass.local.

1. In the vSphere Client, verify that the linux-a-01 and Win-4 VMs are powered on.

2. Use MTPuTTY to log in to sa-esxi-01.vclass.local.

3. Navigate to the studentscripts directory in the sa-esxi-01-local datastore


directory and locate script2.sh.

cd /vmfs/volumes/sa-esxi-01-local/studentscripts
4. Run the script.

./script2.sh

83
Task 2: Troubleshoot the Problem
You analyze diagnostic messages in the vSphere Client and log files to identify the root cause of
the ESXi host disconnection problems.

1. Assess how the break script affected the environment.

• Does the vSphere Client indicate any problem with sa-esxi-01.vclass.local?

• If a problem exists, do other hosts have the same problem?

• What other tasks can you perform to assess the affected environment?

2. Drawing on the information that you gathered so far, determine what the root cause might
be.

3. Verify the log files to find additional information that can aid you in identifying the root cause.

4. Conduct any additional tests and analysis to identify the root cause.

5. Identify the root cause of the problem.

Task 3: Resolve the Problem


You apply the solution that you deem is the most appropriate based on your analysis.

1. List the ways to resolve the problem.

2. Choose a resolution to implement.

NOTE

For the purposes of this lab, do not resolve the problem by rebooting the host.

3. Using the command line, apply the resolution that you selected.

Task 4: Verify the Solution


You verify that the problem is resolved.

1. Verify that you can successfully log in to sa-esxi-01.vclass.local with VMware Host Client.

2. Log in to the vSphere Client and verify that sa-esxi-01 is operating normally.

• In the inventory, sa-esxi-01.vclass.local should not have a state of Not Responding.

• The VMs on sa-esxi-01.vclass.local should not have a state of Disconnected.

• The sa-esxi-01-local datastore should not have a state of Inaccessible.

84
Lab 20 Troubleshooting vCenter
Server Connection Problems

Objective and Tasks


Troubleshoot a vCenter Server Appliance connection problem:

1. Run a Break Script

2. Troubleshoot the Problem

3. Resolve the Problem

4. Verify the Solution

NOTE

For useful information about troubleshooting vCenter Server connection problems, see these
references. Review these references before you start the lab and use the information, as
needed, while performing the lab tasks.

Reference Link

Stopping, Starting, or Restarting VMware vCenter https://kb.vmware.com/kb/2109887


Server Appliance 6.x & above services (2109887)

Stopping, starting, or restarting services in https://kb.vmware.com/kb/2147152


vCenter Server Appliance 6.5 (2147152)

Platform Services Controller Services https://docs.vmware.com/en/VMware-


vSphere/6.7/com.vmware.psc.doc/GUID-
FE4E0496-A14C-4331-A7D6-
1200F7C068A5.html

85
Task 1: Run a Break Script
You run a break script to introduce a problem on the vCenter Server Appliance instance.

1. Log out of the vSphere Client.

2. Use MTPuTTY to log in to sa-vcsa-01.vclass.local.

3. Change to the Bash shell and navigate to the studentscripts directory.


cd /root/studentscripts
4. Run ./script4.sh.

5. When the script completes running, wait 15-20 seconds and log in to sa-vcsa-01.vclass.local
using the vSphere Client.

You should receive an error message.

Task 2: Troubleshoot the Problem


In the vSphere Client, you analyze diagnostic messages and log files to identify the root cause of
the connection problem.

1. Assess how the break script affected the environment.

What errors did you receive?

2. Review the log files for additional information that might help you to identify the root cause.

3. Perform any additional tests and analysis to determine the root cause.

4. Identify the root cause of the problem.

86
Task 3: Resolve the Problem
You apply the resolution that you deem is the most appropriate based on your analysis.

1. List the ways to resolve the problem.

2. Choose a resolution to implement.

NOTE

For purposes of this lab, do not resolve the problem by rebooting the system.

3. Using the command line, apply your resolution.

Task 4: Verify the Solution


You verify that the problem is resolved.

1. Verify that you can successfully log in to sa-vcsa-01.vclass.local using the vSphere Client.

87
88
Lab 21 Troubleshooting vCenter
Server and ESXi Host Problems

Objective and Tasks


1. Run a Break Script

2. Verify That the System Is Not Functioning Properly

3. Troubleshoot and Resolve the Problem

4. Verify the Solution

89
Task 1: Run a Break Script
You use PowerCLI to run a break script to damage your vCenter Server configuration or ESXi
host configuration in the lab environment.

Several break scripts are provided to create vCenter Server configuration and ESXi host
configuration problems. Each script damages the configuration in the lab environment in a
different way. You can run the break scripts in any order. You can choose which problems to
resolve.

1. On the student desktop, double-click the PowerCLI icon.

2. In the PowerCLI window, enter cd \Materials\Scripts\Mod8.

3. Enter the name of a break script.

For example, you enter .\Break-11-1.ps1 to run the first break script.

In the Difficulty column, 1 = least difficult to resolve, and 3 = most difficult to resolve.

Break Script Difficulty

.\Break-11-1.ps1 3

.\Break-11-2.ps1 2

.\Break-11-5.ps1 2

.\Break-11-7.ps1 3

.\Break-11-11.ps1 1

NOTE

After the break script completes, do not run another break script until you complete tasks 2
through 4 for each problem. You must run the scripts one at a time.

4. Wait until the You are ready to start the lab message appears.

5. Leave the PowerCLI window open for the next problem.

90
Task 2: Verify That the System Is Not Functioning Properly
You verify that an ESXi host or vCenter Server configuration is damaged in your lab environment.

1. Using the support request summary information, verify that the symptoms reported for your
break script occur and that your lab environment is not working.

Break Script Support Request

.\Break-11-1.ps1 A vSphere administrator reports that the inventory in the


vSphere Client is empty.

.\Break-11-2.ps1 Note: Power off all VMs in the inventory before running this
script.

A vSphere administrator cannot use SSH or the DCUI to


access an ESXi host. The administrator did not report which
host had the problem.

.\Break-11-5.ps1 A vSphere administrator notices that the size of the vCenter


Server log files is rapidly expanding and the logs are rotating
quickly. These events make troubleshooting difficult.

.\Break-11-7.ps1 A vSphere administrator logged out of the vSphere Client


before going to lunch. When the administrator returns and
logs back in, the inventory is empty.

.\Break-11-11.ps1 A vSphere administrator cannot log in to the vSphere Client.

Task 3: Troubleshoot and Resolve the Problem


You troubleshoot and resolve the problem with your configuration.

1. Use the available techniques and tools to troubleshoot and resolve the problem.

• Lab topology handout, which provides important information about the network,
storage, host, and VM configurations

• Lecture manual for this course

• VM, vCenter Server, and ESXi host log files

• VMware knowledge base articles, available at http://kb.vmware.com

• Internet

2. Apply your resolution.

91
Task 4: Verify the Solution
You verify that the vCenter Server and ESXi host configuration problem is resolved.

1. Reread the support request summary information in task 2.

2. Use the vSphere Client and VM web console, as needed, to verify that the problem is
resolved.

3. Leave the vSphere Client open until you complete all vCenter Server and ESXi host
troubleshooting problems.

4. Return to task 1 and run another break script.

92
Lab 22 Appendix: Troubleshooting
Network Communication Failures

Troubleshooting Flowchart
The flowchart presents a logical sequence for troubleshooting failures related to network
communications.

Troubleshooting Tasks
To troubleshoot network communication failures, you might perform the following tasks:

1. Verify the IP Configuration

2. Verify the VLAN Configuration

3. Verify the Speed, Duplex, or MTU Configuration

4. Verify the Uplink Configuration

5. Verify the Teaming Configuration

6. Verify the Network Link Status

7. Investigate a Host Failure

8. Investigate a Network Failure

9. Investigate a Communications or Port Failure

93
Task 1: Verify the IP Configuration
A specific host (ESXi host, VM, or vCenter Server) on a specific network seems to have a
problem.

1. Verify that the IP address and subnet mask of the VMkernel ports (ESXi hosts) or assigned
NICs (vCenter Server system, VM) are correct.

2. Verify that the default gateway of the VMkernel ports (ESXi hosts) or assigned NICs
(vCenter Server system, VM) is correct.

3. Verify that the DNS settings of the VMkernel ports (ESXi hosts) or assigned NICs (vCenter
Server system, VM) are correct.

4. Verify that observed IP ranges match the expected IP address settings.

a. Select Hosts > specific_host > Configure > Physical adapters.

Task 2: Verify the VLAN Configuration


A VLAN configuration error on a specific network causes network connectivity to fail.

1. In the vSphere Client, verify that the VLAN configuration of any distributed switch is correct
on the vCenter Server system.

a. Select Networking > distributed_switch > port_group > Configure > Edit Settings >
VLAN.

2. Verify that the VLAN configuration of any standard switch is correct on every ESXi host.

a. Select Host & Clusters > ESXi_host > Configure > Virtual Switches > virtual_switch >
port_group > ... > View Settings > Properties > VLAN ID.

Task 3: Verify the Speed, Duplex, or MTU Configuration


A speed, duplex, or MTU configuration error on a specific network causes network connectivity
to fail.

1. In the vSphere Client, verify that the hardware configuration for speed, duplex, and MTU of
any physical adapters is correct on every ESXi host.

a. Select Hosts > ESXi_host > Configure > Networking > Physical adapters.

94
Task 4: Verify the Uplink Configuration
An uplink configuration error on a specific host causes network connectivity to fail.

The correct uplink must be connected to the correct virtual switch on all ESXi hosts that are
connected to a standard switch or a distributed switch. VMs must be connected to the correct
port group.

1. On an ESXi host, verify that the uplink configuration is correct.

a. In the vSphere Client, select Networking > specific_switch > Configure > Topology.

2. On a VM, verify that the correct network is configured.

a. In the vSphere Client, select the VM and select VM Hardware > Network adapter.

Task 5: Verify the Teaming Configuration


A teaming and failover configuration error on a specific host causes network connectivity to fail.

The correct teaming and failover configuration must be set on every ESXi host connected to a
standard switch or a distributed switch.

1. In the vSphere Client, verify that the teaming and failover configuration is correct by
selecting Networking > switch > port_group > Edit Settings > Teaming and failover.

Task 6: Verify the Network Link Status


A link-down error on a specific host causes network connectivity to fail.

1. For an ESXi host, verify that the hardware link is online.

a. On each host, run the command to verify that link status is up on all network links.

esxcli -s <specific host> network nic list


2. For a VM, verify that the correct network is in a connected state.

a. In the vSphere Client, select the VM and select VM Hardware > Network adapter.

95
Task 7: Investigate a Host Failure
One host (ESXi, VM, or vCenter Server) seems to have a problem on all networks.

1. Verify that all other hosts can communicate.

a. Use ping and other communication tools to verify normal communications everywhere
else.

2. Verify all local network configuration settings on this specific host from inside the host for
each network device.

a. Use ESXCLI commands and other tools, such as ping.

b. Verify DNS, gateway, subnet masks, and firewall settings for each network device in
use.

c. Verify that all network devices are correctly identified.

A correct configuration on eth0 does not help if you are using eth1.

3. From the vSphere Client, verify all network configurations settings that are local to this
specific host on the network.

a. Verify VMkernel settings (if applicable), port settings, port group settings, NSX firewalls,
NSX routing, MTU settings, and so on.

b. Verify that the network device for this host is set to active (instead of standby or
unused).

4. Review logs for any indication of recent configuration changes.

If you can, identify a specific time when communication failed.

5. Review logs for any indication of traffic overloads or DOS attacks on this specific host.

Task 8: Investigate a Network Failure


A specific network seems to have a problem on multiple hosts (ESXi, VM, or vCenter Server).
The outage is limited to one network.

1. Verify that all hosts can communicate normally on other networks.

a. Use ping and other communication tools to Verify normal communications everywhere
else.

2. If a host is connected only to this network, attach it to a different network.

a. Verify that the host can communicate.

If not, a network-specific problem exists.

96
3. Verify whether a routing or gateway problem exists.

Do local network subnet communications work?

4. Verify all network configurations settings specific to this network, starting with virtual
hardware and then physical hardware (if applicable).
Configuration settings include VMkernel (if applicable) settings, port settings, port group
settings, NSX firewalls, NSX routing, MTU settings, and so on.

Is the network device for this host set to active (instead of standby or unused) on this
configuration?

5. Review logs for any indication of traffic overloads or DOS attacks on this specific network.

Task 9: Investigate a Communications or Port Failure


Network communications of a specific type or to a specific port seem to have a problem.

1. Verify that all hosts can communicate normally with other protocols.

If possible, test communications on a different TCP or UDP port.

2. Examine all network firewall configurations settings that are specific to this network.

3. Examine all firewall configuration settings within each host.

4. Review logs for any indication of recent configuration changes.

97
98
Lab 23 Appendix: Troubleshooting
Storage Failures

Troubleshooting Flowchart
The flowchart presents a logical sequence for troubleshooting storage failures.

99
Troubleshooting Tasks
To troubleshoot storage failures, you might perform the following tasks:

1. Follow Storage Troubleshooting Procedures

2. Investigate a VM Disk Failure

3. Investigate an I/O Overload Problem

4. Investigate an iSCSI Storage Failure

5. Investigate an NFS Storage Failure

6. Troubleshoot a Fibre Channel over Ethernet Failure

7. Troubleshoot a Fibre Channel Failure

8. Troubleshoot a Path Failure

9. Troubleshoot a Local Disk Failure

10. Troubleshoot a Storage Array Failure

11. Troubleshoot a Storage Site Disaster

Task 1: Follow Storage Troubleshooting Procedures


General troubleshooting procedures might help you to identify and resolve storage problems that
are difficult to diagnose.

If you cannot determine the specific problem, use these procedures.

1. Verify that all individual ESXi hosts can see all LUNs.

esxcli storage core path list


2. Attempt to run a rescan command on any ESXi host that cannot see the datastore.

esxcli storage core adapter rescan -A vmhbaX


3. Verify the capacity of datastores from each ESXi host.

df -h | grep VMFS
All datastores should be visible and show free space.

4. If storage is working but performing poorly, look for overloads from specific ESXi hosts,
VMs, and so on.

a. Run the esxtop or resxtop commands.

b. If NAS storage (iSCSI, NFS, FCoE, and so on) is used, follow network troubleshooting
procedures to identify bandwidth overloads or other network configuration problems.

100
Task 2: Investigate a VM Disk Failure
A single VM has lost connection to one or more disk devices.

1. Verify that all the VMs that use the same datastore are operating normally.

2. If other VMs on the same datastore are experiencing problems, go to Datastore Failure.

3. View the vmware.log file to find error messages related to this VM.
4. In the vSphere Client datastore browser, locate all files for the VM.

a. Verify that no files are missing, especially VM disk descriptor files, VM configuration files,
and VM disk files (.vmdk and .vmx).

5. Verify that the VM configuration correctly identifies the VM disk files and has the correct
path to the VM disk files.

6. Make a backup copy of all VM files.

7. Attempt to edit the disk descriptor file with a text editor to resolve any CID mismatch errors.

8. Use vmkfstools to identify if a single ESXi host disk file is locked.

If a VM disk file is locked, attempt to unlock the file.

Task 3: Investigate an I/O Overload Problem


A storage device is working but performing poorly. An I/O overload might be degrading
performance.

1. If storage is working but performing poorly, look for overloads from specific ESXi hosts,
VMs, and so on.

a. Run the esxtop or resxtop commands.

2. If NAS storage (iSCSI, NFS, FCoE, and so on) is used, follow network troubleshooting
procedures to identify bandwidth overloads or other network configuration problems.

101
Task 4: Investigate an iSCSI Storage Failure
A network storage device connected by the iSCSI protocol is offline.

1. Verify that the iSCSI target array is supported and that it presents the LUN to the ESXi host.

2. Verify that the iSCSI storage adapter on each ESXi host is configured correctly.

a. Review the configuration of parameters.

• iSCSI target name / IP address

• Port

• Authentication

• Port binding

• iSCSI initiator name

3. Verify that the IP networking components on each ESXi host are configured correctly.

• VMkernel port

• TCP/IP network configuration

• Uplink

4. Verify that you can ping the VMkernel port from other devices on the IP storage network.

5. Verify that the network port group is configured correctly.

a. Review the configuration of parameters.

• VLAN

• Teaming

• Traffic shaping

• Elastic port allocation

• Port blocking

6. Verify that path status is active on iSCSI storage adapters.

7. Verify that port group policy is compliant on iSCSI storage adapters.

8. Verify that the virtual switch is configured correctly, including MTU and filtering settings.

9. Verify that iSCSI network traffic is not fighting congestion from other types of IP traffic on
the network.

An isolated storage network is recommended, and you might need to isolate iSCSI from NFS
traffic.

102
10. Verify that IP communications between the affected ESXi hosts and the target array are
working.

11. Verify that no firewalls are blocking TCP 3260 between the affected ESXi hosts and the
target array.

a. Run the command to verify communication between the ESXi host and the iSCSI array.

nc -z IPaddr 3260
12. Verify that the physical storage hardware on the iSCSI target array is functioning correctly.

13. Use the VMware vSphere On-disk Metadata Analyzer (VOMA) to verify VMFS metadata
consistency.

Task 5: Investigate an NFS Storage Failure


A network storage device hosted by the NFS protocol is offline.

1. Verify that the IP networking components on each ESXi host are correctly configured.

• VMkernel port

• TCP/IP network configuration

• Uplink

2. Verify that NFS storage is configured correctly.

3. Run ESXCLI and other commands from an SSH session or PowerCLI to verify all local
network configuration settings on the specific host from inside the host.

a. Verify DNS, gateways, subnet masks, firewall, and other settings for each network
device in use.

4. Verify that all network devices are configured correctly.

a. On ESXi hosts, verify that you are troubleshooting the correct virtual switch, port group,
VMkernel address, and uplink.

A correct configuration on eth0 does not help if you are using eth1 for storage.

5. Verify that network communications are correctly configured on the NFS Storage provider.

a. Verify folder names, TCP ports, security, and so on.

6. Verify that the NFS storage provider and the ESXi hosts are consistent on the NFS protocol
that is used (v3, v4.1, and so on).

7. If Kerberos authentication is used, verify that it is configured correctly.

8. Verify that time is synchronized between the ESXi host and the NFS storage provider.

103
Task 6: Investigate a Fibre Channel Storage Connectivity Failure
A connectivity problem occurs with Fibre Channel storage. Channel storage arrays are
connected with vendor-specific dedicated hardware.

1. Verify that no ESXi hosts can see the shared storage array.

If the problem is specific to a single ESXi host, it might be either a configuration error on the
storage adapter or a hardware failure.

2. Attempt a rescan of the storage adapter.

3. Verify that the storage adapter on the ESXi host is configured correctly.

4. Verify that the VMFS metadata is consistent with the vSphere On-disk Metadata Analyzer
(VOMA).

5. Verify that the fibre switch zoning configuration permits the ESXi host to see the storage
array.

6. If your configuration requires an ESXi host reboot after the zone set change on the FC-SAN
array, reboot your ESXi host.

7. For more information about troubleshooting Fibre Channel storage connectivity, see
VMware knowledge base article 1003680 at https://kb.vmware.com/s/article/1003680.

8. Contact the FC SAN administrator.

Task 7: Investigate a FCoE Failure


Fibre Channel over Ethernet (FCoE) is a Fibre Channel storage array connected with Ethernet
using Fibre Channel SCSI protocol instead of TCP/IP.

1. Verify that Ethernet layer 2 connectivity between the ESXi host and FCoE storage array is
good.

2. Verify that the storage adapter on the ESXi host is configured correctly.

3. Attempt a rescan of the storage adapter.

4. Verify that the VMFS metadata is consistent with the vSphere On-disk Metadata Analyzer
(VOMA).

5. Contact the FCoE SAN administrator.

104
Task 8: Troubleshoot a Path Failure
Path failure means one or more paths between the ESXi host and the storage device or storage
array are down. Either a device is permanently unavailable to an ESXi host (permanent device
loss or PDL) or all paths between the ESXi host and the storage device or array are down (all
paths down or APD). An APD condition is expected to be temporary.

Possible causes of PDL:

• The device is unintentionally removed.

• The device’s unique ID changes.

• The device experiences an unrecoverable hardware error.

• The device ran out of space, causing it to become inaccessible.

Symptoms:

• Operational state is Lost Communication.

• All paths appear as dead.

• Datastores are unavailable.

1. Verify that all other hosts can communicate normally to the device.

a. Use ping and other communication tools to Verify normal communications everywhere
else.

2. Run ESXCLI commands to perform initial checks on the LUN paths.

• esxcli storage core path list

• esxcli storage nmp device list


• esxcli storage core adapter rescan –A vmhbaX
3. Examine all local network configuration settings on this specific host from inside the host.

Settings include DNS, gateways, subnet masks, and firewall settings for each network device
in use.

a. Verify that all network devices are correctly identified.

A correct configuration on eth0 is not going to help if you are using eth1.

4. If multiple hosts lose communications with the same storage array, verify configuration
settings on the storage array.

Also verify that no network hardware between the hosts and storage array is misconfigured
(firewalls, VLANs, port blocking, MTU, and so on.)

5. Verify that teaming and failover on the required NICs is correctly configured and that the
correct uplink is active.

105
6. Verify that default APD handling is enabled on the ESXi host with the global setting
Misc.APDHandlingEnable = 1.

Also verify that the time-out setting is long enough. Misc.APDTimeout = 140 is
recommended (140 seconds).

7. Verify that the path selection policy is correctly configured on the storage adapter.

8. For NFS devices, verify that the correct NFS protocol (NFS 3 or NFS 4.1) is used.

9. After configuration problems are solved, you might need to reattach or remount the
datastore.

Task 9: Troubleshoot a Local Disk Failure


Disks that are locally attached to ESXi hosts can be used for storage, especially in vSAN
configurations.

1. Verify that your storage adapter is correctly configured.

2. Verify that local storage is available and functioning using ESXCLI commands.

A local disk failure might require reformatting the disk.

A local disk failure might be caused by a local hardware failure on the ESXi host.

Task 10: Troubleshoot a Storage Array Failure


You encounter a problem with storage that is part of a storage array.

1. If the storage array is an FC-SAN, run ESXCLI commands.

• Run the command to list adapter attributes.

esxcli storage san fc list


• Run the command to show adapter statistics.

esxcli storage san fc stats get


• Run the command to reset a specific adapter.

esxcli storage san fc reset -A vmhbaX


• Run the command to retrieve events for a specific adapter.

esxcli storage san fc events get -A vmhbaX


2. Contact your FC-SAN storage administrator.

106
Task 11: Troubleshoot a Storage Site Disaster
A disaster occurs at a storage site. All storage that is hosted at a specific physical site is offline.

1. Determine the scope of the problem by asking key questions.

• Is the disaster caused by a natural event (storm, fire, natural disaster)?

• Is the physical building intact?

• Is the outage caused by an internal physical problem, such as flooding because of


plumbing problems, high temperature caused by HVAC failure, or a construction error
that causes the power or communication lines to break.

• Is the problem a power outage?

• Is the problem a communications outage?

• Can the problem be resolved in a relatively short time?

2. Create a service recovery estimate for how long the facility will be offline.

3. Determine if storage hardware is physically damaged and must be physically replaced with
data restored from backups.

4. Drawing on the scope of the problem and the service recovery estimate, determine which
form of the disaster recovery or business continuity plan should be implemented.

107
108
Lab 24 Appendix: Troubleshooting
Cluster Failures

#Troubleshooting Flowchart
The flowchart presents a logical sequence for troubleshooting cluster failures.

109
Troubleshooting Tasks
To troubleshoot cluster failures, you might perform the following tasks:

1. Troubleshoot a vSphere vMotion Migration Failure

2. Investigate a Management Agent Problem

3. Reset Migrate Enabled and Verify the Result

4. Investigate an HA Configuration Problem

5. Investigate an HA Resources Problem

6. Investigate Why DRS Never Migrates

7. Investigate Why DRS Rarely Migrates

8. Investigate DRS Erratic Behavior

Task 1: Troubleshoot a vSphere vMotion Migration Failure


A vSphere vMotion migration fails completely. The status bar does not report any progress.

1. Verify that the network configuration on the management network is correct by reviewing
the settings.

• VMkernel port settings

• IP address

• Subnet mask

• Gateway

• Uplink connections

• VLAN settings

2. Verify that the network configuration on the vMotion network is correct by reviewing the
settings.

• VMkernel port settings

• IP address

• Subnet mask

• Gateway

• Uplink connections

• VLAN settings

a. Verify that network bandwidth is sufficient to support vSphere vMotion.

110
3. Verify that name resolution is working on all ESXi hosts and vCenter Server systems.

4. Verify that time is synchronized across the environment (ESXi hosts and vCenter Server
systems).

5. Verify that enough disk-free disk space is available on the target host. (Occurs only during
storage migration).

6. Verify that the reservation requirements (if any) on the VM can be met on the target host.

7. Verify that the log.rotateSize parameter is not set too low for the VM.

8. Restart the hostd and vpxa management agents on both ESXi hosts.

Task 2: Investigate a Management Agent Problem


A management agent problem is often indicated by a vSphere vMotion migration failure at 15% or
less.

1. Verify that the network configuration on the management network is correct by reviewing
the settings.

• VMkernel port settings

• IP address

• Subnet mask

• Gateway

• Uplink connections

• VLAN settings

2. Verify that name resolution is working on all ESXi hosts and vCenter Server systems.

3. Verify that time is synchronized across the environment (ESXi hosts and vCenter Server
systems).

4. Verify that the log.rotateSize parameter is not set too low for the VM.

5. Restart the hostd and vpxa management agents on both ESXi hosts.

111
Task 3: Reset Migrate Enabled and Verify the Result
A reset of the Migrate.Enabled parameter can solve some vSphere vMotion migration
problems.

1. Reset the Migrate.Enabled parameter on the source ESXi host.

a. In the source ESXi host, change the Migrate.Enabled setting to 0.


b. Save the Advanced System Settings.

c. Change the Migrate.Enabled setting back to 1.

d. Save the Advanced System Settings.

2. Repeat these steps for the target ESXi host.

Task 4: Investigate an HA Configuration Problem


High availability (HA) cannot be enabled on the cluster.

1. Verify that the Fault Domain Manager (FDM) agent is installed on all ESXi hosts.

• Disable and then re-enable HA to attempt to reinstall the FDM agent.

• Poor network bandwidth can prevent the FDM agent from installing.

• Insufficient disk space in /root can prevent the FDM agent from installing.

• Verify that the /etc/opt/vmware/fdm directory exists and has the correct files installed.

a. Disable and then reenable HA to attempt to reinstall the FDM agent.

Poor network bandwidth can prevent the FDM agent from installing. Insufficient disk
space in the /root directory can prevent the FDM agent from installing.

b. Verify that the /etc/opt/vmware/fdm directory exists and has the correct files
installed.

2. Verify that the FDM is running.

a. If the FDM is not running, restart it on an ESXi host after you determine what caused the
failure.

3. Verify that all ESXi hosts are connected to the vCenter Server system.

a. Test connectivity from the ESXi host back to the vCenter Server system using ping.

4. Verify that all ESXi hosts have static addresses.

a. If you use DHCP, verify that the IP address for each host persists across reboots.

Although HA supports both IPv4 and IPv6, ensure that all HA network traffic is either
one protocol or the other, not a mixture of both protocols.

112
5. Verify that all hosts have at least one heartbeat network in common (management network
or vSAN network, if vSAN was first enabled on the cluster).

Best practice is to have at least two management networks in common.

6. Verify that at least one heartbeat datastore is accessible by all hosts.


Best practice is to have two heartbeat datastores. Heartbeat datastores are not used in
vSAN configurations.

7. Verify that you have connectivity to the heartbeat datastore LUNs.

8. If using vSAN, verify that das.isolationAddress0 and


das.useDefaultIsolationAddress are configured so that HA uses the vSAN
network as the HA network.

HA should not use the management network as the HA network

9. To ensure that any VM can run on any host in the cluster, provide all hosts with access to the
same VM networks and datastores.

10. Verify that you have a minimum of two ESXi hosts in a vSphere HA cluster.

For a vSAN cluster, a minimum of three ESXi hosts is required.

11. Verify that VMware Tools is installed.

If VMware Tools is not installed, VM monitoring does not work.

12. Verify that all hosts are licensed for vSphere HA.

vSphere HA supports IPv4 and IPv6. However, a cluster that mixes both of these protocol
versions is more likely to result in a network partition.

13. Verify that all ESXi hosts can access the same networks.

14. Verify that all ESXi hosts can access the same shared datastores.

Task 5: Investigate an HA Resources Problem


Insufficient resources in the HA cluster prevent VMs from powering on.

1. Verify that the cluster has sufficient physical resources.

2. Verify VM reservations.

One or more VMs might have excessive reservations. Check VM bandwidth reservations.

3. Verify that the vSphere HA admission control policy is configured correctly.

113
Task 6: Investigate Why DRS Never Migrates
DRS does not migrate VMs, even when the cluster is badly imbalanced.

1. Verify that DRS is not in manual mode.

2. Verify that the DRS automation level is not set too low.

3. Determine whether DRS affinity or anti-affinity rules might be preventing migration.

4. Verify that you can manually migrate running VMs.

If not, verify the configuration of the vMotion network.

5. Verify that all VMs are not using local host resources.

Task 7: Investigate Why DRS Rarely Migrates


DRS rarely migrates VMs, even when the cluster is badly imbalanced.

1. Verify that the VM loads and resource requirements are correct.

2. Verify that the DRS automation level is not set too low.

3. Determine whether DRS affinity or anti-affinity rules might be preventing migration.

4. Verify that some VMs are not using local host resources.

Task 8: Investigate DRS Erratic Behavior


DRS constantly migrates VMs, even when the cluster is relatively balanced.

1. Verify that VM loads and resource requirements are correct.

2. Verify that VM loads are not erratic in their resource demands.

VMs might be incorrectly configured (insufficient resources, badly designed applications,


operating system problems).

3. Verify that the DRS automation level is not set too high.

114
Lab 25 Appendix: Troubleshooting
Virtual Machine Failures

#Troubleshooting Flowchart
The flowchart presents a logical sequence for troubleshooting VM failures.

115
Troubleshooting Tasks
To troubleshoot VM failures, you might perform the following tasks:

1. Investigate a CID Problem

2. Investigate a Quiesced VM Problem

3. Investigate a General Snapshot Failure

4. Investigate a Power-On Failure

5. Investigate a VM That Shows an Invalid or Orphaned State

6. Investigate a VMware Tools Installation Failure

Task 1: Investigate a CID Problem


Attempted snapshot fails with a CID mismatch error. This failure can be caused by an interruption
to a vSphere vMotion migration or by a VMware software error.

1. Examine the vmware.log file associated with the VM to identify the specific disk chain
that is affected.

For multiple .vmdk files, the CID and parentCID that are referenced in the files should
match.

2. Back up the higher number .vmdk file that is incorrect.

3. Manually edit the higher numbered .vmdk file.

a. Change the parentCID= entry to match the correct CID.

4. Run the vmkfstools command to verify that the CID is corrected.

Task 2: Investigate a Quiesced VM Problem


A VM with a heavy I/O workload might fail to quiesce before a snapshot operation.

1. Verify that you can take a normal snapshot with Quiesce deselected.

2. Verify that VSS prerequisites are met.

3. Verify that appropriate services are running and startup types are correct.

4. Verify that the VSS provider is used.

5. Verify that all the VSS writers are stable and not reporting errors.

116
Task 3: Investigate a General Snapshot Failure
User cannot create or commit a snapshot.

1. Verify that the virtual disk type is supported for snapshots.

2. Verify that fewer than 32 levels of snapshots exist.

3. Verify that you have permission to create or commit snapshots, including permission to write
to the datastore.

4. Verify that the -delta.vmdk file does not have an associated descriptor file that is
missing.

5. Verify that the snapshot file size does not exceed the maximum size supported by the
datastore.

6. Verify that space is available on the datastore for snapshots.

Task 4: Investigate a Power-On Failure


A VM cannot be powered on.

1. Examine the vmware.log file associated with the VM.

2. Verify that no VM files are missing.

a. If files are missing, restore them from backup.

b. If the descriptor file (.vmx) is missing, it must be recreated manually.

3. Verify that none of the VM files are locked.

a. Run the vmkfstools command to identify which ESXi host is locking the file.

b. Run the lsof command to identify the process that is locking the file.

c. Stop the process.

4. If a file is locked and you cannot stop the process that locks it, migrate all VMs to a new host
or reboot the ESXi host that is locking the file.

5. Verify that the ESXi host has sufficient resources.

a. Examine and decrease reservation settings or add more resources.

6. Verify that the ESXi host is online and connected to the vCenter Server system.

a. Verify that the ESXi host can respond to a network ping on the management interface.

b. Open the direct console to the ESXi host and look for a purple error screen.

117
Task 5: Investigate a VM That Shows an Invalid or Orphaned State
VMs might show as Invalid or Orphaned. This problem might be caused by a vCenter Server
system failure or a restart during a migration process.

1. If the vCenter Server system was rebooted, wait until it is completely back online and stable.

An invalid state caused by a vCenter Server system reboot is temporary.

2. Verify that the .vmx file is present and is not corrupt.

3. Restore from backup if the file is corrupt or missing.

a. Remove the VM from the inventory

b. Restore the VM files from backup.

c. Add the VM back to the inventory.

4. Examine the Recent Tasks pane to verify that the VM is being migrated.

5. If the VM is registered on one of the ESXi hosts, restart the management processes on that
ESXi host.

6. If the VM is not registered, attempt to reregister the VM.

7. Verify that all VM files still exist.

8. If the files are present, attempt to reregister.

9. If the files are missing (files deleted outside of vCenter Server system), restore files from
backup and reregister the VM.

Task 6: Investigate a VMware Tools Installation Failure


VMware Tools cannot be installed.

1. Verify that the correct guest operating system is selected.

2. Verify that the correct VMware Tools ISO image is available and is not corrupt

3. If possible, use the open-vm-tools on https://github.com/vmware/open-vm-tools.

118
Lab 26 Appendix: Troubleshooting
ESXi Host and vCenter Server System
Failures

Troubleshooting Flowchart
The flowchart presents a logical sequence for troubleshooting ESXi host and vCenter Server
failures.

119
Troubleshooting Tasks
To troubleshoot ESXi host and vCenter Server problems, you might need to perform the
following tasks:

1. Investigate a Certificate Problem

2. Replace Self-Signed Certificate with CA-Generated Certificate

3. Restart the vCenter Server Service

4. Investigate a vCenter Server Database Free Space Problem

5. Investigate a vCenter Server PostgreSQL Problem

6. Investigate a Purple Diagnostic Screen

7. Investigate Why an ESXi Host Is Unresponsive

Task 1: Investigate a Certificate Problem


Digital security certificates cannot be replaced or are not working.

1. Examine the certificate-manager.log.

2. Verify that you are using base64 certificates.

3. Verify that all ICA and root CA certificates are published into the trusted store in the VECS.

4. Verify that you are not using self-signed certificates.

Task 2: Replace Self-Signed Certificate with CA-Generated Certificate


For more information about replacing vCenter Server SSL certificates, see VMware knowledge
base article 2111219 at http://kb.vmware.com/kb/2111219.

Task 3: Restart the vCenter Server Service


You attempt to restart the vCenter Server service to resolve the certificate problem.

For more information about restarting vCenter Server services, see VMware knowledge base
article 2109881 at http://kb.vmware.com/kb/2109881.

1. In the vCenter Server console, restart the vCenter Server service.

a. Connect to the vCenter Server Management Interface at


https://<vCenter_Server_FQDN>:5480/.

b. Click Services.

c. Select the service and click Restart.

120
2. Restart the vCenter Server service from an SSH session.

service-control --stop --all


service-control --start --all

Task 4: Investigate a vCenter Server Database Free Space Problem


The vSphere Client or log files indicate that the database is either low on free space or that
space is exhausted.

1. In the vSphere Client, examine database settings to verify that your configuration is not
trying to record too much data in the database.

2. Verify that tables are not too large.

Tables can expand at a fast rate.

3. Verify that the statistic level is set to level 2 or lower.

4. Verify that rollup jobs are correctly configured.

5. Verify that the last time the rollup jobs ran is not more than 24 hours past.

6. Verify that the vpx_hist_statl table does not include more than 10 million rows.

7. If you use an internal PostgreSQL database and it is out of space, shut down the vCenter
Server Appliance VM, expand the VM's hard disk, power on vCenter Server Appliance, and
run the vpxd_servicecfg storage lvm autogrow command.

For additional information, refer to VMware knowledge base article 2145603 at


http://kb.vmware.com/kb/2145603.

Task 5: Investigate a vCenter Server PostgreSQL Problem


The vCenter Server PostgreSQL database server is not running.

1. Verify that the postgres service is running.

2. If the postgres service is not running, start it.

3. If the postgres service is running, restart the service.

121
Task 6: Investigate a Purple Diagnostic Screen
An ESXi host stops responding and displays a purple diagnostic screen.

1. Record the state of the system.

a. Take a screenshot or photograph of the purple diagnostic screen.

b. Note any relevant environmental issues or conditions.

2. Restart the host.

a. Get the VMs up and running.

b. Collect a vm-support log bundle from the affected host.

3. Contact VMware Technical Support.

4. If VMware Technical Support determines that the issue is a hardware problem, contact your
hardware vendor.

Task 7: Investigate Why an ESXi Host Is Unresponsive


An ESXi host appears to be unresponsive.

1. Verify that an ESXi host is not responding by performing tasks on the host.

• Ping the VMkernel network interface.


• Determine whether the vSphere Client responds to queries.

• Monitor network traffic from the ESXi host and its VMs.

If any of the verification tasks are successful, your ESXi host should be at least minimally
operational.

2. In the ESXi host's DCUI, press ALT+F12 to display VMkernel messages on the screen.

3. Reboot the host.

4. Determine why the host locked up.

a. Review logs that led to the outage.

b. Set up serial-line logging.

c. Gather performance statistics.

5. After hardware problems are corrected, reinstall and configure the ESXi host, using your
most recent backup to ensure that faulty hardware did not corrupt the disk.

6. Install the latest patches and updates for the ESXi host.

122
Answer Key

Lab 5 Monitoring NIC Teaming During Failover


Q1. Which uplink is used by linux-a-07 VM?
A1. vmnic5, the active uplink
Q2. Which uplink is now used by the linux-a-07 VM?
A2. vmnic4, the standby uplink
Q3. What messages did you find?
A3. On sa-esxi-02.vclass.local's Summary tab, the critical alarm Network uplink
redundancy lost appears. On sa-esxi-02.vclass.local's Monitor tab, the Events pane
shows the same alarm but with a little more information, informing you that Physical
NIC vmnic5 is down.
Q4. What log entries did you find?
A4. In the hostd.log and vobd.log files, the following message is posted: Physical NIC
vmnic5. In the vmkernel.log file, the following messages are posted: Setting link
down on physical adapter vmnic5 ... [vmnic5] Taking down link ...
vmnic5: link down notification ...
Q5. What log entries did you find?
A5. In the hostd.log and vobd.log files, the following message is posted: Physical NIC
vmnic5 is up. In the vmkernel.log file, the following messages are posted: vmnic5:
link up event received ... vmnic5: device Up notification ...
vmnic5: link up notification ...
Q1. Which uplink is now used by the VM?
A1. vmnic5, the active uplink
Lab 7 Applying the Troubleshooting Methodology
Q1. Is the host IP in the correct subnet?
A1. Because this host is on the Production network, the IP subnet should be in the
172.20.11.0/24 range. If the host IP is configured as a DHCP address and a network
problem occurs, no IP address is assigned.
Q2. Does the host have the correct default router?

123
A2. The default router for the Production network should be 172.20.11.10. However, in a
DHCP network configuration, no router is assigned if a network problem occurs.
Q3. Does the host have the correct network configured?
A3. The network should be configured as either the pg-SA-Production-01 or the pg-SA-
Production-02 network.
Q4. Does the host have a network link status of connected?
A4. Yes, the network status is connected.
Q1. Does the host have the correct uplink configured?
A1. Uplinks are not correct. The sa-esxi-01 host does not have uplinks configured on the pg-
SA-Production-01 and pg-SA-Production-02 port groups.
Lab 10 Troubleshooting Storage Performance Issues
Q1. Which HBA might be the cause of slow storage performance?
A1. vmhba65, because this HBA shows high IOPS.
Q2. What condition is degrading storage performance?
A2. A high number of read commands are being issued from vmhba65.
Q1. Which storage device seems to be affected?
A1. The device with the storage identifier naa.60003ff44dc75adcaf760d6a0ac8e3fe
Q2. What is the datastore name of the affected storage device?
A2. Shared3
Q1. Which VM or VMs might be contributing to slow storage performance?
A1. Win-4, Win-5, and Win-6 are running several read commands per second. However,
these VMs do not seem to be causing a significant amount of latency because the load is
still less.
Q2. What possible solutions can help you get better performance?
A2. Add another VMkernel port and vmnic for software iSCSI multipathing and set the
multipathing policy to Round Robin. Also, check the DAVG value or latency values for
the VM. Enable Storage I/O Control and set the value to 5 ms. Migrate one or two VMs
to another datastore.

124

You might also like