(IBM) System Storage SAN Volume Controller Version 6.4.0

Download as pdf or txt
Download as pdf or txt
You are on page 1of 350

IBM System Storage SAN Volume Controller

Version 6.4.0

Troubleshooting Guide



GC27-2284-03
Note
Before using this information and the product it supports, read the information in “Notices” on page 315.

This edition applies to IBM System Storage SAN Volume Controller, Version 6.4.0, and to all subsequent releases
and modifications until otherwise indicated in new editions.
This edition replaces GC27-2284-02.
© Copyright IBM Corporation 2003, 2012.
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract
with IBM Corp.
Contents
Figures . . . . . . . . . . . . . . vii Accessing the management GUI . . . . . . 68
Deleting a node from a clustered system using
Tables . . . . . . . . . . . . . . . ix the management GUI . . . . . . . . . . 69
Adding nodes to a clustered system . . . . . 71
Service assistant interface . . . . . . . . . . 74
About this guide . . . . . . . . . . . xi When to use the service assistant . . . . . . 74
Who should use this guide . . . . . . . . . xi Accessing the service assistant . . . . . . . 75
| Summary of changes for GC27-2284-03 SAN Volume Cluster (system) command-line interface. . . . . 75
| Controller Troubleshooting Guide . . . . . . . xi When to use the cluster (system) CLI . . . . . 75
Summary of changes for GC27-2284-02 SAN Volume Accessing the cluster (system) CLI. . . . . . 76
Controller Troubleshooting Guide . . . . . . . xi Service command-line interface . . . . . . . . 76
Summary of changes for GC27-2284-01 SAN When to use the service CLI . . . . . . . . 76
Volume Controller Troubleshooting Guide . . . . xii Accessing the service CLI. . . . . . . . . 76
Emphasis . . . . . . . . . . . . . . . xiii
SAN Volume Controller library and related
Chapter 4. Performing recovery actions
publications . . . . . . . . . . . . . . xiv
How to order IBM publications . . . . . . . xvii using the SAN Volume Controller CLI . 77
Sending your comments . . . . . . . . . xvii Validating and repairing mirrored volume copies
using the CLI. . . . . . . . . . . . . . 77
Repairing a space-efficient volume using the CLI . . 78
Chapter 1. SAN Volume Controller
Recovering from offline volumes using the CLI . . 79
overview . . . . . . . . . . . . . . 1 Replacing nodes nondisruptively . . . . . . . 80
Clustered systems . . . . . . . . . . . . 5
Configuration node . . . . . . . . . . . 5
Chapter 5. Viewing the vital product
Configuration node addressing . . . . . . . 5
Management IP failover . . . . . . . . . 6 data . . . . . . . . . . . . . . . . 87
SAN fabric overview . . . . . . . . . . . 7 Viewing the vital product data using the
management GUI . . . . . . . . . . . . 87
Displaying the vital product data using the CLI . . 87
Chapter 2. Introducing the SAN Volume
Displaying node properties using the CLI . . . 87
Controller hardware components . . . . 9 Displaying clustered system properties using the
SAN Volume Controller nodes . . . . . . . . 9 CLI . . . . . . . . . . . . . . . . 88
SAN Volume Controller front panel controls and Fields for the node VPD . . . . . . . . . . 90
indicators . . . . . . . . . . . . . . 9 Fields for the system VPD . . . . . . . . . 94
SAN Volume Controller operator-information
panel . . . . . . . . . . . . . . . 14
Chapter 6. Using the front panel of the
SAN Volume Controller rear-panel indicators and
connectors . . . . . . . . . . . . . . 19 SAN Volume Controller. . . . . . . . 97
Fibre Channel port numbers and worldwide port Boot progress indicator . . . . . . . . . . 97
names . . . . . . . . . . . . . . . 35 Boot failed. . . . . . . . . . . . . . . 97
Requirements for the SAN Volume Controller Charging . . . . . . . . . . . . . . . 98
environment . . . . . . . . . . . . . 36 Error codes . . . . . . . . . . . . . . 98
Redundant ac-power switch . . . . . . . . . 47 Hardware boot . . . . . . . . . . . . . 98
Redundant ac-power environment requirements 48 Node rescue request . . . . . . . . . . . 98
Cabling of redundant ac-power switch (example) 49 Power failure . . . . . . . . . . . . . . 99
Uninterruptible power supply . . . . . . . . 51 Powering off . . . . . . . . . . . . . . 99
2145 UPS-1U . . . . . . . . . . . . . 51 Recovering . . . . . . . . . . . . . . 100
Uninterruptible power-supply environment Restarting . . . . . . . . . . . . . . 100
requirements . . . . . . . . . . . . . 56 Shutting down . . . . . . . . . . . . . 100
Defining the SAN Volume Controller FRUs . . . . 56 Validate WWNN? option . . . . . . . . . 101
SAN Volume Controller FRUs . . . . . . . 56 SAN Volume Controller menu options . . . . . 102
Redundant ac-power switch FRUs . . . . . . 65 Cluster (system) options . . . . . . . . . 104
Node options . . . . . . . . . . . . 106
Version options . . . . . . . . . . . . 106
Chapter 3. SAN Volume Controller user
Ethernet options . . . . . . . . . . . 106
interfaces for servicing your system . . 67 Fibre Channel port options . . . . . . . . 107
Management GUI interface . . . . . . . . . 67 Actions options. . . . . . . . . . . . 108
When to use the management GUI . . . . . 68

© Copyright IBM Corp. 2003, 2012 iii


Language? option . . . . . . . . . . . 122 Chapter 9. Understanding the medium
Using the power control for the SAN Volume errors and bad blocks . . . . . . . 229
Controller node . . . . . . . . . . . . 123
Chapter 10. Using the maintenance
Chapter 7. Diagnosing problems . . . 125
analysis procedures . . . . . . . . 231
Event reporting. . . . . . . . . . . . . 125
MAP 5000: Start . . . . . . . . . . . . 231
Power-on self-test . . . . . . . . . . . 126
MAP 5050: Power 2145-CG8, 2145-CF8, 2145-8G4,
Understanding events . . . . . . . . . . 126
2145-8F4, and 2145-8F2 . . . . . . . . . . 238
Managing the event log . . . . . . . . . 127
MAP 5060: Power 2145-8A4 . . . . . . . . 245
Viewing the event log . . . . . . . . . 127
MAP 5150: 2145 UPS-1U. . . . . . . . . . 248
Describing the fields in the event log . . . . 127
MAP 5250: 2145 UPS-1U repair verification . . . 254
Event notifications. . . . . . . . . . . . 128
MAP 5320: Redundant ac power . . . . . . . 255
Inventory information email . . . . . . . . 131
MAP 5340: Redundant ac power verification . . . 256
Understanding the error codes . . . . . . . 131
MAP 5350: Powering off a SAN Volume Controller
Using the error code tables . . . . . . . . 132
node . . . . . . . . . . . . . . . . 258
Event IDs . . . . . . . . . . . . . 132
Using the management GUI to power off a
SCSI event reporting . . . . . . . . . . 140
system . . . . . . . . . . . . . . 259
Object types . . . . . . . . . . . . . 142
Using the SAN Volume Controller CLI to power
Error event IDs and error codes . . . . . . 143
off a node . . . . . . . . . . . . . 260
Determining a hardware boot failure . . . . 154
Using the SAN Volume Controller Power
Boot code reference . . . . . . . . . . 154
control button . . . . . . . . . . . . 261
Node error code overview . . . . . . . . 155
MAP 5400: Front panel . . . . . . . . . . 263
Clustered-system code overview . . . . . . 156
MAP 5500: Ethernet . . . . . . . . . . . 265
Error code range . . . . . . . . . . . 157
Defining an alternate configuration node . . . 268
Booting codes . . . . . . . . . . . . 157
| MAP 5550: 10G Ethernet and Fibre Channel over
Create cluster errors . . . . . . . . . . 159
| Ethernet personality enabled Adapter port . . . 268
Node errors . . . . . . . . . . . . . 159
MAP 5600: Fibre Channel . . . . . . . . . 271
Cluster recovery and states . . . . . . . . 169
MAP 5700: Repair verification . . . . . . . . 278
Cluster error codes . . . . . . . . . . 169
MAP 5800: Light path . . . . . . . . . . 279
SAN problem determination . . . . . . . . 205
Light path for SAN Volume Controller
| Fibre Channel and 10G Ethernet link failures . . . 206
2145-CG8 . . . . . . . . . . . . . . 280
Ethernet iSCSI host-link problems . . . . . . 206
Light path for SAN Volume Controller 2145-CF8 286
Fibre Channel over Ethernet host-link problems 207
Light path for SAN Volume Controller 2145-8A4 292
Servicing storage systems . . . . . . . . . 208
Light path for SAN Volume Controller 2145-8G4 294
Light path for SAN Volume Controller 2145-8F2
Chapter 8. Recovery procedures . . . 211 and SAN Volume Controller 2145-8F4 . . . . 298
Recover system procedure . . . . . . . . . 211 MAP 5900: Hardware boot . . . . . . . . . 302
When to run the recover system procedure . . 212 MAP 6000: Replace offline SSD . . . . . . . 308
Fix hardware errors . . . . . . . . . . 213 MAP 6001: Replace offline SSD in a RAID 0
Removing clustered-system information for array . . . . . . . . . . . . . . . 308
nodes with error code 550 or error code 578 MAP 6002: Replace offline SSD in RAID 1 array
using the front panel . . . . . . . . . . 213 or RAID 10 array . . . . . . . . . . . 310
Removing system information for nodes with
error code 550 or error code 578 using the
Appendix. Accessibility . . . . . . . 313
service assistant . . . . . . . . . . . 214
Performing recovery procedure for clustered
systems using the front panel . . . . . . . 214 Notices . . . . . . . . . . . . . . 315
Performing system recovery using the service Trademarks . . . . . . . . . . . . . . 317
assistant . . . . . . . . . . . . . . 216 Electronic emission notices . . . . . . . . . 317
Recovering from offline VDisks using the CLI 218 Federal Communications Commission (FCC)
What to check after running the system statement. . . . . . . . . . . . . . 317
recovery . . . . . . . . . . . . . . 218 Industry Canada compliance statement . . . . 318
Backing up and restoring the system configuration 219 Avis de conformité à la réglementation
Backing up the system configuration using the d'Industrie Canada . . . . . . . . . . 318
CLI. . . . . . . . . . . . . . . . 220 Australia and New Zealand Class A Statement 318
Restoring the system configuration . . . . . 222 European Union Electromagnetic Compatibility
Deleting backup configuration files using the Directive . . . . . . . . . . . . . . 318
CLI. . . . . . . . . . . . . . . . 225 Germany Electromagnetic compatibility
Performing the node rescue when the node boots 226 directive . . . . . . . . . . . . . . 319
Japan VCCI Council Class A statement . . . . 320

iv SAN Volume Controller: Troubleshooting Guide


People's Republic of China Class A Electronic Russia Electromagnetic Interference (EMI) Class
Emission Statement . . . . . . . . . . 320 A Statement . . . . . . . . . . . . . 321
International Electrotechnical Commission (IEC) Taiwan Class A compliance statement . . . . 321
statement. . . . . . . . . . . . . . 320 European Contact Information. . . . . . . 321
United Kingdom telecommunications Taiwan Contact Information . . . . . . . 321
requirements . . . . . . . . . . . . 320
Korean Communications Commission (KCC) Index . . . . . . . . . . . . . . . 323
Class A Statement . . . . . . . . . . . 320

Contents v
vi SAN Volume Controller: Troubleshooting Guide
Figures
1. SAN Volume Controller system in a fabric 2 32. SAN Volume Controller 2145-8G4 external
2. Data flow in a SAN Volume Controller system 3 connectors . . . . . . . . . . . . . 26
3. SAN Volume Controller nodes with internal 33. Power connector . . . . . . . . . . . 27
SSDs . . . . . . . . . . . . . . . 4 34. Service ports of the SAN Volume Controller
4. Configuration node . . . . . . . . . . 5 2145-8G4 . . . . . . . . . . . . . 27
5. SAN Volume Controller 2145-CG8 front panel 10 35. SAN Volume Controller 2145-8F4 rear-panel
6. SAN Volume Controller 2145-CF8 front panel 10 indicators . . . . . . . . . . . . . 28
7. SAN Volume Controller 2145-8A4 front-panel 36. SAN Volume Controller 2145-8F4 external
assembly . . . . . . . . . . . . . 11 connectors . . . . . . . . . . . . . 28
8. SAN Volume Controller 2145-8G4 front-panel 37. Power connector . . . . . . . . . . . 29
assembly . . . . . . . . . . . . . 11 38. Service ports of the SAN Volume Controller
9. SAN Volume Controller 2145-8F2 and SAN 2145-8F4 . . . . . . . . . . . . . 29
Volume Controller 2145-8F4 front-panel 39. Ports not used during normal operation by the
assembly . . . . . . . . . . . . . 12 SAN Volume Controller 2145-8F4 . . . . . 30
10. SAN Volume Controller 2145-CG8 or 2145-CF8 40. Ports not used on the front panel of the SAN
operator-information panel . . . . . . . 14 Volume Controller 2145-8F4 . . . . . . . 30
11. SAN Volume Controller 2145-CG8 or 2145-CF8 41. SAN Volume Controller 2145-8F2 rear-panel
operator-information panel . . . . . . . 15 indicators . . . . . . . . . . . . . 30
12. SAN Volume Controller 2145-8A4 42. SAN Volume Controller 2145-8F2 external
operator-information panel . . . . . . . 16 connectors . . . . . . . . . . . . . 31
13. SAN Volume Controller 2145-8G4 43. Power connector . . . . . . . . . . . 31
operator-information panel . . . . . . . 16 44. SAN Volume Controller 2145-CG8 or 2145-CF8
14. SAN Volume Controller 2145-8F2 and SAN ac, dc, and power-error LEDs . . . . . . 34
Volume Controller 2145-8F4 45. SAN Volume Controller 2145-8G4 ac and dc
operator-information panel . . . . . . . 17 LEDs. . . . . . . . . . . . . . . 34
15. SAN Volume Controller 2145-CG8 rear-panel 46. SAN Volume Controller 2145-8F4 and SAN
indicators . . . . . . . . . . . . . 19 Volume Controller 2145-8F2 ac and dc LEDs . 35
16. SAN Volume Controller 2145-CG8 rear-panel 47. Photo of the redundant ac-power switch 48
indicators for the 10 Gbps Ethernet feature . . 19 48. A four-node SAN Volume Controller system
17. Connectors on the rear of the SAN Volume with the redundant ac-power switch feature . 50
Controller 2145-CG8 . . . . . . . . . 20 49. 2145 UPS-1U front-panel assembly . . . . . 53
18. 10 Gbps Ethernet ports on the rear of the SAN 50. 2145 UPS-1U connectors and switches. . . . 55
Volume Controller 2145-CG8 . . . . . . . 20 51. 2145 UPS-1U dip switches . . . . . . . 55
19. Power connector . . . . . . . . . . . 21 52. Ports not used by the 2145 UPS-1U . . . . 55
20. Service ports of the SAN Volume Controller 53. Power connector . . . . . . . . . . . 56
2145-CG8 . . . . . . . . . . . . . 21 54. SAN Volume Controller front-panel assembly 97
21. SAN Volume Controller 2145-CG8 port not 55. Example of a boot progress display . . . . 97
used . . . . . . . . . . . . . . . 21 56. Example of an error code for a clustered
22. SAN Volume Controller 2145-CF8 rear-panel system . . . . . . . . . . . . . . 98
indicators . . . . . . . . . . . . . 22 57. Example of a node error code . . . . . . 98
23. Connectors on the rear of the SAN Volume 58. Node rescue display . . . . . . . . . 99
Controller 2145-CG8 or 2145-CF8 . . . . . 22 59. Validate WWNN? navigation . . . . . . 101
24. Power connector . . . . . . . . . . . 23 60. SAN Volume Controller options on the
25. Service ports of the SAN Volume Controller front-panel display. . . . . . . . . . 103
2145-CF8 . . . . . . . . . . . . . 23 61. Viewing the IPv6 address on the front-panel
26. SAN Volume Controller 2145-CF8 port not display. . . . . . . . . . . . . . 105
used . . . . . . . . . . . . . . . 24 62. Upper options of the actions menu on the
27. SAN Volume Controller 2145-8A4 rear-panel front panel . . . . . . . . . . . . 110
indicators . . . . . . . . . . . . . 24 63. Middle options of the actions menu on the
28. SAN Volume Controller 2145-8A4 external front panel . . . . . . . . . . . . 111
connectors . . . . . . . . . . . . . 25 64. Lower options of the actions menu on the
29. Power connector . . . . . . . . . . . 25 front panel . . . . . . . . . . . . 112
30. Service ports of the SAN Volume Controller 65. Language? navigation. . . . . . . . . 122
2145-8A4 . . . . . . . . . . . . . 25 66. Example of a boot error code . . . . . . 154
31. SAN Volume Controller 2145-8G4 rear-panel 67. Example of a boot progress display . . . . 155
indicators . . . . . . . . . . . . . 26 68. Example of a displayed node error code 155

© Copyright IBM Corp. 2003, 2012 vii


69. Example of a node-rescue error code 156 89. SAN Volume Controller 2145-CG8 or
70. Example of a create error code for a clustered 2145-CF8 operator-information panel. . . . 280
system . . . . . . . . . . . . . . 156 90. SAN Volume Controller 2145-CG8 or
71. Example of a recovery error code . . . . . 157 2145-CF8 light path diagnostics panel . . . 280
72. Example of an error code for a clustered 91. SAN Volume Controller 2145-CG8 system
system . . . . . . . . . . . . . . 157 board LEDs diagnostics panel . . . . . . 282
73. Node rescue display . . . . . . . . . 226 92. SAN Volume Controller 2145-CG8 or
74. SAN Volume Controller service controller 2145-CF8 operator-information panel. . . . 286
error light. . . . . . . . . . . . . 233 93. SAN Volume Controller 2145-CG8 or
75. Error LED on the SAN Volume Controller 2145-CF8 light path diagnostics panel . . . 286
models. . . . . . . . . . . . . . 234 94. SAN Volume Controller 2145-CF8 system
76. Hardware boot display . . . . . . . . 234 board LEDs diagnostics panel . . . . . . 288
77. Power LED on the SAN Volume Controller 95. SAN Volume Controller 2145-8A4
models 2145-CG8, 2145-CF8, 2145-8G4, and operator-information panel . . . . . . . 292
2145-8F4 or 2145-8F2 operator-information 96. SAN Volume Controller 2145-8A4 system
panel . . . . . . . . . . . . . . 239 board LEDs . . . . . . . . . . . . 293
78. Power LED on the SAN Volume Controller 97. SAN Volume Controller 2145-8G4
models 2145-8G4, 2145-8F4, and 2145-8F2 rear operator-information panel . . . . . . . 295
panel . . . . . . . . . . . . . . 241 98. SAN Volume Controller 2145-8G4 light path
79. Power LED indicator on the rear panel of the diagnostics panel . . . . . . . . . . 295
SAN Volume Controller 2145-CG8 or 99. SAN Volume Controller 2145-8G4 system
2145-CF8 . . . . . . . . . . . . . 241 board LEDs . . . . . . . . . . . . 296
80. SAN Volume Controller models 2145-8G4 and 100. SAN Volume Controller 2145-8F4
2145-8F4 or 2145-8F2 ac and dc LED operator-information panel . . . . . . . 299
indicators on the rear panel . . . . . . . 242 101. SAN Volume Controller 2145-8F2 and SAN
81. Power LED indicator and ac and dc Volume Controller 2145-8F4 light path
indicators on the rear panel of the SAN diagnostics panel . . . . . . . . . . 299
Volume Controller 2145-CG8 or 2145-CF8 . . 242 102. SAN Volume Controller 2145-8F2 and SAN
82. Power LED on the SAN Volume Controller Volume Controller 2145-8F4 system board
2145-8A4 operator-information panel. . . . 245 LEDs . . . . . . . . . . . . . . 300
83. SAN Volume Controller 2145-8A4 system 103. Hardware boot display . . . . . . . . 303
board LEDs . . . . . . . . . . . . 247 104. Node rescue display . . . . . . . . . 303
84. 2145 UPS-1U front-panel assembly . . . . 249 105. Keyboard and monitor ports on the SAN
85. Power control button on the SAN Volume Volume Controller models 2145-8G4,
Controller models . . . . . . . . . . 262 2145-8A4, 2145-8F4 and 2145-8F2 . . . . . 305
86. SAN Volume Controller service controller 106. Keyboard and monitor ports on the SAN
error light. . . . . . . . . . . . . 263 Volume Controller 2145-CF8 . . . . . . 305
87. Front-panel display when push buttons are 107. Keyboard and monitor ports on the SAN
pressed . . . . . . . . . . . . . 264 Volume Controller 2145-CG8 . . . . . . 305
88. Port 2 Ethernet link LED on the SAN Volume
Controller rear panel . . . . . . . . . 266

viii SAN Volume Controller: Troubleshooting Guide


Tables
1. Terminology mapping table for version 6.1.0 xiii 33. Fields that are specific to the node software 92
2. Terminology mapping table for version 6.2.0 xiii 34. Fields that are provided for the front panel
3. SAN Volume Controller library. . . . . . xiv assembly . . . . . . . . . . . . . 92
4. Other IBM publications . . . . . . . . xvi 35. Fields that are provided for the Ethernet port 92
5. IBM documentation and related websites xvii 36. Fields that are provided for the power supplies
6. SAN Volume Controller communications types 4 in the node . . . . . . . . . . . . 93
7. Link state and activity for the bottom Fibre 37. Fields that are provided for the uninterruptible
Channel LED . . . . . . . . . . . . 31 power supply assembly that is powering the
8. Link speed for the top Fibre Channel LED 32 node . . . . . . . . . . . . . . . 93
9. Actual link speeds . . . . . . . . . . 32 38. Fields that are provided for the SAS host bus
10. Actual link speeds . . . . . . . . . . 32 adapter (HBA) . . . . . . . . . . . 93
11. Maximum power consumption . . . . . . 36 39. Fields that are provided for the SAS solid-state
12. Physical specifications . . . . . . . . . 37 drive (SSD) . . . . . . . . . . . . 94
13. Environment requirements with redundant ac 40. Fields that are provided for the small form
power . . . . . . . . . . . . . . 37 factor pluggable (SFP) transceiver . . . . . 94
14. Dimensions and weight . . . . . . . . 38 41. Fields that are provided for the system
15. Additional space requirements . . . . . . 38 properties . . . . . . . . . . . . . 95
16. Maximum heat output of each SAN Volume 42. When options are available . . . . . . . 108
Controller 2145-CG8 node. . . . . . . . 38 43. Description of data fields for the event log 127
17. Maximum heat output of each 2145 UPS-1U 38 44. Notification types . . . . . . . . . . 128
18. SAN Volume Controller 2145-CG8 FRU 45. SAN Volume Controller notification types and
descriptions . . . . . . . . . . . . 57 corresponding syslog level codes . . . . . 129
19. SAN Volume Controller 2145-CF8 FRU 46. SAN Volume Controller values of
descriptions . . . . . . . . . . . . 58 user-defined message origin identifiers and
20. Ethernet feature FRU descriptions . . . . . 60 syslog facility codes . . . . . . . . . 130
21. Solid-state drive (SSD) feature FRU 47. Informational events . . . . . . . . . 132
descriptions . . . . . . . . . . . . 60 48. Configuration event IDs . . . . . . . . 136
22. 2145 UPS-1U FRU descriptions . . . . . . 60 49. SCSI status . . . . . . . . . . . . 140
23. SAN Volume Controller 2145-8A4 FRU 50. SCSI sense keys, codes, and qualifiers 141
descriptions . . . . . . . . . . . . 61 51. Reason codes . . . . . . . . . . . 142
24. SAN Volume Controller 2145-8G4 FRU 52. Object types . . . . . . . . . . . . 142
descriptions . . . . . . . . . . . . 62 53. Error event IDs and error codes . . . . . 143
25. SAN Volume Controller 2145-8F4 FRU 54. Message classification number range 157
descriptions . . . . . . . . . . . . 63 55. Bad block errors . . . . . . . . . . 229
26. SAN Volume Controller 2145-8F2 FRU 56. 2145 UPS-1U error indicators . . . . . . 249
descriptions . . . . . . . . . . . . 64 57. SAN Volume Controller Fibre Channel
27. Fields for the system board . . . . . . . 90 adapter assemblies . . . . . . . . . . 276
28. Fields for the processors . . . . . . . . 90 58. SAN Volume Controller Fibre Channel
29. Fields for the fans . . . . . . . . . . 91 adapter connection hardware . . . . . . 277
30. Fields that are repeated for each installed 59. Diagnostics panel LED prescribed actions 283
memory module . . . . . . . . . . . 91 60. Diagnostics panel LED prescribed actions 289
31. Fields that are repeated for each adapter that 61. SAN Volume Controller 2145-8A4 diagnostics
is installed . . . . . . . . . . . . . 91 panel LED prescribed actions . . . . . . 294
32. Fields that are repeated for each SCSI, IDE, 62. Diagnostics panel LED prescribed actions 297
SATA, and SAS device that is installed . . . 92 63. Diagnostics panel LED prescribed actions 301

© Copyright IBM Corp. 2003, 2012 ix


x SAN Volume Controller: Troubleshooting Guide
About this guide
This guide describes how to service the IBM® System Storage® SAN Volume
Controller.

The chapters that follow introduce you to the SAN Volume Controller, the
redundant ac-power switch, and the uninterruptible power supply. They describe
how you can configure and check the status of one SAN Volume Controller node
or a clustered system of nodes through the front panel or with the management
GUI.

The vital product data (VPD) chapter provides information about the VPD that
uniquely defines each hardware and microcode element that is in the SAN Volume
Controller. You can also learn how to diagnose problems using the SAN Volume
Controller.

The maintenance analysis procedures (MAPs) can help you analyze failures that
occur in a SAN Volume Controller. With the MAPs, you can isolate the
field-replaceable units (FRUs) of the SAN Volume Controller that fail. Begin all
problem determination and repair procedures from “MAP 5000: Start” on page 231.

Who should use this guide


This guide is intended for system administrators or systems services
representatives who use and diagnose problems with the SAN Volume Controller,
the redundant ac-power switch, and the uninterruptible power supply.

| Summary of changes for GC27-2284-03 SAN Volume Controller


| Troubleshooting Guide
| The summary of changes provides a list of new and changed information since the
| last version of the guide.

| New information

| This topic describes the changes to this guide since the previous edition,
| GC27-2284-02. The following sections summarize the changes that have since been
| implemented from the previous version.

| This version includes the following new information:


| v Fibre Channel over Ethernet

| Changed information

| This version does not include any changed information.

Summary of changes for GC27-2284-02 SAN Volume Controller


Troubleshooting Guide
The summary of changes provides a list of new and changed information since the
last version of the guide.

© Copyright IBM Corp. 2003, 2012 xi


New information

This topic describes the changes to this guide since the previous edition,
GC27-2284-01. The following sections summarize the changes that have since been
implemented from the previous version.

This version includes the following new information:


v Information about understanding medium errors and bad blocks.
v New error codes
v New event IDs

Changed information

This version includes the following changed information:


v “MAP 5050: Power 2145-CG8, 2145-CF8, 2145-8G4, 2145-8F4, and 2145-8F2” on
page 238
v “MAP 6001: Replace offline SSD in a RAID 0 array” on page 308

Summary of changes for GC27-2284-01 SAN Volume Controller


Troubleshooting Guide
The summary of changes provides a list of new and changed information since the
last version of the guide.

New information

This topic describes the changes to this guide since the previous edition,
GC27-2284-00. The following sections summarize the changes that have since been
implemented from the previous version.

This version includes the following new information:


v Support statements for the SAN Volume Controller 2145-CG8 node
v New error codes
v New event IDs
v Support statements for 10 Gbps Ethernet
v MAP 5550: 10 Gbps Ethernet
v MAP 6001: Replace offline SSD in a RAID 0 array
v MAP 6002: Replace offline SSD in RAID 1 array or RAID 10 array

Changed information

This version includes the following changed information:


v MAP 6000: Replace offline SSD
v Terminology changes:
To coincide with new and existing IBM products and functions, several common
terms have changed and are incorporated in the SAN Volume Controller
information. Certain SAN Volume Controller information, particularly
command-line interface (CLI) documentation, remains primarily unchanged.
The following table shows the current and previous use of the changed common
terms for version 6.1.0.

xii SAN Volume Controller: Troubleshooting Guide


Table 1. Terminology mapping table for version 6.1.0
Previous SAN
6.1.0 SAN Volume Volume Controller
Controller term term Description
event error An occurrence of significance to a task or
system. Events can include completion or
failure of an operation, a user action, or the
change in state of a process.
host mapping VDisk-to-host The process of controlling which hosts have
mapping access to specific volumes within a clustered
system.
storage pool managed disk A collection of storage capacity that provides
(MDisk) group the capacity requirements for a volume.
thin provisioning (or space-efficient The ability to define a storage unit (full
thin-provisioned) system, storage pool, volume) with a logical
capacity size that is larger than the physical
capacity assigned to that storage unit.
volume virtual disk (VDisk) A discrete unit of storage on disk, tape, or
other data recording medium that supports
some form of identifier and parameter list,
such as a volume label or input/output
control.

The following table shows the current and previous use of the changed common
terms for version 6.2.0.
Table 2. Terminology mapping table for version 6.2.0
Previous SAN
6.2.0 SAN Volume Volume Controller
Controller term term Description
clustered system or cluster A collection of nodes that are placed in pairs
system (I/O groups) for redundancy, which provide
a single management interface.

v Use of svctask and svcinfo command prefixes.


The svctask and svcinfo command prefixes are no longer necessary when
issuing a command. If you have existing scripts that use those prefixes, they will
continue to function. You do not need to change the scripts.
The satask and sainfo command prefixes are still required.

Emphasis
Different typefaces are used in this guide to show emphasis.

The following typefaces are used to show emphasis:

Boldface Text in boldface represents menu items.


Bold monospace Text in bold monospace represents command
names.
Italics Text in italics is used to emphasize a word.
In command syntax, it is used for variables
for which you supply actual values, such as
a default directory or the name of a system.

About this guide xiii


Monospace Text in monospace identifies the data or
commands that you type, samples of
command output, examples of program code
or messages from the system, or names of
command flags, parameters, arguments, and
name-value pairs.

SAN Volume Controller library and related publications


Product manuals, other publications, and websites contain information that relates
to SAN Volume Controller.

SAN Volume Controller Information Center

The IBM System Storage SAN Volume Controller Information Center contains all of
the information that is required to install, configure, and manage the SAN Volume
Controller. The information center is updated between SAN Volume Controller
product releases to provide the most current documentation. The information
center is available at the following website:

publib.boulder.ibm.com/infocenter/svc/ic/index.jsp

SAN Volume Controller library

Unless otherwise noted, the publications in the SAN Volume Controller library are
available in Adobe portable document format (PDF) from the following website:

www.ibm.com/storage/support/2145

Each of the PDF publications in Table 3 is available in this information center by


clicking the number in the “Order number” column:
Table 3. SAN Volume Controller library
Title Description Order number
IBM System Storage SAN This guide provides the GC27-3923
Volume Controller Model instructions that the IBM
2145-CG8 Hardware service representative uses to
Installation Guide install the hardware for SAN
Volume Controller model
2145-CG8.
IBM System Storage SAN This guide provides the GC27-2283
Volume Controller Hardware instructions that the IBM
Maintenance Guide service representative uses to
service the SAN Volume
Controller hardware,
including the removal and
replacement of parts.

xiv SAN Volume Controller: Troubleshooting Guide


Table 3. SAN Volume Controller library (continued)
Title Description Order number
IBM System Storage SAN This guide describes the GC27-2284
Volume Controller features of each SAN Volume
Troubleshooting Guide Controller model, explains
how to use the front panel,
and provides maintenance
analysis procedures to help
you diagnose and solve
problems with the SAN
Volume Controller.
IBM System Storage SAN This guide provides GC27-2286
Volume Controller Software guidelines for configuring
Installation and Configuration your SAN Volume Controller.
Guide Instructions for backing up
and restoring the cluster
configuration, using and
upgrading the management
GUI, using the CLI,
upgrading the SAN Volume
Controller software, and
replacing or adding nodes to
a cluster are included.
IBM System Storage SAN This guide describes the GC27-2288
Volume Controller CIM Agent concepts of the Common
Developer's Guide Information Model (CIM)
environment. Procedures
describe such tasks as using
the CIM agent object class
instances to complete basic
storage configuration tasks,
establishing new Copy
Services relationships, and
performing CIM agent
maintenance and diagnostic
tasks.
IBM System Storage SAN This guide contains GA32-0844
Volume Controller Safety translated caution and
Notices danger statements. Each
caution and danger
statement in the SAN
Volume Controller
documentation has a number
that you can use to locate the
corresponding statement in
your language in the IBM
System Storage SAN Volume
Controller Safety Notices
document.
IBM System Storage SAN This document introduces GA32-0843
Volume Controller Read First the major components of the
Flyer SAN Volume Controller
system and describes how to
get started installing the
hardware and software.

About this guide xv


Table 3. SAN Volume Controller library (continued)
Title Description Order number
IBM System Storage SAN This guide describes the GC27-2287
Volume Controller and IBM commands that you can use
Storwize® V7000 from the SAN Volume
Command-Line Interface User's Controller command-line
Guide interface (CLI).
IBM Statement of Limited This multilingual document Part number: 85Y5978
Warranty (2145 and 2076) provides information about
the IBM warranty for
machine types 2145 and
2076.
IBM License Agreement for This multilingual guide SC28-6872 (contains
Machine Code contains the License Z125-5468)
Agreement for Machine
Code for the SAN Volume
Controller product.

Other IBM publications

Table 4 lists IBM publications that contain information related to the SAN Volume
Controller.
Table 4. Other IBM publications
Title Description Order number
IBM System Storage This guide introduces the IBM SC23-8824
Productivity Center System Storage Productivity
Introduction and Planning Center hardware and software.
Guide
Read This First: Installing the This guide describes how to GI11-8938
IBM System Storage install the IBM System Storage
Productivity Center Productivity Center hardware.
IBM System Storage This guide describes how to SC27-2336
Productivity Center User's configure the IBM System
Guide Storage Productivity Center
software.
IBM System Storage Multipath This guide describes the IBM GC52-1309
Subsystem Device Driver System Storage Multipath
User's Guide Subsystem Device Driver for IBM
System Storage products and
how to use it with the SAN
Volume Controller.
IBM Storage Management This guide describes how to GC27-3909
Pack for Microsoft System install, configure, and use the
Center Operations Manager IBM Storage Management Pack publibfp.dhe.ibm.com/
User Guide for Microsoft System Center epubs/pdf/c2739092.pdf
Operations Manager (SCOM).

xvi SAN Volume Controller: Troubleshooting Guide


Table 4. Other IBM publications (continued)
Title Description Order number
IBM Storage Management This publication describes how to GA32-0929
Console for VMware vCenter, install, configure, and use the
version 3.0.0, User Guide IBM Storage Management publibfp.dhe.ibm.com/
Console for VMware vCenter, epubs/pdf/a3209295.pdf
which enables SAN Volume
Controller and other IBM storage
systems to be integrated in
VMware vCenter environments.

IBM documentation and related websites

Table 5 lists websites that provide publications and other information about the
SAN Volume Controller or related products or technologies.
Table 5. IBM documentation and related websites
Website Address
Support for SAN Volume Controller www.ibm.com/storage/support/2145
(2145)
Support for IBM System Storage www.ibm.com/storage/support/
and IBM TotalStorage products
IBM Publications Center www.ibm.com/e-business/linkweb/publications/
servlet/pbi.wss
IBM Redbooks® publications www.redbooks.ibm.com/

Related accessibility information

To view a PDF file, you need Adobe Acrobat Reader, which can be downloaded
from the Adobe website:

www.adobe.com/support/downloads/main.html

How to order IBM publications


The IBM Publications Center is a worldwide central repository for IBM product
publications and marketing material.

The IBM Publications Center offers customized search functions to help you find
the publications that you need. Some publications are available for you to view or
download at no charge. You can also order publications. The publications center
displays prices in your local currency. You can access the IBM Publications Center
through the following website:

www.ibm.com/e-business/linkweb/publications/servlet/pbi.wss

Sending your comments


Your feedback is important in helping to provide the most accurate and highest
quality information.

To submit any comments about this book or any other SAN Volume Controller
documentation:

About this guide xvii


v Go to the feedback page on the website for the SAN Volume Controller
Information Center at publib.boulder.ibm.com/infocenter/svc/ic/
index.jsp?topic=/com.ibm.storage.svc.console.doc/feedback.htm. There you can
use the feedback page to enter and submit comments or browse to the topic and
use the feedback link in the running footer of that page to identify the topic for
which you have a comment.
v Send your comments by email to [email protected]. Include the following
information for this publication or use suitable replacements for the publication
title and form number for the publication on which you are commenting:
– Publication title: IBM System Storage SAN Volume Controller Troubleshooting
Guide
– Publication form number: GC27-2284-02
– Page, table, or illustration numbers that you are commenting on
– A detailed description of any information that should be changed

xviii SAN Volume Controller: Troubleshooting Guide


Chapter 1. SAN Volume Controller overview
The SAN Volume Controller combines software and hardware into a
comprehensive, modular appliance that uses symmetric virtualization.

Symmetric virtualization is achieved by creating a pool of managed disks (MDisks)


from the attached storage systems. Those storage systems are then mapped to a set
of volumes for use by attached host systems. System administrators can view and
access a common pool of storage on the storage area network (SAN). This
functionality helps administrators to use storage resources more efficiently and
provides a common base for advanced functions.

A SAN is a high-speed Fibre Channel network that connects host systems and
storage devices. In a SAN, a host system can be connected to a storage device
across the network. The connections are made through units such as routers and
switches. The area of the network that contains these units is known as the fabric of
the network.

SAN Volume Controller software

The SAN Volume Controller software performs the following functions for the host
systems that attach to SAN Volume Controller:
v Creates a single pool of storage
v Provides logical unit virtualization
v Manages logical volumes
v Mirrors logical volumes

The SAN Volume Controller system also provides the following functions:
v Large scalable cache
v Copy Services
– IBM FlashCopy® (point-in-time copy) function, including thin-provisioned
FlashCopy to make multiple targets affordable
– Metro Mirror (synchronous copy)
– Global Mirror (asynchronous copy)
– Data migration
v Space management
– IBM System Storage Easy Tier® to migrate the most frequently used data to
higher performing storage
– Metering of service quality when combined with IBM Tivoli® Storage
Productivity Center
– Thin-provisioned logical volumes
– Compressed volumes to consolidate storage

Figure 1 on page 2 shows hosts, SAN Volume Controller nodes, and RAID storage
systems connected to a SAN fabric. The redundant SAN fabric comprises a
fault-tolerant arrangement of two or more counterpart SANs that provide alternate
paths for each SAN-attached device.

© Copyright IBM Corp. 2003, 2012 1


Host Host Host Host

Host zone

Node
Redundant
SAN fabric
Node

Node

RAID RAID
storage system storage system

svc00600
Storage system zone

Figure 1. SAN Volume Controller system in a fabric

Volumes

A system of SAN Volume Controller nodes presents volumes to the hosts. Most of
the advanced functions that SAN Volume Controller provides are defined on
volumes. These volumes are created from managed disks (MDisks) that are
presented by the RAID storage systems. All data transfer occurs through the SAN
Volume Controller nodes, which is described as symmetric virtualization.

Figure 2 shows the data flow across the fabric.

2 SAN Volume Controller: Troubleshooting Guide


Host Host Host Host

Hosts send I/O


to volumes.

Node
Redundant
SAN fabric

Node

I/O is sent to
managed disks.

RAID RAID
storage system storage system

svc00601
Data transfer

Figure 2. Data flow in a SAN Volume Controller system

The nodes in a system are arranged into pairs known as I/O groups. A single pair is
responsible for serving I/O on a given volume. Because a volume is served by two
nodes, there is no loss of availability if one node fails or is taken offline.

System management

The SAN Volume Controller nodes in a clustered system operate as a single system
and present a single point of control for system management and service. System
management and error reporting are provided through an Ethernet interface to one
of the nodes in the system, which is called the configuration node. The configuration
node runs a web server and provides a command-line interface (CLI). The
configuration node is a role that any node can take. If the current configuration
node fails, a new configuration node is selected from the remaining nodes. Each
node also provides a command-line interface and web interface for performing
hardware service actions.

Fabric types

I/O operations between hosts and SAN Volume Controller nodes and between
SAN Volume Controller nodes and RAID storage systems are performed by using
the SCSI standard. The SAN Volume Controller nodes communicate with each
other by using private SCSI commands.

| FCoE connectivity is supported on SAN Volume Controller node model 2145-CG8


| only, after the system software has been upgraded to version 6.4.

Table 6 on page 4 shows the fabric type that can be used for communicating
between hosts, nodes, and RAID storage systems. These fabric types can be used at
the same time.

Chapter 1. SAN Volume Controller overview 3


Table 6. SAN Volume Controller communications types
SAN Volume SAN Volume
Communications Host to SAN Volume Controller to storage Controller to SAN
type Controller system Volume Controller
Fibre Channel SAN Yes Yes Yes
iSCSI (1 Gbps Yes No No
Ethernet or 10 Gbps
Ethernet)
| Fibre Channel Over Yes Yes Yes
| Ethernet SAN (10
| Gbps Ethernet)

Solid-state drives

Some SAN Volume Controller nodes contain solid-state drives (SSDs). These
internal SSDs can be used to create RAID-managed disks (MDisks) that in turn can
be used to create volumes. SSDs provide host servers with a pool of
high-performance storage for critical applications.

Figure 3 shows this configuration. Internal SSD MDisks can also be placed in a
storage pool with MDisks from regular RAID storage systems, and IBM System
Storage Easy Tier performs automatic data placement within that storage pool by
moving high-activity data onto better performing storage.

Hosts send I/O Host Host Host Host


to volumes, which
are mapped to internal
solid-state drives.

Node
with SSDs Redundant
SAN fabric
svc00602

Figure 3. SAN Volume Controller nodes with internal SSDs

SAN Volume Controller hardware

Each SAN Volume Controller node is an individual server in a SAN Volume


Controller clustered system on which the SAN Volume Controller software runs.

The nodes are always installed in pairs, with a minimum of one and a maximum
of four pairs of nodes constituting a system. Each pair of nodes is known as an I/O
group. All I/O operations that are managed by the nodes in an I/O group are
cached on both nodes.

I/O groups take the storage that is presented to the SAN by the storage systems as
MDisks and translates the storage into logical disks (volumes) that are used by

4 SAN Volume Controller: Troubleshooting Guide


applications on the hosts. A node is in only one I/O group and provides access to
the volumes in that I/O group.

Clustered systems
All your configuration, monitoring, and service tasks are performed at the
clustered-system level. Therefore, after configuring your system, you can take
advantage of the virtualization and the advanced features of the SAN Volume
Controller system.

A system can consist of between two to eight SAN Volume Controller nodes.

All configuration settings are replicated across all nodes in the system. Because
configuration is performed at the system level, management IP addresses are
assigned to the system. Each interface accesses the system remotely through the
Ethernet system-management address.

Configuration node
A configuration node is a single node that manages configuration activity of the
system.

If the configuration node fails, the system chooses a new configuration node. This
action is called configuration node failover. The new configuration node takes over
the management IP addresses. Thus you can access the system through the same
IP addresses although the original configuration node has failed. During the
failover, there is a short period when you cannot use the command-line tools or
management GUI.

Figure 4 shows an example clustered system that contains four nodes. Node 1 has
been designated the configuration node. User requests (1) are handled by node 1.

Node 1 Node 2 Node 3 Node 4

1 Configuration
Node
IP Interface

Figure 4. Configuration node

Configuration node addressing


At any given time, only one node within a SAN Volume Controller clustered
system is assigned an IP addresses.

An IP address for the clustered system must be assigned to Ethernet port 1. An IP


address can also be assigned to Ethernet port 2. These are the only ports that can
be assigned management IP addresses.

This node then acts as the focal point for all configuration and other requests that
are made from the management GUI application or the CLI. This node is known as
the configuration node.

Chapter 1. SAN Volume Controller overview 5


If the configuration node is stopped or fails, the remaining nodes in the system
determine which node will take on the role of configuration node. The new
configuration node binds the management IP addresses to its Ethernet ports. It
broadcasts this new mapping so that connections to the system configuration
interface can be resumed.

The new configuration node broadcasts the new IP address mapping using the
Address Resolution Protocol (ARP). You must configure some switches to forward
the ARP packet on to other devices on the subnetwork. Ensure that all Ethernet
devices are configured to pass on unsolicited ARP packets. Otherwise, if the ARP
packet is not forwarded, a device loses its connection to the SAN Volume
Controller system.

If a device loses its connection to the SAN Volume Controller system, it can
regenerate the address quickly if the device is on the same subnetwork as the
system. However, if the device is not on the same subnetwork, it might take hours
for the address resolution cache of the gateway to refresh. In this case, you can
restore the connection by establishing a command line connection to the system
from a terminal that is on the same subnetwork, and then by starting a secure copy
to the device that has lost its connection.

Management IP failover
If the configuration node fails, the IP addresses for the clustered system are
transferred to a new node. The system services are used to manage the transfer of
the management IP addresses from the failed configuration node to the new
configuration node.

The following changes are performed by the system service:


v If software on the failed configuration node is still operational, the software
shuts down the management IP interfaces. If the software cannot shut down the
management IP interfaces, the hardware service forces the node to shut down.
v When the management IP interfaces shut down, all remaining nodes choose a
new node to host the configuration interfaces.
v The new configuration node initializes the configuration daemons, including
sshd and httpd, and then binds the management IP interfaces to its Ethernet
ports.
v The router is configured as the default gateway for the new configuration node.
v The routing tables are established on the new configuration node for the
management IP addresses. The new configuration node sends five unsolicited
address resolution protocol (ARP) packets for each IP address to the local subnet
broadcast address. The ARP packets contain the management IP and the Media
Access Control (MAC) address for the new configuration node. All systems that
receive ARP packets are forced to update their ARP tables. After the ARP tables
are updated, these systems can connect to the new configuration node.

Note: Some Ethernet devices might not forward ARP packets. If the ARP
packets are not forwarded, connectivity to the new configuration node cannot be
established automatically. To avoid this problem, configure all Ethernet devices
to pass unsolicited ARP packets. You can restore lost connectivity by logging in
to the SAN Volume Controller and starting a secure copy to the affected system.
Starting a secure copy forces an update to the ARP cache for all systems
connected to the same switch as the affected system.

6 SAN Volume Controller: Troubleshooting Guide


Ethernet link failures

If the Ethernet link to the SAN Volume Controller system fails because of an event
unrelated to the SAN Volume Controller, such as a cable being disconnected or an
Ethernet router failure, the SAN Volume Controller does not attempt to fail over
the configuration node to restore management IP access. SAN Volume Controller
provides the option for two Ethernet ports, each with its own management IP
address, to protect against this type of failure. If you cannot connect through one
IP address, attempt to access the system through the alternate IP address.

Note: IP addresses that are used by hosts to access the system over an Ethernet
connection are different from management IP addresses.

Routing considerations for event notification and Network Time


Protocol

SAN Volume Controller supports the following protocols that make outbound
connections from the system:
v Email
v Simple Network Mail Protocol (SNMP)
v Syslog
v Network Time Protocol (NTP)
These protocols operate only on a port configured with a management IP address.
When making outbound connections, the SAN Volume Controller uses the
following routing decisions:
v If the destination IP address is in the same subnet as one of the management IP
addresses, the SAN Volume Controller system sends the packet immediately.
v If the destination IP address is not in the same subnet as either of the
management IP addresses, the system sends the packet to the default gateway
for Ethernet port 1.
v If the destination IP address is not in the same subnet as either of the
management IP addresses and Ethernet port 1 is not connected to the Ethernet
network, the system sends the packet to the default gateway for Ethernet port 2.

When configuring any of these protocols for event notifications, use these routing
decisions to ensure that error notification works correctly in the event of a network
failure.

SAN fabric overview


The SAN fabric is an area of the network that contains routers and switches. A SAN
is configured into a number of zones. A device using the SAN can communicate
only with devices that are included in the same zones that it is in. A SAN Volume
Controller clustered system requires several distinct types of zones: a system zone,
host zones, and disk zones. The intersystem zone is optional.

In the host zone, the host systems can identify and address the SAN Volume
Controller nodes. You can have more than one host zone and more than one disk
zone. Unless you are using a dual-core fabric design, the system zone contains all
ports from all SAN Volume Controller nodes in the system. Create one zone for
each host Fibre Channel port. In a disk zone, the SAN Volume Controller nodes
identify the storage systems. Generally, create one zone for each external storage

Chapter 1. SAN Volume Controller overview 7


system. If you are using the Metro Mirror and Global Mirror feature, create a zone
with at least one port from each node in each system; up to four systems are
supported.

Note: Some operating systems cannot tolerate other operating systems in the same
host zone, although you might have more than one host type in the SAN fabric.
For example, you can have a SAN that contains one host that runs on an IBM AIX®
operating system and another host that runs on a Microsoft Windows operating
system.

All communication between SAN Volume Controller nodes is performed through


the SAN. All SAN Volume Controller configuration and service commands are sent
to the system through an Ethernet network.

8 SAN Volume Controller: Troubleshooting Guide


Chapter 2. Introducing the SAN Volume Controller hardware
components
A SAN Volume Controller system consists of SAN Volume Controller nodes and
related hardware components, such as uninterruptible power supply units and the
optional redundant ac-power switches. Note that nodes and uninterruptible power
supply units are installed in pairs.

SAN Volume Controller nodes


SAN Volume Controller supports several different node types.

The following nodes are supported:


v The SAN Volume Controller 2145-CG8 node is available for purchase. The
following features can be purchased for use with the 2145-CG8:
– A high-speed SAS adapter with up to four solid-state drives (SSDs)
– A two-port 10 Gbps Ethernet adapter
v The following nodes are no longer available for purchase but remain supported:
– SAN Volume Controller 2145-CF8
– SAN Volume Controller 2145-8A4
– SAN Volume Controller 2145-8G4
– SAN Volume Controller 2145-8F4
– SAN Volume Controller 2145-8F2

A label on the front of the node indicates the SAN Volume Controller node type,
hardware revision (if appropriate), and serial number.

SAN Volume Controller front panel controls and indicators


The controls and indicators are used for power and navigation and to indicate
information such as system activity, service and configuration options, service
controller failures, and node identification.

SAN Volume Controller 2145-CG8 controls and indicators


The controls and indicators are used for power and navigation and to indicate
information such as system activity, service and configuration options, service
controller failures, and node identification.

Figure 5 on page 10 shows the controls and indicators on the front panel of the
SAN Volume Controller 2145-CG8.

© Copyright IBM Corp. 2003, 2012 9


1 2 3 4

6 5

svc00717
1 2

3 4

Figure 5. SAN Volume Controller 2145-CG8 front panel

1 Node-status LED


2 Front-panel display
3 Navigation buttons
4 Operator-information panel
5 Select button
6 Error LED

SAN Volume Controller 2145-CF8 controls and indicators


The controls and indicators are used for power and navigation and to indicate
information such as system activity, service and configuration options, service
controller failures, and node identification.

Figure 6 shows the controls and indicators on the front panel of the SAN Volume
Controller 2145-CF8.

1 2 3 4

6 5
svc00541c

1 2

3 4

Figure 6. SAN Volume Controller 2145-CF8 front panel

1 Node-status LED


2 Front-panel display
3 Navigation buttons
4 Operator-information panel
5 Select button
6 Error LED

SAN Volume Controller 2145-8A4 controls and indicators


The controls and indicators are used for power and navigation and to indicate
information such as system activity, service and configuration options, service
controller failures, and node identification.

10 SAN Volume Controller: Troubleshooting Guide


Figure 7 shows the controls and indicators on the front panel of the SAN Volume
Controller 2145-8A4.

1 2 3 4

8 7 6 5

svc00438
Figure 7. SAN Volume Controller 2145-8A4 front-panel assembly

1 Operator-information panel


2 Node status LED
3 Front-panel display
4 Navigation buttons
5 Serial number label
6 Select button
7 Node identification label
8 Error LED

SAN Volume Controller 2145-8G4 controls and indicators


The controls and indicators are used for power and navigation and to indicate
information such as system activity, service and configuration options, service
controller failures, and node identification.

Figure 8 shows the controls and indicators on the front panel of the SAN Volume
Controller 2145-8G4.

1 2 3 5

8 7 6 4
svc00216

Figure 8. SAN Volume Controller 2145-8G4 front-panel assembly

1 Node status LED


2 Front panel display
3 Navigation buttons
4 Serial number label
5 Operator information panel
6 Select button

Chapter 2. Introducing the SAN Volume Controller hardware components 11


7 Node identification label
8 Error LED

SAN Volume Controller 2145-8F4 and SAN Volume Controller


2145-8F2 controls and indicators
The controls and indicators are used for power and navigation and to indicate
information such as system activity, service and configuration options, service
controller failures, and node identification.

Figure 9 shows the controls and indicators on the front panel of the SAN Volume
Controller 2145-8F4 and SAN Volume Controller 2145-8F2.

1 2 3 4 5

8 7 6

svc00075
Figure 9. SAN Volume Controller 2145-8F2 and SAN Volume Controller 2145-8F4 front-panel
assembly

1 Node status LED


2 Front-panel display
3 Navigation buttons
4 Serial number label
5 Operator-information panel
6 Select button
7 Node identification label
8 Error LED

Node status LED


System activity is indicated through the green node-status LED.

The node status LED provides the following system activity indicators:
Off The node is not operating as a member of a system.
On The node is operating as a member of a system.
Flashing
The node is dumping cache and state data to the local disk in anticipation
of a system reboot from a pending power-off action or other controlled
restart sequence.

Front-panel display
The front-panel display shows service, configuration, and navigation information.

12 SAN Volume Controller: Troubleshooting Guide


You can select the language that is displayed on the front panel. The display can
show both alphanumeric information and graphical information (progress bars).

The front-panel display shows configuration and service information about the
node and the system, including the following items:
v Boot progress indicator
v Boot failed
v Charging
v Hardware boot
v Node rescue request
v Power failure
v Powering off
v Recovering
v Restarting
v Shutting down
v Error codes
v Validate WWNN?

Navigation buttons
You can use the navigation buttons to move through menus.

There are four navigational buttons that you can use to move throughout a menu:
up, down, right, and left.

Each button corresponds to the direction that you can move in a menu. For
example, to move right in a menu, press the navigation button that is located on
the right side. If you want to move down in a menu, press the navigation button
that is located on the bottom.

Note: The select button is used in tandem with the navigation buttons.

Product serial number


The node contains a SAN Volume Controller product serial number that is written
to the system board hardware. The product serial number is also printed on the
serial number label which is located on the front panel.

This number is used for warranty and service entitlement checking and is included
in the data sent with error reports. It is essential that this number is not changed
during the life of the product. If the system board is replaced, you must follow the
system board replacement instructions carefully and rewrite the serial number on
the system board.

Select button
Use the select button to select an item from a menu.

The select button and navigation buttons help you to navigate and select menu
and boot options, and start a service panel test. The select button is located on the
front panel of the SAN Volume Controller, near the navigation buttons.

Node identification label


The node identification label on the front panel displays a six-digit node
identification number. Sometimes this number is called the panel name or front
panel ID.

Chapter 2. Introducing the SAN Volume Controller hardware components 13


The node identification label is the six-digit number that is input to the addnode
command. It is readable by system software and is used by configuration and
service software as a node identifier. The node identification number can also be
displayed on the front-panel display when node is selected from the menu.

If the service controller assembly front panel is replaced, the configuration and
service software displays the number that is printed on the front of the
replacement panel. Future error reports contain the new number. No system
reconfiguration is necessary when the front panel is replaced.

Error LED
Critical faults on the service controller are indicated through the amber error LED.

The error LED has the following two states:


OFF The service controller is functioning correctly.
ON A critical service-controller failure was detected and you must replace the
service controller.
The error LED can light temporarily when the node is powered on. If the
error LED is on, but the front panel display is completely blank, wait five
minutes to allow the LED time to turn off before performing any service
action.

SAN Volume Controller operator-information panel


The operator-information panel is located on the front panel of the SAN Volume
Controller.

SAN Volume Controller 2145-CG8 operator-information panel


The operator-information panel contains buttons and indicators such as the
power-control button, and LEDs that indicate information such as system-board
errors, hard-drive activity, and power status.

Figure 10 shows the operator-information panel for the SAN Volume Controller
2145-CG8.

1 2 3 4 5

1 2
svc00722

8 7 6

Figure 10. SAN Volume Controller 2145-CG8 or 2145-CF8 operator-information panel

1 Power-button cover


2 Ethernet 1 activity LED. The operator-information panel LEDs refer to the
Ethernet ports that are mounted on the system board.
3 Ethernet 2 activity LED. The operator-information panel LEDs refer to the
Ethernet ports that are mounted on the system board.

14 SAN Volume Controller: Troubleshooting Guide


4 System-information LED
5 System-error LED
6 Release latch
7 Locator button and LED
8 Power button and LED

Note: If you install the 10 Gbps Ethernet feature, the port activity is not reflected
on the activity LEDs.

SAN Volume Controller 2145-CF8 operator-information panel


The operator-information panel contains buttons and indicators such as the
power-control button, and LEDs that indicate information such as system-board
errors, hard-drive activity, and power status.

Figure 11 shows the operator-information panel for the SAN Volume Controller
2145-CF8.

1 2 3 4 5

svc_bb1gs008
2 1

4 3

10 9 8 7 6

Figure 11. SAN Volume Controller 2145-CG8 or 2145-CF8 operator-information panel

1 Power-button cover


2 Ethernet 2 activity LED
3 Ethernet 1 activity LED
4 System-information LED
5 System-error LED
6 Release latch
7 Locator button and LED
8 Not used
9 Not used
10 Power button and LED

SAN Volume Controller 2145-8A4 operator-information panel


The operator-information panel contains buttons and indicators such as the
power-control button, and LEDs that indicate information such as system-board
errors, hard-drive activity, and power status.

Figure 12 on page 16 shows the operator-information panel for the SAN Volume
Controller 2145-8A4.

Chapter 2. Introducing the SAN Volume Controller hardware components 15


svc00452
6 5 4 3 2 1

Figure 12. SAN Volume Controller 2145-8A4 operator-information panel

1 System-error LED (amber)


2 Locator LED (blue)
3 Hard-disk drive activity LED (green)
4 Reset button
5 Power-control button
6 Power LED (green)

SAN Volume Controller 2145-8G4 operator information panel


The operator-information panel contains buttons and indicators such as the release
latch for the light path diagnostics panel, the power-control button, and LEDs that
indicate information such as system-board errors, hard-drive activity, and power
status.

Figure 13 shows the operator information panel for the SAN Volume Controller
2145-8G4.

7 6 5 4 3 2 1
svc00215

Figure 13. SAN Volume Controller 2145-8G4 operator-information panel

1 Release latch for light path diagnostics panel


2 System-error LED (amber)
3 System-information LED (amber)
4 Locator LED (blue)
5 Hard disk drive activity LED (green)
6 Power LED (green)
7 Power-control button

SAN Volume Controller 2145-8F4 and SAN Volume Controller


2145-8F2 operator information panel
The operator-information panel contains buttons and indicators such as the release
latch for the light path diagnostics panel, the power-control button, and LEDs that
indicate information such as system-board errors, hard-drive activity, and power
status.

16 SAN Volume Controller: Troubleshooting Guide


Figure 14 shows the operator-information panel that is used by the SAN Volume
Controller 2145-8F4 and the SAN Volume Controller 2145-8F2 models.

8 7 6 5 4 3 2 1

svc00084
Figure 14. SAN Volume Controller 2145-8F2 and SAN Volume Controller 2145-8F4
operator-information panel

1 Release latch for light path diagnostics panel


2 System-error LED (amber)
3 Information LED (amber)
4 Locator LED (blue)
5 Hard disk drive activity LED (green)
6 Power control button
7 Power LED (green)
8 USB connector

System-error LED
When it is lit, the system-error LED indicates that a system-board error has
occurred.

This amber LED lights up if the SAN Volume Controller hardware detects a fatal
error that requires a new field-replaceable unit (FRU). To help you isolate the
faulty FRU, see MAP 5800: Light path to help you isolate the faulty FRU.

A system-error LED is also at the rear of the SAN Volume Controller models
2145-CG8, 2145-CF8, 2145-8G4, 2145-8F4, and 2145-8F2.

Hard-disk drive activity LED


When it is lit, the green hard-disk drive activity LED indicates that the hard disk
drive is in use.

Reset button
A reset button is available on the SAN Volume Controller 2145-8A4 node, but do
not use it.

Attention: If you use the reset button, the node restarts immediately without the
SAN Volume Controller control data being written to disk. Service actions are then
required to make the node operational again.

Power button
The power button turns main power on or off for the SAN Volume Controller.

To turn on the power, press and release the power button. You must have a
pointed device, such as a pen, to press the button.

Chapter 2. Introducing the SAN Volume Controller hardware components 17


To turn off the power, press and release the power button. For more information
about how to turn off the SAN Volume Controller node, see MAP 5350: Powering
off a SAN Volume Controller node.

Attention: When the node is operational and you press and immediately release
the power button, the SAN Volume Controller indicates on its front panel that it is
turning off and writes its control data to its internal disk. This can take up to five
minutes. If you press the power button but do not release it, the node turns off
immediately without the SAN Volume Controller control data being written to
disk. Service actions are then required to make the SAN Volume Controller
operational again. Therefore, during a power-off operation, do not press and hold
the power button for more than two seconds.

Note: The 2145 UPS-1U does not turn off when the SAN Volume Controller is shut
down from the power button.

Power LED
The green power LED indicates the power status of the SAN Volume Controller.

The power LED has the following properties:


Off One or more of the following are true:
v No power is present at the power supply input.
v The power supply has failed.
v The LED has failed.
On The SAN Volume Controller node is turned on.
Flashing
The SAN Volume Controller node is turned off, but is still connected to a
power source.

Note: A power LED is also at the rear of the SAN Volume Controller 2145-CG8,
2145-CF8, 2145-8F2, 2145-8F4, and 2145-8G4 nodes.

Release latch
The release latch on the SAN Volume Controller models 2145-8G4, 2145-8F4, and
2145-8F2 gives you access to the light path diagnostics panel, which provides a
method for determining the location of a problem.

After pressing the release latch on the operator-information panel, you can slide
the light path diagnostics panel out to view the lit LEDs. The LEDs indicate the
type of error that has occurred. See MAP 5800: Light path for more detail.

To retract the panel, push it back into the node and snap it into place.

System-information LED
When the system-information LED is lit, a noncritical event has occurred.

Check the light path diagnostics panel and the event log. Light path diagnostics
are described in more detail in the light path maintenance analysis procedure
(MAP).

Locator LED
The SAN Volume Controller does not use the locator LED.

18 SAN Volume Controller: Troubleshooting Guide


Ethernet-activity LED
An Ethernet-activity LED beside each Ethernet port indicates that theSAN Volume
Controller node is communicating on the Ethernet network that is connected to the
Ethernet port.

The operator-information panel LEDs refer to the Ethernet ports that are mounted
on the system board. If you install the 10 Gbps Ethernet card on a SAN Volume
Controller 2145-CG8, the port activity is not reflected on the activity LEDs.

SAN Volume Controller rear-panel indicators and connectors


The rear-panel indicators for the SAN Volume Controller are located on the
back-panel assembly. The external connectors are located on the SAN Volume
Controller node and the power supply assembly.

SAN Volume Controller 2145-CG8 rear-panel indicators


The rear-panel indicators consist of LEDs that indicate the status of the Fibre
Channel ports, Ethernet connection and activity, power, electrical current, and
system-board errors.

Figure 15 shows the rear-panel indicators on the SAN Volume Controller 2145-CG8
back-panel assembly.

1 2 4

svc00720
3 5

Figure 15. SAN Volume Controller 2145-CG8 rear-panel indicators

1 Fibre Channel LEDs


2 Ethernet-link LEDs
3 Ethernet-activity LEDs
4 Ac, dc, and power-supply error LEDs
5 Power, location, and system-error LEDs

Figure 16 shows the rear-panel indicators on the SAN Volume Controller 2145-CG8
back-panel assembly that has the 10 Gbps Ethernet feature.

1
svc00729

Figure 16. SAN Volume Controller 2145-CG8 rear-panel indicators for the 10 Gbps Ethernet
feature

1 10 Gbps Ethernet-link LEDs. The amber link LED is on when this port is
connected to a 10 Gbps Ethernet switch and the link is online.

Chapter 2. Introducing the SAN Volume Controller hardware components 19


2 10 Gbps Ethernet-activity LEDs. The green activity LED is on while data is
being sent over the link.

SAN Volume Controller 2145-CG8 connectors


External connectors that the SAN Volume Controller 2145-CG8 uses include four
Fibre Channel ports, a serial port, two Ethernet ports, and two power connectors.
The 2145-CG8 also has external connectors for the 10 Gbps Ethernet feature.

These figures show the external connectors on the SAN Volume Controller
2145-CG8 back panel assembly.

1 2 3 4 5 6

svc00732
9 8 7

Figure 17. Connectors on the rear of the SAN Volume Controller 2145-CG8

1 Fibre Channel port 1


2 Fibre Channel port 2
3 Fibre Channel port 3
4 Fibre Channel port 4
5 Power-cord connector for power supply 1
6 Power-cord connector for power supply 2
7 Serial connection for UPS communication cable
8 Ethernet port 2
9 Ethernet port 1

1 2
svc00731

Figure 18. 10 Gbps Ethernet ports on the rear of the SAN Volume Controller 2145-CG8

1 10 Gbps Ethernet port 3


2 10 Gbps Ethernet port 4

Figure 19 on page 21 shows the type of connector that is located on each


power-supply assembly. Use these connectors to connect the SAN Volume
Controller 2145-CG8 to the two power cables from the uninterruptible power
supply.

20 SAN Volume Controller: Troubleshooting Guide


Neutral
Ground

Live

Figure 19. Power connector

SAN Volume Controller 2145-CG8 ports used during service procedures:

The SAN Volume Controller 2145-CG8 contains a number of ports that are only
used during service procedures.

Figure 20 shows ports that are used only during service procedures.

1 2 3

3 2

svc00724
Figure 20. Service ports of the SAN Volume Controller 2145-CG8

1 System management port


2 Two monitor ports, one on the front and one on the rear
3 Four USB ports, two on the front and two on the rear

During normal operation, none of these ports are used. Connect a device to any of
these ports only when you are directed to do so by a service procedure or by an
IBM service representative.

SAN Volume Controller 2145-CG8 unused ports:

The SAN Volume Controller 2145-CG8 can contain one port that is not used.

Figure 21 shows the one port that is not used during service procedures or normal
use.

1
svc00730

Figure 21. SAN Volume Controller 2145-CG8 port not used

Chapter 2. Introducing the SAN Volume Controller hardware components 21


1 Serial-attached SCSI (SAS) port

When present, this port is disabled in software to make the port inactive.

The SAS port is present when the optional high-speed SAS adapter is installed
with one or more solid-state drives (SSDs).

SAN Volume Controller 2145-CF8 rear-panel indicators


The rear-panel indicators consist of LEDs that indicate the status of the Fibre
Channel ports, Ethernet connection and activity, power, electrical current, and
system-board errors.

Figure 22 shows the rear-panel indicators on the SAN Volume Controller 2145-CF8
back-panel assembly.

1 2

svc_00219b_cf8
5 4 5 4 3

Figure 22. SAN Volume Controller 2145-CF8 rear-panel indicators

1 Fibre Channel LEDs


2 Ac, dc, and power-supply error LEDs
3 Power®, location, and system-error LEDs
4 Ethernet-link LEDs
5 Ethernet-activity LEDs

SAN Volume Controller 2145-CF8 connectors


External connectors that the SAN Volume Controller 2145-CF8 uses include four
Fibre Channel ports, a serial port, two Ethernet ports, and two power connectors.

Figure 23 shows the external connectors on the SAN Volume Controller 2145-CF8
back panel assembly.

1 2 3 4 5 6
svc_00219_cf8

9 8 7

Figure 23. Connectors on the rear of the SAN Volume Controller 2145-CG8 or 2145-CF8

1 Fibre Channel port 1


2 Fibre Channel port 2
3 Fibre Channel port 3
4 Fibre Channel port 4

22 SAN Volume Controller: Troubleshooting Guide


5 Power-cord connector for power supply 1
6 Power-cord connector for power supply 2
7 Serial connection for UPS communication cable
8 Ethernet port 2
9 Ethernet port 1

Figure 24 shows the type of connector that is located on each power-supply


assembly. Use these connectors to connect the SAN Volume Controller 2145-CF8 to
the two power cables from the uninterruptible power supply.

Neutral
Ground

Live

Figure 24. Power connector

SAN Volume Controller 2145-CF8 ports used during service procedures:

The SAN Volume Controller 2145-CF8 contains a number of ports that are only
used during service procedures.

Figure 25 shows ports that are used only during service procedures.

1 2 3 svc00227cf8

Figure 25. Service ports of the SAN Volume Controller 2145-CF8

1 System management port


2 Two monitor ports, one on the front and one on the rear
3 Four USB ports, two on the front and two on the rear

During normal operation, none of these ports are used. Connect a device to any of
these ports only when you are directed to do so by a service procedure or by an
IBM service representative.

SAN Volume Controller 2145-CF8 unused ports:

The SAN Volume Controller 2145-CF8 can contain one port that is not used.

Chapter 2. Introducing the SAN Volume Controller hardware components 23


Figure 26 shows the one port that is not used during service procedures or normal
use.

svc00227cf8b
Figure 26. SAN Volume Controller 2145-CF8 port not used

1 Serial-attached SCSI (SAS) port

When present, this port is disabled in software to make the port inactive.

The SAS port is present when the optional high-speed SAS adapter is installed
with one or more solid-state drives (SSDs).

SAN Volume Controller 2145-8A4 rear-panel indicators


The rear-panel indicators consist of LEDs that indicate the status of the Fibre
Channel ports, Ethernet connection and activity, power, electrical current, and
system-board errors.

Figure 27 shows the rear-panel indicators on the SAN Volume Controller 2145-8A4
back-panel assembly.

svc00539
2 3 4 5

Figure 27. SAN Volume Controller 2145-8A4 rear-panel indicators

1 Fibre Channel LEDs


2 Ethernet port 1 activity LED
3 Ethernet port 1 link LED
4 Ethernet port 2 activity LED
5 Ethernet port 2 link LED

SAN Volume Controller 2145-8A4 connectors


The external connectors consist of Fibre Channel, serial and Ethernet ports, and the
power supply.

Figure 28 on page 25 shows the external connectors on the SAN Volume Controller
2145-8A4 back-panel assembly.

24 SAN Volume Controller: Troubleshooting Guide


5 1 2 3 4

svc00538
8 7 6

Figure 28. SAN Volume Controller 2145-8A4 external connectors

1 Fibre Channel port 1


2 Fibre Channel port 2
3 Fibre Channel port 3
4 Fibre Channel port 4
5 Power supply
6 Serial connection
7 Ethernet port 2
8 Ethernet port 1

Figure 29 shows the type of connector that is located on the power supply
assembly. The connector enables you to connect the SAN Volume Controller
2145-8A4 to the power source from the uninterruptible power supply.

Figure 29. Power connector

SAN Volume Controller 2145-8A4 ports used during service procedures

The SAN Volume Controller 2145-8A4 contains a number of ports that are used
only during service procedures. These ports are shown in Figure 30.

1 2 3

2
svc00537

Figure 30. Service ports of the SAN Volume Controller 2145-8A4

1 System management port


2 Four USB ports, two on the front and two on the rear

Chapter 2. Introducing the SAN Volume Controller hardware components 25


3 One video port on the rear

During normal operation, none of these ports are used. Connect a device to any of
these ports only when you are directed to do so by a service procedure or by your
IBM service representative.

SAN Volume Controller 2145-8A4 ports not used

The SAN Volume Controller 2145-8A4 has no unused ports.

SAN Volume Controller 2145-8G4 rear-panel indicators


The rear-panel indicators consist of LEDs that indicate the status of the Fibre
Channel ports, Ethernet connection and activity, power, electrical current, and
system-board errors.

Figure 31 shows the rear-panel indicators on the SAN Volume Controller 2145-8G4
back-panel assembly.

svc00536
2 3 4 5 6

Figure 31. SAN Volume Controller 2145-8G4 rear-panel indicators

1 Fibre Channel LEDs


2 Ethernet port 1 activity LED
3 Ethernet port 1 link LED
4 Ethernet port 2 activity LED
5 Ethernet port 2 link LED
6 Power, location, and system error LEDs
7 Ac and dc LEDs

SAN Volume Controller 2145-8G4 connectors


The external connectors consist of Fibre Channel, serial, and Ethernet ports, and
the power supply.

Figure 32 shows the external connectors on the SAN Volume Controller 2145-8G4
back panel assembly.

1 2 3 4
svc00535

8 7 6 5

Figure 32. SAN Volume Controller 2145-8G4 external connectors

26 SAN Volume Controller: Troubleshooting Guide


1 Fibre Channel port 1
2 Fibre Channel port 2
3 Fibre Channel port 3
4 Fibre Channel port 4
5 Power supply
6 Serial connection
7 Ethernet port 2
8 Ethernet port 1

Figure 33 shows the type of connector that is located on the power supply
assembly. The connector enables you to connect the SAN Volume Controller
2145-8G4 to the power source from the uninterruptible power supply.

Neutral
Ground

Live

Figure 33. Power connector

SAN Volume Controller 2145-8G4 ports used during service


procedures

The SAN Volume Controller 2145-8G4 contains a number of ports that are only
used during service procedures. These ports are shown in Figure 34.

1 2
3
2
svc00534

Figure 34. Service ports of the SAN Volume Controller 2145-8G4

1 System management port


2 Four USB ports, two on the front and two on the rear
3 Two monitor ports, one on the front and one on the rear

During normal operation, none of these ports are used. Connect a device to any of
these ports only when you are directed to do so by a service procedure or by your
IBM service representative.

Chapter 2. Introducing the SAN Volume Controller hardware components 27


SAN Volume Controller 2145-8G4 ports not used

The SAN Volume Controller 2145-8G4 has no unused ports.

SAN Volume Controller 2145-8F4 rear-panel indicators


The rear-panel indicators are located on the back-panel assembly.

Figure 35 shows the rear-panel indicators on the SAN Volume Controller 2145-8F4
back-panel assembly.

svc00533
2 3 4 5 6 7

Figure 35. SAN Volume Controller 2145-8F4 rear-panel indicators

1 Fibre Channel LEDs


2 Ethernet port 1 link LED
3 Ethernet port 1 activity LED
4 Ethernet port 2 link LED
5 Ethernet port 2 activity LED
6 Power, location, and system error LEDs
7 Ac and dc LEDs

SAN Volume Controller 2145-8F4 connectors


The external connectors consist of Ethernet, serial, and Fibre Channel ports, and
the power supply.

Figure 36 shows the external connectors on the SAN Volume Controller 2145-8F4
back panel assembly.

1 2 3 4 5
svc00532

8 7 6

Figure 36. SAN Volume Controller 2145-8F4 external connectors

1 Fibre Channel port 1


2 Fibre Channel port 2
3 Fibre Channel port 3
28 SAN Volume Controller: Troubleshooting Guide
4 Fibre Channel port 4
5 Power supply
6 Serial connection
7 Ethernet port 2
8 Ethernet port 1

Figure 37 shows the type of connector that is located on the power supply
assembly. The connector enables you to connect the SAN Volume Controller
2145-8F4 to the power source from the uninterruptible power supply.

Neutral
Ground

Live

Figure 37. Power connector

SAN Volume Controller 2145-8F4 ports used during service procedures

The SAN Volume Controller 2145-8F4 contains the keyboard service port and the
monitor service port. These ports are used only during service procedures.
Figure 38 provides the locations of the service ports.

svc00531

1 2

Figure 38. Service ports of the SAN Volume Controller 2145-8F4

1 Keyboard port


2 Monitor port

SAN Volume Controller 2145-8F4 ports not used during normal


operation

The SAN Volume Controller 2145-8F4 is equipped with several ports that are not
used by the SAN Volume Controller during normal operation. Figure 39 on page
30 and Figure 40 on page 30 show the ports that are not used by the SAN Volume
Controller.

Chapter 2. Introducing the SAN Volume Controller hardware components 29


svc00530
1 2 3 4 5

Figure 39. Ports not used during normal operation by the SAN Volume Controller 2145-8F4

1 System management port


2 Mouse port
3 Keyboard port
4 USB ports
5 Monitor port

svc00210
Figure 40. Ports not used on the front panel of the SAN Volume Controller 2145-8F4

1 USB port

SAN Volume Controller 2145-8F2 rear-panel indicators


The rear-panel indicators are located on the back-panel assembly.

Figure 41 shows the rear-panel indicators on the SAN Volume Controller 2145-8F2
back-panel assembly.

1
svc00529

2 3 4 5 6 7

Figure 41. SAN Volume Controller 2145-8F2 rear-panel indicators

1 Fibre Channel LEDs


2 Ethernet port 1 link LED
3 Ethernet port 1 activity LED
4 Ethernet port 2 link LED
5 Ethernet port 2 activity LED
6 Power, location, and system error LEDs

30 SAN Volume Controller: Troubleshooting Guide


7 Ac and dc LEDs

SAN Volume Controller 2145-8F2 connectors


The external connectors consist of the power supply and Ethernet, Fibre Channel,
and serial ports.

Figure 42 shows the external connectors on the SAN Volume Controller 2145-8F2
back panel assembly.

svc00528
8 7 6 5 4 3 2 1

Figure 42. SAN Volume Controller 2145-8F2 external connectors

1 Power supply


2 Fibre Channel port 4
3 Serial connection
4 Fibre Channel port 3
5 Fibre Channel port 2
6 Fibre Channel port 1
7 Ethernet port 2
8 Ethernet port 1

Figure 43 shows the type of connector that is located on the power supply
assembly. The connector enables you to connect the SAN Volume Controller
2145-8F2 to the power source from the uninterruptible power supply.

Neutral
Ground

Live

Figure 43. Power connector

Fibre Channel LEDs


The Fibre Channel LEDs indicate the status of the Fibre Channel ports.

Two LEDs are used to indicate the state and speed of the operation of each Fibre
Channel port. The bottom LED indicates the link state and activity.
Table 7. Link state and activity for the bottom Fibre Channel LED
LED state Link state and activity indicated
Off Link inactive
On Link active, no I/O

Chapter 2. Introducing the SAN Volume Controller hardware components 31


Table 7. Link state and activity for the bottom Fibre Channel LED (continued)
LED state Link state and activity indicated
Blinking Link active, I/O active

Each Fibre Channel port can operate at one of three speeds. The top LED indicates
the relative link speed. The link speed is defined only if the link state is active.
Table 8. Link speed for the top Fibre Channel LED
LED state Link speed indicated
Off SLOW
On FAST
Blinking MEDIUM

Table 9 shows the actual link speeds for the SAN Volume Controller models
2145-8A4, 2145-8G4, and 2145-8F4.
Table 9. Actual link speeds
Link speed Actual link speeds
Slow 1 Gbps
Fast 4 Gbps
Medium 2 Gbps

Table 10 shows the actual link speeds for the SAN Volume Controller 2145-CF8 and
for the SAN Volume Controller 2145-CG8.
Table 10. Actual link speeds
Link speed Actual link speeds
Slow 2 Gbps
Fast 8 Gbps
Medium 4 Gbps

Ethernet activity LED


The Ethernet activity LED indicates that the node is communicating with the
Ethernet network that is connected to the Ethernet port.

There is a set of LEDs for each Ethernet connector. The top LED is the Ethernet
link LED. When it is lit, it indicates that there is an active connection on the
Ethernet port. The bottom LED is the Ethernet activity LED. When it flashes, it
indicates that data is being transmitted or received between the server and a
network device.

Ethernet link LED


The Ethernet link LED indicates that there is an active connection on the Ethernet
port.

There is a set of LEDs for each Ethernet connector. The top LED is the Ethernet
link LED. When it is lit, it indicates that there is an active connection on the

32 SAN Volume Controller: Troubleshooting Guide


Ethernet port. The bottom LED is the Ethernet activity LED. When it flashes, it
indicates that data is being transmitted or received between the server and a
network device.

Power, location, and system-error LEDs


The power, location, and system-error LEDs are housed on the rear of the SAN
Volume Controller. These three LEDs are duplicates of the same LEDs that are
shown on the front of the node.

The following terms describe the power, location, and system-error LEDs:
Power LED
This is the top of the three LEDs and indicates the following states:
Off One or more of the following are true:
v No power is present at the power supply input
v The power supply has failed
v The LED has failed
On The SAN Volume Controller is powered on.
Flashing
The SAN Volume Controller is turned off but is still connected to a
power source.
Location LED
This is the middle of the three LEDs and is not used by the SAN Volume
Controller.
System-error LED
This is the bottom of the three LEDs that indicates that a system board
error has occurred. The light path diagnostics provide more information.

Ac and dc LEDs
The ac and dc LEDs indicate whether the node is receiving electrical current.
Ac LED
The upper LED indicates that ac current is present on the node.
Dc LED
The lower LED indicates that dc current is present on the node.

Ac, dc, and power-supply error LEDs on the SAN Volume Controller 2145-CF8
and SAN Volume Controller 2145-CG8:

The ac, dc, and power-supply error LEDs indicate whether the node is receiving
electrical current.

Figure 44 on page 34 shows the location of the SAN Volume Controller 2145-CF8
ac, dc, and power-supply error LEDs.

Chapter 2. Introducing the SAN Volume Controller hardware components 33


1
2
3

svc00542
Figure 44. SAN Volume Controller 2145-CG8 or 2145-CF8 ac, dc, and power-error LEDs

Each of the two power supplies has its own set of LEDs.
Ac LED
The upper LED (1) on the left side of the power supply, indicates that ac
current is present on the node.
Dc LED
The middle LED (2) to the left side of the power supply, indicates that
dc current is present on the node.
Power-supply error LED
The lower LED (3) to the left side of the power supply, indicates a
problem with the power supply.

Ac and dc LEDs on the SAN Volume Controller 2145-8G4:

The ac LED and dc LED are located on the rear of the SAN Volume Controller
2145-8G4.

Figure 45 shows the location of the ac and dc LEDs.

2
svc00220

Figure 45. SAN Volume Controller 2145-8G4 ac and dc LEDs

34 SAN Volume Controller: Troubleshooting Guide


Ac LED
The upper LED (1) indicates that ac current is present on the node.
Dc LED
The lower LED (2) indicates that dc current is present on the node.

Ac and dc LEDs on the SAN Volume Controller 2145-8F4 and the SAN Volume
Controller 2145-8F2:

The ac LED and dc LED are located on the rear of the SAN Volume Controller
2145-8F4 and the SAN Volume Controller 2145-8F2.

Figure 46 shows the location of the ac and dc LEDs.

svc00105
Figure 46. SAN Volume Controller 2145-8F4 and SAN Volume Controller 2145-8F2 ac and dc
LEDs

Ac LED
The upper LED (1) indicates that ac current is present on the node.
Dc LED
The lower LED (2) indicates that dc current is present on the node.

Fibre Channel port numbers and worldwide port names


Fibre Channel ports are identified by their physical port number and by a
worldwide port name (WWPN).

The physical port numbers identify Fibre Channel cards and cable connections
when you perform service tasks. The physical port numbers are 1 - 4, counting
from left to right when you view the rear panel of the node. The WWPNs are used
for tasks such as Fibre Channel switch configuration and to uniquely identify the
devices on the SAN.

The WWPNs are derived from the worldwide node name (WWNN) of the SAN
Volume Controller node in which the ports are installed.

The WWNN is in the form 50050768010XXXXX, where XXXXX is initially derived


from the unit and is specific to a node.

Chapter 2. Introducing the SAN Volume Controller hardware components 35


The WWPNs are in the form 50050768010QXXXXX, where XXXXX is as
previously stated and Q is related to the port number as follows:

Port Value of Q
1 4
2 3
3 1
4 2

Requirements for the SAN Volume Controller environment


Certain specifications for the physical site of the SAN Volume Controller must be
met before the IBM representative can set up your SAN Volume Controller
environment.

SAN Volume Controller 2145-CG8 environment requirements


Before the SAN Volume Controller 2145-CG8 is installed, the physical environment
must meet certain requirements. This includes verifying that adequate space is
available and that requirements for power and environmental conditions are met.

Input-voltage requirements

Ensure that your environment meets the following voltage requirements.

Voltage Frequency
200 V to 240 V single phase ac 50 Hz or 60 Hz

Attention:
v If the uninterruptible power supply is cascaded from another uninterruptible
power supply, the source uninterruptible power supply must have at least three
times the capacity per phase and the total harmonic distortion must be less than
5%.
v The uninterruptible power supply also must have input voltage capture that has
a slew rate of no more than 3 Hz per second.

Maximum power requirements for each node

Ensure that your environment meets the following power requirements.

The maximum power that is required depends on the node type and the optional
features that are installed.
Table 11. Maximum power consumption
Components Power requirements
SAN Volume Controller 2145-CG8 and 2145 200 W
UPS-1U

For each redundant ac-power switch, add 20 W to the power requirements.

For the high-speed SAS adapter with from one to four solid-state drives, add 50 W
to the power requirements.

36 SAN Volume Controller: Troubleshooting Guide


Circuit breaker requirements

The 2145 UPS-1U has an integrated circuit breaker and does not require additional
protection.

Environment requirements without redundant ac power

Ensure that your environment falls within the following ranges if you are not
using redundant ac power.
Table 12. Physical specifications
Maximum wet
Relative bulb
Environment Temperature Altitude humidity temperature
Operating in 10°C to 35°C 0 m to 914 m 8% to 80% 23°C (73°F)
lower altitudes (50°F to 95°F) (0 ft to 3000 ft) noncondensing
Operating in 10°C to 32°C 914 m to 2133 m 8% to 80% 23°C (73°F)
higher altitudes (50°F to 90°F) (3000 ft to 7000 noncondensing
ft)
Turned off 10°C to 43°C 0 m to 2133 m 8% to 80% 27°C (81°F)
(50°F to 109°F) (0 ft to 7000 ft) noncondensing
Storing 1°C to 60°C 0 m to 2133 m 5% to 80% 29°C (84°F)
(34°F to 140°F) (0 ft to 7000 ft) noncondensing
Shipping -20°C to 60°C 0 m to 10668 m 5% to 100% 29°C (84°F)
(-4°F to 140°F) (0 ft to 34991 ft) condensing, but
no precipitation

Environment requirements with redundant ac power

Ensure that your environment falls within the following ranges if you are using
redundant ac power.
Table 13. Environment requirements with redundant ac power
Maximum wet
Relative bulb
Environment Temperature Altitude humidity temperature
Operating in 15°C to 32°C 0 m to 914 m 20% to 80% 23°C (73°F)
lower altitudes (59°F to 90°F) (0 ft to 3000 ft) noncondensing
Operating in 15°C to 32°C 914 m to 2133 m 20% to 80% 23°C (73°F)
higher altitudes (59°F to 90°F) (3000 ft to 7000 noncondensing
ft)
Turned off 10°C to 43°C 0 m to 2133 m 20% to 80% 27°C (81°F)
(50°F to 109°F) (0 ft to 7000 ft) noncondensing
Storing 1°C to 60°C 0 m to 2133 m 5% to 80% 29°C (84°F)
(34°F to 140°F) (0 ft to 7000 ft) noncondensing
Shipping -20°C to 60°C 0 m to 10668 m 5% to 100% 29°C (84°F)
(-4°F to 140°F) (0 ft to 34991 ft) condensing, but
no precipitation

Preparing your environment

The following tables list the physical characteristics of the SAN Volume Controller
2145-CG8 node.
Chapter 2. Introducing the SAN Volume Controller hardware components 37
Dimensions and weight

Ensure that space is available in a rack that is capable of supporting the node.
Table 14. Dimensions and weight
Height Width Depth Maximum weight
4.3 cm 44 cm 73.7 cm 15 kg
(1.7 in.) (17.3 in.) (29 in.) (33 lb)

Additional space requirements

Ensure that space is also available in the rack for the following additional space
requirements around the node.
Table 15. Additional space requirements
Additional space
Location requirements Reason
Left side and right side Minimum: 50 mm (2 in.) Cooling air flow
Back Minimum: 100 mm (4 in.) Cable exit

Maximum heat output of each SAN Volume Controller 2145-CG8 node

The node dissipates the following maximum heat output.


Table 16. Maximum heat output of each SAN Volume Controller 2145-CG8 node
Model Heat output per node
SAN Volume Controller 2145-CG8 160 W (546 Btu per hour)
SAN Volume Controller 2145-CG8 plus 210 W (717 Btu per hour)
solid-state drives (SSDs)

Maximum heat output of each 2145 UPS-1U

The 2145 UPS-1U dissipates the following maximum heat output.


Table 17. Maximum heat output of each 2145 UPS-1U
Model Heat output per node
Maximum heat output of 2145 UPS-1U 10 W (34 Btu per hour)
during normal operation
Maximum heat output of 2145 UPS-1U 100 W (341 Btu per hour)
during battery operation

SAN Volume Controller 2145-CF8 environment requirements


Before installing a SAN Volume Controller 2145-CF8 node, your physical
environment must meet certain requirements. This includes verifying that adequate
space is available and that requirements for power and environmental conditions
are met.

38 SAN Volume Controller: Troubleshooting Guide


Input-voltage requirements

Ensure that your environment meets the following voltage requirements.

Voltage Frequency
200 to 240 V single phase ac 50 or 60 Hz

Attention:
v If the uninterruptible power supply is cascaded from another uninterruptible
power supply, the source uninterruptible power supply must have at least three
times the capacity per phase and the total harmonic distortion must be less than
5%.
v The uninterruptible power supply also must have input voltage capture that has
a slew rate of no more than 3 Hz per second.

Power requirements for each node

Ensure that your environment meets the following power requirements.

The power capacity that is required depends on the node type and which optional
features are installed.

Components Power requirements


SAN Volume Controller 2145-CF8 node and 200 W
2145 UPS-1U power supply

Notes:
v SAN Volume Controller 2145-CF8 nodes will not connect to all revisions of the
2145 UPS-1U power supply unit. The SAN Volume Controller 2145-CF8 nodes
require the 2145 UPS-1U power supply unit part number 31P1318. This unit has
two power outlets that are accessible. Earlier revisions of the 2145 UPS-1U
power supply unit have only one power outlet that is accessible and are not
suitable.
v For each redundant ac-power switch, add 20 W to the power requirements.
v For each high-speed SAS adapter with one to four solid-state drives (SSDs), add
50 W to the power requirements.

Circuit breaker requirements

The 2145 UPS-1U has an integrated circuit breaker and does not require additional
protection.

Environment requirements without redundant ac power

Ensure that your environment falls within the following ranges if you are not
using redundant ac power.

Maximum wet
Relative bulb
Environment Temperature Altitude humidity temperature
Operating in 10°C to 35°C 0 to 914 m 8% to 80% 23°C (73°F)
lower altitudes (50°F to 95°F) (0 to 2998 ft) noncondensing

Chapter 2. Introducing the SAN Volume Controller hardware components 39


Maximum wet
Relative bulb
Environment Temperature Altitude humidity temperature
Operating in 10°C to 32°C 914 to 2133 m 8% to 80% 23°C (73°F)
higher altitudes (50°F to 90°F) (2998 to 6988 ft) noncondensing
Turned off 10°C to 43°C 0 to 2133 m 8% to 80% 27°C (81°F)
(50°F to 110°F) (0 to 6988 ft) noncondensing
Storing 1°C to 60°C 0 to 2133 m 5% to 80% 29°C (84°F)
(34°F to 140°F) (0 to 6988 ft) noncondensing
Shipping -20°C to 60°C 0 to 10668 m 5% to 100% 29°C (84°F)
(-4°F to 140°F) (0 to 34991 ft) condensing, but
no precipitation

Environment requirements with redundant ac power

Ensure that your environment falls within the following ranges if you are using
redundant ac power.

Maximum wet
Relative bulb
Environment Temperature Altitude humidity temperature
Operating in 15°C to 32°C 0 to 914 m 20% to 80% 23°C (73°F)
lower altitudes (59°F to 90°F) (0 to 2998 ft) noncondensing
Operating in 15°C to 32°C 914 to 2133 m 20% to 80% 23°C (73°F)
higher altitudes (59°F to 90°F) (2998 to 6988 ft) noncondensing
Turned off 10°C to 43°C 0 to 2133m 20% to 80% 27°C (81°F)
(50°F to 110°F) (0 to 6988 ft) noncondensing
Storing 1°C to 60°C 0 to 2133 m 5% to 80% 29°C (84°F)
(34°F to 140°F) (0 to 6988 ft) noncondensing
Shipping -20°C to 60°C 0 to 10668 m 5% to 100% 29°C (84°F)
(-4°F to 140°F) (0 to 34991 ft) condensing, but
no precipitation

Preparing your environment

The following tables list the physical characteristics of the SAN Volume Controller
2145-CF8 node.

Dimensions and weight

Ensure that space is available in a rack that is capable of supporting the node.

Height Width Depth Maximum weight


43 mm 440 mm 686 mm 12.7 kg
(1.69 in.) (17.32 in.) (27 in.) (28 lb)

Additional space requirements

Ensure that space is also available in the rack for the following additional space
requirements around the node.

40 SAN Volume Controller: Troubleshooting Guide


Additional space
Location requirements Reason
Left and right sides 50 mm (2 in.) Cooling air flow
Back Minimum: Cable exit
100 mm (4 in.)

Heat output of each SAN Volume Controller 2145-CF8 node

The node dissipates the following maximum heat output.

Model Heat output per node


SAN Volume Controller 2145-CF8 160 W (546 Btu per hour)
SAN Volume Controller 2145-CF8 and up to 210 W (717 Btu per hour)
four optional solid-state drives (SSDs)
Maximum heat output of 2145 UPS-1U 10 W (34 Btu per hour)
during typical operation
Maximum heat output of 2145 UPS-1U 100 W (341 Btu per hour)
during battery operation

SAN Volume Controller 2145-8A4 environment requirements


Before the SAN Volume Controller 2145-8A4 is installed, the physical environment
must meet certain requirements. This includes verifying that adequate space is
available and that requirements for power and environmental conditions are met.

Input-voltage requirements

Ensure that your environment meets the following voltage requirements.

Voltage Frequency
200 to 240 V single phase ac 50 or 60 Hz

Attention:
v If the uninterruptible power supply is cascaded from another uninterruptible
power supply, the source uninterruptible power supply must have at least three
times the capacity per phase and the total harmonic distortion must be less than
5%.
v The uninterruptible power supply also must have input voltage capture that has
a slew rate of no more than 3 Hz per second.

Power requirements for each node

Ensure that your environment meets the following power requirements.

The power that is required depends on the node type and whether the redundant
ac power feature is used.

Components Power requirements


SAN Volume Controller 2145-8A4 and 2145 180 W
UPS-1U

Chapter 2. Introducing the SAN Volume Controller hardware components 41


For each redundant ac-power switch, add 20 W to the power requirements.

Circuit breaker requirements

The 2145 UPS-1U has an integrated circuit breaker and does not require additional
protection.

Environment requirements without redundant ac power

Ensure that your environment falls within the following ranges if you are not
using redundant ac power.

Maximum wet
Relative bulb
Environment Temperature Altitude humidity temperature
Operating in 10°C to 35°C 0 to 914 m 8% to 80% 23°C (73°F)
lower altitudes (50°F to 95°F) (0 to 3000 ft) noncondensing
Operating in 10°C to 32°C 914 to 2133 m 8% to 80% 23°C (73°F)
higher altitudes (50°F to 90°F) (3000 to 7000 ft) noncondensing
Turned off 10°C to 43°C 0 to 2133 m 8% to 80% 27°C (81°F)
(50°F to 109°F) (0 to 7000 ft) noncondensing
Storing 1°C to 60°C 0 to 2133 m 5% to 80% 29°C (84°F)
(34°F to 140°F) (0 to 7000 ft) noncondensing
Shipping -20°C to 60°C 0 to 10668 m 5% to 100% 29°C (84°F)
(-4°F to 140°F) (0 to 34991 ft) condensing, but
no precipitation

Environment requirements with redundant ac power

Ensure that your environment falls within the following ranges if you are using
redundant ac power.

Maximum wet
Relative bulb
Environment Temperature Altitude humidity temperature
Operating in 15°C to 32°C 0 to 914 m 20% to 80% 23°C (73°F)
lower altitudes (59°F to 90°F) (0 to 3000 ft) noncondensing
Operating in 15°C to 32°C 914 to 2133 m 20% to 80% 23°C (73°F)
higher altitudes (59°F to 90°F) (3000 to 7000 ft) noncondensing
Turned off 10°C to 43°C 0 to 2133 m 20% to 80% 27°C (81°F)
(50°F to 109°F) (0 to 7000 ft) noncondensing
Storing 1°C to 60°C 0 to 2133 m 5% to 80% 29°C (84°F)
(34°F to 140°F) (0 to 7000 ft) noncondensing
Shipping -20°C to 60°C 0 to 10668 m 5% to 100% 29°C (84°F)
(-4°F to 140°F) (0 to 34991 ft) condensing, but
no precipitation

Preparing your environment

The following tables list the physical characteristics of the SAN Volume Controller
2145-8A4 node.

42 SAN Volume Controller: Troubleshooting Guide


Dimensions and weight

Ensure that space is available in a rack that is capable of supporting the node.

Height Width Depth Maximum weight


43 mm 440 mm 559 mm 10.1 kg
(1.75 in.) (17.32 in.) (22 in.) (22 lb)

Additional space requirements

Ensure that space is also available in the rack for the following additional space
requirements around the node.

Additional space
Location requirements Reason
Left and right sides Minimum: 50 mm (2 in.) Cooling air flow
Back Minimum: 100 mm (4 in.) Cable exit

Heat output of each SAN Volume Controller 2145-8A4 node

The node dissipates the following maximum heat output.

Model Heat output per node


SAN Volume Controller 2145-8A4 140 W (478 Btu per hour)

SAN Volume Controller 2145-8G4 environment requirements


Before the SAN Volume Controller 2145-8G4 is installed, the physical environment
must meet certain requirements. This includes verifying that adequate space is
available and that requirements for power and environmental conditions are met.

Input-voltage requirements

Ensure that your environment meets the following voltage requirements.

Voltage Frequency
200 to 240 V single phase ac 50 or 60 Hz

Attention:
v If the uninterruptible power supply is cascaded from another uninterruptible
power supply, the source uninterruptible power supply must have at least three
times the capacity per phase and the total harmonic distortion must be less than
5%.
v The uninterruptible power supply also must have input voltage capture that has
a slew rate of no more than 3 Hz per second.

Power requirements for each node

Ensure that your environment meets the following power requirements.

The power that is required depends on the node type and whether the redundant
ac power feature is used.

Chapter 2. Introducing the SAN Volume Controller hardware components 43


Components Power requirements
SAN Volume Controller 2145-8G4 and 2145 470 W
UPS-1U

For each redundant ac-power switch, add 20 W to the power requirements.

Circuit breaker requirements

The 2145 UPS-1U has an integrated circuit breaker and does not require additional
protection.

Environment requirements without redundant ac power

Ensure that your environment falls within the following ranges if you are not
using redundant ac power.

Maximum wet
Relative bulb
Environment Temperature Altitude humidity temperature
Operating in 10°C to 35°C 0 to 914 m 8% to 80% 23°C (73°F)
lower altitudes (50°F to 95°F) (0 to 2998 ft) noncondensing
Operating in 10°C to 32°C 914 to 2133 m 8% to 80% 23°C (73°F)
higher altitudes (50°F to 90°F) (2998 to 6988 ft) noncondensing
Turned off 10°C to 43°C 0 to 2133 m 8% to 80% 27°C (81°F)
(50°F to 110°F) (0 to 6988 ft) noncondensing
Storing 1°C to 60°C 0 to 2133 m 5% to 80% 29°C (84°F)
(34°F to 140°F) (0 to 6988 ft) noncondensing
Shipping -20°C to 60°C 0 to 10668 m 5% to 100% 29°C (84°F)
(-4°F to 140°F) (0 to 34991 ft) condensing, but
no precipitation

Environment requirements with redundant ac power

Ensure that your environment falls within the following ranges if you are using
redundant ac power.

Maximum wet
Relative bulb
Environment Temperature Altitude humidity temperature
Operating in 15°C to 32°C 0 to 914 m 20% to 80% 23°C (73°F)
lower altitudes (59°F to 90°F) (0 to 2998 ft) noncondensing
Operating in 15°C to 32°C 914 to 2133 m 20% to 80% 23°C (73°F)
higher altitudes (59°F to 90°F) (2998 to 6988 ft) noncondensing
Turned off 10°C to 43°C 0 to 2133m 20% to 80% 27°C (81°F)
(50°F to 110°F) (0 to 6988 ft) noncondensing
Storing 1°C to 60°C 0 to 2133 m 5% to 80% 29°C (84°F)
(34°F to 140°F) (0 to 6988 ft) noncondensing
Shipping -20°C to 60°C 0 to 10668 m 5% to 100% 29°C (84°F)
(-4°F to 140°F) (0 to 34991 ft) condensing, but
no precipitation

44 SAN Volume Controller: Troubleshooting Guide


Preparing your environment

The following tables list the physical characteristics of the SAN Volume Controller
2145-8G4 node.

Dimensions and weight

Ensure that space is available in a rack that is capable of supporting the node.

Height Width Depth Maximum weight


43 mm 440 mm 686 mm 12.7 kg
(1.69 in.) (17.32 in.) (27 in.) (28 lb)

Additional space requirements

Ensure that space is also available in the rack for the following additional space
requirements around the node.

Additional space
Location requirements Reason
Left and right sides 50 mm (2 in.) Cooling air flow
Back Minimum: Cable exit
100 mm (4 in.)

Heat output of each SAN Volume Controller 2145-8G4 node

The node dissipates the following maximum heat output.

Model Heat output per node


SAN Volume Controller 2145-8G4 400 W (1350 Btu per hour)

SAN Volume Controller 2145-8F4 and SAN Volume Controller


2145-8F2 environment requirements
Before the SAN Volume Controller 2145-8F4 or SAN Volume Controller 2145-8F2 is
installed, the physical environment must meet certain requirements. This includes
verifying that adequate space is available and that requirements for power and
environmental conditions are met.

Input-voltage requirements

Ensure that your environment meets the following voltage requirements.

Voltage Frequency
200 to 240 V single phase ac 50 or 60 Hz

Power requirements for each node

Ensure that your environment meets the following power requirements.

The power that is required depends on the node type and whether the redundant
ac power feature is used.

Chapter 2. Introducing the SAN Volume Controller hardware components 45


Components Power requirements
SAN Volume Controller 2145-8F4 and 2145 520 W
UPS-1U
SAN Volume Controller 2145-8F2 and 2145 520 W
UPS-1U

For each redundant ac-power switch, add 20 W to the power requirements.

Circuit breaker requirements

The 2145 UPS-1U has an integrated circuit breaker and does not require additional
protection.

Environment requirements without redundant ac power

Ensure that your environment falls within the following ranges if you are not
using redundant ac power.

Maximum wet
Relative bulb
Environment Temperature Altitude humidity temperature
Operating in 10°C to 35°C 0 to 914.4 m 8% to 80% 23°C (74°F)
lower altitudes (50°F to 95°F) (0 to 3000 ft) noncondensing
Operating in 10°C to 32°C 914.4 to 2133.6 m 8% to 80% 23°C (74°F)
higher altitudes (50°F to 88°F) (3000 to 7000 ft) noncondensing
Turned off 10°C to 43°C 0 to 2133.6 m 8% to 80% 27°C (81°F)
(50°F to 110°F) (3000 to 7000 ft) noncondensing
Storing 1°C to 60°C 0 to 2133.6 m 5% to 80% 29°C (84°F)
(34°F to 140°F) (0 to 7000 ft) noncondensing
Shipping -20°C to 60°C 0 to 10668 m 5% to 100% 29°C (84°F)
(-4°F to 140°F) (0 to 34991 ft) condensing, but
no precipitation

Environment requirements with redundant ac power

Ensure that your environment falls within the following ranges if you are using
redundant ac power.

Maximum wet
Relative bulb
Environment Temperature Altitude humidity temperature
Operating in 15°C to 32°C 0 to 914.4 m 20% to 80% 23°C (74°F)
lower altitudes (59°F to 89°F) (0 to 3000 ft) noncondensing
Operating in 15°C to 32°C 914.4 to 2133.6 20% to 80% 23°C (74°F)
higher altitudes (50°F to 88°F) m noncondensing
(3000 to 7000 ft)
Turned off 10°C to 43°C 0 to 2133.6 m 20% to 80% 27°C (81°F)
(50°F to 110°F) (0 to 7000 ft) noncondensing
Storing 1°C to 60°C 0 to 2133.6 m 5% to 80% 29°C (84°F)
(34°F to 140°F) (0 to 7000 ft) noncondensing

46 SAN Volume Controller: Troubleshooting Guide


Maximum wet
Relative bulb
Environment Temperature Altitude humidity temperature
Shipping -20°C to 60°C 0 to 10668 m 5% to 100% 29°C (84°F)
(-4°F to 140°F) (0 to 34991 ft) condensing, but
no precipitation

Preparing your environment

The following tables list the physical characteristics of the SAN Volume Controller
2145-8F4 and SAN Volume Controller 2145-8F2 nodes.

Dimensions and weight

Ensure that space is available in a rack that is capable of supporting the node.

Height Width Depth Maximum weight


43 mm 440 mm 686 mm 12.7 kg
(1.69 in.) (17.32 in.) (27 in.) (28 lb)

Additional space requirements

Ensure that space is also available in the rack for the following additional space
requirements around the node.

Additional space
Location requirements Reason
Left and right sides 50 mm (2 in.) Cooling air flow
Back Minimum: Cable exit
100 mm (4 in.)

Heat output of each SAN Volume Controller 2145-8F4 or SAN Volume


Controller 2145-8F2 node

The nodes dissipate the following maximum heat output.

Model Heat output per node


SAN Volume Controller 2145-8F4 450 W (1540 Btu per hour)
SAN Volume Controller 2145-8F2 450 W (1540 Btu per hour)

Redundant ac-power switch


The redundant ac-power switch is an optional feature that makes the SAN Volume
Controller nodes resilient to the failure of a single power circuit. The redundant
ac-power switch is not a replacement for an uninterruptible power supply. You
must still use a uninterruptible power supply for each node.

You must connect the redundant ac-power switch to two independent power
circuits. One power circuit connects to the main power input port and the other
power circuit connects to the backup power-input port. If the main power to the
SAN Volume Controller node fails for any reason, the redundant ac-power switch

Chapter 2. Introducing the SAN Volume Controller hardware components 47


automatically uses the backup power source. When power is restored, the
redundant ac-power switch automatically changes back to using the main power
source.

Place the redundant ac-power switch in the same rack as the SAN Volume
Controller node. The redundant ac-power switch logically sits between the rack
power distribution unit and the 2145 UPS-1U.

You can use a single redundant ac-power switch to power one or two SAN Volume
Controller nodes. If you use the redundant ac-power switch to power two nodes,
the nodes must be in different I/O groups. In the event that the redundant
ac-power switch fails or requires maintenance, both nodes turn off. Because the
nodes are in two different I/O groups, the hosts do not lose access to the back-end
disk data.

For maximum resilience to failure, use one redundant ac-power switch to power
each SAN Volume Controller node.

Figure 47 shows a redundant ac-power switch.


svc00297

Figure 47. Photo of the redundant ac-power switch

Redundant ac-power environment requirements


Ensure that your physical site meets the installation requirements for the
redundant ac-power switch.

The redundant ac-power switch requires two independent power sources that are
provided through two rack-mounted power distribution units (PDUs). The PDUs
must have IEC320-C13 outlets.

The redundant ac-power switch comes with two IEC 320-C19 to C14 power cables
to connect to rack PDUs. There are no country-specific cables for the redundant
ac-power switch.

The power cable between the redundant ac-power switch and the 2145 UPS-1U is
rated at 10 A.

Redundant ac-power switch specifications

The following tables list the physical characteristics of the redundant ac-power
switch.

Dimensions and weight

48 SAN Volume Controller: Troubleshooting Guide


Ensure that space is available in a rack that is capable of supporting the redundant
ac-power switch.

Height Width Depth Maximum weight


43 mm (1.69 in.) 192 mm (7.56 in.) 240 mm 2.6 kg (5.72 lb)

Additional space requirements

Ensure that space is also available in the rack for the side mounting plates on
either side of the redundant ac-power switch.

Location Width Reason


Left side 124 mm (4.89 in.) Side mounting plate
Right side 124 mm (4.89 in.) Side mounting plate

Heat output (maximum)

The maximum heat output that is dissipated inside the redundant ac-power switch
is approximately 20 watts (70 Btu per hour).

Cabling of redundant ac-power switch (example)


You must properly cable the redundant ac-power switch units in your
environment.

Note: While this topic provides an example of the cable connections, it does not
indicate a preferred physical location for the components.

Figure 48 on page 50 shows an example of the main wiring for a SAN Volume
Controller clustered system with the redundant ac-power switch feature. The
four-node clustered system consists of two I/O groups:
v I/O group 0 contains nodes A and B
v I/O group 1 contains nodes C and D

Chapter 2. Introducing the SAN Volume Controller hardware components 49


2
3
1

4
5

7
8

6
9
10

11

12

14 13

svc00358_cf8

Figure 48. A four-node SAN Volume Controller system with the redundant ac-power switch
feature

1 I/O group 0


2 SAN Volume Controller node A
3 2145 UPS-1U A
4 SAN Volume Controller node B
5 2145 UPS-1U B
6 I/O group 1
7 SAN Volume Controller node C
8 2145 UPS-1U C
9 SAN Volume Controller node D
10 2145 UPS-1U D
11 Redundant ac-power switch 1
12 Redundant ac-power switch 2
13 Site PDU X (C13 outlets)
14 Site PDU Y (C13 outlets)

50 SAN Volume Controller: Troubleshooting Guide


The site PDUs X and Y (13 and 14) are powered from two independent power
sources.

In this example, only two redundant ac-power switch units are used, and each
power switch powers one node in each I/O group. However, for maximum
redundancy, use one redundant ac-power switch to power each node in the
system.

Some SAN Volume Controller node types have two power supply units. Both
power supplies must be connected to the same 2145 UPS-1U, as shown by node A
and node B. The SAN Volume Controller 2145-CG8 is an example of a node that
has two power supplies. The SAN Volume Controller 2145-8A4 is an example of a
node that has a single power supply.

Uninterruptible power supply


The uninterruptible power supply protects a SAN Volume Controller node against
blackouts, brownouts, and power surges. The uninterruptible power supply
contains a power sensor to monitor the supply and a battery to provide power
until an orderly shutdown of the system can be performed.

SAN Volume Controller models use the 2145 UPS-1U.

2145 UPS-1U
A 2145 UPS-1U is used exclusively to maintain data that is held in the SAN
Volume Controller dynamic random access memory (DRAM) in the event of an
unexpected loss of external power. This use differs from the traditional
uninterruptible power supply that enables continued operation of the device that it
supplies when power is lost.

With a 2145 UPS-1U, data is saved to the internal disk of the SAN Volume
Controller node. The uninterruptible power supply units are required to power the
SAN Volume Controller nodes even when the input power source is considered
uninterruptible.

Note: The uninterruptible power supply maintains continuous SAN Volume


Controller-specific communications with its attached SAN Volume Controller
nodes. A SAN Volume Controller node cannot operate without the uninterruptible
power supply. The uninterruptible power supply must be used in accordance with
documented guidelines and procedures and must not power any equipment other
than a SAN Volume Controller node.

2145 UPS-1U operation


Each SAN Volume Controller node monitors the operational state of the
uninterruptible power supply to which it is attached.

If the 2145 UPS-1U reports a loss of input power, the SAN Volume Controller node
stops all I/O operations and dumps the contents of its dynamic random access
memory (DRAM) to the internal disk drive. When input power to the 2145 UPS-1U
is restored, the SAN Volume Controller node restarts and restores the original
contents of the DRAM from the data saved on the disk drive.

A SAN Volume Controller node is not fully operational until the 2145 UPS-1U
battery state indicates that it has sufficient charge to power the SAN Volume
Controller node long enough to save all of its memory to the disk drive. In the

Chapter 2. Introducing the SAN Volume Controller hardware components 51


event of a power loss, the 2145 UPS-1U has sufficient capacity for the SAN Volume
Controller to save all its memory to disk at least twice. For a fully charged 2145
UPS-1U, even after battery charge has been used to power the SAN Volume
Controller node while it saves dynamic random access memory (DRAM) data,
sufficient battery charge remains so that the SAN Volume Controller node can
become fully operational as soon as input power is restored.

Important: Do not shut down a 2145 UPS-1U without first shutting down the SAN
Volume Controller node that it supports. Data integrity can be compromised by
pushing the 2145 UPS-1U on/off button when the node is still operating. However,
in the case of an emergency, you can manually shut down the 2145 UPS-1U by
pushing the 2145 UPS-1U on/off button when the node is still operating. Service
actions must then be performed before the node can resume normal operations. If
multiple uninterruptible power supply units are shut down before the nodes they
support, data can be corrupted.

Connecting the 2145 UPS-1U to the SAN Volume Controller


To provide redundancy and concurrent maintenance, you must install the SAN
Volume Controller nodes in pairs.

For connection to the 2145 UPS-1U, each SAN Volume Controller of a pair must be
connected to only one 2145 UPS-1U.

Note: A clustered system can contain no more than eight SAN Volume Controller
nodes. The 2145 UPS-1U must be attached to a source that is both single phase and
200-240 V. The 2145 UPS-1U has an integrated circuit breaker and does not need
external protection.

SAN Volume Controller provides a cable bundle for connecting the uninterruptible
power supply to a node. For SAN Volume Controller 2145-8F2, SAN Volume
Controller 2145-8F4, SAN Volume Controller 2145-8G4, and SAN Volume
Controller 2145-8A4, this is a single power cable plus a serial cable. For SAN
Volume Controller 2145-CF8 and SAN Volume Controller 2145-CG8, this is a
dual-power cable plus serial cable. This cable is used to connect both power
supplies of a node to the same uninterruptible power supply.

The SAN Volume Controller software determines whether the input voltage to the
uninterruptible power supply is within range and sets an appropriate voltage
alarm range on the uninterruptible power supply. The software continues to
recheck the input voltage every few minutes. If it changes substantially but
remains within the permitted range, the alarm limits are readjusted.

Note: The 2145 UPS-1U is equipped with a cable retention bracket that keeps the
power cable from disengaging from the rear panel. See the related documentation
for more information.

2145 UPS-1U controls and indicators


All controls and indicators for the 2145 UPS-1U are located on the front-panel
assembly.

52 SAN Volume Controller: Troubleshooting Guide


7
LOAD 2 LOAD 1 + -

1yyzvm
1 2 3 4 5 6

Figure 49. 2145 UPS-1U front-panel assembly

1 Load segment 2 indicator


2 Load segment 1 indicator
3 Alarm or service indicator
4 On-battery indicator
5 Overload indicator
6 Power-on indicator
7 On/off button
8 Test and alarm reset button

Load segment 2 indicator:

The load segment 2 indicator on the 2145 UPS-1U is lit (green) when power is
available to load segment 2.

When the load segment 2 indicator is green, the 2145 UPS-1U is running normally
and power is available to this segment.

Load segment 1 indicator:

The load segment 1 indicator on the 2145 UPS-1U is not currently used by the
SAN Volume Controller.

Note: When the 2145 UPS-1U is configured by the SAN Volume Controller, this
load segment is disabled. During normal operation, the load segment 1 indicator is
off. A “Do not use” label covers the receptacles.

Alarm indicator:

If the alarm on the 2145 UPS-1U is flashing red, maintenance is required.

If the alarm is on, go to the 2145 UPS-1U MAP to resolve the problem.

On-battery indicator:

The amber on-battery indicator is on when the 2145 UPS-1U is powered by the
battery. This indicates that the main power source has failed.

If the on-battery indicator is on, go to the 2145 UPS-1U MAP to resolve the
problem.

Overload indicator:

Chapter 2. Introducing the SAN Volume Controller hardware components 53


The overload indicator lights up when the capacity of the 2145 UPS-1U is
exceeded.

If the overload indicator is on, go to MAP 5250: 2145 UPS-1U repair verification to
resolve the problem.

Power-on indicator:

The power-on indicator is displayed when the 2145 UPS-1U is functioning.

When the power-on indicator is a steady green, the 2145 UPS-1U is active.

On or off button:

The on or off button turns the power on or off for the 2145 UPS-1U.

Turning on the 2145 UPS-1U

After you connect the 2145 UPS-1U to the outlet, it remains in standby mode until
you turn it on. Press and hold the on or off button until the power-on indicator is
illuminated (approximately five seconds). On some versions of the 2145 UPS-1U,
you might need a pointed device, such as a screwdriver, to press the on or off
button. A self-test is initiated that takes approximately 10 seconds, during which
time the indicators are turned on and off several times. The 2145 UPS-1U then
enters normal mode.

Turning off the 2145 UPS-1U

Press and hold the on or off button until the power-on light is extinguished
(approximately five seconds). On some versions of the 2145 UPS-1U, you might
need a pointed device, such as a screwdriver, to press the on or off button. This
places the 2145 UPS-1U in standby mode. You must then unplug the 2145 UPS-1U
to turn off the unit.

Attention: Do not turn off the uninterruptible power supply before you shut
down the SAN Volume Controller node that it is connected to. Always follow the
instructions that are provided in MAP 5350 to perform an orderly shutdown of a
SAN Volume Controller node.

Test and alarm reset button:

Use the test and alarm reset button to start the self-test.

To start the self-test, press and hold the test and alarm reset button for three
seconds. This button also resets the alarm.

2145 UPS-1U connectors and switches


The 2145 UPS-1U has external connectors and dip switches.

Locations for the 2145 UPS-1U connectors and switches

Figure 50 on page 55 shows the location of the connectors and switches on the 2145
UPS-1U.

54 SAN Volume Controller: Troubleshooting Guide


svc00308
1 2 3 4 5

Figure 50. 2145 UPS-1U connectors and switches

1 Main power connector


2 Communication port
3 Dip switches
4 Load segment 1 receptacles
5 Load segment 2 receptacles

2145 UPS-1U dip switches

Figure 51 shows the dip switches, which can be used to configure the input and
output voltage ranges. Because this function is performed by the SAN Volume
Controller software, both switches must be left in the OFF position.

ON 1 2
svc00147

OFF
Figure 51. 2145 UPS-1U dip switches

2145 UPS-1U ports not used

The 2145 UPS-1U is equipped with ports that are not used by the SAN Volume
Controller and have not been tested. Use of these ports, in conjunction with the
SAN Volume Controller or any other application that might be used with the SAN
Volume Controller, is not supported. Figure 52 shows the 2145 UPS-1U ports that
are not used.

Figure 52. Ports not used by the 2145 UPS-1U

1 USB interface port


2 Network ports
3 Load segment receptacles

Chapter 2. Introducing the SAN Volume Controller hardware components 55


2145 UPS-1U power connector

Figure 53 shows the power connector for the 2145 UPS-1U.

Neutral
Ground

Live

Figure 53. Power connector

Uninterruptible power-supply environment requirements


An uninterruptible power-supply environment requires that certain specifications
for the physical site of the SAN Volume Controller must be met.

2145 UPS-1U environment


All SAN Volume Controller models are supported with the 2145 UPS-1U.

2145 UPS-1U specifications

The following tables describe the physical characteristics of the 2145 UPS-1U.

2145 UPS-1U dimensions and weight

Ensure that space is available in a rack that is capable of supporting the 2145
UPS-1U.

Height Width Depth Maximum weight


44 mm 439 mm 579 mm 16 kg
(1.73 in.) (17.3 in.) (22.8 in.) (35.3 lb)
Note: The 2145 UPS-1U package, which includes support rails, weighs 18.8 kg (41.4 lb).

Heat output

The 2145 UPS-1U unit produces the following approximate heat output.

Heat output during normal Heat output during battery


Model operation operation
2145 UPS-1U 10 W (34 Btu per hour) 150 W (512 Btu per hour)

Defining the SAN Volume Controller FRUs


The SAN Volume Controller node, redundant ac-power switch, and uninterruptible
power supply each consist of one or more field-replaceable units (FRUs).

SAN Volume Controller FRUs


The SAN Volume Controller nodes each consist of several field-replaceable units
(FRUs), such as the Fibre Channel adapter, service controller, disk drive,

56 SAN Volume Controller: Troubleshooting Guide


microprocessor, memory module, CMOS battery, power supply assembly, fan
assembly, and the operator-information panel.

SAN Volume Controller 2145-CG8 FRUs

Table 18 provides a brief description of eachSAN Volume Controller 2145-CG8


FRU.
Table 18. SAN Volume Controller 2145-CG8 FRU descriptions
FRU Description
System board The system board for the SAN Volume
Controller 2145-CG8 node.
Short-wave small form-factor pluggable A compact optical transceiver that provides
(SFP) transceiver the optical interface to a Fibre Channel
cable. It is capable of autonegotiating 2, 4, or
8 gigabits-per-second (Gbps) short-wave
optical connection on the 4-port Fibre
Channel adapter.
Note: It is possible that small form-factor
pluggable (SFP) transceivers other than
those shipped with the product are in use
on the Fibre Channel host bus adapter. It is
a customer responsibility to obtain
replacement parts for such SFP transceivers.
The FRU part number is shown as "Non
standard - supplied by customer" in the vital
product data.
Long-wave small form-factor pluggable A compact optical transceiver that provides
(SFP) transceiver the optical interface to a Fibre Channel
cable. It is capable of autonegotiating 2, 4, or
8 Gbps short-wave optical connection on the
4-port Fibre Channel adapter.
Note: It is possible that small form-factor
pluggable (SFP) transceivers other than
those shipped with the product are in use
on the Fibre Channel host bus adapter. It is
a customer responsibility to obtain
replacement parts for such SFP transceivers.
The FRU part number is shown as "Non
standard - supplied by customer" in the vital
product data.
4-port Fibre Channel host bus adapter The SAN Volume Controller 2145-CG8 is
(HBA) connected to the Fibre Channel fabric
through the Fibre Channel HBA, which is
located in PCI slot 1. The adapter assembly
includes the Fibre Channel PCI Express
adapter, four short-wave SFP transceivers,
the riser card, and bracket.
Service controller The unit that provides the service functions
and the front panel display and buttons.
Service controller cable The USB cable that is used to connect the
service controller to the system board.
Disk drive The serial-attached SCSI (SAS) 2.5" disk
drive.
Disk signal cable A 200mm SAS disk-signal cable.

Chapter 2. Introducing the SAN Volume Controller hardware components 57


Table 18. SAN Volume Controller 2145-CG8 FRU descriptions (continued)
FRU Description
Disk power cable The power cable for the 2.5" SAS system
disk.
Disk controller A SAS controller card for the SAS 2.5" disk
drive.
USB riser card for the disk controller The riser card that connects the disk
controller to the system board and provides
the USB port to which the service controller
cable connects.
Disk backplane The hot-swap SAS 2.5" disk drive backplane.
Memory module An 8-GB DDR3-1333 2RX4 LP RDIMM
memory module.
Microprocessor The microprocessor on the system board: a
2.53 GHz quad-core microprocessor.
Power supply unit An assembly that provides dc power to the
node.
CMOS battery A 3.0V battery on the system board that
maintains power to back up the system
BIOS settings.
Operator-information panel The information panel that includes the
power-control button and LEDs that indicate
system-board errors, hard drive activity, and
power status.
Operator-information panel cable A cable that connects the
operator-information panel to the system
board.
Fan assembly A fan assembly that is used in all the fan
positions.
Power cable assembly The cable assembly that connects the SAN
Volume Controller and the 2145 UPS-1U.
The assembly consists of two power cables
and a serial cable bundled together.
Blank drive bay filler assembly A blank drive bay filler assembly.
Alcohol wipe A cleaning wipe.
Thermal grease Grease that is used to provide a thermal seal
between a processor and a heat sink.

SAN Volume Controller 2145-CF8 FRUs

Table 19 provides a brief description of each SAN Volume Controller 2145-CF8


FRU.
Table 19. SAN Volume Controller 2145-CF8 FRU descriptions
FRU Description
System board The system board for the SAN Volume
Controller 2145-CF8 node.

58 SAN Volume Controller: Troubleshooting Guide


Table 19. SAN Volume Controller 2145-CF8 FRU descriptions (continued)
FRU Description
Fibre Channel small form-factor pluggable A compact optical transceiver that provides
(SFP) transceiver the optical interface to a Fibre Channel
cable. It is capable of autonegotiating 2, 4, or
8 Gbps short-wave optical connection on the
4-port Fibre Channel adapter.
Note: It is possible that SFPs other than
those shipped with the product are in use
on the Fibre Channel host bus adapter. It is
a customer responsibility to obtain
replacement parts for such SFP transceivers.
The FRU part number is shown as "Non
standard - supplied by customer" in the vital
product data.
4-port Fibre Channel host bus adapter The SAN Volume Controller 2145-CF8 is
(HBA) connected to the Fibre Channel fabric
through the Fibre Channel HBA, which is
located in PCI slot 1. The adapter assembly
includes the Fibre Channel PCI Express
adapter, four short-wave SFP transceivers,
the riser card, and bracket.
Service controller The unit that provides the service functions
and the front panel display and buttons.
Service controller cable The USB cable that is used to connect the
service controller to the system board.
Disk drive The serial-attached SCSI (SAS) 2.5" disk
drive.
Disk signal cable A 200mm SAS disk-signal cable.
Disk power cable A SAS disk-power cable.
Disk controller A SAS controller card for the SAS 2.5" disk
drive.
Disk controller / USB riser card The riser card that connects the disk
controller to the system board and provides
the USB port to which the service controller
cable connects.
Disk backplane The hot-swap SAS 2.5" disk drive backplane.
Memory module A 4 GB DDR3-1333 2RX4 LP RDIMM
memory module
Microprocessor The microprocessor on the system board.
2.40 GHz quad-core microprocessor.
Power supply unit An assembly that provides dc power to the
SAN Volume Controller 2145-CF8 node.
CMOS battery A 3.0V battery on the system board that
maintains power to back up the system
BIOS settings.
Operator-information panel The information panel that includes the
power-control button and LEDs that indicate
system-board errors, hard drive activity, and
power status.
Operator-information panel cable A cable that connects the
operator-information panel to the system
board.

Chapter 2. Introducing the SAN Volume Controller hardware components 59


Table 19. SAN Volume Controller 2145-CF8 FRU descriptions (continued)
FRU Description
Fan assembly A fan assembly that is used in all the fan
positions.
Power cable assembly The cable assembly that connects the SAN
Volume Controller and the 2145 UPS-1U.
The assembly consists of two power cables
and a serial cable bundled together.
Alcohol wipe A cleaning wipe.
Thermal grease Grease that is used to provide a thermal seal
between a processor and a heat sink.

Ethernet feature FRUs

Table 20 provides a brief description of each Ethernet feature FRU.


Table 20. Ethernet feature FRU descriptions
FRU Description
10 Gbps Ethernet adapter A 10 Gbps Ethernet adapter.
10 Gbps Ethernet fibre SFP A 10 Gbps Ethernet fibre SFP.

Solid-state drive (SSD) feature FRUs

Table 21 provides a brief description of each SSD feature FRU.


Table 21. Solid-state drive (SSD) feature FRU descriptions
FRU Description
High-speed SAS adapter An assembly that includes a high-speed SAS
adapter card that provides connectivity up
to four solid-state drives (SSDs). The
assembly also contains riser card, blanking
plate, and screws.
High-speed SAS cable The cable used to connect the high-speed
SAS adapter to the disk backplate.
146 GB solid-state drive (SSD) A 146-GB solid-state drive (SSD).

2145 UPS-1U FRUs

Table 22 provides a brief description of each 2145 UPS-1U FRU.


Table 22. 2145 UPS-1U FRU descriptions
FRU Description
2145 UPS-1U assembly An uninterruptible power-supply assembly
for use with the SAN Volume Controller.
Battery pack assembly The battery that provides backup power to
the SAN Volume Controller if a power
failure occurs.
Power cable, PDU to 2145 UPS-1U Input power cable for connecting the 2145
UPS-1U to a rack power distribution unit.

60 SAN Volume Controller: Troubleshooting Guide


Table 22. 2145 UPS-1U FRU descriptions (continued)
FRU Description
Power cable, mains to UPS-1 (US) Input power cable for connecting the 2145
UPS-1U to mains power (United States
only).

SAN Volume Controller 2145-8A4 FRUs

Table 23 provides a brief description of each SAN Volume Controller 2145-8A4


FRU.
Table 23. SAN Volume Controller 2145-8A4 FRU descriptions
FRU Description
Memory module A 2 GB PC2–5300 ECC memory module.
Riser card, PCI Express An interconnection card that provides the
interface between the system board and the
4-port Fibre Channel adapter.
4-port Fibre Channel host bus adapter The SAN Volume Controller 2145-8A4 is
(HBA) connected to the Fibre Channel fabric
through the Fibre Channel HBA, which is
located in PCI slot 1.
Fibre Channel small form-factor pluggable A compact optical transceiver that provides
(SFP) transceiver the optical interface to a Fibre Channel
cable. It is capable of operating at up to 4
Gbps.
System board The system board for the SAN Volume
Controller 2145-8A4 node.
Disk drive backplane with cables A SATA simple-swap hard disk drive
backplane with cables.
Power supply An assembly that provides dc power to the
SAN Volume Controller 2145-8A4 node.
Fan A single fan.
Drive cage A cage for the SATA simple-swap hard disk
drive.
Hard disk drive A SATA (serial advanced technology
attachment) disk drive for the SAN Volume
Controller 2145-8A4.
Service controller The unit that provides the service functions
and the front panel display and buttons.
Operator-information panel The information panel that includes the
power-control button and LEDs that indicate
system-board errors, hard drive activity, and
power status.
Operator-information panel cable A cable that connects the
operator-information panel to the system
board.
Air baffle An apparatus that redirects or contains air
flow to keep the computer components cool.
Microprocessor The microprocessor on the system board.

Chapter 2. Introducing the SAN Volume Controller hardware components 61


Table 23. SAN Volume Controller 2145-8A4 FRU descriptions (continued)
FRU Description
CMOS battery A 3.0V battery on the system board that
maintains power to backup the system BIOS
settings.
Heat-sink assembly retention module The unit that is used to install the heat-sink
assembly in the SAN Volume Controller
2145-8A4 node.
Heat-sink assembly An apparatus that is used to dissipate the
heat that is generated by the microprocessor.
Input-power cable assembly The cable assembly that provides the power
and signal connections between the SAN
Volume Controller 2145-8A4 and the 2145
UPS-1U assembly.

SAN Volume Controller 2145-8G4 FRUs

Table 24 provides a brief description of each SAN Volume Controller 2145-8G4


FRU.
Table 24. SAN Volume Controller 2145-8G4 FRU descriptions
FRU Description
System board The planar for the SAN Volume Controller
2145-8G4 node.
4-port Fibre Channel host bus adapter The SAN Volume Controller 2145-8G4 is
(HBA) connected to the Fibre Channel fabric
through the Fibre Channel HBA, which is
located in PCI slot 1.
Fibre Channel small form-factor pluggable A compact optical transceiver that provides
(SFP) transceiver the optical interface to a Fibre Channel
cable. It is capable of operating at up to 4
Gbps.
Riser card, PCI Express An interconnection card that provides the
interface between the system board and the
4-port Fibre Channel adapter.
Service controller The FRU that provides the service functions
and the front panel display and buttons.
Disk drive A SATA (serial advanced technology
attachment) disk drive for the SAN Volume
Controller 2145-8G4.
Disk drive cage assembly A SATA disk drive cage assembly for the
SAN Volume Controller 2145-8G4.
Disk-drive backplane A SATA disk drive cable assembly with
backplane.
Memory module An ECC DRR2 memory module
Microprocessor The microprocessor on the system board.
Power supply assembly An assembly that provides dc power to the
SAN Volume Controller 2145-8G4.
Power backplane An assembly that provides a power interface
between the system board and the power
supply assembly.

62 SAN Volume Controller: Troubleshooting Guide


Table 24. SAN Volume Controller 2145-8G4 FRU descriptions (continued)
FRU Description
CMOS battery A 3.0V battery on the system board that
maintains power to back up the system
BIOS settings.
Front panel signal cable A ribbon cable that connects the
operator-information panel to the system
board.
Operator-information panel The information panel that includes the
power control button and the light path
diagnostics LEDs.
Fan assembly A fan assembly containing two fans, which
is used in all the fan positions.
Input-power cable assembly The cable assembly that provides the power
and signal connections between the SAN
Volume Controller 2145-8G4 and the 2145
UPS-1U assembly.

SAN Volume Controller 2145-8F4 FRUs

Table 25 provides a brief description of each SAN Volume Controller 2145-8F4 FRU.
Table 25. SAN Volume Controller 2145-8F4 FRU descriptions
FRU Description
Frame assembly A complete SAN Volume Controller
2145-8F4 with the exception of the Fibre
Channel cards and the service controller.
4-port Fibre Channel host bus adapter The SAN Volume Controller 2145-8F4 is
(HBA) connected to the Fibre Channel fabric
through the Fibre Channel HBA. The card
assembly is located in PCI slot 2. It is not
permitted to install a Fibre Channel card in
PCI slot 1 when the card is installed.
Fibre Channel small form-factor pluggable A compact optical transceiver that provides
(SFP) transceiver the optical interface to a Fibre Channel
cable. It is capable of operating at up to 4
Gbps.
Riser card, PCI Express An interconnection card that provides the
interface between the system board and the
4-port Fibre Channel adapter.
Service controller The FRU that provides the service functions
and the front panel display and buttons.
Disk drive assembly A SATA (serial advanced technology
attachment) disk drive assembly for the
SAN Volume Controller 2145-8F4.
Memory module A 1 GB ECC DRR2 memory module.
Microprocessor The microprocessor on the system board.
Voltage regulator module (VRM) The VRM of the microprocessor.
Power supply assembly An assembly that provides dc power to the
SAN Volume Controller 2145-8F4.

Chapter 2. Introducing the SAN Volume Controller hardware components 63


Table 25. SAN Volume Controller 2145-8F4 FRU descriptions (continued)
FRU Description
Power backplane An assembly that provides a power interface
between the system board and the power
supply assembly.
CMOS battery A 3.0V battery on the system board that
maintains power to backup the system BIOS
settings.
Fan power cable A kit that provides the cables for connecting
the fan backplanes to the system board.
Front panel signal cable A ribbon cable that connects the
operator-information panel to the system
board.
Fan backplane A kit that provides all fan holder and fan
backplane assemblies.
Operator-information panel The information panel that includes the
power-control button and the light path
diagnostics LEDs.
Fan, 40×40×28 The single fan assemblies located in fan
positions 1 - 3.
Fan, 40×40×56 The double fan assemblies located in fan
positions 4 - 7.
Input-power cable assembly The cable assembly that provides the power
and signal connections between the SAN
Volume Controller 2145-8F4 and the 2145
UPS-1U assembly.

SAN Volume Controller 2145-8F2 FRUs

Table 26 provides a brief description of each SAN Volume Controller 2145-8F2 FRU.
Table 26. SAN Volume Controller 2145-8F2 FRU descriptions
FRU Description
Frame assembly A complete SAN Volume Controller
2145-8F2 with the exception of the Fibre
Channel cards and the service controller.
Fibre Channel host bus adapter (HBA) (full The SAN Volume Controller 2145-8F2 is
height) connected to the Fibre Channel fabric
through the Fibre Channel HBA. The full
height card assembly is located in PCI slot 2.
Fibre Channel small form-factor pluggable A compact optical transceiver that provides
(SFP) transceiver the optical interface to a Fibre Channel
cable. Its maximum speed is limited to 2
Gbps by the Fibre Channel adapter.
Riser card, PCI (full height) An interconnection card that provides the
interface between the system board and the
PCI card in slot 2.
Fibre Channel HBA (low profile) The SAN Volume Controller 2145-8F2 is
connected to the Fibre Channel fabric
through the Fibre Channel HBA. The low
profile card assembly is located in PCI slot 1.

64 SAN Volume Controller: Troubleshooting Guide


Table 26. SAN Volume Controller 2145-8F2 FRU descriptions (continued)
FRU Description
Riser card, PCI (low profile) An interconnection card that provides the
interface between the system board and the
PCI card in slot 1.
Service controller The FRU that provides the service functions
and the front panel display and buttons.
Disk drive assembly A SATA (serial advanced technology
attachment) disk drive assembly for the
SAN Volume Controller 2145-8F2.
Memory module A 1 GB ECC DRR2 memory module.
Microprocessor The microprocessor on the system board.
Voltage regulator module (VRM) The VRM of the microprocessor.
Power supply assembly An assembly that provides DC power to the
SAN Volume Controller 2145-8F2
Power backplane An assembly that provides a power interface
between the system board and the power
supply assembly.
CMOS battery A 3.0V battery on the system board that
maintains power to backup the system BIOS
settings.
Fan power cable A kit that provides the cables for connecting
the fan backplanes to the system board.
Front panel signal cable A ribbon cable that connects the
operator-information panel to the system
board.
Fan backplane A kit that provides all fan holder and fan
backplane assemblies.
Operator-information panel The information panel that includes the
power control button and the light path
diagnostics LEDs.
Fan, 40×40×28 The single fan assemblies located in fan
positions 1-3.
Fan, 40×40×56 The double fan assemblies located in fan
positions 4-7
Input-power cable assembly The cable assembly that provides the power
and signal connections between the SAN
Volume Controller 2145-8F2 and the 2145
UPS-1U assembly.

Redundant ac-power switch FRUs


The redundant ac-power switch consists of a single field replaceable unit (FRU).

FRU Description
Redundant ac-power switch The redundant ac-power switch and its input power
assembly cables.

Chapter 2. Introducing the SAN Volume Controller hardware components 65


66 SAN Volume Controller: Troubleshooting Guide
Chapter 3. SAN Volume Controller user interfaces for
servicing your system
SAN Volume Controller provides a number of user interfaces to troubleshoot,
recover, or maintain your system. The interfaces provide various sets of facilities to
help resolve situations that you might encounter.
v Use the management GUI to monitor and maintain the configuration of storage
that is associated with your clustered systems.
v Perform service procedures from the service assistant.
v Use the command-line interface (CLI) to manage your system.

Management GUI interface


The management GUI is a browser-based GUI for configuring and managing all
aspects of your system. It provides extensive facilities to help troubleshoot and
correct problems.

About this task

You use the management GUI to manage and service your system. The Monitoring
> Events panel provides access to problems that must be fixed and maintenance
procedures that step you through the process of correcting the problem.

The information on the Events panel can be filtered three ways:


Recommended actions (default)
Shows only the alerts that require attention. Alerts are listed in priority
order and should be fixed sequentially by using the available fix
procedures. For each problem that is selected, you can:
v Run a fix procedure.
v View the properties.
Unfixed messages and alerts
Displays only the alerts and messages that are not fixed. For each entry
that is selected, you can:
v Run a fix procedure.
v Mark an event as fixed.
v Filter the entries to show them by specific minutes, hours, or dates.
v Reset the date filter.
v View the properties.
Show all
Displays all event types whether they are fixed or unfixed. For each entry
that is selected, you can:
v Run a fix procedure.
v Mark an event as fixed.
v Filter the entries to show them by specific minutes, hours, or dates.
v Reset the date filter.
v View the properties.

© Copyright IBM Corp. 2003, 2012 67


Some events require a certain number of occurrences in 25 hours before they are
displayed as unfixed. If they do not reach this threshold in 25 hours, they are
flagged as expired. Monitoring events are below the coalesce threshold and are
usually transient.

You can also sort events by time or error code. When you sort by error code, the
most serious events, those with the lowest numbers, are displayed first. You can
select any event that is listed and select Actions > Properties to view details about
the event.
v Recommended Actions. For each problem that is selected, you can:
– Run a fix procedure.
– View the properties.
v Event log. For each entry that is selected, you can:
– Run a fix procedure.
– Mark an event as fixed.
– Filter the entries to show them by specific minutes, hours, or dates.
– Reset the date filter.
– View the properties.

When to use the management GUI


The management GUI is the primary tool that is used to service your system.

Regularly monitor the status of the system using the management GUI. If you
suspect a problem, use the management GUI first to diagnose and resolve the
problem.

Use the views that are available in the management GUI to verify the status of the
system, the hardware devices, the physical storage, and the available volumes. The
Monitoring > Events panel provides access to all problems that exist on the
system. Use the Recommended Actions filter to display the most important events
that need to be resolved.

If there is a service error code for the alert, you can run a fix procedure that assists
you in resolving the problem. These fix procedures analyze the system and provide
more information about the problem. They suggest actions to take and step you
through the actions that automatically manage the system where necessary. Finally,
they check that the problem is resolved.

If there is an error that is reported, always use the fix procedures within the
management GUI to resolve the problem. Always use the fix procedures for both
software configuration problems and hardware failures. The fix procedures analyze
the system to ensure that the required changes do not cause volumes to be
inaccessible to the hosts. The fix procedures automatically perform configuration
changes that are required to return the system to its optimum state.

Accessing the management GUI


This procedure describes how to access the management GUI.

About this task

You must use a supported web browser. Verify that you are using a supported web
browser from the following website:

68 SAN Volume Controller: Troubleshooting Guide


www.ibm.com/storage/support/2145

You can use the management GUI to manage your system as soon as you have
created a clustered system.

Procedure
1. Start a supported web browser and point the browser to the management IP
address of your system.
The management IP address is set when the clustered system is created. Up to
four addresses can be configured for your use. There are two addresses for
IPv4 access and two addresses for IPv6 access.
2. When the connection is successful, you see a login panel.
3. Log on by using your user name and password.
4. When you have logged on, select Monitoring > Events.
5. Ensure that the events log is filtered using Recommended actions.
6. Select the recommended action and run the fix procedure.
7. Continue to work through the alerts in the order suggested, if possible.

Results

After all the alerts are fixed, check the status of your system to ensure that it is
operating as intended.

Deleting a node from a clustered system using the


management GUI
Remove a node from a system if the node has failed and is being replaced with a
new node or if the repair that has been performed has caused that node to be
unrecognizable by the system.

Before you begin

The cache on the selected node is flushed before the node is taken offline. In some
circumstances, such as when the system is already degraded (for example, when
both nodes in the I/O group are online and the volumes within the I/O group are
degraded), the system ensures that data loss does not occur as a result of deleting
the only node with the cache data. If a failure occurs on the other node in the I/O
group, the cache is flushed before the node is removed to prevent data loss.

Before deleting a node from the system, record the node serial number, worldwide
node name (WWNN), all worldwide port names (WWPNs), and the I/O group
that the node is currently part of. If the node is re-added to the system at a later
time, recording this node information can avoid data corruption.

Chapter 3. SAN Volume Controller user interfaces for servicing your system 69
Attention:
v If you are removing a single node and the remaining node in the I/O group is
online, the data on the remaining node goes into write-through mode. This data
can be exposed to a single point of failure if the remaining node fails.
v If the volumes are already degraded before you remove a node, redundancy to
the volumes is degraded. Removing a node might result in a loss of access to
data and data loss.
v Removing the last node in the system destroys the system. Before you remove
the last node in the system, ensure that you want to destroy the system.
v When you remove a node, you remove all redundancy from the I/O group. As a
result, new or existing failures can cause I/O errors on the hosts. The following
failures can occur:
– Host configuration errors
– Zoning errors
– Multipathing-software configuration errors
v If you are deleting the last node in an I/O group and there are volumes that are
assigned to the I/O group, you cannot remove the node from the system if the
node is online. You must back up or migrate all data that you want to save
before you remove the node. If the node is offline, you can remove the node.
v When you remove the configuration node, the configuration function moves to a
different node within the system. This process can take a short time, typically
less than a minute. The management GUI reattaches to the new configuration
node transparently.
v If you turn the power on to the node that has been removed and it is still
connected to the same fabric or zone, it attempts to rejoin the system. The
system tells the node to remove itself from the system and the node becomes a
candidate for addition to this system or another system.
v If you are adding this node into the system, ensure that you add it to the same
I/O group that it was previously a member of. Failure to do so can result in
data corruption.

This task assumes that you have already accessed the management GUI.

About this task

Complete the following steps to remove a node from a system:

Procedure
1. Select Monitoring > System.
2. Find the node that you want to remove.
If the node that you want to remove is shown as Offline, then the node is not
participating in the system.
If the node that you want to remove is shown as Online, deleting the node can
result in the dependent volumes to also go offline. Verify whether the node has
any dependent volumes.
3. To check for dependent volumes before attempting to remove the node, click
Manage , and then click Show Dependent Volumes.
If any volumes are listed, determine why and if access to the volumes is
required while the node is removed from the system. If the volumes are
assigned from MDisk groups that contain solid-state drives (SSDs) that are
located in the node, check why the volume mirror, if it is configured, is not
synchronized. There can also be dependent volumes because the partner node

70 SAN Volume Controller: Troubleshooting Guide


in the I/O group is offline. Fabric issues can also prevent the volume from
communicating with the storage systems. Resolve these problems before
continuing with the node removal.
4. Click Remove Node.
5. Click OK to remove the node. Before a node is removed, the SAN Volume
Controller checks to determine if there are any volumes that depend on that
node. If the node that you selected contains volumes within the following
situations, the volumes go offline and become unavailable if the node is
removed:
v The node contains solid-state drives (SSDs) and also contains the only
synchronized copy of a mirrored volume
v The other node in the I/O group is offline
If you select a node to remove that has these dependencies, another panel
displays confirming the removal.

Adding nodes to a clustered system


This topic provides instructions for adding a node to a clustered system. It also
contains information about adding a node if the node previously failed and is
being replaced with a new node or if a repair action has caused the node to be
unrecognizable by the system. When adding nodes ensure that they are added in
pairs to create a full I/O group.

Before you add a node to a system, you must make sure that the switch zoning is
configured such that the node being added is in the same zone as all other nodes
in the system. If you are replacing a node and the switch is zoned by worldwide
port name (WWPN) rather than by switch port, make sure that the switch is
configured such that the node being added is in the same VSAN or zone.

Considerations when adding a node to a system

If you are adding a node that has been used previously, either within a different
I/O group within this system or within a different system, consider the following
situations before adding the node. If you add a node to the system without
changing its worldwide node name (WWNN), hosts might detect the node and use
it as if it were in its old location. This action might cause the hosts to access the
wrong volumes.
v If the new node requires a level of software that is higher than the software level
that is available on the system, the entire clustered system must be upgraded
before the new node can be added.
v If you are re-adding a node back to the same I/O group after a service action
required the node to be deleted from the system and the physical node has not
changed, no special procedures are required and the node can be added back to
the system.
v If you are replacing a node in a system either because of a node failure or an
upgrade, you must change the WWNN of the new node to match that of the
original node before you connect the node to the Fibre Channel network and
add the node to the system.
v If you are creating an I/O group in the system and are adding a new node,
there are no special procedures because this node was never added to a system
and the WWNN for the node did not exist.
v If you are creating an I/O group in the system and are adding a new node that
has been added to a system before, the host system might still be configured to
the node WWPNs and the node might still be zoned in the fabric. Because you

Chapter 3. SAN Volume Controller user interfaces for servicing your system 71
cannot change the WWNN for the node, you must ensure that other components
in your fabric are configured correctly. Verify that any host that was previously
configured to use the node has been correctly updated.
v If the node that you are adding was previously replaced, either for a node repair
or upgrade, you might have used the WWNN of that node for the replacement
node. Ensure that the WWNN of this node was updated so that you do not have
two nodes with the same WWNN attached to your fabric. Also ensure that the
WWNN of the node that you are adding is not 00000. If it is 00000, contact your
IBM representative.

Considerations when using multipathing device drivers


v Applications on the host systems direct I/O operations to file systems or logical
volumes that are mapped by the operating system to virtual paths (vpaths),
which are pseudo disk objects that are supported by the multipathing device
drivers. Multipathing device drivers maintain an association between a vpath
and a SAN Volume Controller volume. This association uses an identifier (UID)
which is unique to the volume and is never reused. The UID allows
multipathing device drivers to directly associate vpaths with volumes.
v Multipathing device drivers operate within a protocol stack that contains disk
and Fibre Channel device drivers that are used to communicate with the SAN
Volume Controller using the SCSI protocol over Fibre Channel as defined by the
ANSI FCS standard. The addressing scheme that is provided by these SCSI and
Fibre Channel device drivers uses a combination of a SCSI logical unit number
(LUN) and the worldwide node name (WWNN) for the Fibre Channel node and
ports.
v If an error occurs, the error recovery procedures (ERPs) operate at various tiers
in the protocol stack. Some of these ERPs cause I/O to be redriven using the
same WWNN and LUN numbers that were previously used.
v Multipathing device drivers do not check the association of the volume with the
vpath on every I/O operation that it performs.

72 SAN Volume Controller: Troubleshooting Guide


Adding nodes to a system by using the management GUI

Attention:
1. If you are adding a node to the SAN again, ensure that you are adding the
node to the same I/O group from which it was removed. Failure to do this
action can result in data corruption. You must use the information that was
recorded when the node was originally added to the system. If you do not
have access to this information, call the IBM Support Center to add the node
back into the system without corrupting the data.
2. For each external storage system, the LUNs that are presented to the ports on
the new node must be the same as the LUNs that are presented to the nodes
that currently exist in the system. You must ensure that the LUNs are the same
before you add the new node to the system.
3. For each external storage system, LUN masking for each LUN must be
identical for all nodes in a system. You must ensure that the LUN masking for
each LUN is identical before you add the new node to the system.
4. You must ensure that the model type of the new node is supported by the SAN
Volume Controller software level that is currently installed on the system. If the
model type is not supported by the SAN Volume Controller software level,
upgrade the system to a software level that supports the model type of the new
node. See the following website for the latest supported software levels:
www.ibm.com/storage/support/2145

Each node in an I/O group must be connected to a different uninterruptible power


supply. Each node must also have a unique name. If you do not provide a name,
the system assigns a default name to the object.

Note: Whenever possible you must provide a meaningful name for objects to
make identifying that object easier in the future.

This task assumes that you have already accessed the management GUI.

To add a node to a clustered system, follow these steps:


1. Select Monitoring > System.
2. From the rack image, click an empty slot that is associated with the I/O group
that you want to add the node.
3. Select the candidate node that you want to add.
If the node that you want to add is unavailable in the candidate list, the node
is in service state. Actions are required to release the node from service state
before it can be added to the system.
4. Select Add node. You are shown a warning.
5. Click OK.
6. If you are adding a node into a clustered systems for the first time, record the
following information:
v Node serial number
v All WWPNs
v The I/O group that the node belongs to

Important: You need this information to avoid possible data corruption if you
must remove and add the node to the system again.

Chapter 3. SAN Volume Controller user interfaces for servicing your system 73
If a node shows node error 578 or node error 690, the node is in service state.
Perform the following steps from the front panel to exit service state:
1. Press and release the up or down button until the Actions? option displays.
2. Press the select button.
3. Press and release the up or down button until the Exit Service? option
displays.
4. Press the select button.
5. Press and release the left or right button until the Confirm Exit? option
displays.
6. Press the select button.

For any other node errors, follow the appropriate service procedures to fix the
errors. After the errors are resolved and the node is in candidate state, you can try
to add the node to the system again.

Service assistant interface


The service assistant interface is a browser-based GUI that is used to service your
nodes.

You connect to the service assistant through the service IP address.

When to use the service assistant


The primary use of the service assistant is when a node is in service state. The
node cannot be active as part of a system while it is in service state.

Attention: Perform service actions on nodes only when directed to do so by the


fix procedures. If used inappropriately, the service actions that are available
through the service assistant can cause loss of access to data or even data loss.

The node might be in service state because it has a hardware issue, has corrupted
data, or has lost its configuration data.

Use the service assistant in the following situations:


v When you cannot access the system from the management GUI and you cannot
access the storage SAN Volume Controller to run the recommended actions
v When the recommended action directs you to use the service assistant.

The management GUI operates only when there is an online clustered system. Use
the service assistant if you are unable to create a clustered system.

The service assistant provides detailed status and error summaries. You can also
perform the following service-related actions:
v Collect logs to create and download a package of files to send to support
personnel.
v Remove the data for the system from a node.
v Recover a system if it fails.
v Install a software package from the support site or rescue the software from
another node.
v Upgrade software on nodes manually versus performing a standard upgrade
procedure.

74 SAN Volume Controller: Troubleshooting Guide


v Change the service IP address that is assigned to Ethernet port 1 for the current
node.
v Install a temporary SSH key if a key is not installed and CLI access is required.
v Restart the services used by the system.

Accessing the service assistant


The service assistant is a web application that helps troubleshoot and resolve
problems on a node.

About this task

You must use a supported web browser. Verify that you are using a supported and
an appropriately configured web browser from the following website:

www.ibm.com/storage/support/2145

To start the application, perform the following steps:

Procedure
1. Start a supported web browser and point your web browser to
<serviceaddress>/service for the node that you want to work on.
2. Log on to the service assistant using the superuser password.
If you do not know the current superuser password, reset the password.

Results

Perform the service assistant actions on the correct node.

Cluster (system) command-line interface


Use the command-line interface (CLI) to manage a clustered system using the task
commands and information commands.

For a full description of the commands and how to start an SSH command-line
session, see the “Command-line interface” topic in the “Reference” section of the
SAN Volume Controller Information Center.

When to use the cluster (system) CLI


The cluster (system) CLI is intended for use by advanced users who are confident
at using a command-line interface.

Nearly all of the flexibility that is offered by the CLI is available through the
management GUI. However, the CLI does not provide the fix procedures that are
available in the management GUI. Therefore, use the fix procedures in the
management GUI to resolve the problems. Use the CLI when you require a
configuration setting that is unavailable in the management GUI.

You might also find it useful to create command scripts using the CLI commands
to monitor for certain conditions or to automate configuration changes that you
make on a regular basis.

Chapter 3. SAN Volume Controller user interfaces for servicing your system 75
Accessing the cluster (system) CLI
Follow the steps that are described in the “Command-line interface” topic in the
“Reference” section of the SAN Volume Controller Information Center to initialize
and use a CLI session.

Service command-line interface


Use the service command-line interface (CLI) to manage a node using the task
commands and information commands.

For a full description of the commands and how to start an SSH command-line
session, see the “Command-line interface” topic in the “Reference” section of the
SAN Volume Controller Information Center.

When to use the service CLI


The service CLI is intended for use by advanced users who are confident at using
a command-line interface.

To access a node directly, it is normally easier to use the service assistant with its
graphical interface and extensive help facilities.

Accessing the service CLI


Follow the steps that are described in the “Command-line interface” topic in the
“Reference” section of the SAN Volume Controller Information Center to initialize
and use a CLI session.

76 SAN Volume Controller: Troubleshooting Guide


Chapter 4. Performing recovery actions using the SAN
Volume Controller CLI
The SAN Volume Controller command-line interface (CLI) is a collection of
commands that you can use to manage SAN Volume Controller clusters. See the
Command-line interface documentation for the specific details about the
commands provided here.

Validating and repairing mirrored volume copies using the CLI


You can use the repairvdiskcopy command from the command-line interface (CLI)
to validate and repair mirrored volume copies.

Attention: Run the repairvdiskcopy command only if all volume copies are
synchronized.

When you issue the repairvdiskcopy command, you must use only one of the
-validate, -medium, or -resync parameters. You must also specify the name or ID
of the volume to be validated and repaired as the last entry on the command line.
After you issue the command, no output is displayed.
-validate
Use this parameter if you only want to verify that the mirrored volume copies
are identical. If any difference is found, the command stops and logs an error
that includes the logical block address (LBA) and the length of the first
difference. You can use this parameter, starting at a different LBA each time to
count the number of differences on a volume.
-medium
Use this parameter to convert sectors on all volume copies that contain
different contents into virtual medium errors. Upon completion, the command
logs an event, which indicates the number of differences that were found, the
number that were converted into medium errors, and the number that were
not converted. Use this option if you are unsure what the correct data is, and
you do not want an incorrect version of the data to be used.
-resync
Use this parameter to overwrite contents from the specified primary volume
copy to the other volume copy. The command corrects any differing sectors by
copying the sectors from the primary copy to the copies being compared. Upon
completion, the command process logs an event, which indicates the number
of differences that were corrected. Use this action if you are sure that either the
primary volume copy data is correct or that your host applications can handle
incorrect data.
-startlba lba
Optionally, use this parameter to specify the starting Logical Block Address
(LBA) from which to start the validation and repair. If you previously used the
validate parameter, an error was logged with the LBA where the first
difference, if any, was found. Reissue repairvdiskcopy with that LBA to avoid
reprocessing the initial sectors that compared identically. Continue to reissue
repairvdiskcopy using this parameter to list all the differences.

© Copyright IBM Corp. 2003, 2012 77


Issue the following command to validate and, if necessary, automatically repair
mirrored copies of the specified volume:
repairvdiskcopy -resync -startlba 20 vdisk8

Notes:
1. Only one repairvdiskcopy command can run on a volume at a time.
2. Once you start the repairvdiskcopy command, you cannot use the command to
stop processing.
3. The primary copy of a mirrored volume cannot be changed while the
repairvdiskcopy -resync command is running.
4. If there is only one mirrored copy, the command returns immediately with an
error.
5. If a copy being compared goes offline, the command is halted with an error.
The command is not automatically resumed when the copy is brought back
online.
6. In the case where one copy is readable but the other copy has a medium error,
the command process automatically attempts to fix the medium error by
writing the read data from the other copy.
7. If no differing sectors are found during repairvdiskcopy processing, an
informational error is logged at the end of the process.

Checking the progress of validation and repair of volume copies


using the CLI

Use the lsrepairvdiskcopyprogress command to display the progress of mirrored


volume validation and repairs. You can specify a volume copy using the -copy id
parameter. To display the volumes that have two or more copies with an active
task, specify the command with no parameters; it is not possible to have only one
volume copy with an active task.

To check the progress of validation and repair of mirrored volumes, issue the
following command:
lsrepairvdiskcopyprogress –delim :

The following example shows how the command output is displayed:


vdisk_id:vdisk_name:copy id:task:progress:estimated_completion_time
0:vdisk0:0:medium:50:070301120000
0:vdisk0:1:medium:50:070301120000

Repairing a space-efficient volume using the CLI


You can use the repairsevdiskcopy command from the command-line interface to
repair the metadata on a space-efficient volume.

The repairsevdiskcopy command automatically detects and repairs corrupted


metadata. The command holds the volume offline during the repair, but does not
prevent the disk from being moved between I/O groups.

If a repair operation completes successfully and the volume was previously offline
because of corrupted metadata, the command brings the volume back online. The
only limit on the number of concurrent repair operations is the number of virtual
disk copies in the configuration.

78 SAN Volume Controller: Troubleshooting Guide


When you issue the repairsevdiskcopy command, you must specify the name or
ID of the volume to be repaired as the last entry on the command line. Once
started, a repair operation cannot be paused or cancelled; the repair can only be
terminated by deleting the copy.

Attention: Use this command only to repair a space-efficient volume


(thin-provisioned volume) that has reported corrupt metadata.

Issue the following command to repair the metadata on a space-efficient volume:


repairsevdiskcopy vdisk8

After you issue the command, no output is displayed.

Notes:
1. Because the volume is offline to the host, any I/O that is submitted to the
volume while it is being repaired fails.
2. When the repair operation completes successfully, the corrupted metadata error
is marked as fixed.
3. If the repair operation fails, the volume is held offline and an error is logged.

Checking the progress of the repair of a space-efficient volume


using the CLI

Issue the lsrepairsevdiskcopyprogress command to list the repair progress for


space-efficient volume copies of the specified volume. If you do not specify a
volume, the command lists the repair progress for all space-efficient copies in the
system.

Note: Only run this command after you run the repairsevdiskcopy command,
which you must only run as required by the fix procedures or by IBM support.

Recovering from offline volumes using the CLI


If a node or an I/O group fails, you can use the command-line interface (CLI) to
recover offline volumes.

About this task

If you have lost both nodes in an I/O group and have, therefore, lost access to all
the volumes that are associated with the I/O group, you must perform one of the
following procedures to regain access to your volumes. Depending on the failure
type, you might have lost data that was cached for these volumes and the volumes
are now offline.

Data loss scenario 1

One node in an I/O group has failed and failover has started on the second node.
During the failover process, the second node in the I/O group fails before the data
in the write cache is written to hard disk. The first node is successfully repaired
but its hardened data is not the most recent version that is committed to the data
store; therefore, it cannot be used. The second node is repaired or replaced and has
lost its hardened data, therefore, the node has no way of recognizing that it is part
of the clustered system.

Chapter 4. Performing recovery actions using the SAN Volume Controller CLI 79
Perform the following steps to recover from an offline volume when one node has
down-level hardened data and the other node has lost hardened data:

Procedure
1. Recover the node and add it back into the system.
2. Delete all IBM FlashCopy mappings and Metro Mirror or Global Mirror
relationships that use the offline volumes.
3. Run the recovervdisk, recovervdiskbyiogrp or recovervdiskbysystem
command.
4. Re-create all FlashCopy mappings and Metro Mirror or Global Mirror
relationships that use the volumes.

Example

Data loss scenario 2

Both nodes in the I/O group have failed and have been repaired. The nodes have
lost their hardened data, therefore, the nodes have no way of recognizing that they
are part of the system.

Perform the following steps to recover from an offline volume when both nodes
have lost their hardened data and cannot be recognized by the system:
1. Delete all FlashCopy mappings and Metro Mirror or Global Mirror
relationships that use the offline volumes.
2. Run the recovervdisk, recovervdiskbyiogrp or recovervdiskbysystem
command.
3. Recreate all FlashCopy mappings and Metro Mirror or Global Mirror
relationships that use the volumes.

Replacing nodes nondisruptively


These procedures describe how to replace most nodes nondisruptively.

Before you begin

These procedures are nondisruptive because changes to your SAN environment are
not required. The replacement (new) node uses the same worldwide node name
(WWNN) as the node that you are replacing. An alternative to this procedure is to
replace nodes disruptively either by moving volumes to a new I/O group or by
rezoning the SAN. The disruptive procedures, however, require additional work on
the hosts.

This task assumes that these conditions have been met:


v The existing system software must be at a version that supports the new node. If
a node is being replaced by a SAN Volume Controller 2145-CG8 node, the
system software version must be 6.2.0 or later. If a node is being replaced by a
SAN Volume Controller 2145-CF8 node, the system software version must be
5.1.0 or later. If a node is being replaced by a SAN Volume Controller 2145-8A4
node, the system software version must be 4.3.1 or later.

Note: For nodes that contain solid-state drives (SSDs): if the existing SSDs are
being moved to the new node, the new node must contain the necessary
serial-attached SCSI (SAS) adapter to support SSDs.
v All nodes that are configured in the system are present and online.

80 SAN Volume Controller: Troubleshooting Guide


v All errors in the system event log are addressed and marked as fixed.
v There are no volumes, managed disks (MDisks), or external storage systems
with a status of degraded or offline.
v The replacement node is not powered on.
v The replacement node is not connected to the SAN.
v You have a 2145 UPS-1U unit (feature code 8115) for each new SAN Volume
Controller 2145-CG8 SAN Volume Controller 2145-CF8, or SAN Volume
Controller 2145-8A4 node.
v You have backed up the system configuration and saved the
svc.config.backup.xml file.
v The replacement node must be able to operate at the Fibre Channel or Ethernet
connection speed of the node it is replacing.
v If the node being replaced contains solid-state drives (SSDs), transfer all SSDs
and SAS adapters to the new node if it supports the drives. To prevent losing
access to the data, if the new node does not support the existing SSDs, transfer
the data from the SSDs before replacing the node.

Important:
1. Do not continue this task if any of the conditions listed are not met unless you
are instructed to do so by the IBM Support Center.
2. Review all of the steps that follow before you perform this task.
3. Do not perform this task if you are not familiar with SAN Volume Controller
environments or the procedures described in this task.
4. If you plan to reuse the node that you are replacing, ensure that the WWNN of
the node is set to a unique number on your SAN. If you do not ensure that the
WWNN is unique, the WWNN and WWPN are duplicated in the SAN
environment and can cause problems.

Tip: You can change the WWNN of the node you are replacing to the factory
default WWNN of the replacement node to ensure that the number is unique.
5. The node ID and possibly the node name change during this task. After the
system assigns the node ID, the ID cannot be changed. However, you can
change the node name after this task is complete.

About this task

Perform these steps to replace active nodes in a system:

Procedure
1. (If the system software version is at 5.1 or later, complete this step.)
Confirm that no hosts have dependencies on the node.
When shutting down a node that is part of a system or when deleting the
node from a system, you can use either the management GUI or a
command-line interface (CLI) command. In the management GUI, select
Monitoring > System > Manage. Click Show Dependent Volumes to display
all the volumes that are dependent on a node. You can also use the node
parameter with the lsdependentvdisks CLI command to view dependent
volumes.
If dependent volumes exist, determine if the volumes are being used. If the
volumes are being used, either restore the redundant configuration or suspend
the host application. If a dependent quorum disk is reported, repair the access
to the quorum disk or modify the quorum disk configuration.

Chapter 4. Performing recovery actions using the SAN Volume Controller CLI 81
2. Use these steps to determine the system configuration node, and the ID,
name, I/O group ID, and I/O group name for the node that you want to
replace. If you already know the physical location of the node that you want
to replace, you can skip this step and proceed to step 3.

Tip: If one of the nodes that you want to replace is the system configuration
node, replace it last.
a. Issue this command from the command-line interface (CLI):
lsnode -delim :
This output is an example of the output that is displayed for this
command:
id:name:UPS_serial_number:WWNN:status:IO_group_id:IO_group_name:
config_node:UPS_unique_id:hardware:iscsi_name:iscsi_alias
3:dvt113294:100089J137:5005076801005A07:online:0:io_grp0:yes:
20400002096810C7:8A4:iqn.1986-03.com.ibm:2145.ldcluster-80.dvt113294:
14:des113004:10006BR010:5005076801004F0F:online:0:io_grp0:no:
2040000192880040:8G4:iqn.1986-03.com.ibm:2145.ldcluster-80.des113004:

b. In the config_node column, find the value yes and record the values in the
id and name columns.
c. Record the values in the id and the name columns for each node in the
system.
d. Record the values in the IO_group_id and the IO_group_name columns for
each node in the system.
e. Issue this command from the CLI for each node in the system to
determine the front panel ID:
lsnodevpd node_name or node_id
where node_name or node_id is the name or ID of the node for which you
want to determine the front panel ID.
f. Record the value in the front_panel_id column. The front panel ID is
displayed on the front of each node. You can use this ID to determine the
physical location of the node that matches the node ID or node name that
you want replace.
3. Perform these steps to record the WWNN or iSCSI name of the node that you
want to replace:
a. Issue this command from the CLI:
lsnode -delim : node_name or node_id
where node_name or node_id is the name or ID of the node for which you
want to determine the WWNN or iSCSI name.
b. Record the WWNN or iSCSI name of the node that you want to replace.
Also record the order of the Fibre Channel and Ethernet ports.
4. Issue this command from the CLI to power off the node:
stopsystem -node node_name

Important:
a. Record and mark the order of the Fibre Channel or Ethernet cables with
the node port number (port 1 to 4 for Fibre Channel, or port 1 to 2 for
Ethernet) before you remove the cables from the back of the node. The
Fibre Channel ports on the back of the node are numbered 1 to 4 from left
to right. You must reconnect the cables in the exact order on the
replacement node to avoid issues when the replacement node is added to
the system. If the cables are not connected in the same order, the port IDs

82 SAN Volume Controller: Troubleshooting Guide


can change, which impacts the ability of the host to access volumes. See
the hardware documentation specific to your model to determine how the
ports are numbered.
b. Do not connect the replacement node to different ports on the switch or
director. The SAN Volume Controller can have 4 Gbps or 8 Gbps HBAs.
However, do not move them to faster switch or director ports at this time
to avoid issues when the replacement node is added to the system. This
task is separate and must be planned independently of replacing nodes in
a system.
5. Issue this CLI command to delete this node from the system and I/O group:
rmnode node_name or node_id
Where node_name or node_id is the name or ID of the node that you want to
delete. You can use the CLI to verify that the deletion process has completed.
6. Issue this CLI command to ensure that the node is no longer a member of the
system:
lsnode

A list of nodes is displayed. Wait until the removed node is not listed in the
command output.
7. Perform these steps to change the WWNN or iSCSI name of the node that you
just deleted from the system to FFFFF:
For SAN Volume Controller V6.1.0 or later:
a. Power on the node. With the Cluster panel displayed, press the up or
down button until the Actions option is displayed.
b. Press and release the select button.
c. Press the up or down button until Change WWNN? is displayed.
d. Press and release the select button to display the current WWNN.
e. Press and release the select button to switch into edit mode. The Edit
WWNN? panel is displayed.
f. Change the WWNN to FFFFF.
g. Press and release the select button to exit edit mode.
h. Press the right button to confirm your selection. The Confirm WWNN? panel
is displayed.
i. Press and release the select button to confirm.
8. Install the replacement node and the uninterruptible power supply in the rack
and connect the uninterruptible power supply cables. See the IBM System
Storage SAN Volume Controller Model 2145-XXX Hardware Installation Guide to
determine how to connect the node and the uninterruptible power supply.

Important: Do not connect the Fibre Channel or Ethernet cables during this
step.
9. If you are removing SSDs from an old node and inserting them into a new
node, see the IBM System Storage SAN Volume Controller Hardware Maintenance
Guide for specific instructions.
10. Power on the replacement node.
11. Record the WWNN of the replacement node. You can use this name if you
plan to reuse the node that you are replacing.
12. Perform these steps to change the WWNN name of the replacement node to
match the name that you recorded in step 3 on page 82:
For SAN Volume Controller V6.1.0 or later:

Chapter 4. Performing recovery actions using the SAN Volume Controller CLI 83
a. With the Cluster panel displayed, press the up or down button until the
Actions option is displayed.
b. Press and release the select button.
c. Press the up or down button until Change WWNN? is displayed.
d. Press and release the select button to display the current WWNN.
e. Press the select button to switch into edit mode. The Edit WWNN? panel is
displayed.
f. Change the WWNN to the numbers that you recorded in step 3 on page
82.
g. Press and release the select button to exit edit mode.
h. Press the right button to confirm your selection. The Confirm WWNN? panel
is displayed.
i. Press the select button to confirm.
Wait one minute. If Cluster: is displayed on the front panel, this indicates
that the node is ready to be added to the system. If Cluster: is not displayed,
see the troubleshooting information to determine how to address this problem
or contact the IBM Support Center before you continue with the next step.
13. Connect the Fibre Channel or Ethernet cables to the same port numbers that
you recorded for the original node in step 4 on page 82.
14. Issue this CLI command to verify that the last five characters of the WWNN
are correct:
lsnodecandidate

Important: If the WWNN is not what you recorded in step 3 on page 82, you
must repeat step 12 on page 83.
15. Issue this CLI command to add the node to the system and ensure that the
node has the same name as the original node and is in the same I/O group as
the original node. See the addnode CLI command documentation for more
information.
addnode -wwnodename WWNN -iogrp iogroupname/id
WWNN and iogroupname/id are the values that you recorded for the original
node.
The SAN Volume Controller V5.1 and later automatically reassigns the node
with the name that was used originally. For versions before V5.1, use the name
parameter with the svctask addnode command to assign a name. If the
original name of the node name was automatically assigned by SAN Volume
Controller, it is not possible to reuse the same name. It was automatically
assigned if its name starts with node. In this case, either specify a different
name that does not start with node or do not use the name parameter so that
SAN Volume Controller automatically assigns a new name to the node.
If necessary, the new node is updated to the same SAN Volume Controller
software version as the system. This update can take up to 20 minutes.

Important:
a. Both nodes in the I/O group cache data; however, the cache sizes are
asymmetric. The replacement node is limited by the cache size of the
partner node in the I/O group. Therefore, it is possible that the
replacement node does not use the full cache size until you replace the
other node in the I/O group.
b. You do not have to reconfigure the host multipathing device drivers
because the replacement node uses the same WWNN and WWPN as the

84 SAN Volume Controller: Troubleshooting Guide


previous node. The multipathing device drivers should detect the recovery
of paths that are available to the replacement node.
c. The host multipathing device drivers take approximately 30 minutes to
recover the paths. Do not upgrade the other node in the I/O group until
for at least 30 minutes after you have successfully upgraded the first node
in the I/O group. If you have other nodes in different I/O groups to
upgrade, you can perform those upgrades while you wait.
16. Query paths to ensure that all paths have been recovered before proceeding to
the next step. If you are using the IBM System Storage Multipath Subsystem
Device Driver (SDD), the command to query paths is datapath query device.
Documentation that is provided with your multipathing device driver shows
how to query paths.
17. Repair the faulty node.
If you want to use the repaired node as a spare node, perform these steps.
For SAN Volume Controller V6.1.0 or later:
a. With the Cluster panel displayed, press the up or down button until the
Actions option is displayed.
b. Press and release the select button.
c. Press the up or down button until Change WWNN? is displayed.
d. Press and release the select button to display the current WWNN.
e. Press and release the select button to switch into edit mode. The Edit
WWNN? panel is displayed.
f. Change the WWNN to 00000.
g. Press and release the select button to exit edit mode.
h. Press the right button to confirm your selection. The Confirm WWNN? panel
is displayed.
i. Press and release the select button to confirm.
This node can now be used as a spare node.
18. Repeat steps 3 on page 82 to 17 for each node that you want to replace.

Chapter 4. Performing recovery actions using the SAN Volume Controller CLI 85
86 SAN Volume Controller: Troubleshooting Guide
Chapter 5. Viewing the vital product data
Vital product data (VPD) is information that uniquely records each element in the
SAN Volume Controller. The data is updated automatically by the system when
the configuration is changed.

The VPD lists the following types of information:


v System-related values such as the software version, space in storage pools, and
space allocated to volumes.
v Node-related values that include the specific hardware that is installed in each
node. Examples include the FRU part number for the system board and the level
of BIOS firmware that is installed. The node VPD is held by the system which
makes it possible to get most of the VPD for the nodes that are powered off.

Using different sets of commands, you can view the system VPD and the node
VPD. You can also view the VPD through the management GUI.

Viewing the vital product data using the management GUI


You can view the vital product data for a node from the management GUI.

About this task

Perform the following steps to view the vital product data for a node:

Procedure
1. From Home, click System Status.
2. Select the node for which you want to display the details.
3. Click VPD to view the data.

Displaying the vital product data using the CLI


You can use the command-line interface (CLI) to display the SAN Volume
Controller system or node vital product data (VPD).

Issue the following CLI commands to display the VPD:


sainfo lsservicestatus
lsnodehw
lsnodevpd nodename
lssystem system_name
lssystemip
lsdrive

Note: For the SAN Volume Controller 2145-8A4, 2145-8G4, and 2145-8F4 nodes,
the lsnodevpd nodename command displays the device serial number of the Fibre
Channel card as “N/A.”

Displaying node properties using the CLI


You can use the command-line interface (CLI) to display node properties.

© Copyright IBM Corp. 2003, 2012 87


About this task

Perform the following steps to display the node properties:

Procedure
1. Issue the lsnode CLI command to display a concise list of nodes in the system.
The following is an example of the CLI command you can issue to list the
nodes in the system:
lsnode -delim :
The following is an example of the output that is displayed:
id:name:UPS_serial_number:WWNN:status:IO_group_id:IO_group_name:config_node:UPS_unique_id:hardware:iscsi_name:iscsi_alias:
panel_name:enclosure_id:canister_id:enclosure_serial_number
1:node1:UPS_Fake_SN:50050768010050B1:online:0:io_grp0:yes:10000000000050B1:8G4:iqn.1986-03.com.ibm:2145.cluster0.node1:000368:::

2. Issue the lsnode CLI command and specify the node ID or name of the node
that you want to receive detailed output.
The following is an example of the CLI command you can issue to list detailed
output for a node in the system:
lsnode -delim : group1node1
Where group1node1 is the name of the node for which you want to view
detailed output.
The following is an example of the output that is displayed:
id:1
name:group1node1
UPS_serial_number:10L3ASH
WWNN:500507680100002C
status:online
IO_group_id:0
IO_group_name:io_grp0
partner_node_id:2
partner_node_name:group1node2
config_node:yes
UPS_unique_id:202378101C0D18D8
port_id:500507680110002C
port_status:active
port_speed:2GB
port_id:500507680120002C
port_status:active
port_speed:2GB
port_id:500507680130002C
port_status:active
port_speed:2GB
port_id:500507680140003C
port_status:active
port_speed:2GB
hardware:8A4
iscsi_name:iqn.1986-03.com.ibm:2145.ndihill.node2
iscsi_alias
failover_active:no
failover_name:node1
failover_iscsi_name:iqn.1986-03.com.ibm:2145.ndihill.node1
failover_iscsi_alias

Displaying clustered system properties using the CLI


You can use the command-line interface (CLI) to display the properties for a
clustered system.

88 SAN Volume Controller: Troubleshooting Guide


About this task

Perform the following step to display clustered system properties:

Procedure

Issue the lssystem command to display the properties for a clustered system.
The following is an example of the command you can issue:
lssystem -delim : build1

where build1 is the name of the clustered system.

Results
id:000002007A00A0FE
name:build1
location:local
partnership:
bandwidth:
total_mdisk_capacity:90.7GB
space_in_mdisk_grps:90.7GB
space_allocated_to_vdisks:14.99GB
total_free_space:75.7GB
statistics_status:on
statistics_frequency:15
required_memory:0
cluster_locale:en_US
time_zone:522 UTC
code_level:6.1.0.0 (build 47.3.1009031000)
FC_port_speed:2Gb
console_IP:9.71.46.186:443
id_alias:000002007A00A0FE
gm_link_tolerance:300
gm_inter_cluster_delay_simulation:0
gm_intra_cluster_delay_simulation:0
email_reply:
email_contact:
email_contact_primary:
email_contact_alternate:
email_contact_location:
email_state:stopped
inventory_mail_interval:0
total_vdiskcopy_capacity:15.71GB
total_used_capacity:13.78GB
total_overallocation:17
total_vdisk_capacity:11.72GB
cluster_ntp_IP_address:
cluster_isns_IP_address:
iscsi_auth_method:none
iscsi_chap_secret:
auth_service_configured:no
auth_service_enabled:no
auth_service_url:
auth_service_user_name:
auth_service_pwd_set:no
auth_service_cert_set:no
relationship_bandwidth_limit:25
gm_max_host_delay:5
tier:generic_ssd
tier_capacity:0.00MB
tier_free_capacity:0.00MB
tier:generic_hdd
tier_capacity:90.67GB
tier_free_capacity:75.34GB
email_contact2:
email_contact2_primary:
email_contact2_alternate:
total_allocated_extent_capacity:16.12GB

Chapter 5. Viewing the vital product data 89


Fields for the node VPD
The node vital product data (VPD) provides information for items such as the
system board, processor, fans, memory module, adapter, devices, software, front
panel assembly, the uninterruptible power supply, SAS solid-state drive (SSD) and
SAS host bus adapter (HBA).

Table 27 shows the fields you see for the system board.
Table 27. Fields for the system board
Item Field name
System board Part number
System serial number
Number of processors
Number of memory slots
Number of fans
Number of Fibre Channel adapters
Number of SCSI, IDE, SATA, or SAS devices
Note: The service controller is a device.
Number of power supplies
Number of high-speed SAS adapters
BIOS manufacturer
BIOS version
BIOS release date
System manufacturer
System product
Planar manufacturer
Power supply part number
CMOS battery part number
Power cable assembly part number
Service processor firmware
SAS controller part number

Table 28 shows the fields you see for each processor that is installed.
Table 28. Fields for the processors
Item Field name
Processor Part number
Processor location
Manufacturer
Version
Speed
Status
Processor serial number

90 SAN Volume Controller: Troubleshooting Guide


Table 29 shows the fields that you see for each fan that is installed.
Table 29. Fields for the fans
Item Field name
Fan Part number
Location

Table 30 shows the fields that are repeated for each installed memory module.
Table 30. Fields that are repeated for each installed memory module
Item Field name
Memory module Part number
Device location
Bank location
Size (MB)
Manufacturer (if available)
Serial number (if available)

Table 31 shows the fields that are repeated for each installed adapter card.
Table 31. Fields that are repeated for each adapter that is installed
Item Field name
Adapter Adapter type
Part number
Port numbers
Location
Device serial number
Manufacturer
Device
Card revision
Chip revision

Table 32 on page 92 shows the fields that are repeated for each device that is
installed.

Chapter 5. Viewing the vital product data 91


Table 32. Fields that are repeated for each SCSI, IDE, SATA, and SAS device that is
installed
Item Field name
Device Part number
Bus
Device
Model
Revision
Serial number
Approximate capacity
Hardware revision
Manufacturer

Table 33 shows the fields that are specific to the node software.
Table 33. Fields that are specific to the node software
Item Field name
Software Code level
Node name
Worldwide node name
ID
Unique string that is used in dump file
names for this node

Table 34 shows the fields that are provided for the front panel assembly.
Table 34. Fields that are provided for the front panel assembly
Item Field name
Front panel Part number
Front panel ID
Front panel locale

Table 35 shows the fields that are provided for the Ethernet port.
Table 35. Fields that are provided for the Ethernet port
Item Field name
Ethernet port Port number
Ethernet port status
MAC address
Supported speeds

Table 36 on page 93 shows the fields that are provided for the power supplies in
the node.

92 SAN Volume Controller: Troubleshooting Guide


Table 36. Fields that are provided for the power supplies in the node
Item Field name
Power supplies Part number
Location

Table 37 shows the fields that are provided for the uninterruptible power supply
assembly that is powering the node.
Table 37. Fields that are provided for the uninterruptible power supply assembly that is
powering the node
Item Field name
Uninterruptible power supply Electronics assembly part number
Battery part number
Frame assembly part number
Input power cable part number
UPS serial number
UPS type
UPS internal part number
UPS unique ID
UPS main firmware
UPS communications firmware

Table 38 shows the fields that are provided for the SAS host bus adapter (HBA).
Table 38. Fields that are provided for the SAS host bus adapter (HBA)
Item Field name
SAS HBA Part number
Port numbers
Device serial number
Manufacturer
Device
Card revision
Chip revision

Table 39 on page 94 shows the fields that are provided for the SAS solid-state drive
(SSD).

Chapter 5. Viewing the vital product data 93


Table 39. Fields that are provided for the SAS solid-state drive (SSD)
Item Field name
SAS SSD Part number
Manufacturer
Device serial number
Model
Type
UID
Firmware
Slot
FPGA firmware
Speed
Capacity
Expansion tray
Connection type

Table 40 shows the fields that are provided for the small form factor pluggable
(SFP) transceiver.
Table 40. Fields that are provided for the small form factor pluggable (SFP) transceiver
Item Field name
Small form factor pluggable (SFP) Part number
transceiver
Manufacturer
Device
Serial number
Supported speeds
Connector type
Transmitter type
Wavelength
Maximum distance by cable type
Hardware revision
Port number
Worldwide port name

Fields for the system VPD


The system vital product data (VPD) provides various information about the
system, including its ID, name, location, IP address, email contact, code level, and
total free space.

Table 41 on page 95 shows the fields that are provided for the system properties as
shown by the management GUI.

94 SAN Volume Controller: Troubleshooting Guide


Table 41. Fields that are provided for the system properties
Item Field name
General ID
Note: This is the unique identifier for the system.
Name
Location
Time Zone
Required Memory
Licensed Code Version
Channel Port Speed

Note: This field represents the speed at which


non-negotiating nodes in the system will run, for example, the
SAN Volume Controller 2145-8F2. All other models that are
capable of speed negotiation are not affected by the speed
value that is indicated in this field.
IP Addresses1 Ethernet Port 1 (attributes for both IPv4 and IPv6)
v IP Address
v Service IP Address
v Subnet Mask
v Prefix
v Default Gateway
Ethernet Port 2 (attributes for both IPv4 and IPv6)
v IP Address
v Service IP Address
v Subnet Mask
v Prefix
v Default Gateway
Remote Authentication Remote Authentication
Web Address
User Name
Password
SSL Certificate
Space Total MDisk Capacity
Space in Storage Pools
Space Allocated to Volumes
Total Free Space
Total Used Capacity
Total Allocation
Total Volume Copy Capacity
Total Volume Capacity
Statistics Statistics Status
Statistics Frequency

Chapter 5. Viewing the vital product data 95


Table 41. Fields that are provided for the system properties (continued)
Item Field name
Metro and Global Mirror Link Tolerance
Intersystem Delay Simulation
Intrasystem Delay Simulation
Partnership
Bandwidth
Email SMTP Email Server
Email Server Port
Reply Email Address
Contact Person Name
Primary Contact Phone Number
Alternate Contact Phone Number
Physical Location of the System Reporting Error
Email Status
Inventory Email Interval
iSCSI iSNS Server Address
Supported Authentication Methods
CHAP Secret
1
You can also use the lssystemip CLI command to view this data.

96 SAN Volume Controller: Troubleshooting Guide


Chapter 6. Using the front panel of the SAN Volume Controller
The front panel of the SAN Volume Controller has a display, various LEDs,
navigation buttons, and a select button that are used when servicing your SAN
Volume Controller node.

Figure 54 shows where the front-panel display 1 is located on the SAN Volume
Controller node.

Restarting

Restarting

svc00552
Figure 54. SAN Volume Controller front-panel assembly

Boot progress indicator


Boot progress is displayed on the front panel of the SAN Volume Controller.

The Boot progress display on the front panel shows that the node is starting.

Booting 130

Figure 55. Example of a boot progress display

During the boot operation, boot progress codes are displayed and the progress bar
moves to the right while the boot operation proceeds.

Boot failed
If the boot operation fails, boot code 120 is displayed.

Failed 120

See the "Error code reference" topic where you can find a description of the failure
and the appropriate steps that you must perform to correct the failure.
© Copyright IBM Corp. 2003, 2012 97
Charging
The front panel indicates that the uninterruptible power supply battery is charging.

Charging

svc00304
A node will not start and join a system if there is insufficient power in the
uninterruptible power supply battery to manage with a power failure. Charging is
displayed until it is safe to start the node. This might take up to two hours.

Error codes
Error codes are displayed on the front panel display.

Figure 56 and Figure 57 show how error codes are displayed on the front panel.
svc00433

Figure 56. Example of an error code for a clustered system


svc00434

Figure 57. Example of a node error code

For descriptions of the error codes that are displayed on the front panel display,
see the various error code topics for a full description of the failure and the actions
that you must perform to correct the failure.

Hardware boot
The hardware boot display shows system data when power is first applied to the
node as the node searches for a disk drive to boot.

If this display remains active for longer than 3 minutes, there might be a problem.
The cause might be a hardware failure or the software on the hard disk drive
might be missing or damaged.

Node rescue request


If software is lost, you can use the node rescue process to copy all software from
another node.

98 SAN Volume Controller: Troubleshooting Guide


The node-rescue-request display, which is shown in Figure 58, indicates that a
request has been made to replace the software on this node. The SAN Volume
Controller software is preinstalled on all SAN Volume Controller nodes. This
software includes the operating system, the application software, and the SAN
Volume Controller publications. It is normally not necessary to replace the software
on a node, but if the software is lost for some reason (for example, the hard disk
drive in the node fails), it is possible to copy all the software from another node
that is connected to the same Fibre Channel fabric. This process is known as node
rescue.

Figure 58. Node rescue display

Power failure
The SAN Volume Controller node uses battery power from the uninterruptible
power supply to shut itself down.

The Power failure display shows that the SAN Volume Controller is running on
battery power because main power has been lost. All I/O operations have stopped.
The node is saving system metadata and node cache data to the internal disk
drive. When the progress bar reaches zero, the node powers off.

Note: When input power is restored to the uninterruptible power supply, the SAN
Volume Controller turns on without the front panel power button being pressed.

Powering off
The progress bar on the display shows the progress of the power-off operation.

Powering Off is displayed after the power button has been pressed and while the
node is powering off. Powering off might take several minutes.

The progress bar moves to the left when the power is removed.

Chapter 6. Using the front panel of the SAN Volume Controller 99


Recovering
The front panel indicates that the uninterruptible power supply battery is not fully
charged.

Recovering

svc00305
When a node is active in a system but the uninterruptible power supply battery is
not fully charged, Recovering is displayed. If the power fails while this message is
displayed, the node does not restart until the uninterruptible power supply has
charged to a level where it can sustain a second power failure.

Restarting
The front panel indicates when the software on a node is restarting.

Restarting

The software is restarting for one of the following reasons:


v An internal error was detected.
v The power button was pressed again while the node was powering off.

If you press the power button while powering off, the panel display changes to
indicate that the button press was detected; however, the power off continues until
the node finishes saving its data. After the data is saved, the node powers off and
then automatically restarts. The progress bar moves to the right while the node is
restarting.

Shutting down
The front-panel indicator tracks shutdown operations.

The Shutting Down display is shown when you issue a shutdown command to a
SAN Volume Controller clustered system or a SAN Volume Controller node. The
progress bar continues to move to the left until the node turns off.

When the shutdown operation is complete, the node turns off. When you power
off a node that is connected to a 2145 UPS-1U, only the node shuts down; the 2145
UPS-1U does not shut down.

Shutting Down

100 SAN Volume Controller: Troubleshooting Guide


Validate WWNN? option
The front panel prompts you to validate the WWNN when the worldwide node
name (WWNN) that is stored in the service controller (the panel WWNN) does not
match the WWNN that is backed up on the SAN Volume Controller disk (the disk
WWNN).

Typically, this panel is displayed when the service controller has been replaced.
The SAN Volume Controller uses the WWNN that is stored on the service
controller. Usually, when the service controller is replaced, you modify the WWNN
that is stored on it to match the WWNN on the service controller that it replaced.
By doing this, the node maintains its WWNN address, and you do not need to
modify the SAN zoning or host configurations. The WWNN that is stored on disk
is the same that was stored on the old service controller.

After it is in this mode, the front panel display will not revert to its normal
displays, such as node or cluster (system) options or operational status, until the
WWNN is validated. Navigate the Validate WWNN option (shown in Figure 59) to
choose which WWNN that you want to use.

Validate WWNN?
Select

Disk WWNN: Panel WWNN:

Use Disk WWNN? Use Panel WWNN?


Select
svc00409

Node WWNN:

Figure 59. Validate WWNN? navigation

To choose which stored WWNN that you want this node to use, perform the
following steps:
1. From the Validate WWNN? panel, press and release the select button. The Disk
WWNN: panel is displayed and shows the last five digits of the WWNN that is
stored on the disk.
2. To view the WWNN that is stored on the service controller, press and release
the right button. The Panel WWNN: panel is displayed and shows the last five
numbers of the WWNN that is stored on the service controller.
3. Determine which WWNN that you want to use.
a. To use the WWNN that is stored on the disk, perform the following steps:
1) From the Disk WWNN: panel, press and release the down button. The
Use Disk WWNN? panel is displayed.
2) Press and release the select button.
b. To use the WWNN that is stored on the service controller, perform the
following steps:
1) From the Panel WWNN: panel, press and release the down button. The
Use Panel WWNN? panel is displayed.
2) Press and release the select button.

Chapter 6. Using the front panel of the SAN Volume Controller 101
The node is now using the selected WWNN. The Node WWNN: panel is displayed
and shows the last five numbers of the WWNN that you selected.

If neither WWNN that is stored on the service controller panel nor disk is suitable,
you must wait until the node restarts before you can change it. After the node
restarts, select Change WWNN to change the WWNN to the value that you want.

SAN Volume Controller menu options


During normal operations, menu options are available on the front panel display of
the SAN Volume Controller node.

Menu options enable you to review the operational status of the clustered system,
node, and external interfaces. They also provide access to the tools and operations
that you use to service the node.

Figure 60 on page 103 shows the sequence of the menu options. Only one option at
a time is displayed on the front panel display. For some options, additional data is
displayed on line 2. The first option that is displayed is the Cluster: option.

102 SAN Volume Controller: Troubleshooting Guide


Main Options Secondary Options
R/L
Port-1 Port-2
Cluster: L/R Status: L/R Address: L/R Address:
FC Port-1 s s
S s Select shows IPv4 and IPv6 addresses if available
S
R/L
U
/ IPv4 IPv4 IPv4 IPv6 IPv6 IPv6
L/R L/R L/R L/R L/R
D Address-2: Subnet-2: Gateway-2: Address-2: Prefix-2: Gateway-2:

R/L

IPv4 IPv4 IPv4 IPv6 IPv6 IPv6


L/R L/R L/R L/R L/R
Address-1: Subnet-1: Gateway-1: Address-1: Prefix-1: Gateway-1:

R/L

Node Service
Node L/R Status: L/R L/R Address
WWNN: s
S
U R/L
/
D IPv4 IPv4 IPv4 IPv6 IPv6 IPv6
L/R L/R L/R L/R L/R
Address Subnet Gateway Address Prefix Gateway

Version: L/R Cluster


Build: L/R
Build:

U
R/L
/
D R/L

L/R Ethernet L/R Speed-1: L/R MAC


Ethernet
Port-1: Address-1:

L/R

Ethernet MAC
L/R Speed-2: L/R
Port-2: Address-2:

L/R
U
/ Ethernet MAC
D L/R Speed-3: L/R
Port-3: Address-3:

L/R

Ethernet MAC
L/R Speed-4: L/R
Port-4: Address-4:

R/L

FC Port-1 FC Port-1 FC Port-2 FC Port-2


L/R L/R L/R
Status Speed Status Speed

R/L
U
/
D FC Port-3 FC Port-3 FC Port-4 L/R FC Port-4
L/R L/R
Status Speed Status Speed

Actions
x

U
/
D

x Select takes you to the Actions menu


L/R English? L/R Japanese?
svc00560

Language?
L L L Select activates language

Figure 60. SAN Volume Controller options on the front-panel display

Use the left and right buttons to navigate through the secondary fields that are
associated with some of the main fields.

Note: Messages might not display fully on the screen. You might see a right angle
bracket (>) on the right side of the display screen. If you see a right angle bracket,
press the right button to scroll through the display. When there is no more text to
display, you can move to the next item in the menu by pressing the right button.

Chapter 6. Using the front panel of the SAN Volume Controller 103
Similarly, you might see a left angle bracket (<) on the left side of the display
screen. If you see a left angle bracket, press the left button to scroll through the
display. When there is no more text to display, you can move to the previous item
in the menu by pressing the left button.

The following main options are available:


v Cluster
v Node
v Version
v Ethernet
v FC Port 1 Status
v Actions
v Language

Cluster (system) options


The main cluster (system) option from the menu can display the cluster name or
the field can be blank.

The main cluster (system) option displays the system name that the user has
assigned. If a clustered system is in the process of being created on the node, and
no system name has been assigned, a temporary name that is based on the IP
address of the system is displayed. If this node is not assigned to a system, the
field is blank.

Status option
Status is indicated on the front panel.

This field is blank if the node is not a member of a clustered system. If this node is
a member of a clustered system, the field indicates the operational status of the
system, as follows:
Active
Indicates that this node is an active member of the system.
Inactive
Indicates that the node is a member of a system, but is not now operational. It
is not operational because the other nodes that are in the system cannot be
accessed or because this node was excluded from the system.
Degraded
Indicates that the system is operational, but one or more of the member nodes
are missing or have failed.

IPv4 Address option


A clustered system must have either an IPv4 address or an IPv6 address, or both,
assigned to Ethernet port 1. You can also assign an IPv4 address or an IPv6
address, or both, to Ethernet port 2. You can use any of the addresses to access the
system from the command-line tools or the management GUI.

These fields contain the IPv4 addresses of the system. If this node is not a member
of a system or if the IPv4 address has not been assigned, these fields are blank.

IPv4 Subnet options:

104 SAN Volume Controller: Troubleshooting Guide


The IPv4 subnet mask addresses are set when the IPv4 addresses are assigned to
the system.

The IPv4 subnet options display the subnet mask addresses when the system has
IPv4 addresses. If the node is not a member of a system or if the IPv4 addresses
have not been assigned, this field is blank.

IPv4 Gateway options:

The IPv4 gateway addresses are set when the system is created.

The IPv4 gateway options display the gateway addresses for the system. If the
node is not a member of a system, or if the IPv4 addresses have not been assigned,
this field is blank.

IPv6 Address options


A clustered system must have either an IPv4 address or an IPv6 address, or both,
assigned to Ethernet port 1. You can also assign an IPv4 address or an IPv6
address, or both, to Ethernet port 2. You can use any of the addresses to access the
system from the command-line tools or the management GUI.

These fields contain the IPv6 addresses of the system. If the node is not a member
of a system, or if the IPv6 address has not been assigned, these fields are blank.

IPv6 Prefix option:

The IPv6 prefix is set when a system is created.

The IPv6 prefix option displays the network prefix of the system and the service
IPv6 addresses. The prefix has a value of 0 - 127. If the node is not a member of a
system, or if the IPv6 addresses have not been assigned, a blank line displays.

IPv6 Gateway option:

The IPv6 gateway addresses are set when the system is created.

This option displays the IPv6 gateway addresses for the system. If the node is not
a member of a system, or if the IPv6 addresses have not been assigned, a blank
line displays.

Displaying an IPv6 address


After you have set the IPv6 address, you can display the IPv6 addresses and the
IPv6 gateway addresses.

The IPv6 addresses and the IPv6 gateway addresses consist of eight (4-digit)
hexadecimal values that are shown across four panels, as shown in Figure 61. Each
panel displays two 4-digit values that are separated by a colon, the address field
position (such as 2/4) within the total address, and scroll indicators. Move between
the address panels by using the left button or right button.
svc00417

Figure 61. Viewing the IPv6 address on the front-panel display

Chapter 6. Using the front panel of the SAN Volume Controller 105
Node options
The main node option displays the identification number or the name of the node
if the user has assigned a name.

Status option
The node status is indicated on the front panel. The status can be one of the
following states:
Active The node is operational, assigned to a system, and ready to perform I/O.
Service
There is an error that is preventing the node from operating as part of a
system. It is safe to shut down the node in this state.
Candidate
The node is not assigned to a system and is not in service. It is safe to shut
down the node in this state.
Starting
The node is part of a system and is attempting to join the system. It cannot
perform I/O.

Node WWNN option


The Node WWNN (worldwide node name) option displays the last five
hexadecimal digits of the WWNN that is being used by the node. Only the last five
digits of a WWNN vary on a node. The first 11 digits are always 50050768010.

Service Address option


Pressing select on the Service Address panel displays the IP address that is
configured for access to the service assistant and the service CLI.

Version options
The version option displays the version of the SAN Volume Controller software
that is active on the node. The version consists of four fields that are separated by
full stops. The fields are the version, release, modification, and fix level; for
example, 6.1.0.0.

Build option

The Build: panel displays the level of the SAN Volume Controller software that is
currently active on this node.

Cluster Build option

The Cluster Build: panel displays the level of the software that is currently active
on the system that this node is operating in.

Ethernet options
The Ethernet options display the operational state of the Ethernet ports, the speed
and duplex information, and their media access control (MAC) addresses.

The Ethernet panel shows one of the following states:


Config - Yes
This node is the configuration node.
Config - No
This node is not the configuration node.

106 SAN Volume Controller: Troubleshooting Guide


No Cluster
This node is not a member of a system.

Press the right button to view the details of the individual Ethernet ports.

Ethernet Port options

The Ethernet port options Port-1 through Port-4 display the state of the links and
indicates whether or not there is an active link with an Ethernet network.
Link Online
An Ethernet cable is attached to this port.
Link Offline
No Ethernet cable is attached to this port or the link has failed.

Speed options

The speed options Speed-1 through Speed-4 display the speed and duplex
information for the Ethernet port. The speed information can be one of the
following values:
10 The speed is 10 Mbps.
100 The speed is 100 Mbps.
1 The speed is 1Gbps.
10 The speed is 10 Gbps.

The duplex information can be one of the following values:


Full Data can be sent and received at the same time.
Half Data can be sent and received in one direction at a time.

MAC Address options

The MAC address options MAC Address-1 through MAC Address-4 display the
media access control (MAC) address of the Ethernet port.

Fibre Channel port options


The Fibre Channel port-1 through port-4 options display the operational status of
the Fibre Channel ports.
Active The port is operational and can access the Fibre Channel fabric.
Inactive
The port is operational but cannot access the Fibre Channel fabric. One of
the following conditions caused this result:
v The Fibre Channel cable has failed.
v The Fibre Channel cable is not installed.
v The device that is at the other end of the cable has failed.
Failed The port is not operational because of a hardware failure.
Not installed
This port is not installed.

Chapter 6. Using the front panel of the SAN Volume Controller 107
For the SAN Volume Controller 2145-8F2, you can use the Set FC Speed action
option to change the Fibre Channel port speed of a node that is not participating in
a system.

Actions options
During normal operations, action menu options are available on the front panel
display of the node. Only use the front panel actions when directed to do so by a
service procedure. Inappropriate use can lead to loss of access to data or loss of
data.

Figure 62 on page 110, Figure 63 on page 111, and Figure 64 on page 112 show the
sequence of the actions options. In the figures, bold lines indicate that the select
button was pressed. The lighter lines indicate the navigational path (up or down
and left or right). The circled X indicates that if the select button is pressed, an
action occurs using the data entered.

Only one action menu option at a time is displayed on the front-panel display.

Note: Options only display in the menu if they are valid for the current state of
the node. See Table 42 for a list of when the options are valid.

The following options are available from the Actions menu:


Table 42. When options are available
When option is available for
Front panel option Option name the current state of the node
Cluster IPv4 Create a clustered system Candidate state
with an IPv4 management
address
Cluster IPv6 Create a clustered system Candidate state
with an IPv6 management
address
Service IPv4 Set the IPv4 service address All states
of the node
Service IPv6 Set the IPv6 service address All states
of the node
Service DHCPv4 Set a DHCP IPv4 service All states
address
Service DHCPv6 Set a DHCP IPv6 service All states
address
Change WWNN Change the WWNN of the Candidate or service state
node
Enter Service Enter service state Whenever error 690 is not
showing.
Exit Service Leave service state if possible Whenever error 690 is
showing.
Recover Cluster Recover system configuration Candidate or service state
Remove Cluster Remove system state Whenever the node has a
clustered system state.
Paced Upgrade Perform user-paced CCU Node in service without
clustered system state

108 SAN Volume Controller: Troubleshooting Guide


Table 42. When options are available (continued)
When option is available for
Front panel option Option name the current state of the node
Set FC Speed Set Fibre Channel speed Candidate or service state on
a SAN Volume Controller
2145-8F2
Reset Password Reset password Not active or if the
resetpassword command is
enabled
Rescue Node Rescue node software All states

Chapter 6. Using the front panel of the SAN Volume Controller 109
Confirm
Cluster IPv4 IPv4 IPv4 Create?
Gateway: Cancel?
IPv4? Address: Subnet:
x

Confirm
Cluster IPv6 IPv6 IPv6 Create?
Address: Prefix: Gateway: Cancel?
IPv6?
x

Confirm
Service IPv4 IPv4 IPv4 Address?
Address: Gateway: Cancel?
IPv4? Subnet: x

Confirm
Service IPv6 IPv6 IPv6 Address?
Address: Gateway: Cancel?
IPv6? Prefix:
x

Confirm
Service DHCPv4?
Cancel?
DHCPv4? x

Confirm
Service DHCPv6? Cancel?
DHCPv6? x

Confirm
Change Edit WWNN?
Cancel?
WWNN? WWNN? x
svc00657

Figure 62. Upper options of the actions menu on the front panel

110 SAN Volume Controller: Troubleshooting Guide


Confirm
Enter Enter? Cancel?
Service? x

Confirm
Exit Exit? Cancel?
Service? x

Confirm
Recover Recover? Cancel?
Cluster?
x

Confirm
Remove Remove?
Cancel?
Cluster? x

Confirm
Paced Upgrade?
Upgrade? Cancel?
x
svc00658

Figure 63. Middle options of the actions menu on the front panel

Chapter 6. Using the front panel of the SAN Volume Controller 111
Confirm
Set FC Edit Speed?
Speed? Speed? Cancel?
x

Confirm
Reset Reset? Cancel?
Password? x

Confirm
Rescue Rescue? Cancel?
Node? x

Exit Actions?
svc00659

Figure 64. Lower options of the actions menu on the front panel

To perform an action, navigate to the Actions option and press the select button.
The action is initiated. Available parameters for the action are displayed. Use the
left or right buttons to move between the parameters. The current setting is
displayed on the second display line.

To set or change a parameter value, press the select button when the parameter is
displayed. The value changes to edit mode. Use the left or right buttons to move
between subfields, and use the up or down buttons to change the value of a
subfield. When the value is correct, press select to leave edit mode.

Each action also has a Confirm? and a Cancel? panel. Pressing select on the
Confirm? panel initiates the action using the current parameter value setting.
Pressing select on the Cancel? panel returns to the Action option panel without
changing the node.

Note: Messages might not display fully on the screen. You might see a right angle
bracket (>) on the right side of the display screen. If you see a right angle bracket,
press the right button to scroll through the display. When there is no more text to
display, you can move to the next item in the menu by pressing the right button.

Similarly, you might see a left angle bracket (<) on the left side of the display
screen. If you see a left angle bracket, press the left button to scroll through the
display. When there is no more text to display, you can move to the previous item
in the menu by pressing the left button.

112 SAN Volume Controller: Troubleshooting Guide


Cluster IPv4 or Cluster IPv6 options
You can create a clustered system from the Cluster IPv4 or Cluster IPv6 action
options.

The Cluster IPv4 or Cluster IPv6 option allows you to create a clustered system.

From the front panel, when you create a clustered system, you can set either the
IPv4 or the IPv6 address for Ethernet port 1. If required, you can add more
management IP addresses by using the management GUI or the CLI.

Press the up and down buttons to navigate through the parameters that are
associated with the Cluster option. When you have navigated to the desired
parameter, press the select button.

The parameters that are available include:


v IPv4 Address
v IPv4 Subnet
v IPv4 Gateway
v IPv4 Confirm Create?
v IPv6 Address
v IPv6 Subnet
v IPv6 Gateway
v IPv6 Confirm Create?

If you are creating the clustered system with an IPv4 address, complete the
following steps:
1. Press and release the up or down button until Actions? is displayed. Press and
release the select button.
2. Press and release the up or down button until Cluster IPv4? is displayed.
Press and release the select button.
3. Edit the IPv4 address, the IPv4 subnet, and the IPv4 gateway.
4. Press and release the left or right button until IPv4 Confirm Create? is
displayed.
5. Press and release the select button to confirm.

If you are creating the clustered system with an IPv6 address, complete the
following steps:
1. Press and release the up or down button until Actions? is displayed. Press and
release the select button.
2. Press and release the left or right button until Cluster Ipv6? is displayed. Press
and release the select button.
3. Edit the IPv6 address, the IPv6 prefix, and the IPv6 gateway.
4. Press and release the left or right button until IPv6 Confirm Create? is
displayed.
5. Press and release the select button to confirm.

IPv4 Address option

Using the IPv4 address, you can set the IP address for Ethernet port 1 of the
clustered system that you are going to create. The clustered system can have either
an IPv4 or an IPv6 address, or both at the same time. You can set either the IPv4

Chapter 6. Using the front panel of the SAN Volume Controller 113
or IPv6 management address for Ethernet port 1 from the front panel when you
are creating the system. If required, you can add more management IP addresses
from the CLI.

Attention: When you set the IPv4 address, ensure that you type the correct
address. Otherwise, you might not be able to access the system using the
command-line tools or the management GUI.

Perform the following steps to set the IPv4 address:


1. Navigate to the IPv4 Address panel.
2. Press the select button. The first IP address number is highlighted.
3. Press the up button if you want to increase the value that is highlighted; press
the down button if you want to decrease that value. If you want to quickly
increase the highlighted value, hold the up button. If you want to quickly
decrease the highlighted value, hold the down button.

Note: If you want to disable the fast increase or decrease function, press and
hold the down button, press and release the select button, and then release the
down button. The disabling of the fast increase or decrease function lasts until
the creation is completed or until the feature is again enabled. If the up button
or down button is pressed and held while the function is disabled, the value
increases or decreases once every two seconds. To again enable the fast increase
or decrease function, press and hold the up button, press and release the select
button, and then release the up button.
4. Press the right button or left button to move to the number field that you want
to set.
5. Repeat steps 3 and 4 for each number field that you want to set.
6. Press the select button to confirm the settings. Otherwise, press the right button
to display the next secondary option or press the left button to display the
previous options.

Press the right button to display the next secondary option or press the left button
to display the previous options.

IPv4 Subnet option

Using this option, you can set the IPv4 subnet mask for Ethernet port 1.

Attention: When you set the IPv4 subnet mask address, ensure that you type the
correct address. Otherwise, you might not be able to access the system using the
command-line tools or the management GUI.

Perform the following steps to set the subnet mask:


1. Navigate to the IPv4 Subnet panel.
2. Press the select button. The first subnet mask number is highlighted.
3. Press the up button if you want to increase the value that is highlighted; press
the down button if you want to decrease that value. If you want to quickly
increase the highlighted value, hold the up button. If you want to quickly
decrease the highlighted value, hold the down button.

Note: If you want to disable the fast increase or decrease function, press and
hold the down button, press and release the select button, and then release the
down button. The disabling of the fast increase or decrease function lasts until
the creation is completed or until the feature is again enabled. If the up button

114 SAN Volume Controller: Troubleshooting Guide


or down button is pressed and held while the function is disabled, the value
increases or decreases once every two seconds. To again enable the fast increase
or decrease function, press and hold the up button, press and release the select
button, and then release the up button.
4. Press the right button or left button to move to the number field that you want
to set.
5. Repeat steps 3 and 4 for each number field that you want to set.
6. Press the select button to confirm the settings. Otherwise, press the right button
to display the next secondary option or press the left button to display the
previous options.

IPv4 Gateway option

Using this option, you can set the IPv4 gateway address for Ethernet port 1.

Attention: When you set the IPv4 gateway address, ensure that you type the
correct address. Otherwise, you might not be able to access the system using the
command-line tools or the management GUI.

Perform the following steps to set the IPv4 gateway address:


1. Navigate to the IPv4 Gateway panel.
2. Press the select button. The first gateway address number field is highlighted.
3. Press the up button if you want to increase the value that is highlighted; press
the down button if you want to decrease that value. If you want to quickly
increase the highlighted value, hold the up button. If you want to quickly
decrease the highlighted value, hold the down button.

Note: If you want to disable the fast increase or decrease function, press and
hold the down button, press and release the select button, and then release the
down button. The disabling of the fast increase or decrease function lasts until
the creation is completed or until the feature is again enabled. If the up button
or down button is pressed and held while the function is disabled, the value
increases or decreases once every two seconds. To again enable the fast increase
or decrease function, press and hold the up button, press and release the select
button, and then release the up button.
4. Press the right button or left button to move to the number field that you want
to set.
5. Repeat steps 3 and 4 for each number field that you want to set.
6. Press the select button to confirm the settings. Otherwise, press the right button
to display the next secondary option or press the left button to display the
previous options.

IPv4 Confirm Create? option

Using this option, you can start an operation to create a clustered system with an
IPv4 address.
1. Press and release the left or right button until IPv4 Confirm Create? is
displayed.
2. Press the select button to start the operation.
If the create operation is successful, Password is displayed on line 1. The
password that you can use to access the system is displayed on line 2. Be sure
to immediately record the password; it is required on the first attempt to
manage the system from the management GUI.

Chapter 6. Using the front panel of the SAN Volume Controller 115
Attention: The password displays for only 60 seconds, or until a front panel
button is pressed. The clustered system is created only after the password
display is cleared.
If the create operation fails, Create Failed: is displayed on line 1 of the
front-panel display screen. Line 2 displays one of two possible error codes that
you can use to isolate the cause of the failure.

IPv6 Address option

Using this option, you can set the IPv6 address for Ethernet port 1 of the system
that you are going to create. The clustered system can have either an IPv4 or an
IPv6 address, or both at the same time. You can set either the IPv4 or IPv6
management address for Ethernet port 1 from the front panel when you are
creating the system. If required, you can add more management IP addresses from
the CLI.

Attention: When you set the IPv6 address, ensure that you type the correct
address. Otherwise, you might not be able to access the system using the
command-line tools or the management GUI.

Perform the following steps to set the IPv6 address:


1. From the Create Cluster? option, press the select button, and then press the
down button. The IPv6 Address option is displayed.
2. Press the select button again. The first IPv6 address number is highlighted. .
3. Move between the address panels by using the left button or right button. The
IPv6 addresses and the IPv6 gateway addresses consist of eight (4-digit)
hexadecimal values that are shown across four panels
4. You can change each number in the address independently. Press the up button
if you want to increase the value that is highlighted; press the down button if
you want to decrease that value.
5. Press the right button or left button to move to the number field that you want
to set.
6. Repeat steps 3 and 4 for each number field that you want to set.
7. Press the select button to confirm the settings. Otherwise, press the right button
to display the next secondary option or press the left button to display the
previous options.

IPv6 Prefix option

Using this option, you can set the IPv6 prefix for Ethernet port 1.

Attention: When you set the IPv6 prefix, ensure that you type the correct
network prefix.Otherwise, you might not be able to access the system using the
command-line tools or the management GUI.

Perform the following steps to set the IPv6 prefix:

Note: This option is restricted to a value 0 - 127.


1. Press and release the left or right button until IPv6 Prefix is displayed.
2. Press the select button. The first prefix number field is highlighted.
3. Press the up button if you want to increase the value that is highlighted; press
the down button if you want to decrease that value. If you want to quickly

116 SAN Volume Controller: Troubleshooting Guide


increase the highlighted value, hold the up button. If you want to quickly
decrease the highlighted value, hold the down button.

Note: If you want to disable the fast increase or decrease function, press and
hold the down button, press and release the select button, and then release the
down button. The disabling of the fast increase or decrease function lasts until
the creation is completed or until the feature is again enabled. If the up button
or down button is pressed and held while the function is disabled, the value
increases or decreases once every two seconds. To again enable the fast increase
or decrease function, press and hold the up button, press and release the select
button, and then release the up button.
4. Press the select button to confirm the settings. Otherwise, press the right button
to display the next secondary option or press the left button to display the
previous options.

IPv6 Gateway option

Using this option, you can set the IPv6 gateway for Ethernet port 1.

Attention: When you set the IPv6 gateway address, ensure that you type the
correct address. Otherwise, you might not be able to access the system using the
command-line tools or the management GUI.

Perform the following steps to set the IPv6 gateway address:


1. Press and release the left or right button until IPv6 Gateway is displayed.
2. Press the select button. The first gateway address number is highlighted. The
IPv6 addresses and the IPv6 gateway addresses consist of eight (4-digit)
hexadecimal values that are shown across four panels.
3. You can change each number in the address independently. Press the up button
if you want to increase the value that is highlighted; press the down button if
you want to decrease that value.
4. Press the right button or left button to move to the number field that you want
to set.
5. Repeat steps 3 and 4 for each number field that you want to set.
6. Press the select button to confirm the settings. Otherwise, press the right button
to display the next secondary option or press the left button to display the
previous options.

IPv6 Confirm Create? option

Using this option, you can start an operation to create a clustered system with an
IPv6 address.
1. Press and release the left or right button until IPv6 Confirm Create? is
displayed.
2. Press the select button to start the operation.
If the create operation is successful, Password is displayed on line 1. The
password that you can use to access the system is displayed on line 2. Be sure
to immediately record the password; it is required on the first attempt to
manage the system from the management GUI.
Attention: The password displays for only 60 seconds, or until a front panel
button is pressed. The clustered system is created only after the password
display is cleared.

Chapter 6. Using the front panel of the SAN Volume Controller 117
If the create operation fails, Create Failed: is displayed on line 1 of the
front-panel display screen. Line 2 displays one of two possible error codes that
you can use to isolate the cause of the failure.

Service IPv4 or Service IPv6 options


You can use the front panel to change a service IPv4 address or a service IPv6
address.

IPv4 Address option

The IPv4 Address panels show one of the following items for the selected Ethernet
port:
v The active service address if the system has an IPv4 address. This address can be
either a configured or fixed address, or it can be an address obtained through
DHCP.
v DHCP Failed if the IPv4 service address is configured for DHCP but the node
was unable to obtain an IP address.
v DHCP Configuring if the IPv4 service address is configured for DHCP while the
node attempts to obtain an IP address. This address changes to the IPv4 address
automatically if a DHCP address is allocated and activated.
v A blank line if the system does not have an IPv4 address.

If the service IPv4 address was not set correctly or a DHCP address was not
allocated, you have the option of correcting the IPv4 address from this panel. The
service IP address must be in the same subnet as the management IP address.

To set a fixed service IPv4 address from the IPv4 Address: panel, perform the
following steps:
1. Press and release the select button to put the panel in edit mode.
2. Press the right button or left button to move to the number field that you want
to set.
3. Press the up button if you want to increase the value that is highlighted; press
the down button if you want to decrease that value. If you want to quickly
increase the highlighted value, hold the up button. If you want to quickly
decrease the highlighted value, hold the down button.

Note: If you want to disable the fast increase or decrease function, press and
hold the down button, press and release the select button, and then release the
down button. The disabling of the fast increase or decrease function lasts until
the creation is completed or until the feature is again enabled. If the up button
or down button is pressed and held while the function is disabled, the value
increases or decreases once every two seconds. To again enable the fast increase
or decrease function, press and hold the up button, press and release the select
button, and then release the up button.
4. When all the fields are set as required, press and release the select button to
activate the new IPv4 address.
The IPv4 Address: panel is displayed. The new service IPv4 address is not
displayed until it has become active. If the new address has not been displayed
after 2 minutes, check that the selected address is valid on the subnetwork and
that the Ethernet switch is working correctly.

118 SAN Volume Controller: Troubleshooting Guide


IPv6 Address option

The IPv6 Address panels show one of the following conditions for the selected
Ethernet port:
v The active service address if the system has an IPv6 address. This address can be
either a configured or fixed address, or it can be an address obtained through
DHCP.
v DHCP Failed if the IPv6 service address is configured for DHCP but the node
was unable to obtain an IP address.
v DHCP Configuring if the IPv6 service address is configured for DHCP while the
node attempts to obtain an IP address. This changes to the IPv6 address
automatically if a DHCP address is allocated and activated.
v A blank line if the system does not have an IPv6 address.

If the service IPv6 address was not set correctly or a DHCP address was not
allocated, you have the option of correcting the IPv6 address from this panel. The
service IP address must be in the same subnet as the management IP address.

To set a fixed service IPv6 address from the IPv6 Address: panel, perform the
following steps:
1. Press and release the select button to put the panel in edit mode. When the
panel is in edit mode, the full address is still shown across four panels as eight
(four-digit) hexadecimal values. You edit each digit of the hexadecimal values
independently. The current digit is highlighted.
2. Press the right button or left button to move to the number field that you want
to set.
3. Press the up button if you want to increase the value that is highlighted; press
the down button if you want to decrease that value.
4. When all the fields are set as required, press and release the select button to
activate the new IPv6 address.
The IPv6 Address: panel is displayed. The new service IPv6 address is not
displayed until it has become active. If the new address has not been displayed
after 2 minutes, check that the selected address is valid on the subnetwork and
that the Ethernet switch is working correctly.

Service DHCPv4 or DHCPv6 options


The active service address for a system can be either a configured or fixed address,
or it can be an address obtained through DHCP.

If a service IP address does not exist, you must assign a service IP address or use
DHCP with this action.

To set the service IPv4 address to use DHCP, perform the following steps:
1. Press and release the up or down button until Service DHCPv4? is displayed.
2. Press and release the down button. Confirm DHCPv4? is displayed.
3. Press and release the select button to activate DHCP, or you can press and
release the up button to keep the existing address.
4. If you activate DHCP, DHCP Configuring is displayed while the node attempts
to obtain a DHCP address. It changes automatically to show the allocated
address if a DHCP address is allocated and activated, or it changes to DHCP
Failed if a DHCP address is not allocated.

To set the service IPv6 address to use DHCP, perform the following steps:

Chapter 6. Using the front panel of the SAN Volume Controller 119
1. Press and release the up or down button until Service DHCPv6? is displayed.
2. Press and release the down button. Confirm DHCPv6? is displayed.
3. Press and release the select button to activate DHCP, or you can press and
release the up button to keep the existing address.
4. If you activate DHCP, DHCP Configuring is displayed while the node attempts
to obtain a DHCP address. It changes automatically to show the allocated
address if a DHCP address is allocated and activated, or it changes to DHCP
Failed if a DHCP address is not allocated.

Note: If an IPv6 router is present on the local network, SAN Volume Controller
does not differentiate between an autoconfigured address and a DHCP address.
Therefore, SAN Volume Controller uses the first address that is detected.

Change WWNN? option


The Change WWNN? option displays the last five hexadecimal digits of the
WWNN that is being used by the node. Only the last five digits of a WWNN vary
on a node. The first 11 digits are always 50050768010.

To edit the WWNN, complete the following steps:

Important: Only change the WWNN when you are instructed to do so by a service
procedure. Nodes must always have a unique WWNN. If you change the WWNN,
you might have to reconfigure hosts and the SAN zoning.
1. Press and release the up or down button until Actions is displayed.
2. Press and release the select button.
3. Press and release the up or down button until Change WWNN? is displayed on
line 1. Line 2 of the display shows the last five numbers of the WWNN that is
currently set. The first number is highlighted.
4. Edit the highlighted number to match the number that is required. Use the up
and down buttons to increase or decrease the numbers. The numbers wrap F to
0 or 0 to F. Use the left and right buttons to move between the numbers.
5. When the highlighted value matches the required number, press and release the
select button to activate the change. The Node WWNN: panel displays and the
second line shows the last five characters of the changed WWNN.

Enter Service? option


You can enter service state from the Enter Service? option. Service state can be
used to remove a node from a candidate list or to prevent it from being readded to
a clustered system.

If the node is active, entering service state can cause disruption to hosts if other
faults exist in the system.While in service state, the node cannot join or run as part
of a clustered system.

To exit service state, ensure that all errors are resolved. You can exit service state
by using the Exit Service? option or by restarting the node.

Exit Service? option


You can exit service state from the Exit Service? option. This action releases the
node from the service state.

If there are no noncritical errors, the node enters candidate state. If possible, the
node then becomes active in a clustered system.

120 SAN Volume Controller: Troubleshooting Guide


To exit service state, ensure that all errors are resolved. You can exit service state
by using this option or by restarting the node.

Recover Cluster? option


You can recover an entire clustered system if the data has been lost from all nodes
by using the Recover Cluster? option.

Perform service actions on nodes only when directed by the service procedures. If
used inappropriately, service actions can cause loss of access to data or data loss.

For information about the recover system procedure, see “Recover system
procedure” on page 211.

Remove Cluster? option


The Remove Cluster? option deletes the system state data from the node.

Use this option as the final step in decommissioning a clustered system after the
other nodes have been removed from the system using the command-line interface
(CLI) or the management GUI.

Attention: Use the front panel to remove state data from a single node system. To
remove a node from a multi-node system, always use the CLI or the remove node
options from the management GUI.

From the Remove Cluster? panel, perform the following steps to delete the state
data from the node:
1. Press and hold the up button.
2. Press and release the select button.
3. Release the up button.
After the option is run, the node shows Cluster: with no system name. If this
option is performed on a node that is still a member of a system, the system shows
error 1195, Node missing, and the node is displayed in the list of nodes in the
system. Remove the node by using the management GUI or CLI.

Paced Upgrade? option


Use this option to control the time when individual nodes are upgraded within a
concurrent code upgrade.

Note: This action can be used only when the following conditions exist for the
node:
v The node is in service state.
v The node has no errors.
v The node has been removed from the clustered system.

For additional information, see the “Upgrading the software manually” topic in the
information center.

Set FC Speed? option


You can change the speed of the Fibre Channel ports on a SAN Volume Controller
by using the Set FC Speed? option

Note: This option is available only on SAN Volume Controller 2145-8F2 nodes.

Chapter 6. Using the front panel of the SAN Volume Controller 121
Reset Password? option
The Reset Password? option is useful if the system superuser password has been
lost or forgotten.

Use the Reset password? option if the user has lost the system superuser password
or if the user is unable to access the system. If it is permitted by the user's
password security policy, use this selection to reset the system superuser password.

If your password security policy permits password recovery, and if the node is
currently a member of a clustered system, the system superuser password is reset
and a new password is displayed for 60 seconds. If your password security policy
does not permit password recovery or the node is not a member of a system,
completing these steps has no effect.

If the node is in active state when the password is reset, the reset applies to all
nodes in the system. If the node is in candidate or service state when the password
is reset, the reset applies only to the single node.

Rescue Node? option


You can start the automatic software recovery for this node by using the Rescue
Node? option.

Note: Another way to rescue a node is to force a node rescue when the node
boots. It is the preferred method. Forcing a node rescue when a node boots works
by booting the operating system from the service controller and running a program
that copies all the SAN Volume Controller software from any other node that can
be found on the Fibre Channel fabric. See “Performing the node rescue when the
node boots” on page 226.

Exit Actions? option


Return to the main menu by selecting the Exit Actions? option.

Language? option
You can change the language that displays on the front panel.

Before you begin

The Language? option allows you to change the language that is displayed on the
menu. Figure 65 shows the Language? option sequence.

Language?
Select
svc00410

English? Japanese?

Figure 65. Language? navigation

The following languages are available:


v English
v Japanese

122 SAN Volume Controller: Troubleshooting Guide


About this task

To select the language that you want to be used on the front panel, perform the
following steps:

Procedure
1. Press and release the up or down button until Language? is displayed.
2. Press and release the select button.
3. Use the left and right buttons to move to the language that you want. The
translated language names are displayed in their own character set. If you do
not understand the language that is displayed, wait for at least 60 seconds for
the menu to reset to the default option.
4. Press and release the select button to select the language that is displayed.

Results

If the selected language uses the Latin alphabet, the front panel display shows two
lines. The panel text is displayed on the first line and additional data is displayed
on the second line.

If the selected language does not use the Latin alphabet, the display shows only
one line at a time to clearly display the character font. For those languages, you
can switch between the panel text and the additional data by pressing and
releasing the select button.

Additional data is unavailable when the front panel displays a menu option, which
ends with a question mark (?). In this case, press and release the select button to
choose the menu option.

Note: You cannot select another language when the node is displaying a boot
error.

Using the power control for the SAN Volume Controller node
SAN Volume Controller nodes are powered by an uninterruptible power supply
that is located in the same rack as the nodes.

The power state of the SAN Volume Controller is displayed by a power indicator
on the front panel. If the uninterruptible power supply battery is not sufficiently
charged to enable the SAN Volume Controller to become fully operational, its
charge state is displayed on the front panel display of the node.

The power to a SAN Volume Controller is controlled by the power button on the
front panel of the node. Never turn off the node by removing the power cable. You
might lose data. For more information about how to power off the node, see “MAP
5350: Powering off a SAN Volume Controller node” on page 258.

If the SAN Volume Controller software is running and you request it to power off
from the management GUI, CLI, or power button, the node starts its power off
processing. During this time, the node indicates the progress of the power-off
operation on the front panel display. After the power-off processing is complete,
the front panel becomes blank and the front panel power light flashes. It is safe for
you to remove the power cable from the rear of the node. If the power button on

Chapter 6. Using the front panel of the SAN Volume Controller 123
the front panel is pressed during power-off processing, the front panel display
changes to indicate that the node is being restarted, but the power-off process
completes before the restart is performed.

If the SAN Volume Controller software is not running when the front panel power
button is pressed, the node immediately powers off.

Note: The 2145 UPS-1U does not power off when the node is shut down from the
power button.

If you turn off a node using the power button or by a command, the node is put
into a power-off state. The SAN Volume Controller remains in this state until the
power cable is connected to the rear of the node and the power button is pressed.

During the startup sequence, the SAN Volume Controller tries to detect the status
of the uninterruptible power supply through the uninterruptible power supply
signal cable. If an uninterruptible power supply is not detected, the node pauses
and an error is shown on the front panel display. If the uninterruptible power
supply is detected, the software monitors the operational state of the
uninterruptible power supply. If no uninterruptible power supply errors are
reported and the uninterruptible power supply battery is sufficiently charged, the
SAN Volume Controller becomes operational. If the uninterruptible power supply
battery is not sufficiently charged, the charge state is indicated by a progress bar
on the front panel display. When an uninterruptible power supply is first turned
on, it might take up to two hours before the battery is sufficiently charged for the
SAN Volume Controller node to become operational.

If input power to the uninterruptible power supply is lost, the node immediately
stops all I/O operations and saves the contents of its dynamic random access
memory (DRAM) to the internal disk drive. While data is being saved to the disk
drive, a Power Failure message is shown on the front panel and is accompanied
by a descending progress bar that indicates the quantity of data that remains to be
saved. After all the data is saved, the node is turned off and the power light on the
front panel turns off.

Note: The node is now in standby state. If the input power to the uninterruptible
power supply unit is restored, the node restarts. If the uninterruptible power
supply battery was fully discharged, Charging is displayed and the boot process
waits for the battery to charge. When the battery is sufficiently charged, Booting is
displayed, the node is tested, and the software is loaded. When the boot process is
complete, Recovering is displayed while the uninterruptible power supply finalizes
its charge. While Recovering is displayed, the system can function normally.
However, when the power is restored after a second power failure, there is a delay
(with Charging displayed) before the node can complete its boot process.

124 SAN Volume Controller: Troubleshooting Guide


Chapter 7. Diagnosing problems
You can diagnose problems with by using either the command-line interface (CLI)
or the management GUI. The diagnostic LEDs on the SAN Volume Controller
nodes and uninterruptible power supply units also help you diagnose hardware
problems.

Event logs

By understanding the event log, you can do the following tasks:


v Manage the event log
v View the event log
v Describe the fields in the event log

Error codes

The following topics provide information to help you understand and process the
error codes:
v Event reporting
v Understanding the events
v Understanding the error codes
v Determining a hardware boot failure

If the node is showing a boot message, failure message, or node error message,
and you determined that the problem was caused by a software or firmware
failure, you can restart the node to see if that might resolve the problem. Perform
the following steps to properly shut down and restart the node:
1. Follow the instructions in “MAP 5350: Powering off a SAN Volume Controller
node” on page 258.
2. Restart only one node at a time.
3. Do not shut down the second node in an I/O group for at least 30 minutes
after you shut down and restart the first node.

Event reporting
Events that are detected are saved in an event log. As soon as an entry is made in
this event log, the condition is analyzed. If any service activity is required, a
notification is sent.

Event reporting process

The following methods are used to notify you and the IBM Support Center of a
new event:
v The most serious system error code is displayed on the front panel of each node
in the system.
v If you enabled Simple Network Management Protocol (SNMP), an SNMP trap is
sent to an SNMP manager that is configured by the customer.
The SNMP manager might be IBM Systems Director, if it is installed, or another
SNMP manager.

© Copyright IBM Corp. 2003, 2012 125


v If enabled, log messages can be forwarded from a sender to a receiver on an IP
network by using the syslog protocol.
v If enabled, event notifications can be forwarded from a sender to a receiver
through Call Home email.
v If Call Home is enabled, critical faults generate a problem management record
(PMR) that is sent directly to the appropriate IBM Support Center.

Power-on self-test
When you turn on the SAN Volume Controller, the system board performs
self-tests. During the initial tests, the hardware boot symbol is displayed.

All models perform a series of tests to check the operation of components and
some of the options that have been installed when the units are first turned on.
This series of tests is called the power-on self-test (POST).

If a critical failure is detected during the POST, the software is not loaded and the
system error LED on the operator information panel is illuminated. If this failure
occurs, use “MAP 5000: Start” on page 231 to help isolate the cause of the failure.

When the software is loaded, additional testing takes place, which ensures that all
of the required hardware and software components are installed and functioning
correctly. During the additional testing, the word Booting is displayed on the front
panel along with a boot progress code and a progress bar. If a test failure occurs,
the word Failed is displayed on the front panel.

The service controller performs internal checks and is vital to the operation of the
SAN Volume Controller. If the error (check) LED is illuminated on the service
controller front panel, the front-panel display might not be functioning correctly
and you can ignore any message displayed.

The uninterruptible power supply also performs internal tests. If the


uninterruptible power supply reports the failure condition, the SAN Volume
Controller displays critical failure information about the front-panel display or
sends noncritical failure information to the event log. If the SAN Volume
Controller cannot communicate with the uninterruptible power supply, it displays
a boot failure error message on the front-panel display. Further problem
determination information might also be displayed on the front panel of the
uninterruptible power supply.

Understanding events
When a significant change in status is detected, an event is logged in the event log.

Error data

Events are classified as either alerts or messages:


v An alert is logged when the event requires some action. Some alerts have an
associated error code that defines the service action that is required. The service
actions are automated through the fix procedures. If the alert does not have an
error code, the alert represents an unexpected change in state. This situation
must be investigated to see if it is expected or represents a failure. Investigate an
alert and resolve it as soon as it is reported.
v A message is logged when a change that is expected is reported, for instance, an
IBM FlashCopy operation completes.

126 SAN Volume Controller: Troubleshooting Guide


Managing the event log
The event log has a limited size. After it is full, newer entries replace entries that
are no longer required.

To avoid having a repeated event that fills the event log, some records in the event
log refer to multiple occurrences of the same event. When event log entries are
coalesced in this way, the time stamp of the first occurrence and the last occurrence
of the problem is saved in the log entry. A count of the number of times that the
error condition has occurred is also saved in the log entry. Other data refers to the
last occurrence of the event.

Viewing the event log


You can view the event log by using the management GUI or the command-line
interface (CLI).

About this task

You can view the event log by using the Monitoring > Events options in the
management GUI. The event log contains many entries. You can, however, select
only the type of information that you need.

You can also view the event log by using the command-line interface (lseventlog).
See the “Command-line interface” topic for the command details.

Describing the fields in the event log


The event log includes fields with information that you can use to diagnose
problems.

Table 43 describes some of the fields that are available to assist you in diagnosing
problems.
Table 43. Description of data fields for the event log
Data field Description
Event ID This number precisely identifies why the event was logged.
Error code This number describes the service action that should be followed to
resolve an error condition. Not all events have error codes that are
associated with them. Many event IDs can have the same error code
because the service action is the same for all the events.
Sequence number A number that identifies the event.
Event count The number of events coalesced into this event log record.
Object type The object type to which the event log relates.
Object ID A number that uniquely identifies the instance of the object.
Fixed When an alert is shown for an error condition, it indicates if the
reason for the event was resolved. In many cases, the system
automatically marks the events fixed when appropriate. There are
some events that must be manually marked as fixed. If the event is a
message, this field indicates that you have read and performed the
action. The message must be marked as read.
First time The time when this error event was reported. If events of a similar
type are being coalesced together, so that one event log record
represents more than one event, this field is the time the first error
event was logged.

Chapter 7. Diagnosing problems 127


Table 43. Description of data fields for the event log (continued)
Data field Description
Last time The time when the last instance of this error event was recorded in the
log.
Root sequence If set, this number is the sequence number of an event that represents
number an error that probably caused this event to be reported. Resolve the
root event first.
Sense data Additional data that gives the details of the condition that caused the
event to be logged.

Event notifications
The SAN Volume Controller product can use Simple Network Management
Protocol (SNMP) traps, syslog messages, and Call Home email to notify you and
the IBM Support Center when significant events are detected. Any combination of
these notification methods can be used simultaneously. Notifications are normally
sent immediately after an event is raised. However, there are some events that
might occur because of service actions that are being performed. If a recommended
service action is active, these events are notified only if they are still unfixed when
the service action completes.

Each event that SAN Volume Controller detects is assigned a notification type of
Error, Warning, or Information. When you configure notifications, you specify
where the notifications should be sent and which notification types are sent to that
recipient.

Table 44 describes the types of event notifications.


Table 44. Notification types
Notification type Description
Error An error notification is sent to indicate a problem that must be
corrected as soon as possible.

This notification indicates a serious problem with the SAN Volume


Controller. For example, the event that is being reported could
indicate a loss of redundancy in the system, and it is possible that
another failure could result in loss of access to data. The most
typical reason that this type of notification is sent is because of a
hardware failure, but some configuration errors or fabric errors also
are included in this notification type. Error notifications can be
configured to be sent as a Call Home email to the IBM Support
Center.
Warning A warning notification is sent to indicate a problem or unexpected
condition with the SAN Volume Controller. Always immediately
investigate this type of notification to determine the effect that it
might have on your operation, and make any necessary corrections.

A warning notification does not require any replacement parts and


therefore should not require IBM Support Center involvement. The
allocation of notification type Warning does not imply that the
event is less serious than one that has notification type Error.
Information An informational notification is sent to indicate that an expected
event has occurred: for example, a FlashCopy operation has
completed. No remedial action is required when these notifications
are sent.

128 SAN Volume Controller: Troubleshooting Guide


Events with notification type Error or Warning are shown as alerts in the event log.
Events with notification type Information are shown as messages.

SNMP traps

Simple Network Management Protocol (SNMP) is a standard protocol for


managing networks and exchanging messages. The system can send SNMP
messages that notify personnel about an event. You can use an SNMP manager to
view the SNMP messages that the system sends. You can use the management GUI
or the command-line interface to configure and modify your SNMP settings.

You can use the Management Information Base (MIB) file for SNMP to configure a
network management program to receive SNMP messages that are sent by the
system. This file can be used with SNMP messages from all versions of the
software. More information about the MIB file for SNMP is available at this
website:

www.ibm.com/storage/support/2145

Search for MIB. Go to the downloads results to find Management Information


Base (MIB) file for SNMP. Click this link to find download options.

Syslog messages

The syslog protocol is a standard protocol for forwarding log messages from a
sender to a receiver on an IP network. The IP network can be either IPv4 or IPv6.
The system can send syslog messages that notify personnel about an event. The
system can transmit syslog messages in either expanded or concise format. You can
use a syslog manager to view the syslog messages that the system sends. The
system uses the User Datagram Protocol (UDP) to transmit the syslog message.
You can use the management GUI or the SAN Volume Controller command-line
interface to configure and modify your syslog settings.

Table 45 shows how SAN Volume Controller notification codes map to syslog
security-level codes.
Table 45. SAN Volume Controller notification types and corresponding syslog level codes
SAN Volume Controller
notification type Syslog level code Description
ERROR LOG_ALERT Fault that might require
hardware replacement that
needs immediate attention.
WARNING LOG_ERROR Fault that needs immediate
attention. Hardware
replacement is not expected.
INFORMATIONAL LOG_INFO Information message used,
for example, when a
configuration change takes
place or an operation
completes.
TEST LOG_DEBUG Test message

Table 46 on page 130 shows how SAN Volume Controller values of user-defined
message origin identifiers map to syslog facility codes.

Chapter 7. Diagnosing problems 129


Table 46. SAN Volume Controller values of user-defined message origin identifiers and
syslog facility codes
SAN Volume
Controller value Syslog value Syslog facility code Message format
0 16 LOG_LOCAL0 Full
1 17 LOG_LOCAL1 Full
2 18 LOG_LOCAL2 Full
3 19 LOG_LOCAL3 Full
4 20 LOG_LOCAL4 Concise
5 21 LOG_LOCAL5 Concise
6 22 LOG_LOCAL6 Concise
7 23 LOG_LOCAL7 Concise

Call Home email

The Call Home feature transmits operational and event-related data to you and
IBM through a Simple Mail Transfer Protocol (SMTP) server connection in the form
of an event notification email. When configured, this function alerts IBM service
personnel about hardware failures and potentially serious configuration or
environmental issues.

To send email, you must configure at least one SMTP server. You can specify as
many as five additional SMTP servers for backup purposes. The SMTP server must
accept the relaying of email from the SAN Volume Controller management IP
address. You can then use the management GUI or the SAN Volume Controller
command-line interface to configure the email settings, including contact
information and email recipients. Set the reply address to a valid email address.
Send a test email to check that all connections and infrastructure are set up
correctly. You can disable the Call Home function at any time using the
management GUI or the SAN Volume Controller command-line interface.

Data that is sent with notifications

Notifications can be sent using email, SNMP, or syslog. The data sent for each type
of notification is the same. It includes:
v Record type
v Machine type
v Machine serial number
v Error ID
v Error code
v Software version
v FRU part number
v Cluster (system) name
v Node ID
v Error sequence number
v Time stamp
v Object type
v Object ID
v Problem data
130 SAN Volume Controller: Troubleshooting Guide
Emails contain the following additional information that allow the Support Center
to contact you:
v Contact names for first and second contacts
v Contact phone numbers for first and second contacts
v Alternate contact numbers for first and second contacts
v Offshift phone number
v Contact email address
v Machine location

To send data and notifications to IBM service personnel, use one of the following
email addresses:
v For SAN Volume Controller nodes located in North America, Latin America,
South America or the Caribbean Islands, use [email protected]
v For SAN Volume Controller nodes located anywhere else in the world, use
[email protected]

Inventory information email


An inventory information email summarizes the hardware components and
configuration of a system. IBM service personnel can use this information to
contact you when relevant software upgrades are available or when an issue that
can affect your configuration is discovered. It is a good practice to enable
inventory reporting.

Because inventory information is sent using the Call Home email function, you
must meet the Call Home function requirements and enable the Call Home email
function before you can attempt to send inventory information email. You can
adjust the contact information, adjust the frequency of inventory email, or
manually send an inventory email using the management GUI or the SAN Volume
Controller command-line interface.

Inventory information that is sent to IBM includes the following information about
the clustered system on which the Call Home function is enabled. Sensitive
information such as IP addresses is not included.
v Licensing information
v Details about the following objects and functions:
Drives
External storage systems
Hosts
MDisks
Volumes
RAID types
Easy Tier
FlashCopy
Metro Mirror and Global Mirror
For detailed information about what is included in the Call Home inventory
information, configure the system to send an inventory email to yourself.

Understanding the error codes


Error codes are generated by the event-log analysis and system configuration code.

Chapter 7. Diagnosing problems 131


Error codes help you to identify the cause of a problem, the failing
field-replaceable units (FRUs), and the service actions that might be needed to
solve the problem.

Note: If more than one error occurs during an operation, the highest priority error
code displays on the front panel. The lower the number for the error code, the
higher the priority. For example, error code 1020 has a higher priority than error
code 1370.

Using the error code tables


The error code tables list the various error codes and describe the actions that you
can take.

About this task

Perform the following steps to use the error code tables:

Procedure
1. Locate the error code in one of the tables. If you cannot find a particular code
in any table, call IBM Support Center for assistance.
2. Read about the action you must perform to correct the problem. Do not
exchange field replaceable units (FRUs) unless you are instructed to do so.
3. Normally, exchange only one FRU at a time, starting from the top of the FRU
list for that error code.

Event IDs
The SAN Volume Controller software generates events, such as informational
events and error events. An event ID or number is associated with the event and
indicates the reason for the event.

Informational events provide information about the status of an operation.


Informational events are recorded in the event log, and depending on the
configuration, can be notified through email, SNMP, or syslog.

Error events are generated when a service action is required. An error event maps
to an alert with an associated error code. Depending on the configuration, error
events can be notified through email, SNMP, or syslog.

Informational events
The informational events provide information about the status of an operation.

Informational events are recorded in the event log and, depending on the
configuration, can be notified through email, SNMP, or syslog.

Informational events can be either notification type I (information) or notification


type W (warning). An informational event report of type (W) might require user
attention. Table 47 provides a list of informational events, the notification type, and
the reason for the event.
Table 47. Informational events
Notification
Event ID type Description
980221 I The error log is cleared.

132 SAN Volume Controller: Troubleshooting Guide


Table 47. Informational events (continued)
Notification
Event ID type Description
980230 I The SSH key was discarded for the service login user.
980231 I User name has changed.
980301 I Degraded or offline managed disk is now online.
980310 I A degraded or offline storage pool is now online.
980320 I Offline volume is now online.
980321 W Volume is offline because of degraded or offline
storage pool.
980330 I All nodes can see the port.
980340 I All ports in this host are now logged in.
980341 W One or more ports in this host is now degraded.
980342 W One or more ports in this host is now offline.
980343 W All ports in this host are now offline.
980349 I A node has been successfully added to the cluster
(system).
980350 I The node is now a functional member of the cluster
(system).
980351 I A noncritical hardware error occurred.
980352 I Attempt to automatically recover offline node
starting.
980370 I Both nodes in the I/O group are available.
980371 I One node in the I/O group is unavailable.
980372 W Both nodes in the I/O group are unavailable.
980392 I Cluster (system) recovery completed.
980435 W Failed to obtain directory listing from remote node.
980440 W Failed to transfer file from remote node.
980445 I The migration is complete.
980446 I The secure delete is complete.
980501 W The virtualization amount is close to the limit that is
licensed.
980502 W The FlashCopy feature is close to the limit that is
licensed.
980503 W The Metro Mirror or Global Mirror feature is close to
the limit that is licensed.
981002 I Fibre Channel discovery occurred; configuration
changes are pending.
981003 I Fibre Channel discovery occurred; configuration
changes are complete.
981004 I Fibre Channel discovery occurred; no configuration
changes were detected.
981007 W The managed disk is not on the preferred path.
981009 W The initialization for the managed disk failed.

Chapter 7. Diagnosing problems 133


Table 47. Informational events (continued)
Notification
Event ID type Description
981014 W The LUN discovery has failed. The cluster (system)
has a connection to a device through this node but
this node cannot discover the unmanaged or
managed disk that is associated with this LUN.
981015 W The LUN capacity equals or exceeds the maximum.
Only part of the disk can be accessed.
981020 W The managed disk error count warning threshold has
been met.
981022 I Managed disk offline imminent, offline prevention
started
981025 I Drive firmware download started
981026 I Drive FPGA download started
981101 I SAS discovery occurred; no configuration changes
were detected.
981102 I SAS discovery occurred; configuration changes are
pending.
981103 I SAS discovery occurred; configuration changes are
complete.
981104 W The LUN capacity equals or exceeds the maximum
capacity. Only the first 1 PB of disk will be accessed.
981105 I The drive format has started.
981106 I The drive recovery was started.
982003 W Insufficient virtual extents.
982004 W The migration suspended because of insufficient
virtual extents or too many media errors on the
source managed disk.
982007 W Migration has stopped.
982009 I Migration is complete.
982010 W Copied disk I/O medium error.
983001 I The FlashCopy operation is prepared.
983002 I The FlashCopy operation is complete.
983003 W The FlashCopy operation has stopped.
984001 W First customer data being pinned in a virtual disk
working set.
984002 I All customer data in a virtual disk working set is
now unpinned.
984003 W The volume working set cache mode is in the process
of changing to synchronous destage because the
volume working set has too much pinned data.
984004 I Volume working set cache mode updated to allow
asynchronous destage because enough customer data
has been unpinned for the volume working set.
984506 I The debug from an IERR was extracted to disk.
984507 I An attempt was made to power on the slots.
984508 I All the expanders on the strand were reset.

134 SAN Volume Controller: Troubleshooting Guide


Table 47. Informational events (continued)
Notification
Event ID type Description
984509 I The component firmware update paused to allow the
battery charging to finish.
984511 I The update for the component firmware paused
because the system was put into maintenance mode.
984512 I A component firmware update is needed but is
prevented from running.
985001 I The Metro Mirror or Global Mirror background copy
is complete.
985002 I The Metro Mirror or Global Mirror is ready to restart.
985003 W Unable to find path to disk in the remote cluster
(system) within the timeout period.
986001 W The thin-provisioned volume copy data in a node is
pinned.
986002 I All thin-provisioned volume copy data in a node is
unpinned.
986010 I The thin-provisioned volume copy import has failed
and the new volume is offline; either upgrade the
SAN Volume Controller software to the required
version or delete the volume.
986011 I The thin-provisioned volume copy import is
successful.
986020 W A thin-provisioned volume copy space warning has
occurred.
986030 I A thin-provisioned volume copy repair has started.
986031 I A thin-provisioned volume copy repair is successful.
986032 I A thin-provisioned volume copy validation is started.
986033 I A thin-provisioned volume copy validation is
successful.
986034 I The import of the compressed-virtual volume copy
was successful.
986035 W A compressed-virtual volume copy space warning
has occurred.
986036 I A compressed-virtual volume copy repair has started.
986037 I A compressed-virtual volume copy repair is
successful.
986038 I A compressed-virtual volume copy has too many bad
blocks.
986201 I A medium error has been repaired for the mirrored
copy.
986203 W A mirror copy repair, using the validate option
cannot complete.
986204 I A mirror disk repair is complete and no differences
are found.
986205 I A mirror disk repair is complete and the differences
are resolved.

Chapter 7. Diagnosing problems 135


Table 47. Informational events (continued)
Notification
Event ID type Description
986206 W A mirror disk repair is complete and the differences
are marked as medium errors.
986207 I The mirror disk repair has been started.
986208 W A mirror copy repair, using the set medium error
option, cannot complete.
986209 W A mirror copy repair, using the resync option, cannot
complete.
987102 W Node coldstarted.
987103 W A node power-off has been requested from the power
switch.
987104 I Additional Fibre Channel ports were connected.
987301 W The connection to a configured remote cluster
(system) has been lost.
987400 W The node unexpectedly lost power but has now been
restored to the cluster (system).
988100 W An overnight maintenance procedure has failed to
complete. Resolve any hardware and configuration
problems that you are experiencing on the cluster
(system). If the problem persists, contact your IBM
service representative for assistance.
988300 W An array MDisk is offline because it has too many
missing members.
988301 I The rebuild for an array MDisk was started.
988302 I The rebuild for an array MDisk has finished.
988304 I A RAID array has started exchanging an array
member.
988305 I A RAID array has completed exchanging an array
member.
988306 I A RAID array needs resynchronization.
989001 W A managed disk group space warning has occurred.

Configuration event IDs


Configuration event IDs are generated when configuration parameters are set.

Configuration event IDs are recorded in a separate log. They do not raise
notification types or send emails. Their error fixed flags are ignored. Table 48
provides a list of the configuration event IDs and their meanings.
Table 48. Configuration event IDs
Event ID Description
990101 Modify cluster (system) (attributes in the chcluster command)
990102 The email test completed successfully
990103 The email test failed
990105 Delete node from cluster (system) (attributes in the rmnode
command)

136 SAN Volume Controller: Troubleshooting Guide


Table 48. Configuration event IDs (continued)
Event ID Description
990106 Create host (attributes in the mkhost command)
990112 Cluster (system) configuration dumped to file (attributes from the
svcluster -x dumpconfig command)
990117 Create cluster (system) (attributes in the mkcluster command)
990118 Modify node (attributes in the chnode command)
990119 Configure set controller name
990120 Shut down node (attributes in the stopcluster command)
990128 Modify host (attributes in the chhost command)
990129 Delete node (attributes in the rmnode command)
990138 Volume modify (attributes in the chvdisk command)
990140 Volume delete (attributes in the rmvdisk command)
990144 Modify storage pool (attributes in the chmdiskgrp command)
990145 Delete storage pool (attributes in the rmdiskgrp command)
990148 Create storage pool (attributes in the mkmdiskgrp command)
990149 Modify managed disk (attributes in the chmdisk command)
990150 Modify managed disk
990158 Managed disk included
990159 Quorum created
990160 Quorum destroy
990168 Modify the I/O group a volume is assigned to
990169 Create a new volume (attributes in the mkvdisk command)
990173 Add a managed disk to storage pool (attributes in the addmdisk
command)
990174 Delete a managed disk from storage pool (attributes in the
rmmdisk command)
990178 Add a port to a Host (attributes in the addhostport command)
990179 Delete a port from a host (attributes in the rmhostport command)
990182 Create a host mapping (attributes in the mkvdiskhostmap
command)
990183 Delete a host mapping (attributes in the rmdiskhostmap
command)
990184 Create a FlashCopy mapping (attributes in the mkfcmap command)
990185 Modify a FlashCopy mapping (attributes in the chfcmap
command)
990186 Delete a FlashCopy mapping (attributes in the rmfcmap command)
990187 Prepare a FlashCopy mapping (attributes in the prestartfcmap
command)
990188 Prepare a FlashCopy consistency group (attributes in the
prestartfcconsistgrp command)
990189 Trigger a FlashCopy mapping (attributes in the startfcmap
command)
990190 Trigger a FlashCopy consistency group (attributes in the
startfcconsistgrp command)

Chapter 7. Diagnosing problems 137


Table 48. Configuration event IDs (continued)
Event ID Description
990191 Stop a FlashCopy mapping (attributes in the stopfcmap
command)
990192 Stop a FlashCopy consistency group (attributes in the
stopfcconsistgrp command)
990193 FlashCopy set name
990194 Delete a list of ports from a Host (attributes in the rmhostport
command)
990196 Shrink a volume.
990197 Expand a volume (attributes in the expandvdisksize command)
990198 Volume expanded by a single extent.
990199 Modify the I/O governing rate for a volume
990203 Initiate manual managed disk discovery (attributes in the
detectmdisk command)
990204 Create FlashCopy consistency group (attributes in the
mkfcconsistgrp command)
990205 Modify FlashCopy consistency group (attributes in the
chfcconsistgrp command)
990206 Delete FlashCopy consistency group (attributes in the
rmfcconsistgrp command)
990207 Delete a list of hosts (attributes in the rmhost command)
990213 Change the I/O group a node belongs to (attributes in the
chiogrp command)
990216 Apply software upgrade (attributes in the satask
installsoftware command)
990219 Analyze event log (attributes in the finderr command)
990220 Dump event log (attributes in the satask snap command)
990222 Fix event log entry (attributes in the cherrstate command)
990223 Migrate a single extent (attributes in the migrateexts command)
990224 Migrate a number of extents
990225 Create a Metro Mirror or Global Mirror or Global Mirror
relationship (attributes in the mkrcrelationship command)
990226 Modify a Metro Mirror or Global Mirror relationship (attributes in
the chrcrelationship command)
990227 Delete a Metro Mirror or Global Mirror relationship (attributes in
the rmrcrelationship command)
990229 Start a Metro Mirror or Global Mirror relationship (attributes in
the startrcrelationship command)
990230 Stop a Metro Mirror or Global Mirror relationship (attributes in
the stoprcrelationship command)
990231 Switch a Metro Mirror or Global Mirror relationship (attributes in
the switchrcrelationship command)
990232 Start a Metro Mirror or Global Mirror consistency group
(attributes in the startrcconsistgrp command)
990233 Stop a Metro Mirror or Global Mirror consistency group
(attributes in the stoprcconsistgrp command)

138 SAN Volume Controller: Troubleshooting Guide


Table 48. Configuration event IDs (continued)
Event ID Description
990234 Switch a Metro Mirror or Global Mirror consistency group
(attributes in the switchrcconsistgrp command)
990235 Managed disk migrated to a storage pool
990236 Volume migrated to a new managed disk
990237 Create partnership with remote cluster (system) (attributes in the
mkpartnership command)
990238 Modify partnership with remote cluster (system) (attributes in the
chpartnership command)
990239 Delete partnership with remote cluster (system) (attributes in the
rmpartnership command)
990240 Create a Metro Mirror or Global Mirror consistency group
(attributes in the mkrcconsistgrp command)
990241 Modify a Metro Mirror or Global Mirror consistency group
(attributes in the chrcconsistgrp command)
990242 Delete a Metro Mirror or Global Mirror consistency group
(attributes in the rmrcconsistgrp command)
990245 Node shutdown imminent
990246 Node remove
990247 Node unpend
990380 Time zone changed (attributes in the settimezone command)
990383 Change cluster (system) time (attributes in the setclustertime
command)
990385 System time changed
990386 SSH key added (attributes in the addsshkey command)
990387 SSH key removed (attributes in the rmsshkey command)
990388 All SSH keys removed (attributes in the rmallsshkeys command)
990390 Add node to the cluster (system)
990395 Shutdown or reset node
990410 The software installation has started.
990415 The software installation has completed.
990420 The software installation has failed.
990423 The software installation has stalled.
990425 The software installation has stopped.
990430 The Planar Serial Number has changed.
990501 The licensed feature has changed. See the license settings log for
details.
990510 The configuration limits have been changed.
991024 I/O tracing has finished and the managed disk has been
triggered.
991025 The autoexpand setting of the volume has been modified.
991026 The primary copy of the volume has been modified.
991027 The volume synchronization rate has been modified.

Chapter 7. Diagnosing problems 139


Table 48. Configuration event IDs (continued)
Event ID Description
991028 The thin-provisioned volume warning capacity has been
modified.
991029 A mirrored copy has been added to a volume.
991030 A repair of mirrored volume copies has started.
991031 A volume copy has been split from a mirrored volume.
991032 A volume copy has been removed from a mirrored volume.

SCSI event reporting


Nodes can notify their hosts of events for SCSI commands that are issued.

SCSI status

Some events are part of the SCSI architecture and are handled by the host
application or device drivers without reporting an event. Some events, such as
read and write I/O events and events that are associated with the loss of nodes or
loss of access to backend devices, cause application I/O to fail. To help
troubleshoot these events, SCSI commands are returned with the Check Condition
status and a 32-bit event identifier is included with the sense information. The
identifier relates to a specific event in the event log.

If the host application or device driver captures and stores this information, you
can relate the application failure to the event log.

Table 49 describes the SCSI status and codes that are returned by the nodes.
Table 49. SCSI status
Status Code Description
Good 00h The command was successful.
Check condition 02h The command failed and sense data is available.
Condition met 04h N/A
Busy 08h An Auto-Contingent Allegiance condition exists
and the command specified NACA=0.
Intermediate 10h N/A
Intermediate - condition 14h N/A
met
Reservation conflict 18h Returned as specified in SPC2 and SAM-2 where
a reserve or persistent reserve condition exists.
Task set full 28h The initiator has at least one task queued for that
LUN on this port.
ACA active 30h This code is reported as specified in SAM-2.
Task aborted 40h This code is returned if TAS is set in the control
mode page 0Ch. The node has a default setting of
TAS=0, which cannot be changed; therefore, the
node does not report this status.

140 SAN Volume Controller: Troubleshooting Guide


SCSI Sense

Nodes notify the hosts of events on SCSI commands. Table 50 defines the SCSI
sense keys, codes and qualifiers that are returned by the nodes.
Table 50. SCSI sense keys, codes, and qualifiers
Key Code Qualifier Definition Description
2h 04h 01h Not Ready. The logical The node lost sight of the system
unit is in the process of and cannot perform I/O
becoming ready. operations. The additional sense
does not have additional
information.
2h 04h 0Ch Not Ready. The target port The following conditions are
is in the state of possible:
unavailable. v The node lost sight of the
system and cannot perform
I/O operations. The additional
sense does not have additional
information.
v The node is in contact with
the system but cannot perform
I/O operations to the
specified logical unit because
of either a loss of connectivity
to the backend controller or
some algorithmic problem.
This sense is returned for
offline volumes.
3h 00h 00h Medium event This is only returned for read or
write I/Os. The I/O suffered an
event at a specific LBA within its
scope. The location of the event
is reported within the sense
data. The additional sense also
includes a reason code that
relates the event to the
corresponding event log entry.
For example, a RAID controller
event or a migrated medium
event.
4h 08h 00h Hardware event. A The I/O suffered an event that is
command to logical unit associated with an I/O event
communication failure has that is returned by a RAID
occurred. controller. The additional sense
includes a reason code that
points to the sense data that is
returned by the controller. This
is only returned for I/O type
commands. This event is also
returned from FlashCopy target
volumes in the prepared and
preparing state.
5h 25h 00h Illegal request. The logical The logical unit does not exist or
unit is not supported. is not mapped to the sender of
the command.

Chapter 7. Diagnosing problems 141


Reason codes

The reason code appears in bytes 20-23 of the sense data. The reason code provides
the node with a specific log entry. The field is a 32-bit unsigned number that is
presented with the most significant byte first. Table 51 lists the reason codes and
their definitions.

If the reason code is not listed in Table 51, the code refers to a specific event in the
event log that corresponds to the sequence number of the relevant event log entry.
Table 51. Reason codes
Reason code
(decimal) Description
40 The resource is part of a stopped FlashCopy mapping.
50 The resource is part of a Metro Mirror or Global Mirror relationship
and the secondary LUN in the offline.
51 The resource is part of a Metro Mirror or Global Mirror and the
secondary LUN is read only.
60 The node is offline.
71 The resource is not bound to any domain.
72 The resource is bound to a domain that has been recreated.
73 Running on a node that has been contracted out for some reason
that is not attributable to any path going offline.
80 Wait for the repair to complete, or delete the volume.
81 Wait for the validation to complete, or delete the volume.
82 An offline thin-provisioned volume has caused data to be pinned in
the directory cache. Adequate performance cannot be achieved for
other thin-provisioned volumes, so they have been taken offline.
85 The volume has been taken offline because checkpointing to the
quorum disk failed.
86 The repairvdiskcopy -medium command has created a virtual
medium error where the copies differed.

Object types
You can use the object code to determine the object type.

Table 52 lists the object codes and corresponding object types.


Table 52. Object types
Object code Object type
1 mdisk
2 mdiskgrp
3 vdisk
4 node
5 host
7 iogroup
8 fcgrp
9 rcgrp

142 SAN Volume Controller: Troubleshooting Guide


Table 52. Object types (continued)
Object code Object type
10 fcmap
11 rcmap
12 wwpn
13 cluster (system)
16 device
17 SCSI lun
18 quorum
34 Fibre Channel adapter
38 VDisk copy
39 Syslog server
40 SNMP server
41 Email server
42 User group
44 Cluster (management) IP
46 SAS adapter
Fibre Channel adapter
SAS adapter
Ethernet adapter
Bus adapter

Error event IDs and error codes


Error codes describe a service procedure that must be followed. Each event ID that
requires service has an associated error code.

Table 53 lists the event IDs and corresponding error codes.


Table 53. Error event IDs and error codes
Event Notification Error
ID type Condition code
009020 E An automatic system recovery has started. All 1001
configuration commands are blocked.
009040 E The error event log is full. 1002
009052 W The following causes are possible: 1196
v The node is missing.
v The node is no longer a functional member of the
system.
009053 E A node has been missing for 30 minutes. 1195
009100 W The software install process has failed. 2010
009101 W The software upgrade package delivery has failed. 2010
009150 W Unable to connect to the SMTP (email) server 2600
009151 W Unable to send mail through the SMTP (email) server 2601
009170 W The Metro Mirror or Global Mirror feature capacity is 3030
not set.

Chapter 7. Diagnosing problems 143


Table 53. Error event IDs and error codes (continued)
Event Notification Error
ID type Condition code
009171 W The FlashCopy feature capacity is not set. 3031
009172 W The Virtualization feature has exceeded the amount that 3032
is licensed.
009173 W The FlashCopy feature has exceeded the amount that is 3032
licensed.
009174 W The Metro Mirror or Global Mirror feature has exceeded 3032
the amount that is licensed.
009175 W The usage for the thin-provisioned volume is not 3033
licensed.
009176 W The value set for the virtualization feature capacity is not 3029
valid.
009177 E A physical disk FlashCopy feature license is required. 3035
009178 E A physical disk Metro Mirror and Global Mirror feature 3036
license is required.
009179 E A virtualization feature license is required. 3025
009180 E Automatic recovery of offline node failed. 1194
009181 W Unable to send email to any of the configured email 3081
servers.
009182 W The external virtualization feature license limit was 3032
exceeded.
009183 W Unable to connect to LDAP server. 2251
009184 W The LDAP configuration is not valid. 2250
009185 E The limit for the compression feature license was 3032
exceeded.
009186 E The limit for the compression feature license was 3032
exceeded.
010002 E The node ran out of base event sources. As a result, the 2030
node has stopped and exited the system.
010003 W The number of device logins has reduced. 1630
010006 E A software error has occurred. 2030
010008 E The block size is invalid, the capacity or LUN identity 1660
has changed during the managed disk initialization.
010010 E The managed disk is excluded because of excessive 1310
errors.
010011 E The remote port is excluded for a managed disk and 1220
node.
010012 E The local port is excluded. 1210
010013 E The login is excluded. 1230
010017 E A timeout has occurred as a result of excessive 1340
processing time.
010018 E An error recovery procedure has occurred. 1370
010019 E A managed disk I/O error has occurred. 1310
010020 E The managed disk error count threshold has exceeded. 1310

144 SAN Volume Controller: Troubleshooting Guide


Table 53. Error event IDs and error codes (continued)
Event Notification Error
ID type Condition code
010021 W There are too many devices presented to the cluster 1200
(system).
010022 W There are too many managed disks presented to the 1200
cluster (system).
010023 W There are too many LUNs presented to a node. 1200
010024 W There are too many drives presented to a cluster 1200
(system).
010025 W A disk I/O medium error has occurred. 1320
010026 W A suitable MDisk or drive for use as a quorum disk was 1330
not found.
010027 W The quorum disk is not available. 1335
010028 W A controller configuration is not supported. 1625
010029 E A login transport fault has occurred. 1360
010030 E A managed disk error recovery procedure (ERP) has 1370
occurred. The node or controller reported the following:
v Sense
v Key
v Code
v Qualifier
010031 E One or more MDisks on a controller are degraded. 1623
010032 W The controller configuration limits failover. 1625
010033 E The controller configuration uses the RDAC mode; this is 1624
not supported.
010034 E Persistent unsupported controller configuration. 1695
010040 E The controller system device is only connected to the 1627
node through a single initiator port.
010041 E The controller system device is only connected to the 1627
node through a single target port.
010042 E The controller system device is only connected to the 1627
cluster (system) nodes through a single target port.
010043 E The controller system device is only connected to the 1627
cluster (system) nodes through half of the expected
target ports.
010044 E The controller system device has disconnected all target 1627
ports to the cluster (system) nodes.
010055 W An unrecognized SAS device. 1665
010056 E SAS error counts exceeded the warning thresholds. 1216
010057 E SAS errors exceeded critical thresholds. 1216
010066 W Controller indicates that it does not support descriptor 1625
sense for LUNs that are greater than 2 TBs.
010067 W Too many enclosures were presented to a cluster 1200
(system).
010070 W Too many controller target ports were presented to the 1200
cluster (system).

Chapter 7. Diagnosing problems 145


Table 53. Error event IDs and error codes (continued)
Event Notification Error
ID type Condition code
010071 W Too many target ports were presented to the cluster 1200
(system) from a single controller.
010098 W There are too many drives presented to a cluster 1200
(system).
020001 E There are too many medium errors on the managed disk. 1610
020002 E A managed disk group is offline. 1620
020003 W There are insufficient virtual extents. 2030
029001 W The managed disk has bad blocks. 1840
029002 E The system failed to create a bad block because MDisk 1226
already has the maximum number of allowed bad
blocks.
029003 E The system failed to create a bad block because the 1225
clustered system already has the maximum number of
allowed bad blocks.
030000 W The trigger prepare command has failed because of a 1900
cache flush failure.
030010 W The mapping is stopped because of the error that is 1910
indicated in the data.
030020 W The mapping is stopped because of a clustered system or 1895
complete I/O group failure, and the current state of the
relationship could not be recovered.
050001 W The relationship is stopped because of a clustered system 1700
or complete I/O group failure, and the current state of
the mapping could not be recovered.
050002 W A Metro Mirror or Global Mirror relationship or 3080
consistency group exists within a clustered system, but
its partnership has been deleted.
050010 W A Global Mirror relationship has stopped because of a 1920
persistent I/O error.
050011 W A remote copy has stopped because of a persistent I/O 1915
error.
050020 W Remote copy has stopped. 1720
050030 W There are too many cluster (system) partnerships. The 1710
number of partnerships has been reduced.
050031 W There are too many cluster (system) partnerships. The 1710
system has been excluded.
050040 W Background copy process for the Remote Copy was 1960
blocked.
060001 W The thin-provisioned volume copy is offline because 1865
there is insufficient space.
060002 W The thin-provisioned volume copy is offline because the 1862
metadata is corrupt.
060003 W The thin-provisioned volume copy is offline because the 1860
repair has failed.
060004 W The compressed volume copy is offline because there is 1865
insufficient space.

146 SAN Volume Controller: Troubleshooting Guide


Table 53. Error event IDs and error codes (continued)
Event Notification Error
ID type Condition code
060005 W The compressed volume copy is offline because the 1862
metadata is corrupt.
060006 W The compressed volume copy is offline because the 1860
repair has failed.
060007 W The compressed volume copy has bad blocks. 1850
062001 W Unable to mirror medium error during volume copy 1950
synchronization
062002 W The mirrored volume is offline because the data cannot 1870
be synchronized.
062003 W The repair process for the mirrored disk has stopped 1600
because there is a difference between the copies.
070000 E Unrecognized node error. 1083
070510 E Detected memory size does not match the expected 1022
memory size.
070517 E The WWNN that is stored on the service controller and 1192
the WWNN that is stored on the drive do not match.
070521 E Unable to detect any Fibre Channel adapter. 1016
070522 E The system board processor has failed. 1020
070523 W The internal disk file system of the node is damaged. 1187
070524 E Unable to update BIOS settings. 1027
070525 E Unable to update the service processor firmware for the 1020
system board.
070528 W The ambient temperature is too high while the system is 1182
starting.
070550 E Cannot form cluster (system) due to lack of resources. 1192
070556 E Duplicate WWNN detected on the SAN. 1192
070558 E A node is unable to communicate with other nodes. 1192
070562 E The node hardware does not meet minimum 1183
requirements.
070564 E Too many software failures. 1188
070565 E The internal drive of the node is failing. 1030
070574 E The node software is damaged. 1187
070576 E The cluster (system) data cannot be read. 1030
070578 E The cluster (system) data was not saved when power 1194
was lost.
070580 E Unable to read the service controller ID. 1044
070581 E 2145 UPS-1U serial link error. 1181
070582 E 2145 UPS-1U battery error. 1181
070583 E 2145 UPS-1U electronics error. 1171
070584 E 2145 UPS-1U overloaded. 1166
070585 E 2145 UPS-1U failure 1171
070586 E Power supply to 2145 UPS-1U does not meet 1141
requirements.

Chapter 7. Diagnosing problems 147


Table 53. Error event IDs and error codes (continued)
Event Notification Error
ID type Condition code
070587 E Incorrect type of uninterruptible power supply detected. 1152
070588 E 2145 UPS-1U is not cabled correctly. 1151
070589 E The ambient temperature limit for the 2145 UPS-1U was 1136
exceeded.
070590 E Repeated node restarts because of 2145 UPS-1U errors. 1186
070670 W Insufficient uninterruptible power supply charge to allow 1193
node to start.
070690 W Node held in service state. 1189
070710 E High-speed SAS adapter is missing. This error applies to 1120
only the SAN Volume Controller 2145-CG8 model.
070720 E Ethernet adapter is missing. This error applies to only 1072
the SAN Volume Controller 2145-CG8 model.
070840 W Detected hardware is not a valid configuration. 1198
070841 W Detected hardware needs activation. 1199
072004 E A CMOS battery failure has occurred. This error applies 1670
to the SAN Volume Controller 2145-8F2 and the SAN
Volume Controller 2145-8F4 models.
072005 E A CMOS battery failure has occurred. This error applies 1670
to only the SAN Volume Controller 2145-8G4 model.
072006 E A CMOS battery failure has occurred. This error applies 1670
to only the SAN Volume Controller 2145-8A4 model.
072007 E A CMOS battery failure has occurred. This error applies 1670
to the SAN Volume Controller 2145-CF8 and the SAN
Volume Controller 2145-CG8 models.
073003 E The Fibre Channel ports are not operational. 1060
073005 E Cluster (system) path failure. 1550
073006 W The SAN is not correctly zoned. As a result, more than 1800
512 ports on the SAN have logged into one SAN Volume
Controller port.
073101 E The 2-port Fibre Channel adapter card in slot 1 is 1014
missing. This error applies to only the SAN Volume
Controller 2145-8F2 model.
073102 E The 2-port Fibre Channel adapter in slot 1 has failed. 1054
This error applies to only the SAN Volume Controller
2145-8F2 model.
073104 E The 2-port Fibre Channel adapter in slot 1 has detected a 1017
PCI bus error. This error applies to only the SAN Volume
Controller 2145-8F2 model.
073201 E The 2-port Fibre Channel adapter in slot 2 is missing. 1015
This error applies to only the SAN Volume Controller
2145-8F2 model.
073202 E The 2-port Fibre Channel adapter in slot 2 has failed. 1056
This error applies to only the SAN Volume Controller
2145-8F2 model.
073204 E The 2-port Fibre Channel adapter in slot 2 has detected a 1018
PCI bus error. This error applies to only the SAN Volume
Controller 2145-8F2 model.

148 SAN Volume Controller: Troubleshooting Guide


Table 53. Error event IDs and error codes (continued)
Event Notification Error
ID type Condition code
073251 E The 4-port Fibre Channel adapter in slot 1 is missing. 1011
This error applies to only the SAN Volume Controller
2145-8G4 model.
073252 E The 4-port Fibre Channel adapter in slot 1 has failed. 1055
This error applies to only the SAN Volume Controller
2145-8G4 model.
073258 E The 4-port Fibre Channel adapter in slot 1 has detected a 1013
PCI bus error. This error applies to only the SAN Volume
Controller 2145-8G4 model.
073261 E The 4-port Fibre Channel adapter in slot 1 has detected a 1011
PCI bus error. This error applies to only the SAN Volume
Controller 2145-8A4 model.
073262 E The 4-port Fibre Channel adapter in slot 1 has detected a 1055
PCI bus error. This error applies to only the SAN Volume
Controller 2145-8A4 model.
073268 E The 4-port Fibre Channel adapter in slot 1 has detected a 1013
PCI bus error. This error applies to only the SAN Volume
Controller 2145-8A4 model.
073271 E The 4-port Fibre Channel adapter in slot 1 has detected a 1011
PCI bus error. This error applies to the SAN Volume
Controller 2145-CF8 and the SAN Volume Controller
2145-CG8 models.
073272 E The 4-port Fibre Channel adapter in slot 1 has detected a 1055
PCI bus error. This error applies to the SAN Volume
Controller 2145-CF8 and the SAN Volume Controller
2145-CG8 models.
073278 E The 4-port Fibre Channel adapter in slot 1 has detected a 1013
PCI bus error. This error applies to the SAN Volume
Controller 2145-CF8 and the SAN Volume Controller
2145-CG8 models.
073301 E The 4-port Fibre Channel adapter in slot 2 is missing. 1016
This error applies to only the SAN Volume Controller
2145-8F4 model.
073302 E The 4-port Fibre Channel adapter in slot 2 has failed. 1057
This error applies to only the SAN Volume Controller
2145-8F4 model.
073304 E The 4-port Fibre Channel adapter in slot 2 has detected a 1019
PCI bus error. This error applies to only the SAN Volume
Controller 2145-8F4 model.
073305 W One or more Fibre Channel ports are running at a speed 1065
that is lower than the last saved speed.
073310 E A duplicate Fibre Channel frame has been detected, 1203
which indicates that there is an issue with the Fibre
Channel fabric. Other Fibre Channel errors might also be
generated.
074001 W Unable to determine the vital product data (VPD) for an 2040
FRU. This is probably because a new FRU has been
installed and the software does not recognize that FRU.
The cluster (system) continues to operate; however, you
must upgrade the software to fix this warning.

Chapter 7. Diagnosing problems 149


Table 53. Error event IDs and error codes (continued)
Event Notification Error
ID type Condition code
074002 E The node warm started after a software error. 2030
074003 W A connection to a configured remote system has been 1715
lost because of a connectivity problem.
074004 W A connection to a configured remote system has been 1716
lost because of too many minor errors.
075001 E The flash boot device has failed. This error applies to the 1040
SAN Volume Controller 2145-8F2 and the SAN Volume
Controller 2145-8F4 models.
075002 E The flash boot device has recovered. This error applies to 1040
the SAN Volume Controller 2145-8F2 and the SAN
Volume Controller 2145-8F4 models.
075005 E A service controller read failure has occurred. This error 1044
applies to the SAN Volume Controller 2145-8F2 and the
SAN Volume Controller 2145-8F4 models.
075011 E The flash boot device has failed. This error applies to 1040
only the SAN Volume Controller 2145-8G4 model.
075012 E The flash boot device has recovered. This error applies to 1040
only the SAN Volume Controller 2145-8G4 model.
075015 E A service controller read failure has occurred. This error 1044
applies to only the SAN Volume Controller 2145-8G4
model.
075021 E The flash boot device has failed. This error applies to 1040
only the SAN Volume Controller 2145-8A4 model.
075022 E The flash boot device has recovered. This error applies to 1040
only the SAN Volume Controller 2145-8A4 model.
075025 E A service controller read failure has occurred. This error 1044
applies to only the SAN Volume Controller 2145-8A4
model.
075031 E The flash boot device has failed. This error applies to the 1040
SAN Volume Controller 2145-CF8 and the SAN Volume
Controller 2145-CG8 models.
075032 E The flash boot device has recovered. This error applies to 1040
the SAN Volume Controller 2145-CF8 and the SAN
Volume Controller 2145-CG8 models.
075035 E A service controller read failure has occurred. This error 1044
applies to only the SAN Volume Controller 2145-CF8 and
the SAN Volume Controller 2145-CG8 models.
076001 E The internal disk for a node has failed. 1030
076002 E The hard disk is full and cannot capture any more 2030
output.
076401 E One of the two power supply units in the node has 1096
failed.
076402 E One of the two power supply units in the node cannot 1096
be detected.
076403 E One of the two power supply units in the node is 1097
without power.

150 SAN Volume Controller: Troubleshooting Guide


Table 53. Error event IDs and error codes (continued)
Event Notification Error
ID type Condition code
076501 E A high-speed SAS adapter is missing. This error applies 1120
to only the SAN Volume Controller 2145-CF8 model.
076502 E Degraded PCIe lanes on a high-speed SAS adapter. 1121
076503 E A PCI bus error occurred on a high-speed SAS adapter. 1121
076504 E A high-speed SAS adapter requires a PCI bus reset. 1122
076505 E Vital product data (VPD) is corrupt on high-speed SAS 1121
adapter.
077101 E The service processor shows a fan 40×40×28 failure. This 1090
error applies to both the SAN Volume Controller
2145-8F2 and the SAN Volume Controller 2145-8F4
models.
077102 E The service processor shows a fan 40×40×56 failure. This 1091
error applies to both the SAN Volume Controller
2145-8F2 and the SAN Volume Controller 2145-8F4
models.
077105 E The service processor shows a fan failure. This error 1089
applies to only the SAN Volume Controller 2145-8G4
model.
077106 E The service processor shows a fan failure. This error 1089
applies to only the SAN Volume Controller 2145-8A4
model.
077107 E The service processor shows a fan failure. This error 1089
applies to the SAN Volume Controller 2145-CF8 and the
SAN Volume Controller 2145-CG8 models.
077111 E The node ambient temperature threshold has exceeded. 1094
This error applies to both the SAN Volume Controller
2145-8F2 and the SAN Volume Controller 2145-8F4
models.
077112 E The node processor warning temperature threshold has 1093
exceeded. This error applies to both the SAN Volume
Controller 2145-8F2 and the SAN Volume Controller
2145-8F4 models.
077113 E The node processor or ambient critical threshold has 1092
exceeded. This error applies to both the SAN Volume
Controller 2145-8F2 and the SAN Volume Controller
2145-8F4 models.
077121 E System board - any voltage high. This error applies to 1100
both the SAN Volume Controller 2145-8F2 and the SAN
Volume Controller 2145-8F4 models.
077124 E System board - any voltage low. This error applies to 1105
both the SAN Volume Controller 2145-8F2 and the SAN
Volume Controller 2145-8F4 models.
077128 E A power management board voltage failure has 1110
occurred. This error applies to both the SAN Volume
Controller 2145-8F2 and the SAN Volume Controller
2145-8F4 models.
077161 E The node ambient temperature threshold has exceeded. 1094
This error applies to only the SAN Volume Controller
2145-8G4 model.

Chapter 7. Diagnosing problems 151


Table 53. Error event IDs and error codes (continued)
Event Notification Error
ID type Condition code
077162 E The node processor warning temperature threshold has 1093
exceeded. This error applies to only the SAN Volume
Controller 2145-8G4 model.
077163 E The node processor or ambient critical threshold has 1092
exceeded. This error applies to only the SAN Volume
Controller 2145-8G4 model.
077165 E The node ambient temperature threshold has exceeded. 1094
This error applies to only the SAN Volume Controller
2145-8A4 model.
077166 E The node processor warning temperature threshold has 1093
exceeded. This error applies to only the SAN Volume
Controller 2145-8A4 model.
077167 E The node processor or ambient critical threshold has 1092
exceeded. This error applies to only the SAN Volume
Controller 2145-8A4 model.
077171 E System board - any voltage high. This error applies to 1101
only the SAN Volume Controller 2145-8G4 model.
077172 E System board - any voltage high. This error applies to 1101
only the SAN Volume Controller 2145-8A4 model.
077173 E System board - any voltage high. This error applies to 1101
the SAN Volume Controller 2145-CF8 and the SAN
Volume Controller 2145-CG8 models.
077174 E System board - any voltage low. This error applies to 1106
only the SAN Volume Controller 2145-8G4 model.
077175 E System board - any voltage low. This error applies to 1106
only the SAN Volume Controller 2145-8A4 model.
077176 E System board - any voltage low. This error applies to 1106
only the SAN Volume Controller 2145-CF8 model.
077178 E A power management board voltage failure has 1110
occurred. This error applies to only the SAN Volume
Controller 2145-8G4 model.
077185 E The node ambient temperature threshold has exceeded. 1094
This error applies to the SAN Volume Controller
2145-CF8 and the SAN Volume Controller 2145-CG8
models.
077186 E The node processor warning temperature threshold has 1093
exceeded. This error applies to the SAN Volume
Controller 2145-CF8 and the SAN Volume Controller
2145-CG8 models.
077187 E The node processor or ambient critical threshold has 1092
exceeded. This error applies to the SAN Volume
Controller 2145-CF8 and the SAN Volume Controller
2145-CG8 models.
077188 E A power management board voltage failure has 1110
occurred. This error applies to the SAN Volume
Controller 2145-CF8 and the SAN Volume Controller
2145-CG8 models.

152 SAN Volume Controller: Troubleshooting Guide


Table 53. Error event IDs and error codes (continued)
Event Notification Error
ID type Condition code
078001 E A power domain error has occurred. Both nodes in a 1155
pair are powered by the same uninterruptible power
supply.
079500 W The limit on the number of cluster (system) secure shell 2500
(SSH) sessions has been reached.
079501 I Unable to access the Network Time Protocol (NTP) 2700
network time server.
081001 E An Ethernet port failure has occurred. 1400
082001 E A server error has occurred. 2100
083101 E An uninterruptible power supply communications failure 1146
has occurred. The RS232 connection between a node and
its uninterruptible power supply is faulty. This error
applies to only the 2145 UPS-1U model.
083102 E The uninterruptible power supply output is 1166
unexpectedly high. The uninterruptible power supply is
probably connected to a non-SAN Volume Controller
load. This error applies to only the 2145 UPS-1U model.
083103 E The uninterruptible power supply battery has reached 1191
end of life. This error applies to only the 2145 UPS-1U
model.
083104 E An uninterruptible power supply battery failure has 1181
occurred. This error applies to only the 2145 UPS-1U
model.
083105 E An uninterruptible power supply electronics failure has 1171
occurred. This error applies to only the 2145 UPS-1U
model.
083107 E Uninterruptible power supply overcurrent. This error 1161
applies to only the 2145 UPS-1U model.
083108 E An uninterruptible power supply failure has occurred. 1186
This error applies to only the 2145 UPS-1U model.
083109 E Uninterruptible power supply ac input power fault. This 1141
error applies to only the 2145 UPS-1U model.
083110 E An uninterruptible power supply configuration error has 1151
occurred. This error applies to only the 2145 UPS-1U
model.
083111 E Uninterruptible power supply ambient over temperature. 1136
This error applies to only the 2145 UPS-1U model.
083112 E Uninterruptible power supply over temperature warning. 3001
This error applies to only the 2145 UPS-1U model.
083113 E An uninterruptible power supply software error has 3011
occurred. This error applies to only the 2145 UPS-1U
model.
084000 W An array MDisk has deconfigured members and has lost 1689
redundancy.
084100 W An array MDisk is corrupt because of lost metadata. 1240
084200 W An array MDisk has taken a spare member that is not an 1692
exact match to the array goals.

Chapter 7. Diagnosing problems 153


Table 53. Error event IDs and error codes (continued)
Event Notification Error
ID type Condition code
084201 W An array has members that are located in a different I/O 1688
group.
084300 W An array MDisk is no longer protected by an appropriate 1690
number of suitable spares.
084500 W An array MDisk is offline. The metadata for the inflight 1243
writes is on a missing node.
084600 W An array MDisk is offline. Metadata on the missing node 1243
contains needed state information.

Determining a hardware boot failure


During the hardware boot, you see progress messages. If the boot detects a
situation where it cannot continue, it fails. The cause might be that the software on
the hard disk drive is missing or damaged. If possible, the boot sequence loads
and starts the SAN Volume Controller software. Any faults that are detected are
reported as a node error.

Before you begin

Line 1 of the front panel displays the message Booting that is followed by the boot
code. Line 2 of the display shows a boot progress indicator. If the boot code detects
an error that makes it impossible to continue, Failed is displayed. You can use the
code to isolate the fault.

The following figure shows an example of a hardware boot display.

Failed 120

Figure 66. Example of a boot error code

About this task

Perform the following steps to determine a boot failure:

Procedure
1. Attempt to restore the software by using the node rescue procedure.
2. If node rescue fails, perform the actions that are described for any failing node
rescue code or procedure.

Boot code reference


Boot codes are displayed on the screen when a node is booting.

The codes indicate the progress of the boot operation. Line 1 of the front panel
displays the message Booting that is followed by the boot code. Line 2 of the
display shows a boot progress indicator. Figure 67 on page 155 provides a view of
the boot progress display.

154 SAN Volume Controller: Troubleshooting Guide


Booting 130

Figure 67. Example of a boot progress display

Node error code overview


Node error codes describe failure that relate to a specific node. Node rescue codes
are displayed on the menu screen during node rescue.

Because node errors are specific to a node, for example, memory has failed, the
errors are only reported on that node.

Each code indicates that a critical error was detected that prevents the node from
becoming a member of a clustered system. Line 1 of the menu screen contains the
message Node Error.

Line 2 contains either the error code or the error code and additional data. In
errors that involve a node with more than one power supply, the error code is
followed by two numbers. The first number indicates the power supply that has a
problem (either a 1 or a 2). The second number indicates the problem that has been
detected.

Figure 68 provides an example of a node error code. This data might exceed the
maximum width of the menu screen. You can press the Right navigation to scroll
the display.

Figure 68. Example of a displayed node error code

The additional data is unique for any error code. It provides necessary information
that enables you to isolate the problem in an offline environment. Examples of
additional data are disk serial numbers and field replaceable unit (FRU) location
codes. When these codes are displayed, you can do additional fault isolation by
navigating the default menu to determine the node and Fibre Channel port status.

There are two types of node errors: critical node errors and noncritical node errors.

Critical errors

A critical error means that the node is not able to participate in a clustered system
until the issue that is preventing it from joining a clustered system is resolved. This
error occurs because part of the hardware has failed or the system detects that the
software is corrupt. If a node has a critical node error, it is in service state, and the
fault LED on the node is on. The exception is when the node cannot connect to
enough resources to form a clustered system. It shows a critical node error but is
in the starting state. Resolve the errors in priority order. The range of errors that
are reserved for critical errors are 500 - 699.

Chapter 7. Diagnosing problems 155


Noncritical errors

A noncritical error code is logged when there is a hardware or software failure that
is related to just one specific node. These errors do not stop the node from entering
active state and joining a clustered system. If the node is part of a clustered
system, there is also an alert that describes the error condition. The range of errors
that are reserved for noncritical errors are 800 - 899.

Node rescue codes

To start node rescue, press and hold the left and right buttons on the front panel
during a power-on cycle. The menu screen displays the Node rescue request. See
the node rescue request topic. The hard disk is formatted and, if the format
completes without error, the software image is downloaded from any available
node. During node recovery, Line 1 of the menu screen displays the message
Booting followed by one of the node rescue codes. Line 2 of the menu screen
displays a boot progress indicator. Figure 69 shows an example of a displayed
node rescue code.

Booting 300

Figure 69. Example of a node-rescue error code

The three-digit code that is shown in Figure 69 represents a node rescue code.

Note: The 2145 UPS-1U will not power off following a node rescue failure.

Clustered-system code overview


The error codes for creating a clustered system are displayed on the menu screen
when you are using the front panel to create a new system, but the create
operation fails. Recovery codes for clustered systems indicate that a critical
software error has occurred that might corrupt your system. Error codes for
clustered systems describe errors other than creation and recovery errors. Each
error-code topic includes an error code number, a description, action, and possible
field-replaceable units (FRUs).

Error codes for creating a clustered system

Figure 70 provides an example of a create error code.

Figure 70. Example of a create error code for a clustered system

Line 1 of the menu screen contains the message Create Failed. Line 2 shows the
error code and, where necessary, additional data.

Error codes for recovering a clustered system

156 SAN Volume Controller: Troubleshooting Guide


100

You must perform software problem analysis before you can perform further
operations to avoid the possibility of corrupting your configuration.

Figure 71 provides an example of a recovery error code.

Figure 71. Example of a recovery error code

Error codes for clustered systems

Error codes for clustered systems describe errors other than recovery errors.

Figure 72 provides an example of a clustered-system error code.


svc00433

Figure 72. Example of an error code for a clustered system

Error code range


This topic shows the number range for each message classification.

Table 54 lists the number range for each message classification.


Table 54. Message classification number range
Message classification Range
Booting codes 100-299
Node errors Node rescue errors 300-399
Log-only node errors 400-499
Critical node errors 500-699
Noncritical node errors 800-899
Error codes when creating 700, 710
a clustered system
Error codes when 920, 990
recovering a clustered
system
Error codes for a clustered 1001-3081
system

Booting codes
Possible Cause-FRUs or other:
100 Boot is running
2145-CG8 or 2145-CF8
Explanation: The SAN Volume Controller node has
started. It is running diagnostics and loading the v Service controller (47%)
runtime code. v Service controller cable (47%)
User response: Go to the hardware boot MAP to v System board assembly (6%)
resolve the problem.

Chapter 7. Diagnosing problems 157


120 • 160

2145-8G4 or 2145-8A4 codes 100 and 132 three times or more, go to MAP
v Service controller (95%) 5900: Hardware boot to resolve the problem.
v System board (5%)
135 Verifying the software
2145-8F2 or 2145-8F4
Explanation: The software packages of the node are
v Service controller (95%) being checked for integrity.
v Frame assembly (5%)
User response: Allow the verification process to
complete.
120 Disk drive hardware error
Explanation: The internal disk drive of the node has 137 Updating system board service processor
reported an error. The node is unable to start. firmware

User response: Ensure that the boot disk drive and all Explanation: The service processor firmware of the
related cabling is properly connected, then exchange node is being updated to a new level. This process can
the FRU for a new FRU. (See “Possible Cause-FRUs or take 90 minutes. Do not restart the node while this is in
other.”) progress.

Possible Cause-FRUs or other: User response: Allow the updating process to


complete.
2145-CF8 or 2145-CG8
v Disk drive (50%)
150 Loading cluster code
v Disk controller (30%)
v Disk backplane (10%) Explanation: The SAN Volume Controller code is
being loaded.
v Disk signal cable (8%)
v Disk power cable (1%) User response: If the progress bar has been stopped
for at least 90 seconds, power off the node and then
v System board (1%) power on the node. If the boot process stops again at
this point, run the node rescue procedure.
2145-8G4 or 2145-8A4
Possible Cause-FRUs or other:
v Disk drive assembly (95%)
v None.
v Disk cable assembly (4%)
v System board (1%)
155 Loading cluster data
2145-8F2 or 2145-8F4 Explanation: The saved cluster state and cache data is
v Disk drive assembly (98%) being loaded.
v Frame assembly (2%) User response: If the progress bar has been stopped
for at least 5 minutes, power off the node and then
130 Checking the internal disk file system power on the node. If the boot process stops again at
this point, run the node rescue procedure.
Explanation: The file system on the internal disk drive
of the node is being checked for inconsistencies. Possible Cause-FRUs or other:
v None.
User response: If the progress bar has been stopped
for at least five minutes, power off the node and then
power on the node. If the boot process stops again at 160 Updating the service controller
this point, run the node rescue procedure.
Explanation: The firmware on the service controller is
Possible Cause-FRUs or other: being updated. This can take 30 minutes.
v None. User response: When a node rescue is occurring, if the
progress bar has been stopped for at least 30 minutes,
132 Updating BIOS settings of the node exchange the FRU for a new FRU. When a node rescue
is not occurring, if the progress bar has been stopped
Explanation: The system has found that changes are for at least 15 minutes, exchange the FRU for a new
required to the BIOS settings of the node. These FRU.
changes are being made. The node will restart once the
changes are complete. Possible Cause-FRUs or other:

User response: If the progress bar has stopped for 2145-CG8 or 2145-CF8
more than 10 minutes, or if the display has shown v Service controller (95%)

158 SAN Volume Controller: Troubleshooting Guide


170 • 320

v Service controller cable (5%) v Service controller (100%)

All previous 2145 models


182 Checking uninterruptible power supply
v Service Controller (100%)
Explanation: The node is checking whether the
uninterruptible power supply is operating correctly.
170 A flash module hardware error has
occurred. User response: Allow the checking process to
complete.
Explanation: A flash module hardware error has
occurred.
232 Checking uninterruptible power supply
User response: Exchange the FRU for a new FRU. connections
Possible Cause-FRUs or other: Explanation: The node is checking whether the power
2145-CG8 or 2145-CF8 and signal cable connections to the uninterruptible
power supply are correct.
v Service controller (95%)
v Service controller cable (5%) User response: Allow the checking process to
complete.
All previous 2145 models

Create cluster errors


the counter increment failed.
870 The cluster cannot be created because
the counter maximum has been reached. Explanation: When a new cluster ID is requested from
the service controller, the service controller must
Explanation: Each time a node creates a new cluster, a
increase the ID counter. The new ID is returned for
unique ID is generated by the service controller of the
verification. If the ID counter has not been increased,
node. Once 255 clusters have been created, the service
this error code is displayed. This error has occurred
controller must be replaced.
because the service controller failed.
User response: Use a different node to create the
User response: Exchange the FRU for a new FRU.
cluster.

871 The cluster cannot be created because

Node errors
2145-CG8 or 2145-CF8
300 The 2145 is running node rescue.
v Disk drive (50%)
Explanation: The 2145 is running node rescue.
v Disk controller (30%)
User response: If the progress bar has been stopped v Disk backplane (10%)
for at least two minutes, exchange the FRU for a new
v Disk signal cable (8%)
FRU.
v Disk power cable (1%)
Possible Cause-FRUs or other:
v System board (1%)
2145-CG8 or 2145-CF8
v Service controller (95%) 2145-8G4 or 2145-8A4
v Service controller cable (5%) v Disk drive assembly (90%)
v Disk cable assembly (10%)
2145-8F2 or 2145-8F4 or 2145-8G4 or 2145-8A4
v Service controller (100%) 2145-8F2 or 2145-8F4
v Disk drive assembly (100%)
310 The 2145 is running a format operation.
320 A 2145 format operation has failed.
Explanation: The 2145 is running a format operation.
Explanation: A 2145 format operation has failed.
User response: If the progress bar has been stopped
for two minutes, exchange the FRU for a new FRU. User response: Exchange the FRU for a new FRU.
Possible Cause-FRUs or other: Possible Cause-FRUs or other:

Chapter 7. Diagnosing problems 159


330 • 365

2145-CG8 or 2145-CF8
345 The 2145 is searching for a donor node
v Disk drive (50%) from which to copy the software.
v Disk controller (30%)
Explanation: The node is searching at 1 Gb/s for a
v Disk backplane (10%) donor node.
v Disk signal cable (8%)
User response: If the progress bar has stopped for
v Disk power cable (1%) more than two minutes, exchange the FRU for a new
v System board (1%) FRU.
Possible Cause-FRUs or other:
2145-8G4 or 2145-8A4
v Fibre Channel adapter (100%)
v Disk drive assembly (90%)
v Disk cable assembly (10%)
350 The 2145 cannot find a donor node.
2145-8F2 or 2145-8F4 Explanation: The 2145 cannot find a donor node.
v Disk drive assembly (95%) User response: If the progress bar has stopped for
v Frame assembly (5%) more than two minutes, perform the following steps:
1. Ensure that all of the Fibre Channel cables are
330 The 2145 is partitioning its disk drive. connected correctly and securely to the cluster.
2. Ensure that at least one other node is operational, is
Explanation: The 2145 is partitioning its disk drive.
connected to the same Fibre Channel network, and
User response: If the progress bar has been stopped is a donor node candidate. A node is a donor node
for two minutes, exchange the FRU for a new FRU. candidate if the version of software that is installed
on that node supports the model type of the node
Possible Cause-FRUs or other:
that is being rescued.
2145-CG8 or 2145-CF8 3. Ensure that the Fibre Channel zoning allows a
v Disk drive (50%) connection between the node that is being rescued
v Disk controller (30%) and the donor node candidate.

v Disk backplane (10%) 4. Perform the problem determination procedures for


the network.
v Disk signal cable (8%)
v Disk power cable (1%) Possible Cause-FRUs or other:
v System board (1%) v None

2145-8G4 or 2145-8A4 Other:


v Disk drive assembly (90%) v Fibre Channel network problem
v Disk cable assembly (10%)
360 The 2145 is loading software from the
2145-8F2 or 2145-8F4 donor.
v Disk drive assembly (95%)
Explanation: The 2145 is loading software from the
v Frame assembly (5%) donor.

Other: User response: If the progress bar has been stopped


for at least two minutes, restart the node rescue
v Configuration problem
procedure.
v Software error
Possible Cause-FRUs or other:
v None
340 The 2145 is searching for donor node.
Explanation: The 2145 is searching for donor node.
365 Cannot load SW from donor
User response: If the progress bar has been stopped
Explanation: None.
for more than two minutes, exchange the FRU for a
new FRU. User response: None.
Possible Cause-FRUs or other:
v Fibre Channel adapter (100%)

160 SAN Volume Controller: Troubleshooting Guide


370 • 521

370 Installing software 514 Memory bank 3 of the 2145 is failing.


Explanation: The 2145 is installing software. Explanation: Memory bank 3 of the 2145 is failing.
User response: User response: For the 2145-8F2, 2145-8F4, 2145-8G4
1. If this code is displayed and the progress bar has and 2145-8A4, go to the light path MAP to resolve this
been stopped for at least ten minutes, the software problem.
install process has failed with an unexpected Possible Cause-FRUs or other:
software error.
v Memory module (100%)
2. Power off the 2145 and wait for 60 seconds.
3. Power on the 2145. The software upgrade operation
515 Memory bank 4 of the 2145 is failing.
continues.
4. Report this problem immediately to your Software Explanation: Memory bank 4 of the 2145 is failing.
Support Center. User response: For the 2145-8F2, 2145-8F4, 2145-8G4
and 2145-8A4, go to the light path MAP to resolve this
Possible Cause-FRUs or other: problem.
v None
Possible Cause-FRUs or other:
v Memory module (100%)
510 The detected memory size does not
match the expected memory size.
517 The WWNNs of the service controller
Explanation: The detected memory size, in MB, is the and the disk do not match.
first number following the error code. The expected
memory size for the cluster is the second number Explanation: The node is unable to determine the
following the error code. This problem might have WWNN that it should use. This is because of the
occurred because a memory module has failed or service controller or the nodes internal drive being
because failing memory modules were exchanged and replaced.
the wrong size modules were installed. User response: Follow troubleshooting procedures to
User response: Check the memory size of another configure the WWNN of the node.
2145 that is in the same cluster. For the 2145-8F2, 1. Continue to follow the hardware remove and
2145-8F4, 2145-8G4, 2145-8A4, 2145-CF8, and 2145-CG8, replace procedure for the service controller or disk
if you have just replaced a memory module, check that these explain the service actions.
the module that you have installed is the correct size, 2. If you have not followed the hardware remove and
then go to the light path MAP to isolate any possible replace procedures, you should determine the
failed memory modules. correct WWNN. If you do not have this information
Possible Cause-FRUs or other: recorded, examine your Fibre Channel switch
configuration to see whether it is listed there.
v Memory module (100%)
Follow the procedures to change the WWNN of a
node.
511 Memory bank 1 of the 2145 is failing.
Possible Cause-FRUs or other:
Explanation: Memory bank 1 of the 2145 is failing.
v None
User response: For the 2145-8F2, 2145-8F4, 2145-8G4
and 2145-8A4, go to the light path MAP to resolve this
problem. 521 Unable to detect a Fibre Channel
adapter
Possible Cause-FRUs or other:
Explanation: The 2145 cannot detect any Fibre
v Memory module (100%)
Channel adapter cards.
Explanation: The 2145 cannot detect any Fibre
513 Memory bank 2 of the 2145 is failing.
Channel adapter cards.
Explanation: Memory bank 2 of the 2145 is failing.
User response: Ensure that a Fibre Channel adapter
User response: For the 2145-8F2, 2145-8F4, 2145-8G4 card has been installed. Ensure that the Fibre Channel
and 2145-8A4, go to the light path MAP to resolve this card is seated correctly in the riser card. Ensure that the
problem. riser card is seated correctly on the system board. If the
problem persists, exchange FRUs for new FRUs in the
Possible Cause-FRUs or other:
order shown.
v Memory module (100%)
Possible Cause-FRUs or other:

Chapter 7. Diagnosing problems 161


522 • 530

2145-CG8 or 2145-CF8
524 Unable to update BIOS settings.
v 4-port Fibre Channel host bus adapter assembly
(95%) Explanation: Unable to update BIOS settings.
v System board assembly (5%) User response: Power off node, wait 30 seconds, and
then power on again. If the error code is still reported,
2145-8G4 or 2145-8A4 replace the system board.
v 4-port Fibre Channel host bus adapter (80%) Possible Cause-FRUs or other:
v Riser card (19%) v System board (100%)
v System board (1%)
525 Unable to update system board service
2145-8F4 processor firmware.
v 4-port Fibre Channel host bus adapter (99%)
Explanation: The process of updating the system
v Frame assembly (1%) board service processor firmware might take up to 90
minutes.
2145-8F2
User response: If the progress bar has been stopped
v Fibre Channel host bus adapter (full height) (40%)
for more than 90 minutes, power off and reboot the
v Fibre Channel host bus adapter (low profile) (40%) node. If the boot progress bar stops again on this code,
v Riser card, PCI (full height) (9%) replace the FRU shown.
v Riser card, PCI (low profile) (9%) Possible Cause-FRUs or other:
v Frame assembly (2%)
2145-8G4, 2145-8A4, 2145-CF8, or 2145-CG8
v System board (100%)
522 The system board service processor has
failed. 2145-8F2 or 2145-8F4
Explanation: The service processor on the system v Frame assembly (100%)
board has failed.
User response: Exchange the FRU for a new FRU. 528 Ambient temperature is too high during
(See“Possible Cause-FRUs or other.”) system startup.
Possible Cause-FRUs or other: Explanation: The ambient temperature read during
the node startup procedures is too high for the node to
2145-8G4, 2145-8A4, 2145-CF8, or 1245-CG8
continue. The startup procedure will continue when the
v System board assembly (100%) temperature is within range.

2145-8F2 or 2145-8F4 User response: Reduce the temperature around the


system.
v Frame assembly (100%)
1. Resolve the issue with the ambient temperature, by
checking and correcting:
523 The internal disk file system is
a. Room temperature and air conditioning
damaged.
b. Ventilation around the rack
Explanation: The node startup procedures have found
c. Airflow within the rack
problems with the file system on the internal disk of
the node.
Possible Cause-FRUs or other:
User response: Follow troubleshooting procedures to v Environment issue (100%)
reload the software.
1. Follow the procedures to rescue the software of a
530 A problem with one of the node's power
node from another node.
supplies has been detected.
2. If the rescue node does not succeed, use the
hardware remove and replace procedures. Explanation: The 530 error code is followed by two
numbers. The first number is either 1 or 2 to indicate
Possible Cause-FRUs or other: which power supply has the problem.
v Disk drive (100%) The second number is either 1, 2 or 3 to indicate the
reason. 1 indicates that the power supply is not
detected. 2 indicates that the power supply has failed. 3
indicates that there is no input power to the power
supply.

162 SAN Volume Controller: Troubleshooting Guide


550 • 556

If the node is a member of a cluster, the cluster will Reason 3: There is no input power to the power supply.
report error code 1096 or 1097, depending on the error v Power cable assembly (25%)
reason.
v UPS-1U assembly (4%)
The error will automatically clear when the problem is v System board (1%)
fixed.
v Other: Power supply is not installed correctly (70%)
User response:
1. Ensure that the power supply is seated correctly 550 A cluster cannot be formed because of a
and that the power cable is attached correctly to lack of cluster resources.
both the node and to the 2145 UPS-1U.
Explanation: Supplemental data displayed with this
2. If the error has not been automatically marked fixed
error code list the missing IDs for the 2145s and the
after two minutes, note the status of the three LEDs
quorum disk controller. Each missing node is listed by
on the back of the power supply. For the 2145-CG8
its node ID. A missing quorum disk is listed as
or 2145-CF8, the AC LED is the top green LED, the
WWWWWWWWWWWWWWWW/LL, where
DC LED is the middle green LED and the error
WWWWWWWWWWWWWWWW is a worldwide port
LED is the bottom amber LED.
name (WWPN) on the disk controller that contains the
3. If the power supply error LED is off and the AC missing quorum disk and LL is the Logical Unit
and DC power LEDs are both on, this is the normal Number (LUN) of the missing quorum disk on that
condition. If the error has not been automatically controller.
fixed after two minutes, replace the system board.
User response: Follow troubleshooting procedures to
4. Follow the action specified for the LED states noted
correct connectivity issues between the cluster nodes
in the table below.
and the quorum devices.
5. If the error has not been automatically fixed after
1. Ensure that the other 2145s in the cluster are
two minutes, contact support.
powered on and operational.
Error,AC,DC:Action 2. From the front panel, display the Fibre Channel
port status. If any port is not active, perform the
ON,ON or OFF,ON or OFF:The power supply has a Fibre Channel port problem determination
fault. Replace the power supply. procedures.
3. Ensure that Fibre Channel network zoning changes
OFF,OFF,OFF:There is no power detected. Ensure that have not restricted communication between nodes,
the power cable is connected at the node and 2145 or between the nodes and the quorum disk.
UPS-1U. If the AC LED does not light, check whether 4. Do the problem determination procedures for the
the 2145 UPS-1U is showing any errors. Follow MAP network.
5150 2145 UPS-1U if the UPS-1U is showing an error; 5. The quorum disk failed or cannot be accessed.
otherwise, replace the power cable. If the AC LED still Perform the problem determination procedures for
does not light, replace the power supply. the disk controller.

OFF,OFF,ON:The power supply has a fault. Replace the


power supply. 555 Power Domain error
Explanation: Both 2145s in an I/O group are being
OFF,ON,OFF:Ensure that the power supply is installed powered by the same uninterruptible power supply.
correctly. If the DC LED does not light, replace the The ID of the other 2145 is displayed with the node
power supply. error code on the front panel.

Possible Cause-FRUs or other: User response: Ensure that the configuration is correct
and that each 2145 is in an I/O group is connected
Reason 1: A power supply is not detected. from a separate uninterruptible power supply.

v Power supply (19%)


v System board (1%) 556 A duplicate WWNN has been detected.

v Other: Power supply is not installed correctly (80%) Explanation: The node has detected another device
that has the same World Wide Node Name (WWNN)
Reason 2: The power supply has failed. on the Fibre Channel network. A WWNN is 16
hexadecimal digits long. For a cluster, the first 11 digits
v Power supply (90%)
are always 50050768010. The last 5 digits of the
v Power cable assembly (5%) WWNN are given in the additional data of the error
v System board (5%) and appear on the front panel displays. The Fibre
Channel ports of the node are disabled to prevent

Chapter 7. Diagnosing problems 163


558 • 564

disruption of the Fibre Channel network. One or both


558 The node is unable to communicate
nodes with the same WWNN can show the error.
with other nodes.
Because of the way WWNNs are allocated, a device
with a duplicate WWNN is normally another cluster Explanation: The 2145 cannot see the Fibre Channel
node. fabric or the Fibre Channel card port speed might be
set to a different speed than the Fibre Channel fabric.
User response: Follow troubleshooting procedures to
configure the WWNN of the node: User response: Ensure that:
1. Find the cluster node with the same WWNN as the 1. The Fibre Channel network fabric switch is
node reporting the error. The WWNN for a cluster powered-on.
node can be found from the node Vital Product 2. At least one Fibre Channel cable connects the 2145
Data (VPD) or from the Node menu on the front to the Fibre Channel network fabric.
panel. The node with the duplicate WWNN need
3. The Fibre Channel card port speed is equal to the
not be part of the same cluster as the node
Fibre Channel fabric.
reporting the error; it could be remote from the
node reporting the error on a part of the fabric 4. At least one Fibre Channel adapter is installed in
connected through an inter-switch link. The WWNN the 2145.
of the node is stored within the service controller, so 5. Go to the Fibre Channel MAP.
the duplication is most likely caused by the
replacement of a service controller. Possible Cause-FRUs or other:
2. If a cluster node with a duplicate WWNN is found, v None
determine whether it, or the node reporting the
error, has the incorrect WWNN. Generally, it is the
node that has had its service controller that was 562 The nodes hardware configuration does
recently replaced or had its WWNN changed not meet the minimum requirements.
incorrectly. Also consider how the SAN is zoned Explanation: The node hardware is not at the
when making your decision. minimum specification for the node to become active in
3. Determine the correct WWNN for the node with the a cluster. This may be because of hardware failure, but
incorrect WWNN. If the service controller has been is also possible after a service action has used an
replaced as part of a service action, the WWNN for incorrect replacement part.
the node should have been written down. If the
User response: Follow troubleshooting procedures to
correct WWNN cannot be determined contact your
fix the hardware:
support center for assistance.
1. View node VPD information, to see whether
4. Use the the front panel menus to modify the
anything looks inconsistent. Compare the failing
incorrect WWNN. If it is the node showing the
node VPD with the VPD of a working node of the
error that should be modified, this can safely be
same type. Pay particular attention to the number
done immediately. If it is an active node that should
and type of CPUs and memory.
be modified, use caution because the node will
restart when the WWNN is changed. If this node is 2. Replace any incorrect parts.
the only operational node in an enclosure, access to
the volumes that it is managing will be lost. You 564 Too many software crashes have
should ensure that the host systems are in the occurred.
correct state before you change the WWNN.
5. If the node showing the error had the correct Explanation: The node has been determined to be
WWNN, it can be restarted, using the the front unstable because of multiple resets. The cause of the
panel power control button, after the node with the resets can be that the system encountered an
duplicate WWNN is updated. unexpected state or has executed instructions that were
not valid. The node has entered the service state so that
6. If you are unable to find a cluster node with the diagnostic data can be recovered.
same WWNN as the node showing the error, use
the SAN monitoring tools to determine whether The node error does not persist across restarts of the
there is another device on the SAN with the same node software and operating system.
WWNN. This device should not be using a WWNN
User response: Follow troubleshooting procedures to
assigned to a cluster, so you should follow the
reload the software:
service procedures for the device to change its
WWNN. Once the duplicate has been removed, 1. Get a support package (snap), including dumps,
restart the node canister. from the node using the management GUI or the
service assistant.
Possible Cause-FRUs or other: 2. If more than one node is reporting this error,
v None contact IBM technical support for assistance. The
support package from each node will be required.

164 SAN Volume Controller: Troubleshooting Guide


565 • 578

3. Check the support site to see whether the issue is Possible Cause-FRUs or other:
known and whether a software upgrade exists to v None
resolve the issue. Update the cluster software if a
resolution is available. Use the manual upgrade
process on the node that reported the error first. 576 The cluster state and configuration data
cannot be read.
4. If the problem remains unresolved, contact IBM
technical support and send them the support Explanation: The node has been unable to read the
package. saved cluster state and configuration data from its
internal drive because of a read or medium error.
Possible Cause-FRUs or other:
User response: In the sequence shown, exchange the
v None FRUs for new FRUs.
Possible Cause-FRUs or other:
565 The internal drive of the node is failing.
v 2145-CG8 or 2145-CF8
Explanation: The internal drive within the node is – Disk drive (50%)
reporting too many errors. It is no longer safe to rely
– Disk controller (30%)
on the integrity of the drive. Replacement is
recommended. – Disk backplane (10%)
– Disk signal cable (8%)
User response: Follow troubleshooting procedures to
fix the hardware: – Disk power cable (1%)
1. View hardware information. – System board (1%)
2. Replace parts (canister or disk). v 2145-8A4
– Disk drive assembly (80%)
Possible Cause-FRUs or other: – Disk cable assembly (15%)
v 2145-8G4 or 2145-8A4 – System board (5%)
– Disk drive assembly (95%) v 2145-8G4
– Disk drive cables (5%) – Disk drive assembly (80%)
v 2145-8F2 or 2145-8F4 – Disk drive cables (10%)
– Disk drive assembly (100%) – System board (10%)
v 2145-8F2 or 2145-8F4
573 The node software is inconsistent. – Disk drive assembly (90%)
Explanation: Parts of the node software package are – Frame assembly (10%)
receiving unexpected results; there may be an
inconsistent set of subpackages installed, or one
578 The state data was not saved following
subpackage may be damaged.
a power loss.
User response: Follow troubleshooting procedures to
Explanation: On startup, the node was unable to read
reload the software.
its state data. When this happens, it expects to be
1. Follow the procedure to run a node rescue. automatically added back into a cluster. However, if it
2. If the error occurs again, contact IBM technical has not joined a cluster in 60 sec, it raises this node
support. error. This is a critical node error and user action is
required before the node can become a candidate to
Possible Cause-FRUs or other:
join a cluster.
v None
User response: Follow troubleshooting procedures to
correct connectivity issues between the cluster nodes
574 The node software is damaged. and the quorum devices.
Explanation: A checksum failure has indicated that 1. Manual intervention is required once the node
the node software is damaged and needs to be reports this error.
reinstalled. 2. Attempt to reestablish the cluster using other nodes.
User response: If the other node canister is This may involve fixing hardware issues on other
operational, run node rescue. Otherwise, install new nodes or fixing connectivity issues between nodes.
software using the service assistant. Node rescue 3. If you are able to reestablish the cluster, remove the
failures or the repeated return of this node error after cluster data from the node showing 578 so it goes to
reinstallation is symptomatic of a hardware fault with candidate state, it will then be automatically added
the node canister. back to the cluster. If the node does not

Chapter 7. Diagnosing problems 165


580 • 584

automatically add back to the cluster, note the name


582 A battery error in the 2145 UPS-1U has
and I/O group of the node, then delete the node
occurred.
from the cluster configuration (if this has not
already happened) and then add the node back to Explanation: A problem has occurred with the
the cluster using the same name and I/O group. uninterruptible power supply 2145 UPS-1U battery.
4. If all nodes have either node error 578 or 550, User response: Exchange the FRU for a new FRU.
follow the cluster recovery procedures. After replacing the battery assembly, if the 2145
5. Attempt to determine what caused the nodes to UPS-1U service indicator is on, press and hold the 2145
shut down. UPS-1U Test button for three seconds to start the
self-test and verify the repair. During the self-test, the
Possible Cause-FRUs or other: rightmost four LEDs on the 2145 UPS-1U front-panel
v None assembly flash in sequence.
Possible Cause-FRUs or other:
580 The service controller ID could not be v UPS-1U battery assembly (50%)
read. v UPS-1U assembly (50%)
Explanation: The 2145 cannot read the unique ID from
the service controller, so the Fibre Channel adapters 583 An electronics error in the 2145 UPS-1U
cannot be started. has occurred.
User response: In the sequence shown, exchange the Explanation: A problem has occurred with the 2145
following FRUs for new FRUs. UPS-1U electronics.
Possible Cause-FRUs or other: User response: Exchange the FRU for a new FRU.
2145-CG8 or 2145-CF8 Possible Cause-FRUs or other:
v Service controller (70%) v 2145 UPS-1U assembly
v Service controller cable (30%)
584 The 2145 UPS-1U is overloaded.
2145-8F2 or 2145-8F4 or 2145-8G4 or 2145-8A4
Explanation: A problem with output overload has
Service controller (100%) been reported by the uninterruptible power supply
2145 UPS-1U. The Overload Indicator on the 2145
Other: UPS-1U front panel is illuminated red.
v None User response:
1. Ensure that only one 2145 is receiving power from
581 A serial link error in the 2145 UPS-1U the 2145 UPS-1U. Also ensure that no other devices
has occurred. are connected to the 2145 UPS-1U.
2. Disconnect the 2145 from the 2145 UPS-1U. If the
Explanation: There is a fault in the communications
Overload Indicator is still illuminated, on the
cable, the serial interface in the uninterruptible power
disconnected 2145 replace the 2145 UPS-1U.
supply 2145 UPS-1U, or 2145.
3. If the Overload Indicator is now off, and the node is
User response: Check that the communications cable a 2145-8F2, 2145-8F4, 2145-8G4 or 2145-8A4, on the
is correctly plugged in to the 2145 and the 2145 disconnected 2145, with all outputs disconnected, in
UPS-1U. If the cable is plugged in correctly, replace the the sequence shown, exchange the FRUs for new
FRUs in the order shown. FRUs.
Possible Cause-FRUs or other: 4. If the Overload Indicator is now off, and the node is
a 2145-CG8 or 2145-CF8, on the disconnected 2145,
2145-8G4, 2145-8A4, 2145-CF8, or 2145-CG8 with all outputs disconnected, determine whether it
v 2145 power cable assembly (40%) is one of the two power supplies or the power cable
v 2145 UPS-1U assembly (30%) assembly that must be replaced. Plug just one
power cable into the left hand power supply and
v 2145 system board (30%)
start the node and see whether the error is reported.
Then shut down the node and connect the other
2145-8F2 or 2145-8F4
power cable into the left hand power supply and
v 2145 power cable assembly (40%) start the node and see whether the error is repeated.
v 2145 UPS-1U assembly (30%) Then repeat the two tests for the right hand power
v 2145 frame assembly (30%) supply. If the error is repeated for both cables on
one power supply but not the other, replace the

166 SAN Volume Controller: Troubleshooting Guide


586 • 690

power supply that showed the error; otherwise, 4. Wait at least five minutes, and then restart the 2145
replace the power cable assembly. UPS-1U. If the problem remains, exchange 2145
UPS-1U assembly.
Possible Cause-FRUs or other:
v Power cable assembly (45%) 590 Repetitive node restarts have occurred
v Power supply assembly (45%) because of errors from the 2145 UPS-1U.
v UPS-1U assembly (10%) Explanation: Multiple node restarts have occurred
because of 2145 UPS-1U errors.
586 The power supply to the 2145 UPS-1U User response: Follow troubleshooting procedures to
does not meet requirements. fix the hardware:
Explanation: None. 1. Verify that the room temperature is within specified
limits and that the input power is stable.
User response: Follow troubleshooting procedures to
fix the hardware. 2. Verify that the 2145 UPS-1U signal cable is fastened
securely at both ends.

587 An incorrect type of uninterruptible Note: The condition will be reset by powering off the
power supply has been detected. node from the node front panel.
Explanation: An incorrect type of 2145 UPS-1U was
installed. 670 The UPS battery charge is not enough to
User response: Exchange the 2145 UPS-1U for one of allow the node to start.
the correct type. Explanation: The uninterruptible power supply
Possible Cause-FRUs or other: connected to the node does not have sufficient battery
charge for the node to safely become active in a cluster.
v 2145 UPS-1U (100%) The node will not start until a sufficient charge exists to
store the state and configuration data held in the node
588 The 2145 UPS-1U is not cabled correctly. memory if power were to fail. The front panel of the
node will show "charging".
Explanation: The signal cable or the 2145 power
cables are probably not connected correctly. The power User response: Wait for sufficient battery charge for
cable and signal cable might be connected to different enclosure to start:
2145 UPS-1U assemblies. 1. Wait for the node to automatically fix the error
when there is sufficient charge.
User response:
2. Ensure that no error conditions are indicated on the
1. Connect the cables correctly.
uninterruptible power supply.
2. Restart the node.

Possible Cause-FRUs or other: 690 The node is held in the service state.
v None. Explanation: The node is in service state and has been
instructed to remain in service state. While in service
Other: state, the node will not run as part of a cluster. A node
v Cabling error (100%) must not be in service state for longer than necessary
while the cluster is online because a loss of redundancy
will result. A node can be set to remain in service state
589 The 2145 UPS-1U ambient temperature either because of a service assistant user action or
limit has been exceeded. because the node was deleted from the cluster.
Explanation: The ambient temperature threshold for User response: When it is no longer necessary to hold
the 2145 UPS-1U has been exceeded. the node in the service state, exit the service state to
allow the node to run:
User response: Reduce the temperature around the
system: 1. Use the service assistant action or use the front
panel Exit Service action to release the service state.
1. Turn off the 2145 UPS-1U and unplug it from the
power source.
Possible Cause-FRUs or other:
2. Clear the vents and remove any heat sources.
v None
3. Ensure that the air flow around the 2145 UPS-1U is
not restricted.

Chapter 7. Diagnosing problems 167


710 • 860

replacement part for the enclosure model and


710 The high speed SAS adapter that was
software version that you are operating.
previously present has not been
detected.
818 Unable to recover the service controller
Explanation: The 2145 could not detect the high speed
flash disk.
SAS adapter.
Explanation: Unable to recover the service controller
User response: This non-critical node error should be
flash disk.
serviced by using the management GUI and running
the recommended actions for the alert with error code User response: Follow troubleshooting procedures to
1120. fix the hardware.

720 The 10 Gbps Ethernet adapter that was 840 A hardware change has been made to
previously present has not been this node that is not supported by its
detected. software. User action is required to
repair the hardware or update the
Explanation: The 2145 could not detect the 10 Gbps
software. This non-critical node error
Ethernet adapter.
can only be reported when the node is
User response: This non-critical node error should be active in a cluster and its configuration
serviced by using the management GUI and running is stored. The detected hardware is not
the recommended actions for the alert with error code being used.
1072.
Explanation: This is a non-critical node error. The
node will continue to operate but only the first 1024
801 Memory reduced. Fibre Channel logins will be used. Connectivity
problems to the controllers, hosts, or other nodes could
Explanation: Memory is reduced but sufficient
exist.
memory exists to run I/O operations.
User response: Confirm that the required software
User response: Follow troubleshooting procedures to
version supporting any recently installed hardware is
fix the hardware.
running on the system. Upgrade the system to the
correct level. If the recently installed hardware was not
803 One or more Fibre Channel ports are received as a feature code enhancement or as a part
not operational. replacement, it should be removed. If the recently
installed hardware was received as a feature code
Explanation: One or more Fibre Channel ports are not enhancement or as a part replacement, and you have a
operational. level of software that supports the installed part,
User response: Follow troubleshooting procedures to contact IBM technical support.
fix the hardware.
841 A supported hardware change has been
805 One or more configured Ethernet ports made to this node. User action is
are not operational. required to activate the new hardware.
This non-critical node error can only be
Explanation: One or more configured Ethernet ports reported when the node is active in a
are not operational. cluster and its configuration is stored.
User response: Follow troubleshooting procedures to Explanation: This is a non-critical node error. The
fix the hardware. node will continue to operate but only the first 1024
Fibre Channel logins will be used. Connectivity
815 Cannot determine the VPD for a problems to the controllers, hosts, or other nodes could
component. exist.

Explanation: An FRU in the system has been changed, User response: Use the management GUI
and the VPD is unreadable or unrecognized. recommended actions for the alert with error code 1199
to confirm the hardware configuration change.
User response:
1. Check whether the replacement part that you have
860 The Fibre Channel network fabric is too
installed is the correct part.
large.
2. See whether there is an updated software package
that correctly supports the part that was used. If an Explanation: This is a non-critical node error. The
updated software package exists, upgrade to that node will continue to operate but only the first 1024
software version. Otherwise, obtain the correct Fibre Channel logins will be used. Connectivity

168 SAN Volume Controller: Troubleshooting Guide


878 • 1013

problems to the controllers, hosts, or other nodes could Explanation: During startup, the node was unable to
exist. read its state data. It expects to be added back into a
cluster, and reports this error while it is waiting.
User response: Fix the Fibre Channel network
configuration: User response: Allow time for recovery. No further
1. View hardware WWNN information. action is required.
2. Reconfigure your SAN zoning.

878 Attempting recovery after loss of state


data

Cluster recovery and states


Explanation: Special upgrade mode.
920 Unable to perform cluster recovery
because of a lack of cluster resources. User response: None.
Explanation: The node is looking for a quorum of
resources which also require cluster recovery. 990 Cluster recovery has failed.
User response: Contact IBM technical support. Explanation: Cluster recovery has failed.
User response: Contact IBM technical support.
950 Special upgrade mode.

Cluster error codes


“online”, go to start MAP. If you return to this step,
1001 Automatic cluster recovery has run.
contact your support center to resolve the problem
Explanation: All cluster configuration commands are with the 2145.
blocked. 3. Go to repair verification MAP.
User response: Call your software support center.
Possible Cause-FRUs or other:
Caution: You can unblock the configuration commands
through the cluster GUI, but you must first consult 2145-CG8 or 2145-CF8
with your software support to avoid corrupting your
cluster configuration. v 4-port Fibre Channel host bus adapter (98%)
v System board (2%)
Possible Cause-FRUs or other:
v None 2145-8G4 or 2145-8A4
v 4-port Fibre Channel host bus adapter (90%)
1002 Event log full. v PCI Express riser card (8%)
Explanation: Event log full. v System board (2%)
User response: To fix the errors in the event log, go to
2145-8F4
the start MAP.
Possible Cause-FRUs or other: N/A
v Unfixed errors in the log.
2145-8F2

1011 Fibre Channel adapter (4 port) in slot 1 N/A


is missing.
Explanation: Fibre Channel adapter (4 port) in slot 1 1013 Fibre Channel adapter (4-port) in slot 1
is missing. PCI fault.
User response: Explanation: Fibre Channel adapter (4-port) in slot 1
1. In the sequence shown, exchange the FRUs for new PCI fault.
FRUs.
User response:
2. Check node status. If all nodes show a status of
“online”, mark the error that you have just repaired 1. In the sequence shown, exchange the FRUs for new
as “fixed”. If any nodes do not show a status of FRUs.

Chapter 7. Diagnosing problems 169


1014 • 1016

2. Check node status. If all nodes show a status of N/A


“online”, mark the error that you have just repaired
as “fixed”. If any nodes do not show a status of
1015 Fibre Channel adapter in slot 2 is
“online”, go to start MAP. If you return to this step,
missing.
contact your support center to resolve the problem
with the 2145. Explanation: Fibre Channel adapter in slot 2 is
3. Go to repair verification MAP. missing.
User response:
Possible Cause-FRUs or other:
1. In the sequence shown, exchange the FRUs for new
FRUs.
2145-CG8 or 2145-CF8
2. Check node status. If all nodes show a status of
v 4-port Fibre Channel host bus adapter (98%)
“online”, mark the error that you have just repaired
v System board (2%) “fixed”. If any nodes do not show a status of
“online”, go to start MAP. If you return to this step,
2145-8G4 or 2145-8A4 contact your support center to resolve the problem
v 4-port Fibre Channel host bus adapter (80%) with the 2145.
v PCI Express riser card (10%) 3. Go to repair verification MAP.
v System board (10%)
Possible Cause-FRUs or other:
2145-8F4
2145-8F2
N/A v Dual port Fibre Channel host bus adapter - full
height (90%)
2145-8F2 v PCI riser card (8%)
v Frame assembly (2%)
N/A
2145-8G4
1014 Fibre Channel adapter in slot 1 is
missing. N/A

Explanation: Fibre Channel adapter in slot 1 is 2145-8F4


missing.
User response: N/A
1. In the sequence shown, exchange the FRUs for new
FRUs. 1016 Fibre Channel adapter (4 port) in slot 2
2. Check node status. If all nodes show a status of is missing.
“online”, mark the error that you have just repaired
Explanation: Fibre Channel adapter (4 port) in slot 2
“fixed”. If any nodes do not show a status of
is missing.
“online”, go to start MAP. If you return to this step,
contact your support center to resolve the problem User response:
with the 2145. 1. In the sequence shown, exchange the FRUs for new
3. Go to repair verification MAP. FRUs.
2. Check node status. If all nodes show a status of
Possible Cause-FRUs or other: “online”, mark the error that you have just repaired
“fixed”. If any nodes do not show a status of
2145-8F2 “online”, go to start MAP. If you return to this step,
v Dual port Fibre Channel HBA - low profile (90%) contact your support center to resolve the problem
with the 2145.
v PCI riser card - low profile (8%)
3. Go to repair verification MAP.
v Frame assembly (2%)
Possible Cause-FRUs or other:
2145-8G4
2145-8F4
N/A
v 4-port Fibre Channel host bus adapter (90%)
2145-8F4 v PCI Express riser card (8%)
v Frame assembly (2%)

170 SAN Volume Controller: Troubleshooting Guide


1017 • 1020

2145-8G4 2145-8F2
v Dual port Fibre Channel host bus adapter - full
N/A height (80%)
v PCI riser card (10%)
2145-8F2
v Frame assembly (10%)
N/A
2145-8G4

1017 Fibre Channel adapter in slot 1 PCI bus N/A


error.
Explanation: Fibre Channel adapter in slot 1 PCI bus 2145-8F4
error.
N/A
User response:
1. In the sequence shown, exchange the FRUs for new
1019 Fibre Channel adapter (4-port) in slot 2
FRUs.
PCI fault.
2. Check node status. If all nodes show a status of
“online”, mark the error that you have just repaired Explanation: Fibre Channel adapter (4-port) in slot 2
“fixed”. If any nodes do not show a status of PCI fault.
“online”, go to start MAP. If you return to this step, User response:
contact your support center to resolve the problem
with the 2145. 1. In the sequence shown, exchange the FRUs for new
FRUs.
3. Go to repair verification MAP.
2. Check node status. If all nodes show a status of
“online”, mark the error that you have just repaired
Possible Cause-FRUs or other:
“fixed”. If any nodes do not show a status of
“online”, go to start MAP. If you return to this step,
2145-8F2
contact your support center to resolve the problem
v Dual port Fibre Channel host bus adapter - low with the 2145.
profile (80%)
3. Go to repair verification MAP.
v PCI riser card (10%)
v Frame assembly (10%) Possible Cause-FRUs or other:

2145-8G4 2145-8F4
v 4-port Fibre Channel host bus adapter (80%)
N/A
v PCI Express riser card (10%)

2145-8F4 v Frame assembly (10%)

N/A 2145-8G4

N/A
1018 Fibre Channel adapter in slot 2 PCI
fault. 2145-8F2
Explanation: Fibre Channel adapter in slot 2 PCI fault.
N/A
User response:
1. In the sequence shown, exchange the FRUs for new
1020 The system board service processor has
FRUs.
failed.
2. Check node status. If all nodes show a status of
“online”, mark the error that you have just repaired Explanation: The cluster is reporting that a node is
“fixed”. If any nodes do not show a status of not operational because of critical node error 522. See
“online”, go to start MAP. If you return to this step, the details of node error 522 for more information.
contact your support center to resolve the problem User response: See node error 522.
with the 2145.
3. Go to repair verification MAP.

Possible Cause-FRUs or other:

Chapter 7. Diagnosing problems 171


1022 • 1044

Replace the FRUs in the order shown. Mark the error


1022 The detected memory size does not
as fixed.
match the expected memory size.
Possible Cause-FRUs or other:
Explanation: The cluster is reporting that a node is
not operational because of critical node error 510. See 2145-CG8 or 2145-CF8
the details of node error 510 for more information. v disk drive (50%)
User response: See node error 510. v Disk controller (30%)
v Disk backplane (10%)
1025 The 2145 system assembly is failing. v Disk signal cable (8%)
Explanation: The 2145 system assembly is failing. v Disk power cable (1%)
v System board (1%)
User response:
1. Go to the light path diagnostic MAP and perform 2145-8A4
the light path diagnostic procedures.
v disk drive (90%)
2. If the light path diagnostic procedure isolates the
FRU, mark this error as “fixed” and go to the repair v disk cable assembly (10%)
verification MAP. If you have just replaced a FRU
but it has not corrected the problem, ensure that the 2145-8G4
FRU is installed correctly and go to the next step. v disk drive assembly (90%)
3. Replace the system board or frame assembly as v disk drive cables (10%)
indicated in the Possible Cause list below.
4. Check node status. If all nodes show a status of 2145-8F4 or 2145-8F2
“online”, mark the error that you have just repaired v disk drive assembly (100%)
as “fixed”. If any nodes do not show a status of
“online”, go to the start MAP. If you return to this
step, contact your support center to resolve the 1040 A flash module error has occurred after
problem with the 2145. a successful start of a 2145.

5. Go to the repair verification MAP. Explanation: Note: The node containing the flash
module has not been rejected by the cluster.
Possible Cause-FRUs or other: User response:
1. Replace the FRUs below in the order listed
2145-8G4, 2145-CF8, or 2145-CG8
2. Check node status. If all nodes show a status of
v The FRUs that are indicated by the Light path
Online, mark the error that you have just repaired
diagnostics (98%)
“fixed”. If any nodes do not show a status of
v System board (2%) Online, go to start MAP. If you return to this step,
contact your support center to resolve the problem
2145-8F2 or 2145-8F4 with the 2145.
v The FRUs that are indicated by the Light path 3. Go to repair verification MAP.
diagnostics (98%)
v Frame assembly (2%) Possible Cause-FRUs or other:

2145-CG8 or 2145-CF8
1027 Unable to update BIOS settings.
v Service controller (50%)
Explanation: The cluster is reporting that a node is
v Service controller cable (50%)
not operational because of critical node error 524. See
the details of node error 524 for more information.
2145-8F2 or 2145-8F4 or 2145-8G4 or 2145-8A4
User response: See node error 524.
Service controller (100%)
1030 The internal disk of a node has failed.
Explanation: An error has occurred while attempting 1044 A service controller read failure
to read or write data to the internal disk of one of the occurred.
nodes in the cluster. The disk has failed. Explanation: A service controller read failure occurred.
User response: Determine which node's internal disk User response:
has failed using the node information in the error.
1. Replace the FRUs below in the order listed.

172 SAN Volume Controller: Troubleshooting Guide


1052 • 1056

2. Check node status. If all nodes show a status of


1055 Fibre Channel adapter (4 port) in slot 1
Online, mark the error that you have just repaired
adapter present but failed.
“fixed”. If any nodes do not show a status of
Online, go to start MAP. If you return to this step, Explanation: Fibre Channel adapter (4 port) in slot 1
contact your support center to resolve the problem adapter present but failed.
with the 2145.
User response:
3. Go to repair verification MAP.
1. Exchange the FRU for new FRU.
Possible Cause-FRUs or other: 2. Check node status. If all nodes show a status of
“online”, mark the error that you have just repaired
2145-CG8 or 2145-CF8 “fixed”. If any nodes do not show a status of
“online”, go to start MAP. If you return to this step,
v Service controller (50%) contact your support center to resolve the problem
v Service controller cable (50%) with the 2145.
3. Go to repair verification MAP.
2145-8F2 or 2145-8F4 or 2145-8G4 or 2145-8A4
Possible Cause-FRUs or other:
Service controller (100%)
2145-8G4, 2145-8A4, 2145-CF8, or 2145-CG8
1052 Incorrect type of uninterruptible power v 4-port Fibre Channel host bus adapter (100%)
supply detected
Explanation: The cluster is reporting that a node is 2145-8F4
not operational because of critical node error 587. See
the details of node error 587 for more information. N/A

User response: See node error 587. 2145-8F2

1054 Fibre Channel adapter in slot 1 adapter N/A


present but failed.
Explanation: Fibre Channel adapter in slot 1 adapter 1056 Fibre Channel adapter in slot 2 adapter
present but failed. present but failed.

User response: Explanation: Fibre Channel adapter in slot 2 adapter


present but failed.
1. Replace the Fibre Channel adapter.
2. Check node status. If all nodes show a status of User response:
“online”, mark the error that you have just repaired 1. Replace the Fibre Channel adapter.
“fixed”. If any nodes do not show a status of 2. Check node status. If all nodes show a status of
“online”, go to start MAP. If you return to this step, “online”, mark the error that you have just repaired
contact your support center to resolve the problem “fixed”. If any nodes do not show a status of
with the 2145. “online”, go to start MAP. If you return to this step,
3. Go to repair verification MAP. contact your support center to resolve the problem
with the 2145.
Possible Cause-FRUs or other: 3. Go to repair verification MAP.

2145-8F2 Possible Cause-FRUs or other:

Dual port Fibre Channel host bus adapter - low profile 2145-8F2
(100%)
Dual port Fibre Channel host bus adapter - full height
2145-8G4 (100%)

N/A 2145-8G4

2145-8F4 N/A

N/A 2145-8F4

N/A

Chapter 7. Diagnosing problems 173


1057 • 1089

1057 Fibre Channel adapter (4 port) in slot 2 1065 One or more Fibre Channel ports are
adapter present but failed. running at lower than the previously
saved speed.
Explanation: Fibre Channel adapter (4 port) in slot 2
adapter present but failed. Explanation: The Fibre Channel ports will normally
operate at the highest speed permitted by the Fibre
User response:
Channel switch, but this speed might be reduced if the
1. Exchange the FRU for new FRU. signal quality on the Fibre Channel connection is poor.
2. Check node status. If all nodes show a status of The Fibre Channel switch could have been set to
“online”, mark the error that you have just repaired operate at a lower speed by the user, or the quality of
“fixed”. If any nodes do not show a status of the Fibre Channel signal has deteriorated.
“online”, go to start MAP. If you return to this step,
User response:
contact your support center to resolve the problem
with the 2145. v Go to MAP 5600: Fibre Channel to resolve the
problem.
3. Go to repair verification MAP.
Possible Cause-FRUs or other:
Possible Cause-FRUs or other:
2145-8F4, 2145-8G4, 2145-8A4, 2145-CF8, or 2145-CG8
2145-8F4
v Fibre Channel cable (50%)
v 4-port Fibre Channel host bus adapter (100%)
v Small Form-factor Pluggable (SFP) connector (20%)
2145-8G4 v 4-port Fibre Channel host bus adapter (5%)

N/A Other:
v Fibre Channel switch, SFP connector, or GBIC (25%)
2145-8F2
1083 Unrecognized node error
N/A
Explanation: The cluster is reporting that a node is
not operational because of critical node error 562. See
1060 One or more Fibre Channel ports on the
the details of node error 562 for more information.
2145 are not operational.
User response: See node error 562.
Explanation: One or more Fibre Channel ports on the
2145 are not operational.
1089 One or more fans are failing.
User response:
1. Go to MAP 5600: Fibre Channel to isolate and Explanation: One or more fans are failing.
repair the problem. User response:
2. Go to the repair verification MAP. 1. Determine the failing fan(s) from the fan indicator
on the system board or from the text of the error
Possible Cause-FRUs or other: data in the log. The reported fan for the 2145-8A4,
2145-CF8, or 2145-CG8 matches the fan assembly
2145-8F4, 2145-8G4, 2145-CF8, or 2145-CG8 position. For the 2145-8G4, if you have determined
v Fibre Channel cable (80%) the failing fan number from the error data in the
v Small Form-factor Pluggable (SFP) connector (5%) log, use the following list to determine the position
of the fan assembly to replace. Each fan assembly
v 4-port Fibre Channel host bus adapter (5%) contains two fans.
2. Exchange the FRU for a new FRU.
2145-8F2
3. Go to repair verification MAP.
v Fibre Channel cable (80%)
v Fan number:Fan assembly position
v Small Form-factor Pluggable (SFP) connector (5%)
v 1 or 2 :1
v Dual port Fibre Channel host bus adapter (Fibre
Channel MAP isolates to the correct type) (5%) v 3 or 4 :2
v 5 or 6 :3
Other: v 7 or 8 :4
v Fibre Channel network fabric (10%) v 9 or 10:5
v 11 or 12:6

174 SAN Volume Controller: Troubleshooting Guide


1090 • 1093

Possible Cause-FRUs or other: N/A

2145-8G4, 2145-8A4, 2145-CF8, or 2145-CG8


1092 The temperature soft or hard shutdown
v Fan assembly (100%) threshold of the 2145 has been exceeded.
The 2145 has automatically powered off.
2145-8F4
Explanation: The temperature soft or hard shutdown
threshold of the 2145 has been exceeded. The 2145 has
N/A
automatically powered off.
User response:
1090 One or more fans (40x40x28) are failing.
1. Ensure that the operating environment meets
Explanation: One or more fans (40x40x28) are failing. specifications.
User response: 2. Ensure that the airflow is not obstructed.
1. Determine the failing fan(s) from the fan indicator 3. Ensure that the fans are operational.
on the system board or from the text of the error 4. Go to the light path diagnostic MAP and perform
data in the log. the light path diagnostic procedures.
2. If all fans on the fan backplane are failing or if no 5. Check node status. If all nodes show a status of
fan fault lights are illuminated, verify that the cable “online”, mark the error that you have just repaired
between the fan backplane and the system board is as “fixed”. If any nodes do not show a status of
connected. “online”, go to the start MAP. If you return to this
3. Exchange the FRU for a new FRU. step, contact your support center to resolve the
4. Go to repair verification MAP. problem with the 2145.
6. Go to the repair verification MAP.
Possible Cause-FRUs or other:
Possible Cause-FRUs or other:
2145-8F2 or 2145-8F4
v Fan 40x40x28 (98%) 2145-8G4, 2145-8A4, 2145-CF8, or 2145-CG8
v Fan power cable assembly (2%) v The FRU that is indicated by the Light path
diagnostics (25%)
2145-8G4 v System board (5%)

N/A 2145-8F2 or 2145-8F4


v The FRU that is indicated by the Light path
1091 One or more fans (40x40x56) are failing. diagnostics (25%)
v Frame assembly (5%)
Explanation: One or more fans (40x40x56) are failing.
User response: Other:
1. Determine the failing fan(s) from the fan indicator
on the system board or from the text of the error System environment or airflow blockage (70%)
data in the log.
2. If all fans on the fan backplane are failing or if no 1093 The internal temperature sensor of the
fan fault lights are illuminated, verify that the cable 2145 has reported that the temperature
between the fan backplane and the system board is warning threshold has been exceeded.
connected.
Explanation: The internal temperature sensor of the
3. Exchange the FRU for a new FRU. 2145 has reported that the temperature warning
4. Go to repair verification MAP. threshold has been exceeded.
User response:
Possible Cause-FRUs or other:
1. Ensure that the internal airflow of the node has not
2145-8F2 or 2145-8F4 been obstructed.

v Fan 40x40x56 (98%) 2. Check node status. If all nodes show a status of
“online”, mark the error that you have just repaired
v Fan power cable assembly (2%) “fixed”. If any nodes do not show a status of
“online”, go to the start MAP. If you return to this
2145-8G4 step, contact your support center to resolve the
problem with the 2145.

Chapter 7. Diagnosing problems 175


1094 • 1097

3. Go to repair verification MAP. DC LED is the middle green LED and the error
LED is the bottom amber LED.
Possible Cause-FRUs or other: 3. If the power supply error LED is off and the AC
and DC power LEDs are both on, this is the normal
2145-8G4, 2145-8A4, 2145-CF8, or 2145-CG8 condition. If the error has not been automatically
v Fan assembly (25%) fixed after two minutes, replace the system board.
v System board (5%) 4. Follow the action specified for the LED states noted
in the table below.
2145-8F2 or 2145-8F4 5. If the error has not been automatically fixed after
v Fan assembly (25%) two minutes, contact support.
v Frame assembly (5%) 6. Go to repair verification MAP.

Other: Error,AC,DC:Action

Airflow blockage (70%) ON,ON or OFF,ON or OFF:The power supply has a


fault. Replace the power supply.

1094 The ambient temperature threshold has OFF,OFF,OFF:There is no power detected. Ensure that
been exceeded. the power cable is connected at the node and 2145
Explanation: The ambient temperature threshold has UPS-1U. If the AC LED does not light, check the status
been exceeded. of the 2145 UPS-1U to which the power supply is
connected. Follow MAP 5150 2145 UPS-1U if the
User response: UPS-1U is showing no power or an error; otherwise,
1. Check that the room temperature is within the replace the power cable. If the AC LED still does not
limits allowed. light, replace the power supply.
2. Check for obstructions in the air flow.
OFF,OFF,ON:The power supply has a fault. Replace the
3. Mark the errors as fixed.
power supply.
4. Go to repair verification MAP.
OFF,ON,OFF:Ensure that the power supply is installed
Possible Cause-FRUs or other: correctly. If the DC LED does not light, replace the
power supply.
None
Possible Cause-FRUs or other:
Other:
Failed PSU:
System environment (100%) v Power supply (90%)
v Power cable assembly (5%)
1096 A Power Supply Unit is missing or has v System board (5%)
failed.
Explanation: One of the two power supply units in Missing PSU:
the node is either missing or has failed. v Power supply (19%)
NOTE: This error is reported when a hot-swap power v System board (1%)
supply is removed from an active node, so it might be v Other: Power supply not correctly installed (80%)
reported when a faulty power supply is removed for
replacement. Both the missing and faulty conditions
report this error code. 1097 A Power Supply Unit reports no A/C
power.
User response: Error code 1096 is reported when the
power supply either cannot be detected or reports an Explanation: One of the two power supply units in
error. the node is reporting that no main power is detected.
1. Ensure that the power supply is seated correctly User response:
and that the power cable is attached correctly to 1. Ensure that the power supply is attached correctly
both the node and to the 2145 UPS-1U. to both the node and to the 2145 UPS-1U.
2. If the error has not been automatically marked fixed 2. If the error has not been automatically marked fixed
after two minutes, note the status of the three LEDs after two minutes, note the status of the three LEDs
on the back of the power supply. For the 2145-CG8 on the back of the power supply. For the 2145-CG8
or 2145-CF8, the AC LED is the top green LED, the

176 SAN Volume Controller: Troubleshooting Guide


1100 • 1105

or 2145-CF8, the AC LED is the top green LED, the 2145-8F2 or 2145-8F4
DC LED is the middle green LED and the error v Light path diagnostic MAP FRUs (98%)
LED is the bottom amber LED.
v Frame assembly (2%)
3. If the power supply error LED is off and the AC
and DC power LEDs are both on, this is the normal
condition. If the error has not been automatically 1101 One of the voltages that is monitored on
fixed after two minutes, replace the system board. the system board is over the set
threshold.
4. Follow the action specified for the LED states noted
in the table below. Explanation: One of the voltages that is monitored on
5. If the error has not been automatically fixed after the system board is over the set threshold.
two minutes, contact support. User response:
6. Go to repair verification MAP. 1. See the light path diagnostic MAP.
2. If the light path diagnostic MAP does not resolve
Error,AC,DC:Action
the issue, exchange the system board assembly.
ON,ON or OFF,ON or OFF:The power supply has a 3. Check node status. If all nodes show a status of
fault. Replace the power supply. “online”, mark the error that you have just repaired
as “fixed”. If any nodes do not show a status of
OFF,OFF,OFF:There is no power detected. Ensure that “online”, go to start MAP. If you return to this step,
the power cable is connected at the node and 2145 contact your support center to resolve the problem
UPS-1U. If the AC LED does not light, check whether with the 2145.
the 2145 UPS-1U is showing any errors. Follow MAP 4. Go to repair verification MAP.
5150 2145 UPS-1U if the UPS-1U is showing an error;
otherwise, replace the power cable. If the AC LED still Possible Cause-FRUs or other:
does not light, replace the power supply.
2145-8G4, 2145-8A4, 2145-CF8, or 2145-CG8
OFF,OFF,ON:The power supply has a fault. Replace the v Light path diagnostic MAP FRUs (98%)
power supply.
v System board (2%)
OFF,ON,OFF:Ensure that the power supply is installed
correctly. If the DC LED does not light, replace the 1105 One of the voltages that is monitored on
power supply. the system board is under the set
threshold.
Possible Cause-FRUs or other:
Explanation: One of the voltages that is monitored on
v Power cable assembly (85%) the system board is under the set threshold.
v UPS-1U assembly (10%)
User response:
v System board (5%)
1. Check the cable connections.
2. See the light path diagnostic MAP.
1100 One of the voltages that is monitored on
3. If the light path diagnostic MAP does not resolve
the system board is over the set
the issue, exchange the frame assembly.
threshold.
4. Check node status. If all nodes show a status of
Explanation: One of the voltages that is monitored on “online”, mark the error that you have just repaired
the system board is over the set threshold. as “fixed”. If any nodes do not show a status of
User response: “online”, go to start MAP. If you return to this step,
contact your support center to resolve the problem
1. See the light path diagnostic MAP. with the 2145.
2. If the light path diagnostic MAP does not resolve 5. Go to repair verification MAP.
the issue, exchange the frame assembly.
3. Check node status. If all nodes show a status of Possible Cause-FRUs or other:
“online”, mark the error that you have just repaired
as “fixed”. If any nodes do not show a status of 2145-8F2 or 2145-8F4
“online”, go to start MAP. If you return to this step,
v Light path diagnostic MAP FRUs (98%)
contact your support center to resolve the problem
with the 2145. v Frame assembly (2%)
4. Go to repair verification MAP.

Possible Cause-FRUs or other:

Chapter 7. Diagnosing problems 177


1106 • 1122

v Power supply assembly (5%)


1106 One of the voltages that is monitored on
the system board is under the set v Frame assembly (5%)
threshold.
Explanation: One of the voltages that is monitored on 1120 A high speed SAS adapter is missing.
the system board is under the set threshold. Explanation: This node has detected that a high speed
User response: SAS adapter that was previously installed is no longer
present.
1. Check the cable connections.
2. See the light path diagnostic MAP. User response: If the high speed SAS adapter was
deliberately removed, mark the error “fixed.”
3. If the light path diagnostic MAP does not resolve
the issue, exchange the system board assembly. Otherwise, the high speed SAS adapter has failed and
4. Check node status. If all nodes show a status of must be replaced. In the sequence shown, exchange the
“online”, mark the error that you have just repaired FRUs for new FRUs.
as “fixed”. If any nodes do not show a status of Go to the repair verification MAP.
“online”, go to start MAP. If you return to this step,
contact your support center to resolve the problem Possible Cause-FRUs or other:
with the 2145. 1. High speed SAS adapter (90%)
5. Go to repair verification MAP. 2. System board (10%)

Possible Cause-FRUs or other:


1121 A high speed SAS adapter has failed.
2145-8G4, 2145-8A4, 2145-CF8, or 2145-CG8 Explanation: A fault has been detected on a high
v Light path diagnostic MAP FRUs (98%) speed SAS adapter.
v System board (2%) User response: In the sequence shown, exchange the
FRUs for new FRUs.
1110 The power management board detected Go to the repair verification MAP.
a voltage that is outside of the set
thresholds. Possible Cause-FRUs or other:
1. High speed SAS adapter (90%)
Explanation: The power management board detected
a voltage that is outside of the set thresholds. 2. System board (10%)

User response:
1122 A high speed SAS adapter error has
1. In the sequence shown, exchange the FRUs for new occurred.
FRUs.
2. Check node status. If all nodes show a status of Explanation: The high speed SAS adapter has
“online”, mark the error that you have just repaired detected a PCI bus error and requires service before it
as “fixed”. If any nodes do not show a status of can be restarted. The high speed SAS adapter failure
“online”, go to start MAP. If you return to this step, has caused all of the solid-state drives that were being
contact your support center to resolve the problem accessed through this adapter to go Offline.
with the 2145. User response: If this is the first time that this error
3. Go to repair verification MAP. has occurred on this node, complete the following
steps:
Possible Cause-FRUs or other: 1. Power off the node.
2. Reseat the high speed SAS adapter card.
2145-CG8 or 2145-CF8
3. Power on the node.
v Power supply unit (50%)
4. Submit the lsmdisk task and ensure that all of the
v System board (50%) solid-state drive managed disks that are located in
this node have a status of Online.
2145-8G4
v Power backplane (90%) If the sequence of actions above has not resolved the
v Power supply assembly (5%) problem or the error occurs again on the same node,
complete the following steps:
v System board (5%)
1. In the sequence shown, exchange the FRUs for new
2145-8F2 or 2145-8F4 FRUs.

v Power backplane (90%)

178 SAN Volume Controller: Troubleshooting Guide


1133 • 1140

2. Submit the lsmdisk task and ensure that all of the User response:
solid-state drive managed disks that are located in 1. Power off the node attached to the 2145 UPS-1U.
this node have a status of Online.
2. Turn off the 2145 UPS-1U, and then unplug the 2145
3. Go to the repair verification MAP. UPS-1U from the main power source.
3. Ensure that the air vents of the 2145 UPS-1U are not
Possible Cause-FRUs or other:
obstructed.
1. High speed SAS adapter (90%)
4. Ensure that the air flow around the 2145 UPS-1U is
2. System board (10%) not restricted.
5. Wait for at least five minutes, and then restart the
1133 A duplicate WWNN has been detected. 2145 UPS-1U. If the problem remains, check the
ambient temperature. Correct the problem.
Explanation: The cluster is reporting that a node is Otherwise, exchange the FRU for a new FRU.
not operational because of critical node error 556. See
the details of node error 556 for more information. 6. Check node status. If all nodes show a status of
“online”, mark the error that you have just repaired
User response: See node error 556. “fixed”. If any nodes do not show a status of
“online”, go to start MAP. If you return to this step,
contact your support center to resolve the problem
1135 The 2145 UPS has reported an ambient
with the uninterruptible power supply.
over temperature.
7. Go to repair verification MAP.
Explanation: The 2145 UPS has reported an ambient
over temperature. The uninterruptible power supply Possible Cause-FRUs or other:
switches to Bypass mode to allow the 2145 UPS to cool.
User response: 2145 UPS-1U assembly (50%)
1. Power off the nodes attached to the 2145 UPS.
Other:
2. Turn off the 2145 UPS, and then unplug the 2145
UPS from the main power source.
The system ambient temperature is outside the
3. Ensure that the air vents of the 2145 UPS are not specification (50%)
obstructed.
4. Ensure that the air flow around the 2145 UPS is not
1140 The 2145 UPS has reported that it has a
restricted.
problem with the input AC power.
5. Wait for at least five minutes, and then restart the
2145 UPS. If the problem remains, check the Explanation: The 2145 UPS has reported that it has a
ambient temperature. Correct the problem. problem with the input AC power.
Otherwise, exchange the FRU for a new FRU. User response:
6. Check node status. If all nodes show a status of 1. Check the input AC power, whether it is missing or
“online”, mark the error that you have just repaired out of specification. Correct if necessary. Otherwise,
“fixed”. If any nodes do not show a status of exchange the FRU for a new FRU.
“online”, go to start MAP. If you return to this step,
contact your support center to resolve the problem 2. Check node status. If all nodes show a status of
with the uninterruptible power supply. “online”, mark the error that you have just repaired
“fixed”. If any nodes do not show a status of
7. Go to repair verification MAP. “online”, go to start MAP. If you return to this step,
contact your support center to resolve the problem
Possible Cause-FRUs or other: with the uninterruptible power supply.
3. Go to repair verification MAP.
2145 UPS electronics unit (50%)
Possible Cause-FRUs or other:
Other:
v 2145 UPS input power cable (10%)
The system ambient temperature is outside the v Electronics assembly (10%)
specification (50%)
Other:
1136 The 2145 UPS-1U has reported an v The input AC power is missing (40%)
ambient over temperature. v The input AC power is not in specification (40%)
Explanation: The 2145 UPS-1U has reported an
ambient over temperature.

Chapter 7. Diagnosing problems 179


1141 • 1150

1141 The 2145 UPS-1U has reported that it 1146 The signal connection between a 2145
has a problem with the input AC power. and its 2145 UPS-1U is failing.
Explanation: The 2145 UPS-1U has reported that it has Explanation: The signal connection between a 2145
a problem with the input AC power. and its 2145 UPS-1U is failing.
User response: User response:
1. Check the input AC power, whether it is missing or 1. Exchange the FRUs for new FRUs in the sequence
out of specification. Correct if necessary. Otherwise, shown.
exchange the FRU for a new FRU. 2. Check node status. If all nodes show a status of
2. Check node status. If all nodes show a status of “online”, mark the error that you have just repaired
“online”, mark the error that you have just repaired as “fixed”. If any nodes do not show a status of
“fixed”. If any nodes do not show a status of “online”, go to start MAP. If you return to this step,
“online”, go to start MAP. If you return to this step, contact your support center to resolve the problem
contact your support center to resolve the problem with the uninterruptible power supply.
with the uninterruptible power supply. 3. Go to repair verification MAP.
3. Go to repair verification MAP.
Possible Cause-FRUs or other:
Possible Cause-FRUs or other:
v 2145 UPS-1U input power cable (10%) 2145-8G4
v 2145 UPS-1U assembly (10%) v Power cable assembly (40%)
v 2145 UPS-1U assembly (30%)
Other: v System board (30%)
v The input AC power is missing (40%)
v The input AC power is not in specification (40%) 2145-8F2 or 2145-8F4
v Power cable assembly (40%)
1145 The signal connection between a 2145 v 2145 UPS-1U assembly (30%)
and its 2145 UPS is failing. v Frame assembly (30%)
Explanation: The signal connection between a 2145
and its 2145 UPS is failing. 1150 Data that the 2145 has received from the
2145 UPS suggests the 2145 UPS power
User response:
cable, the signal cable, or both, are not
1. If other 2145s that are using this uninterruptible connected correctly.
power supply are reporting this error, exchange the
2145 UPS electronics unit for a new one. Explanation: Data that the 2145 has received from the
2145 UPS suggests the 2145 UPS power cable, the
2. If only this 2145 is reporting the problem, check the
signal cable, or both, are not connected correctly.
signal cable, exchange the FRUs for new FRUs in
the sequence shown. User response:
3. Check node status. If all nodes show a status of 1. Connect the cables correctly. See your product's
“online”, mark the error that you have just repaired installation guide.
“fixed”. If any nodes do not show a status of 2. Check node status. If all nodes show a status of
“online”, go to start MAP. If you return to this step, “online”, mark the error that you have just repaired
contact your support center to resolve the problem “fixed”. If any nodes do not show a status of
with the uninterruptible power supply. “online”, go to start MAP. If you return to this step,
4. Go to repair verification MAP. contact your support center to resolve the problem
with the uninterruptible power supply.
Possible Cause-FRUs or other: 3. Go to repair verification MAP.

2145-8F2 or 2145-8F4 or 2145-8G4 Possible Cause-FRUs or other:


v None
N/A
Other:
v Configuration error

180 SAN Volume Controller: Troubleshooting Guide


1151 • 1161

1151 Data that the 2145 has received from the 1160 The output load on the 2145 UPS
2145 UPS-1U suggests the 2145 UPS-1U exceeds the specification.
power cable, the signal cable, or both,
Explanation: The 2145 UPS is reporting that too much
are not connected correctly.
power is being drawn from it. The power overload
Explanation: Data that the 2145 has received from the warning LED, which is above the load level indicators,
2145 UPS-1U suggests the 2145 UPS-1U power cable, on the 2145 UPS will be on.
the signal cable, or both, are not connected correctly.
User response:
User response: 1. Determine the 2145 UPS that is reporting the error
1. Connect the cables correctly. See your product's from the error event data. Perform the following
installation guide. steps on just this uninterruptible power supply.
2. Check node status. If all nodes show a status of 2. Check that the 2145 UPS is still reporting the error.
“online”, mark the error that you have just repaired If the power overload warning LED is no longer on,
“fixed”. If any nodes do not show a status of go to step 6.
“online”, go to start MAP. If you return to this step, 3. Ensure that only 2145s are receiving power from the
contact your support center to resolve the problem uninterruptible power supply. Ensure that there are
with the uninterruptible power supply. no switches or disk controllers that are connected to
3. Go to repair verification MAP. the 2145 UPS.
4. Remove each connected 2145 input power in turn,
Possible Cause-FRUs or other: until the output overload is removed.
v None 5. Exchange the FRUs for new FRUs in the sequence
shown, on the overcurrent 2145.
Other: 6. Check node status. If all nodes show a status of
v Configuration error “online”, mark the error that you have just repaired
“fixed”. If any nodes do not show a status of
“online”, go to start MAP. If you return to this step,
1152 Incorrect type of uninterruptible power
contact your support center to resolve the problem
supply detected.
with the 2145 UPS.
Explanation: The cluster is reporting that a node is 7. Go to repair verification MAP.
not operational because of critical node error 587. See
the details of node error 587 for more information. Possible Cause-FRUs or other:
User response: See node error 587. v Power cable assembly (50%)
v Power supply assembly (40%)
1155 A power domain error has occurred. v 2145 UPS electronics assembly (10%)
Explanation: Both 2145s of a pair are powered by the
same uninterruptible power supply. 1161 The output load on the 2145 UPS-1U
exceeds the specifications (reported by
User response:
2145 UPS-1U alarm bits).
1. List the 2145s of the cluster and check that 2145s in
the same I/O group are connected to a different Explanation: The output load on the 2145 UPS-1U
uninterruptible power supply. exceeds the specifications (reported by 2145 UPS-1U
alarm bits).
2. Connect one of the 2145s as identified in step 1 to a
different uninterruptible power supply. User response:
3. Mark the error that you have just repaired, “fixed”. 1. Ensure that only 2145s are receiving power from the
4. Go to repair verification MAP. uninterruptible power supply. Also, ensure that no
other devices are connected to the 2145 UPS-1U.
Possible Cause-FRUs or other: 2. Exchange, in the sequence shown, the FRUs for new
v None FRUs. If the Overload Indicator is still illuminated
with all outputs disconnected, replace the 2145
Other: UPS-1U.

v Configuration error 3. Check node status. If all nodes show a status of


“online”, mark the error that you have just repaired
“fixed”. If any nodes do not show a status of
“online”, go to start MAP. If you return to this step,
contact your support center to resolve the problem
with the 2145 UPS-1U.

Chapter 7. Diagnosing problems 181


1165 • 1175

4. Go to repair verification MAP.


1170 2145 UPS electronics fault (reported by
the 2145 UPS alarm bits).
Possible Cause-FRUs or other:
v Power cable assembly (50%) Explanation: 2145 UPS electronics fault (reported by
the 2145 UPS alarm bits).
v Power supply assembly (40%)
v 2145 UPS-1U assembly (10%) User response:
1. Replace the uninterruptible power supply
electronics assembly.
1165 The 2145 UPS output load is
unexpectedly high. The 2145 UPS output 2. Check node status. If all nodes show a status of
is possibly connected to an extra “online”, mark the error that you have just repaired
non-2145 load. “fixed”. If any nodes do not show a status of
“online”, go to start MAP. If you return to this step,
Explanation: The 2145 UPS output load is contact your support center to resolve the problem
unexpectedly high. The 2145 UPS output is possibly with the UPS.
connected to an extra non-2145 load. 3. Go to repair verification MAP.
User response:
1. Ensure that only 2145s are receiving power from the Possible Cause-FRUs or other:
uninterruptible power supply. Ensure that there are
no switches or disk controllers that are connected to 2145 UPS electronics assembly (100%)
the 2145 UPS.
2. Check node status. If all nodes show a status of 1171 2145 UPS-1U electronics fault (reported
“online”, the problem no longer exists. Mark the by the 2145 UPS-1U alarm bits).
error that you have just repaired “fixed” and go to
Explanation: 2145 UPS-1U electronics fault (reported
the repair verification MAP.
by the 2145 UPS-1U alarm bits).
3. Go to repair verification MAP.
User response:
Possible Cause-FRUs or other: 1. Replace the uninterruptible power supply assembly.
2. Check node status. If all nodes show a status of
None “online”, mark the error that you have just repaired
“fixed”. If any nodes do not show a status of
Other: “online”, go to start MAP. If you return to this step,
v Configuration error contact your support center to resolve the problem
with the 2145 UPS-1U.
3. Go to repair verification MAP.
1166 The 2145 UPS-1U output load is
unexpectedly high.
Possible Cause-FRUs or other:
Explanation: The uninterruptible power supply output
is possibly connected to an extra non-2145 load. 2145 UPS-1U assembly (100%)
User response:
1. Ensure that there are no other devices that are 1175 A problem has occurred with the
connected to the 2145 UPS-1U. uninterruptible power supply frame
2. Check node status. If all nodes show a status of fault (reported by uninterruptible power
“online”, mark the error that you have just repaired supply alarm bits).
“fixed”. If any nodes do not show a status of Explanation: A problem has occurred with the
“online”, go to start MAP. If you return to this step, uninterruptible power supply frame fault (reported by
contact your support center to resolve the problem the uninterruptible power supply alarm bits).
with the 2145 UPS-1U.
User response:
3. Go to repair verification MAP.
1. Replace the uninterruptible power supply assembly.
Possible Cause-FRUs or other: 2. Check node status. If all nodes show a status of
v 2145 UPS-1U assembly (5%) “online”, mark the error that you have just repaired
“fixed”. If any nodes do not show a status of
“online”, go to start MAP. If you return to this step,
Other:
contact your support center to resolve the problem
v Configuration error (95%) with the uninterruptible power supply.
3. Go to repair verification MAP.

182 SAN Volume Controller: Troubleshooting Guide


1180 • 1187

Possible Cause-FRUs or other: User response: See node error 562.

Uninterruptible power supply assembly (100%)


1185 2145 UPS fault, with no specific FRU
identified (reported by uninterruptible
1180 2145 UPS battery fault (reported by 2145 power supply alarm bits).
UPS alarm bits).
Explanation: 2145 UPS fault, with no specific FRU
Explanation: 2145 UPS battery fault (reported by 2145 identified (reported by 2145 UPS alarm bits).
UPS alarm bits).
User response:
User response: 1. In the sequence shown, exchange the FRU for a
1. Replace the 2145 UPS battery assembly. new FRU.
2. Check node status. If all nodes show a status of 2. Check node status. If all nodes show a status of
“online”, mark the error that you have just repaired “online”, mark the error that you have just repaired
“fixed”. If any nodes do not show a status of “fixed”. If any nodes do not show a status of
“online”, go to start MAP. If you return to this step, “online”, go to start MAP. If you return to this step,
contact your support center to resolve the problem contact your support center to resolve the problem
with the uninterruptible power supply. with the 2145 UPS.
3. Go to repair verification MAP. 3. Go to repair verification MAP.

Possible Cause-FRUs or other: Possible Cause-FRUs or other:


v 2145 UPS electronics assembly (60%)
2145 UPS battery assembly (100%) v 2145 UPS battery assembly (20%)
v 2145 UPS assembly (20%)
1181 2145 UPS-1U battery fault (reported by
2145 UPS-1U alarm bits).
1186 A problem has occurred in the 2145
Explanation: 2145 UPS-1U battery fault (reported by UPS-1U, with no specific FRU identified
2145 UPS-1U alarm bits). (reported by 2145 UPS-1U alarm bits).
User response: Explanation: A problem has occurred in the 2145
1. Replace the 2145 UPS-1U battery assembly. UPS-1U, with no specific FRU identified (reported by
2145 UPS-1U alarm bits).
2. Check node status. If all nodes show a status of
“online”, mark the error that you have just repaired User response:
“fixed”. If any nodes do not show a status of 1. In the sequence shown, exchange the FRU for a
“online”, go to start MAP. If you return to this step, new FRU.
contact your support center to resolve the problem
with the uninterruptible power supply. 2. Check node status. If all nodes show a status of
“online”, mark the error that you have just repaired
3. Go to repair verification MAP. “fixed”. If any nodes do not show a status of
“online”, go to start MAP. If you return to this step,
Possible Cause-FRUs or other: contact your support center to resolve the problem
with the uninterruptible power supply.
2145 UPS-1U battery assembly (100%)
3. Go to repair verification MAP.

1182 Ambient temperature is too high during Possible Cause-FRUs or other:


system startup.
2145 UPS-1U assembly (100%)
Explanation: The cluster is reporting that a node is
not operational because of critical node error 528. See
the details of node error 528 for more information. 1187 Node software is inconsistent or
damaged
User response: See node error 528.
Explanation: The cluster is reporting that a node is
not operational because of critical node errors 523, 573,
1183 The nodes hardware configuration does
574. See the details of node errors 523, 573, 574 for
not meet the minimum requirements.
more information.
Explanation: The cluster is reporting that a node is
User response: See node errors 523, 573, 574.
not operational because of critical node error 562. See
the details of node error 562 for more information.

Chapter 7. Diagnosing problems 183


1188 • 1194

1188 Too many software crashes have 1192 Unexpected node error
occurred.
Explanation: A node is missing from the cluster. The
Explanation: The cluster is reporting that a node is error that it is reporting is not recognized by the
not operational because of critical node error 564. See system.
the details of node error 564 for more information.
User response: Find the node that is in service state
User response: See node error 564. and use the service assistant to determine why it is not
active.
1189 The node is held in the service state.
1193 The UPS battery charge is not enough to
Explanation: The cluster is reporting that a node is
allow the node to start.
not operational because of critical node error 690. See
the details of node error 690 for more information. Explanation: The cluster is reporting that a node is
not operational because of critical node error 587. See
User response: See node error 690.
the details of node error 587 for more information.
User response:
1190 The 2145 UPS battery has reached its
end of life.
1194 Automatic recovery of offline node has
Explanation: The 2145 UPS battery has reached its end
failed.
of life.
Explanation: The cluster has an offline node and has
User response:
determined that one of the candidate nodes matches
1. Replace the 2145 UPS battery assembly. the characteristics of the offline node. The cluster has
2. Check node status. If all nodes show a status of attempted but failed to add the node back into the
“online”, mark the error that you have just repaired cluster. The cluster has stopped attempting to
“fixed”. If any nodes do not show a status of automatically add the node back into the cluster.
“online”, go to start MAP. If you return to this step,
If a node has incomplete state data, it remains offline
contact your support center to resolve the problem
after it starts. This occurs if the node has had a loss of
with the uninterruptible power supply.
power or a hardware failure that prevented it from
3. Go to repair verification MAP. completing the writing of all of the state data to disk.
The node reports a node error 578 when it is in this
Possible Cause-FRUs or other: state.
If three attempts to automatically add a matching
2145 UPS battery assembly (100%)
candidate node to a cluster have been made, but the
node has not returned online for 24 hours, the cluster
1191 The 2145 UPS-1U battery has reached its stops automatic attempts to add the node and logs
end of life. error code 1194 “Automatic recovery of offline node
failed”.
Explanation: The 2145 UPS-1U battery has reached its
end of life. Two possible scenarios when this error event is logged
are:
User response:
1. The node has failed without saving all of its state
1. Replace the 2145 UPS-1U battery assembly.
data. The node has restarted, possibly after a repair,
2. Check node status. If all nodes show a status of and shows node error 578 and is a candidate node
“online”, mark the error that you have just repaired for joining the cluster. The cluster attempts to add
“fixed”. If any nodes do not show a status of the node into the cluster but does not succeed. After
“online”, go to start MAP. If you return to this step, 15 minutes, the cluster makes a second attempt to
contact your support center to resolve the problem add the node into the cluster and again does not
with the uninterruptible power supply. succeed. After another 15 minutes, the cluster
3. Go to repair verification MAP. makes a third attempt to add the node into the
cluster and again does not succeed. After another 15
Possible Cause-FRUs or other: minutes, the cluster logs error code 1194. The node
never came online during the attempts to add it to
2145 UPS-1U battery assembly (100%) the cluster.
2. The node has failed without saving all of its state
data. The node has restarted, possibly after a repair,
and shows node error 578 and is a candidate node
for joining the cluster. The cluster attempts to add
the node into the cluster and succeeds and the node

184 SAN Volume Controller: Troubleshooting Guide


1195 • 1200

becomes online. Within 24 hours the node fails Possible Cause-FRUs or other:
again without saving its state data. The node
restarts and shows node error 578 and is a None, although investigation might indicate a
candidate node for joining the cluster. The cluster hardware failure.
again attempts to add the node into the cluster,
succeeds, and the node becomes online; however,
the node again fails within 24 hours. The cluster 1195 A 2145 is missing from the cluster.
attempts a third time to add the node into the Explanation: You can resolve this problem by
cluster, succeeds, and the node becomes online; repairing the failure on the missing 2145.
however, the node again fails within 24 hours. After
another 15 minutes, the cluster logs error code 1194. User response:
1. If it is not obvious which node in the cluster has
A combination of these scenarios is also possible. failed, check the status of the nodes and find the
2145 with a status of offline.
Note: If the node is manually removed from the cluster, 2. Go to the Start MAP and perform the repair on the
the count of automatic recovery attempts is reset to failing node.
zero. 3. When the repair has been completed, this error is
User response: automatically marked as fixed.
1. If the node has been continuously online in the 4. Check node status. If all nodes show a status of
cluster for more than 24 hours, mark the error as “online”, but the error in the log has not been
fixed and go to the Repair Verification MAP. marked as fixed, manually mark the error that you
have just repaired “fixed”. If any nodes do not
2. Determine the history of events for this node by
show a status of “online”, go to start MAP. If you
locating events for this node name in the event log.
return to this step, contact your support center to
Note that the node ID will change, so match on the
resolve the problem with the 2145.
WWNN and node name. Also, check the service
records. Specifically, note entries indicating one of 5. Go to repair verification MAP.
three events: 1) the node is missing from the cluster
(cluster error 1195 event 009052), 2) an attempt to Possible Cause-FRUs or other:
automatically recover the offline node is starting v None
(event 980352), 3) the node has been added to the
cluster (event 980349).
1200 The configuration is not valid. Too
3. If the node has not been added to the cluster since
many devices, MDisks, or targets have
the recovery process started, there is probably a
been presented to the system.
hardware problem. The node's internal disk might
be failing in a manner that it is unable to modify its Explanation: The configuration is not valid. Too many
software level to match the software level of the devices, MDisks, or targets have been presented to the
cluster. If you have not yet determined the root system.
cause of the problem, you can attempt to manually
User response:
remove the node from the cluster and add the node
back into the cluster. Continuously monitor the 1. Remove unwanted devices from the Fibre Channel
status of the nodes in the cluster while the cluster is network fabric.
attempting to add the node. Note: If the node type 2. Start a cluster discovery operation to find
is not supported by the software version of the devices/disks by rescanning the Fibre Channel
cluster, the node will not appear as a candidate network.
node. Therefore, incompatible hardware is not a 3. List all connected managed disks. Check with the
potential root cause of this error. customer that the configuration is as expected.
4. If the node was added to the cluster but failed Mark the error that you have just repaired fixed.
again before it has been online for 24 hours, 4. Go to repair verification MAP.
investigate the root cause of the failure. If no events
in the event log indicate the reason for the node
Possible Cause-FRUs or other:
failure, collect dumps and contact IBM technical
support for assistance. v None
5. When you have fixed the problem with the node,
Other:
you must use either the cluster console or the
command line interface to manually remove the
node from the cluster and add the node into the Fibre Channel network fabric fault (100%)
cluster.
6. Mark the error as fixed and go to the verification
MAP.

Chapter 7. Diagnosing problems 185


1201 • 1217

2. Check the status of the disk controllers. If all disk


1201 A solid-state drive requires a recovery.
controllers show a “good” status, mark the error
Explanation: The solid-state drive that is identified by that you just repaired as “fixed”.
this error needs to be recovered. 3. Go to repair verification MAP.
User response: To recover this SSD drive, submit the
following command: chdrive -task recover drive_id Possible Cause-FRUs or other:
where drive_id is the identity of the drive that needs to v Fibre Channel cable assembly (75%)
be recovered. v Small Form-factor Pluggable (SFP) connector (10%)
v Fibre Channel adapter (5%)
1202 A solid-state drive is missing from the
configuration. Other:
Explanation: The offline solid-state drive (SSD) v Fibre Channel network fabric fault (10%)
identified by this error must be repaired.
User response: In the management GUI, click 1215 A solid-state drive is failing.
Troubleshooting > Recommended Actions to run the
Explanation: The solid-state drive has detected faults
recommended action for this error. Otherwise, use MAP
that indicate that the drive is likely to fail soon. The
6000 to replace the drive.
drive should be replaced. The cluster event log will
identify a drive ID for the solid-state drive that caused
1203 A duplicate Fibre Channel frame has the error.
been received.
User response: In the management GUI, click
Explanation: A duplicate Fibre Channel frame should Troubleshooting > Recommended Actions to run the
never be detected. Receiving a duplicate Fibre Channel recommended action for this error. If this does not
frame indicates that there is a problem with the Fibre resolve the issue, contact your next level of support.
Channel fabric. Other errors related to the Fibre
Channel fabric might be generated.
1216 SAS errors have exceeded thresholds.
User response:
Explanation: The cluster has experienced a large
1. Use the transmitting and receiving WWPNs number of SAS communication errors, which indicates
indicated in the error data to determine the section a faulty SAS component that must be replaced.
of the Fibre Channel fabric that has generated the
duplicate frame. Search for the cause of the problem User response: In the sequence shown, exchange the
by using fabric monitoring tools. The duplicate FRUs for new FRUs.
frame might be caused by a design error in the Go to the repair verification MAP.
topology of the fabric, by a configuration error, or
by a software or hardware fault in one of the Possible Cause-FRUs or other:
components of the Fibre Channel fabric, including 1. SAS Cable (70%)
inter-switch links. 2. High speed SAS adapter (20%)
2. When you are satisfied that the problem has been 3. SAS drive backplane (5%)
corrected, mark the error that you have just
repaired “fixed”. 4. solid-state drive (5%)
3. Go to MAP 5700: Repair verification.
1217 A solid-state drive has exceeded the
Possible Cause-FRUs or other: temperature warning threshold.
v Fibre Channel cable assembly (1%) Explanation: The solid-state drive identified by this
v Fibre Channel adapter (1%) error has reported that its temperature is higher than
the warning threshold.
Other: User response: Take steps to reduce the temperature
v Fibre Channel network fabric fault (98%) of the drive.
1. Determine the temperature of the room, and reduce
1210 A local Fibre Channel port has been the room temperature if this action is appropriate.
excluded. 2. Replace any failed fans.
Explanation: A local Fibre Channel port has been 3. Ensure that there are no obstructions to air flow for
excluded. the node.
4. Mark the error as fixed. If the error recurs, contact
User response:
hardware support for further investigation.
1. Repair faults in the order shown.

186 SAN Volume Controller: Troubleshooting Guide


1220 • 1310

Possible Cause-FRUs or other: from this node to a number of different controllers or


v Solid-state drive (10%) clusters, then it is probably the node to switch link that
is causing the errors. Unless there are other contrary
Other: indications, first replace the cable between the switch
and the remote system.
v System environment or airflow blockage (90%)
1. From the fabric analysis, determine the FRU that is
most likely causing the error. If this FRU has
1220 A remote Fibre Channel port has been recently been replaced while resolving a 1230 error,
excluded. choose the next most likely FRU that has not been
replaced recently. Exchange the FRU for a new FRU.
Explanation: A remote Fibre Channel port has been
excluded. 2. Mark the error as fixed. If the FRU replacement has
not fixed the problem, the error will be logged
User response: again; however, depending on the severity of the
1. View the event log. Note the MDisk ID associated problem, the error might not be logged again
with the error code. immediately.
2. From the MDisk, determine the failing disk 3. Start a cluster discovery operation to recover the
controller ID. login by re-scanning the Fibre Channel network.
3. Refer to the service documentation for the disk 4. Check the status of the disk controller or remote
controller and the Fibre Channel network to resolve cluster. If the status is not “good”, go to the Start
the reported problem. MAP.
4. After the disk drive is repaired, start a cluster 5. Go to repair verification MAP.
discovery operation to recover the excluded Fibre
Channel port by rescanning the Fibre Channel Possible Cause-FRUs or other:
network. v Fibre Channel cable, switch to remote port, (30%)
5. To restore MDisk online status, include the v Switch or remote device SFP connector or adapter,
managed disk that you noted in step 1. (30%)
6. Check the status of the disk controller. If all disk v Fibre Channel cable, local port to switch, (30%)
controllers show a “good” status, mark the error
that you have just repaired, “fixed”. v Cluster SFP connector, (9%)

7. If all disk controllers do not show a good status, v Cluster Fibre Channel adapter, (1%)
contact your support center to resolve the problem
with the disk controller. Note: The first two FRUs are not cluster FRUs.
8. Go to repair verification MAP.
1310 A managed disk is reporting excessive
Possible Cause-FRUs or other: errors.
v None Explanation: A managed disk is reporting excessive
errors.
Other:
User response:
v Enclosure/controller fault (50%)
1. Repair the enclosure/controller fault.
v Fibre Channel network fabric (50%)
2. Check the managed disk status. If all managed
disks show a status of “online”, mark the error that
1230 A login has been excluded. you have just repaired as “fixed”. If any managed
disks show a status of “excluded”, include the
Explanation: A port to port fabric connection, or login,
excluded managed disks and then mark the error as
between the cluster node and either a controller or
“fixed”.
another cluster has had excessive errors. The login has
therefore been excluded, and will not be used for I/O 3. Go to repair verification MAP.
operations.
Possible Cause-FRUs or other:
User response: Determine the remote system, which
might be either a controller or a SAN Volume v None
Controller cluster. Check the event log for other 1230
errors. Ensure that all higher priority errors are fixed. Other:

This error event is usually caused by a fabric problem. Enclosure/controller fault (100%)
If possible, use the fabric switch or other fabric
diagnostic tools to determine which link or port is
reporting the errors. If there are error events for links

Chapter 7. Diagnosing problems 187


1311 • 1330

Other:
1311 A solid-state drive is offline due to
excessive errors.
Enclosure/controller fault (100%)
Explanation: The drive that is reporting excessive
errors has been taken offline.
1330 A suitable managed disk (MDisk) or
User response: In the management GUI, click drive for use as a quorum disk was not
Troubleshooting > Recommended Actions to run the found.
recommended action for this error. If this does not
resolve the issue, contact your next level of support. Explanation: A quorum disk is needed to enable a
tie-break when some cluster members are missing.
Three quorum disks are usually defined. By default, the
1320 A disk I/O medium error has occurred. cluster automatically allocates quorum disks when
managed disks are created; however, the option exists
Explanation: A disk I/O medium error has occurred.
to manually assign quorum disks. This error is reported
User response: when there are managed disks or image mode disks
1. Check whether the volume the error is reported but no quorum disks.
against is mirrored. If it is, check if there is a “1870 To become a quorum disk:
Mirrored volume offline because a hardware read
v The MDisk must be accessible by all nodes in the
error has occurred” error relating to this volume in
cluster.
the event log. Also check if one of the mirror copies
is synchronizing. If all these tests are true then you v The MDisk must be managed; that is, it must be a
must delete the volume copy that is not member of a storage pool.
synchronized from the volume. Check that the v The MDisk must have free extents.
volume is online before continuing with the v The MDisk must be associated with a controller that
following actions. Wait until the medium error is is enabled for quorum support. If the controller has
corrected before trying to re-create the volume multiple WWNNs, all of the controller components
mirror. must be enabled for quorum support.
2. If the medium error was detected by a read from a
host, ask the customer to rewrite the incorrect data A quorum disk might not be available because of a
to the block logical block address (LBA) that is Fibre Channel network failure or because of a Fibre
reported in the host systems SCSI sense data. If an Channel switch zoning problem.
individual block cannot be recovered it will be
necessary to restore the volume from backup. (If User response:
this error has occurred during a migration, the host 1. Resolve any known Fibre Channel network
system does not notice the error until the target problems.
device is accessed.) 2. Ask the customer to confirm that MDisks have been
3. If the medium error was detected during a mirrored added to storage pools and that those MDisks have
volume synchronization, the block might not be free extents and are on a controller that is enabled
being used for host data. The medium error must for use as a provider of quorum disks. Ensure that
still be corrected before the mirror can be any controller with multiple WWNNs has all of its
established. It may be possible to fix the block that components enabled to provide quorum disks.
is in error using the disk controller or host tools. Either create a suitable MDisk or if possible enable
Otherwise, it will be necessary to use the host tools quorum support on controllers with which existing
to copy the volume content that is being used to a MDisks are associated. If at least one managed disk
new volume. Depending on the circumstances, this shows a mode of managed and has a non-zero
new volume can be kept and mirrored, or the quorum index, mark the error that you have just
original volume can be repaired and the data copied repaired as “fixed”.
back again. 3. If the customer is unable to make the appropriate
4. Check managed disk status. If all managed disks changes, ask your software support center for
show a status of “online”, mark the error that you assistance.
have just repaired as “fixed”. If any managed disks 4. Go to repair verification MAP.
do not show a status of “online”, go to start MAP. If
you return to this step, contact your support center Possible Cause-FRUs or other:
to resolve the problem with the disk controller.
v None
5. Go to repair verification MAP.
Other:
Possible Cause-FRUs or other:
v None Configuration error (100%)

188 SAN Volume Controller: Troubleshooting Guide


1335 • 1370

User response:
1335 Quorum disk not available.
1. View the event log entry to determine the node that
Explanation: Quorum disk not available. logged the problem. Determine the 2145 node or
User response: controller that the problem was logged against.
1. View the event log entry to identify the managed 2. Perform Fibre Channel switch problem
disk (MDisk) being used as a quorum disk, that is determination and repair procedures for the
no longer available. switches connected to the 2145 node or controller.
2. Perform the disk controller problem determination 3. Perform Fibre Channel cabling problem
and repair procedures for the MDisk identified in determination and repair procedures for the cables
step 1. connected to the 2145 node or controller.
3. Include the MDisks into the cluster. 4. If any problems are found and resolved in step 2
and 3, mark this error as “fixed”.
4. Check the managed disk status. If the managed disk
identified in step 1 shows a status of “online”, mark 5. If no switch or cable failures were found in steps 2
the error that you have just repaired as “fixed”. If and 3, take an event log dump. Call your hardware
the managed disk does not show a status of support center.
“online”, go to start MAP. If you return to this step, 6. Go to repair verification MAP.
contact your support center to resolve the problem
with the disk controller. Possible Cause-FRUs or other:
5. Go to repair verification MAP. v None

Possible Cause-FRUs or other: Other:


v None v Fibre Channel switch
v Fibre Channel cabling
Other:

Enclosure/controller fault (100%) 1370 A managed disk error recovery


procedure (ERP) has occurred.

1340 A managed disk has timed out. Explanation: This error was reported because a large
number of disk error recovery procedures have been
Explanation: This error was reported because a large performed by the disk controller. The problem is
number of disk timeout conditions have been detected. probably caused by a failure of some other component
The problem is probably caused by a failure of some on the SAN.
other component on the SAN.
User response:
User response:
1. View the event log entry and determine the
1. Repair problems on all enclosures/controllers and managed disk that was being accessed when the
switches on the same SAN as this 2145 cluster. problem was detected.
2. If problems are found, mark this error as “fixed”. 2. Perform the disk controller problem determination
3. If no switch or disk controller failures can be found, and repair procedures for the MDisk determined in
take an event log dump and call your hardware step 1.
support center. 3. Perform problem determination and repair
4. Go to repair verification MAP. procedures for the fibre channel switches connected
to the 2145 and any other Fibre Channel network
Possible Cause-FRUs or other: components.
v None 4. If any problems are found and resolved in steps 2
and 3, mark this error as “fixed”.
Other: 5. If no switch or disk controller failures were found
v Enclosure/controller fault in steps 2 and 3, take an event log dump. Call your
hardware support center.
v Fibre Channel switch
6. Go to repair verification MAP.

1360 A SAN transport error occurred. Possible Cause-FRUs or other:


Explanation: This error has been reported because the v None
2145 performed error recovery procedures in response
to SAN component associated transport errors. The Other:
problem is probably caused by a failure of a component v Enclosure/controller fault
of the SAN.

Chapter 7. Diagnosing problems 189


1400 • 1600

v Fibre Channel switch cluster, although multiple component controller


definitions are created, the cluster recognizes that all of
the component controllers belong to the same storage
1400 The 2145 cannot detect an Ethernet
system. To enable the creation of a quorum disk on this
connection.
storage system, all of the controller components must
Explanation: The 2145 cannot detect an Ethernet be configured to allow quorum.
connection.
A configuration change to the SAN, or to a storage
User response: system with multiple WWNNs, might result in the
1. Go to the Ethernet MAP. cluster discovering new component controllers for the
storage system. These components will take the default
2. Go to the repair verification MAP. setting for allowing quorum. This error is reported if
there is a quorum disk associated with the controller
Possible Cause-FRUs or other: and the default setting is not to allow quorum.

2145-8G4, 2145-8A4, 2145-CF8, or 2145-CG8 User response:


v Ethernet cable (25%) v Determine if there should be a quorum disk on this
storage system. Ensure that the controller supports
v System board (25%)
quorum before you allow quorum disks on any disk
controller. You can check the support website
2145-8F2 or 2145-8F4 www.ibm.com/storage/support/2145 for more
v Ethernet cable (25%) information.
v Frame assembly (25%) v If a quorum disk is required on this storage system,
allow quorum on the controller component that is
Other: reported in the error. If the quorum disk should not
v Ethernet cable is disconnected or damaged (25%) be on this storage system, move it elsewhere.
v Ethernet hub fault (25%) v Mark the error as “fixed”.

Possible Cause-FRUs or other:


1550 A cluster path has failed.
v None
Explanation: One of the 2145 Fibre Channel ports is
unable to communicate with all the other 2145s in the Other:
cluster.
User response: Fibre Channel network fabric fault (100%)

1. Check for incorrect switch zoning.


2. Repair the fault in the Fibre Channel network 1600 Mirrored disk repair halted because of
fabric. difference.

3. Check the status of the node ports. If the status of Explanation: During the repair of a mirrored volume
the node ports shows as active, mark the error that two copy disks were found to contain different data for
you have just repaired as “fixed”. If any node ports the same logical block address (LBA). The validate
do not show a status of active, go to start MAP. If option was used, so the repair process has halted.
you return to this step contact your support center
Read operations to the LBAs that differ might return
to resolve the problem with the 2145.
the data of either volume copy. Therefore it is
4. Go to repair verification MAP. important not to use the volume unless you are sure
that the host applications will not read the LBAs that
Possible Cause-FRUs or other: differ or can manage the different data that potentially
v None can be returned.
User response: Perform one of the following actions:
Other:
v Continue the repair starting with the next LBA after
the difference to see how many differences there are
Fibre Channel network fabric fault (100%)
for the whole mirrored volume. This can help you
decide which of the following actions to take.
1570 Quorum disk configured on controller v Choose a primary disk and run repair
that has quorum disabled resynchronizing differences.
Explanation: This error can occur with a storage v Run a repair and create medium errors for
controller that can be accessed through multiple differences.
WWNNs and have a default setting of not allowing v Restore all or part of the volume from a backup.
quorum disks. When these controllers are detected by a

190 SAN Volume Controller: Troubleshooting Guide


1610 • 1620

v Decide which disk has correct data, then delete the expected to be only for blocks that have never been
copy that is different and re-create it allowing it to be written, it is important to clear the virtual medium
synchronized. errors to avoid inhibition of other operations. To
recover the data for all of these virtual medium
Then mark the error as “fixed”. errors it is likely that the volume will have to be
recovered from a backup using a process that
Possible Cause-FRUs or other: rewrites all sectors of the volume.
v None 2. If the virtual medium errors have been created by a
copy operation, it is best practice to correct any
medium errors on the source volume and to not
1610 There are too many copied media errors propagate the medium errors to copies of the
on a managed disk. volume. Fixing higher priority errors in the event
Explanation: The cluster maintains a virtual medium log would have corrected the medium error on the
error table for each MDisk. This table is a list of logical source volume. Once the medium errors have been
block addresses on the managed disk that contain data fixed, you must run the copy operation again to
that is not valid and cannot be read. The virtual clear the virtual medium errors from the target
medium error table has a fixed length. This error event volume. It might be necessary to repeat a sequence
indicates that the system has attempted to add an entry of copy operations if copies have been made of
to the table, but the attempt has failed because the table already copied medium errors.
is already full.
An alternative that does not address the root cause is to
There are two circumstances that will cause an entry to delete volumes on the target managed disk that have
be added to the virtual medium error table: the virtual medium errors. This volume deletion
1. FlashCopy, data migration and mirrored volume reduces the number of virtual medium error entries in
synchronization operations copy data from one the MDisk table. Migrating the volume to a different
managed disk extent to another. If the source extent managed disk will also delete entries in the MDisk
contains either a virtual medium error or the RAID table, but will create more entries on the MDisk table of
controller reports a real medium error, the system the MDisk to which the volume is migrated.
creates a matching virtual medium error on the
target extent. Possible Cause-FRUs or other:
2. The mirrored volume validate and repair process v None
has the option to create virtual medium errors on
sectors that do not match on all volume copies.
Normally zero, or very few, differences are 1620 A storage pool is offline.
expected; however, if the copies have been marked Explanation: A storage pool is offline.
as synchronized inappropriately, then a large
number of virtual medium errors could be created. User response:
1. Repair the faults in the order shown.
User response: Ensure that all higher priority errors
are fixed before you attempt to resolve this error. 2. Start a cluster discovery operation by rescanning the
Fibre Channel network.
Determine whether the excessive number of virtual
3. Check managed disk (MDisk) status. If all MDisks
medium errors occurred because of a mirrored disk
show a status of “online”, mark the error that you
validate and repair operation that created errors for
have just repaired as “fixed”. If any MDisks do not
differences, or whether the errors were created because
show a status of “online”, go to start MAP. If you
of a copy operation. Follow the corresponding option
return to this step, contact your support center to
shown below.
resolve the problem with the disk controller.
1. If the virtual medium errors occurred because of a
4. Go to repair verification MAP.
mirrored disk validate and repair operation that
created medium errors for differences, then also
Possible Cause-FRUs or other:
ensure that the volume copies had been fully
synchronized prior to starting the operation. If the v None
copies had been synchronized, there should be only
a few virtual medium errors created by the validate Other:
and repair operation. In this case, it might be v Fibre Channel network fabric fault (50%)
possible to rewrite only the data that was not
v Enclosure/controller fault (50%)
consistent on the copies using the local data
recovery process. If the copies had not been
synchronized, it is likely that there are now a large
number of medium errors on all of the volume
copies. Even if the virtual medium errors are

Chapter 7. Diagnosing problems 191


1623 • 1627

unsupported because it is known to not work with the


1623 One or more MDisks on a controller are
cluster.
degraded.
User response:
Explanation: At least one MDisk on a controller is
degraded because the MDisk is not available through 1. Using the IBM DS series console, ensure that the
one or more nodes. The MDisk is available through at host type is set to 'IBM TS SAN VCE' and that the
least one node. Access to data might be lost if another AVT option is enabled. (The AVT and RDAC
failure occurs. options are mutually exclusive).
2. Mark the error that you have just repaired as
In a correctly configured system, each node accesses all
“fixed”. If the problem has not been fixed it will be
of the MDisks on a controller through all of the
logged again; this could take a few minutes.
controller's ports.
3. Go to repair verification MAP.
This error is only logged once per controller. There
might be more than one MDisk on this controller that Possible Cause-FRUs or other:
has been configured incorrectly, but the error is only
logged for one MDisk. v None

To prevent this error from being logged because of Other:


short-term fabric maintenance activities, this error
v Enclosure/controller fault
condition must have existed for one hour before the
error is logged.
1625 Incorrect disk controller configuration.
User response:
1. Determine which MDisks are degraded. Look for Explanation: While running an MDisk discovery, the
MDisks with a path count lower than the number of cluster has detected that a disk controller's
nodes. Do not use only the MDisk status, since configuration is not supported by the cluster. The disk
other errors can also cause degraded MDisks. controller might appear to be operating with the
cluster; however, the configuration detected can
2. Ensure that the controller is zoned correctly with all
potentially cause issues and should not be used. The
of the nodes.
unsupported configuration is shown in the event data.
3. Ensure that the logical unit is mapped to all of the
nodes. User response:
4. Ensure that the logical unit is mapped to all of the 1. Use the event data to determine changes required
nodes using the same LUN. on the disk controller and reconfigure the disk
controller to use a supported configuration.
5. Run the console or CLI command to discover
MDisks and ensure that the command completes. 2. Mark the error that you have just repaired as
“fixed”. If the problem has not been fixed it will be
6. Mark the error that you have just repaired as
logged again by the managed disk discovery that
“fixed”. When you mark the error as “fixed”, the
automatically runs at this time; this could take a
controller's MDisk availability is tested and the
few minutes.
error will be logged again immediately if the error
persists for any MDisks. It is possible that the new 3. Go to repair verification MAP.
error will report a different MDisk.
Possible Cause-FRUs or other:
7. Go to repair verification MAP.
v None
Possible Cause-FRUs or other:
Other:
v None
v Enclosure/controller fault
Other:
v Fibre Channel network fabric fault (50%) 1627 The cluster has insufficient redundancy
v Enclosure/controller fault (50%) in its controller connectivity.
Explanation: The cluster has detected that it does not
1624 Controller configuration has have sufficient redundancy in its connections to the
unsupported RDAC mode. disk controllers. This means that another failure in the
SAN could result in loss of access to the application
Explanation: The cluster has detected that an IBM DS data. The cluster SAN environment should have
series disk controller's configuration is not supported redundant connections to every disk controller. This
by the cluster. The disk controller is operating in RDAC redundancy allows for continued operation when there
mode. The disk controller might appear to be operating is a failure in one of the SAN components.
with the cluster; however, the configuration is

192 SAN Volume Controller: Troubleshooting Guide


1627

To provide recommended redundancy, a cluster should v A node has detected that it is only connected to
be configured so that: exactly one target port on a disk controller, and more
v each node can access each disk controller through than one target port connection is expected.
two or more different initiator ports on the node. v The error data indicates the WWPN of the disk
v each node can access each disk controller through controller port that is connected.
two or more different controller target ports. Note: v A zoning issue or a Fibre Channel connection
Some disk controllers only provide a single target hardware fault might cause this condition.
port.
v each node can access each disk controller target port 010042 Only a single port on a disk controller is
through at least one initiator port on the node. accessible from every node in the cluster.
v Only a single port on a disk controller is accessible to
If there are no higher-priority errors being reported, every node when there are multiple ports on the
this error usually indicates a problem with the SAN controller that could be connected.
design, a problem with the SAN zoning or a problem v The error data indicates the WWPN of the disk
with the disk controller. controller port that is connected.
v A zoning issue or a Fibre Channel connection
If there are unfixed higher-priority errors that relate to
hardware fault might cause this condition.
the SAN or to disk controllers, those errors should be
fixed before resolving this error because they might
010043 A disk controller is accessible through only half,
indicate the reason for the lack of redundancy. Error
or less, of the previously configured controller ports.
codes that must be fixed first are:
v Although there might still be multiple ports that are
v 1210 Local FC port excluded
accessible on the disk controller, a hardware
v 1230 Login has been excluded component of the controller might have failed or one
of the SAN fabrics has failed such that the
Note: This error can be reported if the required action, operational system configuration has been reduced to
to rescan the Fibre Channel network for new MDisks, a single point of failure.
has not been performed after a deliberate
v The error data indicates a port on the disk controller
reconfiguration of a disk controller or after SAN
that is still connected, and also lists controller ports
rezoning.
that are expected but that are not connected.

The 1627 error code is reported for a number of v A disk controller issue, switch hardware issue,
different error IDs. The error ID indicates the area zoning issue or cable fault might cause this
where there is a lack of redundancy. The data reported condition.
in an event log entry indicates where the condition was
found. 010044 A disk controller is not accessible from a node.
v A node has detected that it has no access to a disk
The meaning of the error IDs is shown below. For each controller. The controller is still accessible from the
error ID the most likely reason for the condition is partner node in the I/O group, so its data is still
given. If the problem is not found in the suggested accessible to the host applications.
areas, check the configuration and state of all of the v The error data indicates the WWPN of the missing
SAN components (switches, controllers, disks, cables disk controller.
and cluster) to determine where there is a single point
v A zoning issue or a cabling error might cause this
of failure.
condition.

010040 A disk controller is only accessible from a single User response:


node port. 1. Check the error ID and data for a more detailed
v A node has detected that it only has a connection to description of the error.
the disk controller through exactly one initiator port, 2. Determine if there has been an intentional change
and more than one initiator port is operational. to the SAN zoning or to a disk controller
v The error data indicates the device WWNN and the configuration that reduces the cluster's access to
WWPN of the connected port. the indicated disk controller. If either action has
v A zoning issue or a Fibre Channel connection occurred, continue with step 8.
hardware fault might cause this condition. 3. Use the GUI or the CLI command lsfabric to
ensure that all disk controller WWPNs are
010041 A disk controller is only accessible from a single reported as expected.
port on the controller. 4. Ensure that all disk controller WWPNs are zoned
appropriately for use by the cluster.

Chapter 7. Diagnosing problems 193


1630 • 1670

5. Check for any unfixed errors on the disk that you have just repaired as “fixed”. If any disk
controllers. controllers do not show “good” status, go to start
6. Ensure that all of the Fibre Channel cables are MAP. If you return to this step, contact the support
connected to the correct ports at each end. center to resolve the problem with the disk
controller.
7. Check for failures in the Fibre Channel cables and
connectors. 7. Go to repair verification MAP.
8. When you have resolved the issues, use the GUI
Possible Cause-FRUs or other:
or the CLI command detectmdisk to rescan the
Fibre Channel network for changes to the MDisks. v None
Note: Do not attempt to detect MDisks unless you
are sure that all problems have been fixed. Other:
Detecting MDisks prematurely might mask an v Fibre Channel network fabric fault (50%)
issue.
v Enclosure/controller fault (50%)
9. Mark the error that you have just repaired as
fixed. The cluster will revalidate the redundancy
and will report another error if there is still not 1660 The initialization of the managed disk
sufficient redundancy. has failed.
10. Go to MAP 5700: Repair verification. Explanation: The initialization of the managed disk
has failed.
Possible Cause-FRUs or other:
User response:
v None
1. View the event log entry to identify the managed
disk (MDisk) that was being accessed when the
1630 The number of device logins was problem was detected.
reduced. 2. Perform the disk controller problem determination
Explanation: The number of port to port fabric and repair procedures for the MDisk identified in
connections, or logins, between the node and a storage step 1.
controller has decreased. This might be caused by a 3. Include the MDisk into the cluster.
problem on the SAN or by a deliberate reconfiguration 4. Check the managed disk status. If all managed
of the SAN. disks show a status of “online”, mark the error that
User response: you have just repaired as “fixed”. If any managed
disks do not show a status of “online”, go to the
1. Check the error in the cluster event log to identify start MAP. If you return to this step, contact your
the object ID associated with the error. support center to resolve the problem with the disk
2. Check the availability of the failing device using the controller.
following command line: lscontroller object_ID. 5. Go to repair verification MAP.
If the command fails with the message
“CMMVC6014E The command failed because the
Possible Cause-FRUs or other:
requested object is either unavailable or does not
exist,” ask the customer if this device was removed v None
from the system.
Other:
v If “yes”, mark the error as fixed in the cluster
event log and continue with the repair
verification MAP. Enclosure/controller fault (100%)
v If “no” or if the command lists details of the
failing controller, continue with the next step. 1670 The CMOS battery on the 2145 system
3. Check whether the device has regained connectivity. board failed.
If it has not, check the cable connection to the Explanation: The CMOS battery on the 2145 system
remote-device port. board failed.
4. If all attempts to log in to a remote-device port have
User response:
failed and you cannot solve the problem by
changing cables, check the condition of the 1. Replace the CMOS battery.
remote-device port and the condition of the remote 2. Mark the error that you have just repaired as
device. “fixed”.
5. Start a cluster discovery operation by rescanning the 3. Go to repair verification MAP.
Fibre Channel network.
6. Check the status of the disk controller. If all disk Possible Cause-FRUs or other:
controllers show a “good” status, mark the error

194 SAN Volume Controller: Troubleshooting Guide


1695 • 1710

CMOS battery (100%) Metro Mirror or Global Mirror relationship from


either the master or auxiliary cluster; however, you
must re-create the relationship on the master cluster.
1695 Persistent unsupported disk controller
Therefore, it might be necessary to go to another
configuration.
cluster to complete this service action.
Explanation: A disk controller configuration that
might prevent failover for the cluster has persisted for Possible Cause-FRUs or other:
more than four hours. The problem was originally v None
logged through a 010032 event, service error code 1625.
User response: 1710 There are too many cluster partnerships.
1. Fix any higher priority error. In particular, follow The number of cluster partnerships has
the service actions to fix the 1625 error indicated by been reduced.
this error's root event. This error will be marked as
Explanation: A cluster can have a Metro Mirror and
“fixed” when the root event is marked as “fixed”.
Global Mirror cluster partnership with one or more
2. If the root event cannot be found, or is marked as other clusters. Partnership sets consist of clusters that
“fixed”, perform an MDisk discovery and mark this are either in direct partnership with each other or are
error as “fixed”. in indirect partnership by having a partnership with
3. Go to repair verification MAP. the same intermediate cluster. The topology of the
partnership set is not fixed; the topology might be a
Possible Cause-FRUs or other: star, a loop, a chain or a mesh. The maximum
v None supported number of clusters in a partnership set is
four. A cluster is a member of a partnership set if it has
a partnership with another cluster in the set, regardless
Other:
of whether that partnership has any defined
v Enclosure/controller fault consistency groups or relationships.
The following are examples of valid partnership sets
1700 Unrecovered Metro Mirror or Global for five unique clusters labelled A, B, C, D, and E
Mirror relationship where a partnership is indicated by a dash between
Explanation: This error might be reported after the two cluster names:
recovery action for a cluster failure or a complete I/O v A-B, A-C, A-D. E has no partnerships defined and
group failure. The error is reported because some therefore is not a member of the set.
Metro Mirror or Global Mirror relationships, whose v A-B, A-D, B-C, C-D. E has no partnerships defined
control data is stored by the I/O group, were active at and therefore is not a member of the set.
the time of the failure and the current state of the
v A-B, B-C, C-D. E has no partnerships defined and
relationship could not be recovered.
therefore is not a member of the set.
User response: To fix this error it is necessary to v A-B, A-C, A-D, B-C, B-D, C-D. E has no partnerships
delete all of the relationships that could not be defined and therefore is not a member of the set.
recovered and then re-create the relationships.
v A-B, A-C, B-C. D-E. There are two partnership sets.
1. Note the I/O group index against which the error is One contains clusters A, B, and C. The other contains
logged. clusters D and E.
2. List all of the Metro Mirror and Global Mirror
relationships that have either a master or an The following are examples of unsupported
auxiliary volume in this I/O group. Use the volume configurations because the number of clusters in the set
view to determine which volumes in the I/O group is five, which exceeds the supported maximum of four
you noted have a relationship defined. clusters:
3. Note the details of the Metro Mirror and Global v A-B, A-C, A-D, A-E.
Mirror relationships that are listed so that they can v A-B, A-D, B-C, C-D, C-E.
be re-created.
v A-B, B-C, C-D, D-E.
4. Delete all of the Metro Mirror and Global Mirror
relationships that are listed. Note: The error will
The cluster prevents you from creating a new Metro
automatically be marked as “fixed” once the last
Mirror and Global Mirror cluster partnership if a
relationship on the I/O group is deleted. New
resulting partnership set would exceed the maximum
relationships should not be created until the error is
of four clusters. However, if you restore a broken link
fixed.
between two clusters that have a partnership, the
5. Using the details noted in step 3, re-create all of the number of clusters in the set might exceed four. If this
Metro Mirror and Global Mirror relationships that occurs, Metro Mirror and Global Mirror cluster
you just deleted. Note: You are able to delete a partnerships are excluded from the set until only four

Chapter 7. Diagnosing problems 195


1720 • 1860

clusters remain in the set. A cluster partnership that is 3. Go to repair verification MAP.
excluded from a set has all of its Metro Mirror and
Global Mirror cluster partnerships excluded. Possible Cause-FRUs or other:
v None
Event ID 0x050030 is reported if the cluster is retained
in the partnership set. Event ID 0x050031 is reported if
the cluster is excluded from the partnership set. All 1800 The SAN has been zoned incorrectly.
clusters that were in the partnership set report error Explanation: This has resulted in more than 512 other
1710. ports on the SAN logging into one port of a 2145 node.

All inter-cluster Metro Mirror or Global Mirror User response:


relationships that involve an excluded cluster will lose 1. Ask the user to reconfigure the SAN.
connectivity. If any of these relationships are in the 2. Mark the error as “fixed”.
consistent_synchronized state and they receive a write
I/O, they will stop with error code 1720. 3. Go to repair verification MAP.

User response: To fix this error it is necessary to Possible Cause-FRUs or other:


delete all of the relationships that could not be
v None
recovered and then re-create the relationships.
1. Determine which clusters are still connected and Other:
members of the partnership set, and which clusters
v Fibre Channel switch configuration error
have been excluded.
v Fibre Channel switch
2. Determine the Metro Mirror and Global Mirror
relationships that exist on those clusters.
3. Determine which of the Metro Mirror and Global 1850 A cluster recovery operation was
Mirror relationships you want to maintain, which performed but data on one or more
determines which cluster partnerships you want to volumes has not been recovered.
maintain. Ensure that the partnership set or sets Explanation: A cluster recovery operation was
that would result from configuring the cluster performed but data on one or more volumes has not
partnerships that you want contain no more than been recovered.
four clusters in each set. NOTE: The reduced
partnership set created by the cluster might not User response:
contain the clusters that you want in the set. 1. The support center will direct the user to restore the
4. Remove all of the Metro Mirror and Global Mirror data on the affected volumes.
relationships that you do not want to retain. 2. When the volume data has been restored or the
5. Remove all of the Metro Mirror and Global Mirror user has chosen not to restore the data, mark the
cluster partnerships that you do not want to retain. error as “fixed”.
6. Restart all relationships and consistency groups that 3. Go to repair verification MAP.
were stopped.
7. Go to repair verification MAP. Possible Cause-FRUs or other:
v None
Possible Cause-FRUs or other:
v None 1860 Thin-provisioned volume copy offline
because of failed repair.
1720 In a Metro Mirror or Global Mirror Explanation: The attempt to repair the metadata of a
operation, the relationship has stopped thin-provisioned volume that describes the disk
and lost synchronization, for a reason contents has failed because of problems with the
other than a persistent I/O error. automatically maintained backup copy of this data. The
Explanation: In a Metro Mirror or Global Mirror error event data describes the problem.
operation, the relationship has stopped and lost User response: Delete the thin-provisioned volume
synchronization, for a reason other than a persistent and reconstruct a new one from a backup or mirror
I/O error. copy. Mark the error as “fixed”. Also mark the original
User response: 1862 error as “fixed”.
1. Restart the relationship after fixing errors of higher Possible Cause-FRUs or other:
priority. v None
2. Mark the error that you have just repaired as
“fixed”.

196 SAN Volume Controller: Troubleshooting Guide


1862 • 1895

v Provide some free capacity in the storage pool by


1862 Thin-provisioned volume copy offline
reducing the used space. Volume copies that are no
because of corrupt metadata.
longer required can be deleted, the size of volume
Explanation: A thin-provisioned volume has been copies can be reduced or volume copies can be
taken offline because there is an inconsistency in the migrated to a different storage pool.
cluster metadata that describes the disk contents. This v Migrate the thin-provisioned volume copy to a
might occur because of corruption of data on the storage pool that has sufficient unused capacity.
physical disk (e.g., medium error or data miscompare),
v Consider reducing the value of the storage pool
the loss of cached metadata (because of a cluster
warning threshold to give more time to allocate extra
recovery) or because of a software error. The event data
space.
gives information on the reason.
The cluster maintains backup copies of the metadata If the volume copy is not auto-expand enabled,
and it might be possible to repair the thin-provisioned perform one or more of the following actions. In this
volume using this data. case the error will automatically be marked as “fixed”,
and the volume copy will return online when space is
User response: The cluster is able to repair the
available.
inconsistency in some circumstances. Run the repair
volume option to start the repair process. This repair v Determine why the thin-provisioned volume copy
process, however, can take some time. In some used space has grown at the rate that it has. There
situations it might be more appropriate to delete the might be an application error.
thin-provisioned volume and reconstruct a new one v Increase the real capacity of the volume copy.
from a backup or mirror copy. v Enable auto-expand for the thin-provisioned volume
If you run the repair procedure and it completes, this copy.
error is automatically marked as “fixed”; otherwise, v Consider reducing the value of the thin-provisioned
another error event (error code 1860) is logged to volume copy warning threshold to give more time to
indicate that the repair action has failed. allocate more real space.
Possible Cause-FRUs or other:
Possible Cause-FRUs or other:
v None
v None

1865 Thin-provisioned volume copy offline


because of insufficient space. 1870 Mirrored volume offline because a
hardware read error has occurred.
Explanation: A thin-provisioned volume has been
taken offline because there is insufficient allocated real Explanation: While attempting to maintain the volume
capacity available on the volume for the used space to mirror, a hardware read error occurred on all of the
increase further. If the thin-provisioned volume is synchronized volume copies.
auto-expand enabled, then the storage pool it is in also The volume copies might be inconsistent, so the
has no free space. volume is now offline.
User response: The service action differs depending User response:
on whether the thin-provisioned volume copy is
v Fix all higher priority errors. In particular, fix any
auto-expand enabled or not. Whether the disk is
read errors that are listed in the sense data. This
auto-expand enabled or not is indicated in the error
error event will automatically be fixed when the root
event data.
event is marked as “fixed”.
If the volume copy is auto-expand enabled, perform v If you cannot fix the root error, but the read errors
one or more of the following actions. When you have on some of the volume copies have been fixed, mark
performed all of the actions that you intend to perform, this error as “fixed” to run without the mirror. You
mark the error as “fixed”; the volume copy will then can then delete the volume copy that cannot read
return online. data and re-create it on different MDisks.
v Determine why the storage pool free space has been
depleted. Any of the thin-provisioned volume copies, Possible Cause-FRUs or other:
with auto-expand enabled, in this storage pool might v None
have expanded at an unexpected rate; this could
indicate an application error. New volume copies
might have been created in, or migrated to, the 1895 Unrecovered FlashCopy mappings
storage pool. Explanation: This error might be reported after the
v Increase the capacity of the storage pool that is recovery action for a cluster failure or a complete I/O
associated with the thin-provisioned volume copy by group failure. The error is reported because some
adding more MDisks to the group. FlashCopies, whose control data is stored by the I/O

Chapter 7. Diagnosing problems 197


1900 • 1920

group, were active at the time of the failure and the 1. Correct higher priority errors, and then prepare and
current state of the mapping could not be recovered. start the FlashCopy task again.
User response: To fix this error it is necessary to 2. Mark the error that you have just repaired as
delete all of the FlashCopy mappings on the I/O group “fixed”.
that failed. 3. Go to repair verification MAP.
1. Note the I/O group index against which the error is
logged. Possible Cause-FRUs or other:
2. List all of the FlashCopy mappings that are using v None
this I/O group for their bitmaps. You should get the
detailed view of every possible FlashCopy ID. Note 1920 Global and Metro Mirror persistent
the IDs of the mappings whose IO_group_id error.
matches the ID of the I/O group against which this
error is logged. Explanation: This error might be caused by a problem
3. Note the details of the FlashCopy mappings that are on the primary cluster, a problem on the secondary
listed so that they can be re-created. cluster, or a problem on the inter-cluster link. The
problem might be a failure of a component, a
4. Delete all of the FlashCopy mappings that are component becoming unavailable or having reduced
listed. Note: The error will automatically be marked performance because of a service action or it might be
as “fixed” once the last mapping on the I/O group that the performance of a component has dropped to a
is deleted. New mappings cannot be created until level where the Metro Mirror or Global Mirror
the error is fixed. relationship cannot be maintained. Alternatively the
5. Using the details noted in step 3, re-create all of the error might be caused by a change in the performance
FlashCopy mappings that you just deleted. requirements of the applications using Metro Mirror or
Global Mirror.
Possible Cause-FRUs or other:
This error is reported on the primary cluster when the
v None copy relationship has not progressed sufficiently over a
period of time. Therefore, if the relationship is restarted
1900 A FlashCopy, Trigger Prepare command before all of the problems are fixed, the error might be
has failed because a cache flush has reported again when the time period next expires (the
failed. default period is five minutes).

Explanation: A FlashCopy, Trigger Prepare command This error might also be reported because the primary
has failed because a cache flush has failed. cluster has encountered read errors.

User response: You might need to refer to the Copy Services features
information in the software installation and
1. Correct higher priority errors, and then try the
configuration documentation while diagnosing this
Trigger Prepare command again.
error.
2. Mark the error that you have just repaired as
“fixed”. User response:
3. Go to repair verification MAP. 1. If the 1920 error has occurred previously on Metro
Mirror or Global Mirror between the same clusters
Possible Cause-FRUs or other: and all the following actions have been attempted,
contact your product support center to resolve the
v None problem.
2. On the primary cluster reporting the error, correct
Other:
any higher priority errors.
Cache flush error (100%) 3. On the secondary cluster, review the maintenance
logs to determine if the cluster was operating with
reduced capability at the time the error was
1910 A FlashCopy mapping task was stopped reported. The reduced capability might be because
because of the error that is indicated in of a software upgrade, hardware maintenance to a
the sense data. 2145 node, maintenance to a backend disk system
Explanation: A stopped FlashCopy might affect the or maintenance to the SAN.
status of other volumes in the same I/O group. 4. On the secondary 2145 cluster, correct any errors
Preparing the stopped FlashCopy operations as soon as that are not fixed.
possible is advised. 5. On the intercluster link, review the logs of each
User response: link component for any incidents that would cause
reduced capability at the time of the error. Ensure
the problems are fixed.

198 SAN Volume Controller: Troubleshooting Guide


1930 • 1950

6. If a reason for the error has been found and Other:


corrected, go to Action 10. v Primary 2145 cluster or SAN fabric problem (10%)
7. On the primary cluster reporting the error, v Primary 2145 cluster or SAN fabric configuration
examine the 2145 statistics using a SAN (10%)
productivity monitoring tool and confirm that all
v Secondary 2145 cluster or SAN fabric problem (15%)
the Metro Mirror and Global Mirror requirements
described in the planning documentation are met. v Secondary 2145 cluster or SAN fabric configuration
Ensure that any changes to the applications using (25%)
Metro Mirror or Global Mirror have been taken v Intercluster link problem (15%)
into account. Resolve any issues. v Intercluster link configuration (25%)
8. On the secondary cluster, examine the 2145
statistics using a SAN productivity monitoring tool
and confirm that all the Metro Mirror and Global 1930 Migration suspended.
Mirror requirements described in the software Explanation: Migration suspended.
installation and configuration documentation are
met. Resolve any issues. User response:
9. On the intercluster link, examine the performance 1. Ensure that all error codes of a higher priority have
of each component using an appropriate SAN already been fixed.
productivity monitoring tool to ensure that they 2. Ask the customer to ensure that all storage pools
are operating as expected. Resolve any issues. that are the destination of suspended migrate
10. Mark the error as “fixed” and restart the Metro operations have available free extents.
Mirror or Global Mirror relationship. 3. Mark this error as “fixed”. This causes the migrate
operation to be restarted. If the restart fails, a new
When you restart the Metro Mirror or Global Mirror error is logged.
relationship there will be an initial period during which 4. Go to repair verification MAP.
Metro Mirror or Global Mirror performs a background
copy to resynchronize the volume data on the primary Possible Cause-FRUs or other:
and secondary clusters. During this period the data on
v None
the Metro Mirror or Global Mirror auxiliary volumes
on the secondary cluster is inconsistent and the
volumes could not be used as backup disks by your 1950 Unable to mirror medium error.
applications.
Explanation: During the synchronization of a mirrored
volume copy it was necessary to duplicate the record of
Note: To ensure the system has the capacity to handle
a medium error onto the volume copy, creating a
the background copy load you may want to delay
virtual medium error. Each managed disk has a table of
restarting the Metro Mirror or Global Mirror
virtual medium errors. The virtual medium error could
relationship until there is a quiet period when the
not be created because the table is full. The volume
secondary cluster and the SAN fabric (including the
copy is in an inconsistent state and has been taken
intercluster link) have the required capacity. If the
offline.
required capacity is not available you might experience
another 1920 error and the Metro Mirror or Global User response: Three different approaches can be
Mirror relationship will stop in an inconsistent state. taken to resolving this problem: 1) the source volume
copy can be fixed so that it does not contain medium
Note: If the Metro Mirror or Global Mirror relationship errors, 2) the number of virtual medium errors on the
has stopped in a consistent state (“consistent-stopped”) target managed disk can be reduced or 3) the target
it is possible to use the data on the Metro Mirror or volume copy can be moved to a managed disk with
Global Mirror auxiliary volumes on the secondary more free virtual medium error entries.
cluster as backup disks by your applications. You might
The managed disk with a full medium error table can
therefore want to start a Flash Copy of your Metro
be determined from the data of the root event.
Mirror or Global Mirror auxiliary disks on the
secondary system before restarting the Metro Mirror or Approach 1) - This is the preferred procedure because
Global Mirror relationship. This means you maintain it restores the source volume copy to a state where all
the current, consistent, image until the time when the of the data can be read. Use the normal service
Metro Mirror or Global Mirror relationship is again procedures for fixing a medium error (rewrite block or
synchronized and in a consistent state. volume from backup or regenerate the data using local
procedures).
Possible Cause-FRUs or other:
Approach 2) - This method can be used if the majority
v None of the virtual medium errors on the target managed
disk do not relate to the volume copy. Determine where

Chapter 7. Diagnosing problems 199


2008 • 2100

the virtual medium errors are using the event log 3. Ensure that the software is at the latest level on the
events and re-write the block or volume from backup. cluster and on the disk systems.
Approach 3) - Delete the offline volume copy and 4. Use the available SAN monitoring tools to check for
create a new one either forcing the use of different any problems on the fabric.
MDisks in the storage pool or using a completely 5. Mark the error that you have just repaired as
different storage pool. “fixed”.
Follow your selection option(s) and then mark the error 6. Go to repair verification Map.
as “fixed”.
Possible Cause-FRUs or other:
Possible Cause-FRUs or other:
v Your support center might indicate a FRU based on
v None their problem analysis (2%)

2008 A software downgrade has failed. Other:


v 2145 software (48%)
Explanation: Cluster configuration changes are
restricted until the downgrade is completed. The cluster v Enclosure/controller software (25%)
downgrade process waits for user intervention when v Fibre Channel switch or switch configuration (25%)
this error is logged.
User response: The action required to recover from a 2040 A software upgrade is required.
stalled downgrade depends on the current state of the
cluster being downgraded. Call IBM Support for an Explanation: The software cannot determine the VPD
action plan to resolve this problem. for a FRU. Probably, a new FRU has been installed and
the software does not recognize that FRU.
Possible Cause-FRUs or other:
User response:
v None
1. If a FRU has been replaced, ensure that the correct
replacement part was used. The node VPD indicates
Other:
which part is not recognized.
2145 software (100%) 2. Ensure that the cluster software is at the latest level.
3. Save dump data with configuration dump and
logged data dump.
2010 A software upgrade has failed.
4. Contact your product support center to resolve the
Explanation: Cluster configuration changes are problem.
restricted until the upgrade is completed or rolled back.
5. Mark the error that you have just repaired as
The cluster upgrade process waits for user intervention
“fixed”.
when this error is logged.
6. Go to repair verification MAP.
User response: The action required to recover from a
stalled upgrade depends on the current state of the Possible Cause-FRUs or other:
cluster being upgraded. Call IBM technical support for
v None
an action plan to resolve this problem.
Possible Cause-FRUs or other: Other:
v None
2145 software (100%)
Other:
2100 A software error has occurred.
2145 software (100%)
Explanation: One of the 2145 server software
components (sshd, crond, or httpd) has failed and
2030 Software error. reported an error.
Explanation: The 2145 software has restarted because User response:
of a problem in the cluster, on a disk system or on the
1. Ensure that the software is at the latest level on the
Fibre Channel fabric.
cluster.
User response: 2. Save dump data with configuration dump and
1. Collect the software dump file(s) generated at the logged data dump.
time the error was logged on the cluster. 3. Contact your product support center to resolve the
2. Contact your product support center to investigate problem.
and resolve the problem.

200 SAN Volume Controller: Troubleshooting Guide


2500 • 2700

4. Mark the error that you have just repaired as v Send a test email and validate that the change has
“fixed”. corrected the issue.
5. Go to repair verification MAP. v Mark the error that you have just repaired as fixed.
v Go to MAP 5700: Repair verification.
Possible Cause-FRUs or other:
v None Possible Cause-FRUs or other:
v None
Other:

2145 software (100%) 2601 Error detected while sending an email.


Explanation: An error has occured while the cluster
2500 A secure shell (SSH) session limit for was attempting to send an email in response to an
the cluster has been reached. event. The cluster is unable to determine if the email
has been sent and will attempt to resend it. The
Explanation: Secure Shell (SSH) sessions are used by problem might be with the SMTP server or with the
applications that manage the cluster. An example of cluster email configuration. The problem might also be
such an application is the command-line interface caused by a failover of the configuration node. This
(CLI). An application must initially log in to the cluster error is not logged by the test email function because it
to create an SSH session. The cluster imposes a limit on responds immediately with a result code.
the number of SSH sessions that can be open at one
time. This error indicates that the limit on the number User response:
of SSH sessions has been reached and that no more v If there are higher-priority unfixed errors in the log,
logins can be accepted until a current session logs out. fix those errors first.
The limit on the number of SSH sessions is usually v Ensure that the SMTP email server is active.
reached because multiple users have opened an SSH v Ensure that the SMTP server TCP/IP address and
session but have forgotten to close the SSH session port are correctly configured in the cluster email
when they are no longer using the application. configuration.
User response: v Send a test email and validate that the change has
corrected the issue.
v Because this error indicates a problem with the
number of sessions that are attempting external v Mark the error that you have just repaired as fixed.
access to the cluster, determine the reason that so v Go to MAP 5700: Repair verification.
many SSH sessions have been opened.
v Run the Fix Procedure for this error on the panel at Possible Cause-FRUs or other:
Management GUI Troubleshooting > v None
Recommended Actions to view and manage the
open SSH sessions.
2700 Unable to access NTP network time
server
2600 The cluster was unable to send an
Explanation: Cluster time cannot be synchronized
email.
with the NTP network time server that is configured.
Explanation: The cluster has attempted to send an
User response: There are three main causes to
email in response to an event, but there was no
examine:
acknowledgement that it was successfully received by
the SMTP mail server. It might have failed because the v The cluster NTP network time server configuration is
cluster was unable to connect to the configured SMTP incorrect. Ensure that the configured IP address
server, the email might have been rejected by the matches that of the NTP network time server.
server, or a timeout might have occurred. The SMTP v The NTP network time server is not operational.
server might not be running or might not be correctly Check the status of the NTP network time server.
configured, or the cluster might not be correctly v The TCP/IP network is not configured correctly.
configured. This error is not logged by the test email Check the configuration of the routers, gateways and
function because it responds immediately with a result firewalls. Ensure that the cluster can access the NTP
code. network time server and that the NTP protocol is
User response: permitted.
v Ensure that the SMTP email server is active.
The error will automatically fix when the cluster is able
v Ensure that the SMTP server TCP/IP address and to synchronize its time with the NTP network time
port are correctly configured in the cluster email server.
configuration.

Chapter 7. Diagnosing problems 201


3000 • 3025

Possible Cause-FRUs or other: node has determined that the uninterruptible power
v None supply is functioning sufficiently for the node to
continue operations. The operation of the cluster is not
affected by this error. This error is usually resolved by
3000 The 2145 UPS temperature is close to its power cycling the uninterruptible power supply.
upper limit. If the temperature
continues to rise the 2145 UPS will User response:
power off. 1. Power cycle the uninterruptible power supply at a
convenient time. The one or two nodes attached to
Explanation: The temperature sensor in the 2145 UPS
the uninterruptible power supply should be
is reporting a temperature that is close to the
powered off before powering off the uninterruptible
operational limit of the unit. If the temperature
power supply. Once the nodes have powered down,
continues to rise the 2145 UPS will power off for safety
wait 5 minutes for the uninterruptible power supply
reasons. The sensor is probably reporting an excessively
to go into standby mode (flashing green AC LED).
high temperature because the environment in which
If this does not happen automatically then check the
the 2145 UPS is operating is too hot.
cabling to confirm that all nodes powered by this
User response: uninterruptible power supply have been powered
1. Ensure that the room ambient temperature is within off. Remove the power input cable from the
the permitted limits. uninterruptible power supply and wait at least 2
minutes for the uninterruptible power supply to
2. Ensure that the air vents at the front and back of clear its internal state. Reconnect the uninterruptible
the 2145 UPS are not obstructed. power supply power input cable. Press the
3. Ensure that other devices in the same rack are not uninterruptible power supply ON button. Power on
overheating. the nodes connected to this uninterruptible power
4. When you are satisfied that the cause of the supply.
overheating has been resolved, mark the error 2. If the error is reported again after the nodes are
“fixed”. restarted replace the 2145 UPS electronics assembly.

3001 The 2145 UPS-1U temperature is close to Possible Cause-FRUs or other:


its upper limit. If the temperature v 2145 UPS electronics assembly (5%)
continues to rise the 2145 UPS-1U will
power off. Other:
Explanation: The temperature sensor in the 2145 v Transient 2145 UPS error (95%)
UPS-1U is reporting a temperature that is close to the
operational limit of the unit. If the temperature 3025 A virtualization feature license is
continues to rise the 2145 UPS-1U will power off for required.
safety reasons. The sensor is probably reporting an
excessively high temperature because the environment Explanation: The cluster has no virtualization feature
in which the 2145 UPS-1U is operating is too hot. license registered. You should have either an Entry
Edition Physical Disk virtualization feature license or a
User response: Capacity virtualization feature license that covers the
1. Ensure that the room ambient temperature is within cluster.
the permitted limits.
The cluster will continue to operate, but it might be
2. Ensure that the air vents at the front and back of violating the license conditions.
the 2145 UPS-1U are not obstructed.
User response:
3. Ensure that other devices in the same rack are not
overheating. v If you do not have a virtualization feature license
that is valid and sufficient for this cluster, contact
4. When you are satisfied that the cause of the
your IBM sales representative, arrange a license and
overheating has been resolved, mark the error
change the license settings for the cluster to register
“fixed”.
the license.
v The error will automatically fix when the situation is
3010 Internal uninterruptible power supply resolved.
software error detected.
Explanation: Some of the tests that are performed Possible Cause-FRUs or other:
during node startup did not complete because some of v None
the data reported by the uninterruptible power supply
during node startup is inconsistent because of a
software error in the uninterruptible power supply. The

202 SAN Volume Controller: Troubleshooting Guide


3029 • 3032

your IBM sales representative if you want to change


3029 Virtualization feature capacity is not
the licensed Global and Metro Mirror capacity.
valid.
v The error will automatically be fixed when a valid
Explanation: The setting for the amount of space that configuration is entered.
can be virtualized is not valid. The value must be an
integer number of terabytes. Possible Cause-FRUs or other:
This error event is created when a cluster is upgraded v None
from a version prior to 4.3.0 to version 4.3.0 or later.
Prior to version 4.3.0 the virtualization feature capacity
value was in gigabytes and therefore could be set to a 3031 FlashCopy feature capacity not set.
fraction of a terabyte. With version 4.3.0 and later the Explanation: The FlashCopy feature is set to On for
licensed capacity for the virtualization feature must be the cluster, but the capacity has not been set.
an integer number of terabytes.
This error event is created when a cluster is upgraded
User response: from a version prior to 4.3.0 to version 4.3.0 or later.
v Review the license conditions for the virtualization Prior to version 4.3.0 the feature can only be set to On
feature. If you have one cluster, change the license or Off; with version 4.3.0 and later the licensed capacity
settings for the cluster to match the capacity that is for the feature must also be set.
licensed. If your license covers more than one cluster,
User response: Perform one of the following actions:
apportion an integer number of terabytes to each
cluster. You might have to change the virtualization v Change the FlashCopy license settings for the cluster
capacity that is set on the other clusters to ensure either to the licensed FlashCopy capacity, or if the
that the sum of the capacities for all of the clusters license applies to more than one cluster, to the
does not exceed the licensed capacity. portion of the license allocated to this cluster. Set the
licensed FlashCopy capacity to zero if it is no longer
v You can view the event data or the feature log to
being used.
ensure that the licensed capacity is sufficient for the
space that is actually being used. Contact your IBM v View the event data or the feature log to ensure that
sales representative if you want to change the the licensed FlashCopy capacity is sufficient for the
capacity of the license. space actually being used. Contact your IBM sales
representative if you want to change the licensed
v This error will automatically be fixed when a valid
FlashCopy capacity.
configuration is entered.
v The error will automatically be fixed when a valid
Possible Cause-FRUs or other: configuration is entered.
v None
Possible Cause-FRUs or other:
v None
3030 Global and Metro Mirror feature
capacity not set.
3032 Feature license limit exceeded.
Explanation: The Global and Metro Mirror feature is
set to On for the cluster, but the capacity has not been Explanation: The amount of space that is licensed for
set. a cluster feature is being exceeded.

This error event is created when a cluster is upgraded The feature that is being exceeded might be:
from a version prior to 4.3.0 to version 4.3.0 or later. v Virtualization feature - event identifier 009172
Prior to version 4.3.0 the feature can only be set to On v FlashCopy feature - event identifier 009173
or Off; with version 4.3.0 and later the licensed capacity
v Global and Metro Mirror feature - event identifier
for the feature must also be set.
009174
User response: Perform one of the following actions:
v Change the Global and Metro Mirror license settings The cluster will continue to operate, but it might be
for the cluster either to the licensed Global and violating the license conditions.
Metro Mirror capacity, or if the license applies to User response:
more than one cluster, to the portion of the license
allocated to this cluster. Set the licensed Global and v Determine which feature license limit has been
Metro Mirror capacity to zero if it is no longer being exceeded. This might be:
used. v Virtualization feature - event identifier 009172
v View the event data or the feature log to ensure that v FlashCopy feature - event identifier 009173
the licensed Global and Metro Mirror capacity is v Global and Metro Mirror feature - event identifier
sufficient for the space actually being used. Contact 009174

Chapter 7. Diagnosing problems 203


3035 • 3080

v Ensure that the feature capacity that is reported by v If you do not want to use the FlashCopy feature, you
the cluster has been set to match either the licensed must delete all of the FlashCopy mappings.
size, or if the license applies to more than one v The error will automatically fix when the situation is
cluster, to the portion of the license that is allocated resolved.
to this cluster.
v Decide whether to increase the feature capacity or to Possible Cause-FRUs or other:
reduce the space that is being used by this feature. v None
v To increase the feature capacity, contact your IBM
sales representative and arrange an increased license
capacity. Change the license settings for the cluster to 3036 Physical Disk Global and Metro Mirror
set the new licensed capacity. Alternatively, if the feature license required
license applies to more than one cluster modify how Explanation: The Entry Edition cluster has some
the licensed capacity is apportioned between the Global Mirror or Metro Mirror relationships defined.
clusters. Update every cluster so that the sum of the There is, however, no Physical Disk Global and Metro
license capacity for all of the clusters does not exceed Mirror license registered on the cluster. The cluster will
the licensed capacity for the location. continue to operate, but it might be violating the
v To reduce the amount of disk space that is license conditions.
virtualized, delete some of the managed disks or
User response:
image mode volumes. The used virtualization size is
the sum of the capacities of all of the managed disks v Check if you have an Entry Edition Physical Disk
and image mode disks. Global and Metro Mirror license for this cluster that
you have not registered on the cluster. Update the
v To reduce the FlashCopy capacity delete some
cluster license configuration if you have a license.
FlashCopy mappings. The used FlashCopy size is the
sum of all of the volumes that are the source volume v Decide whether you want to continue to use the
of a FlashCopy mapping. Global Mirror or Metro Mirror features or not.
v To reduce Global and Metro Mirror capacity delete v If you want to use either the Global Mirror or Metro
some Global Mirror or Metro Mirror relationships. Mirror feature contact your IBM sales representative,
The used Global and Metro Mirror size is the sum of arrange a license and change the license settings for
the capacities of all of the volumes that are in a the cluster to register the license.
Metro Mirror or Global Mirror relationship; both v If you do not want to use both the Global Mirror and
master and auxiliary volumes are counted. Metro Mirror features, you must delete all of the
v The error will automatically be fixed when the Global Mirror and Metro Mirror relationships.
licensed capacity is greater than the capacity that is v The error will automatically fix when the situation is
being used. resolved.

Possible Cause-FRUs or other: Possible Cause-FRUs or other:


v None v None

3035 Physical Disk FlashCopy feature license 3080 Global or Metro Mirror relationship or
required consistency group with deleted
partnership
Explanation: The Entry Edition cluster has some
FlashCopy mappings defined. There is, however, no Explanation: A Global Mirror or Metro Mirror
Physical Disk FlashCopy license registered on the relationship or consistency group exists with a cluster
cluster. The cluster will continue to operate, but it whose partnership is deleted.
might be violating the license conditions.
Beginning with SAN Volume Controller version 4.3.1
User response: this configuration is not supported and should be
v Check whether you have an Entry Edition Physical resolved. This condition can occur as a result of an
Disk FlashCopy license for this cluster that you have upgrade to SAN Volume Controller version 4.3.1 or
not registered on the cluster. Update the cluster later.
license configuration if you have a license. User response: The issue can be resolved either by
v Decide whether you want to continue to use the deleting all of the Global Mirror or Metro Mirror
FlashCopy feature or not. relationships or consistency groups that exist with a
v If you want to use the FlashCopy feature contact cluster whose partnership is deleted, or by recreating
your IBM sales representative, arrange a license and all of the partnerships that they were using.
change the license settings for the cluster to register The error will automatically fix when the situation is
the license. resolved.

204 SAN Volume Controller: Troubleshooting Guide


3081

v List all of the Global Mirror and Metro Mirror determine which Global Mirror or Metro Mirror
relationships and note those where the master cluster relationships or consistency groups are still causing
name or the auxiliary cluster name is blank. For each the issue.
of these relationships, also note the cluster ID of the
remote cluster. Possible Cause-FRUs or other:
v List all of the Global Mirror and Metro Mirror v None
consistency groups and note those where the master
cluster name or the auxiliary cluster name is blank.
For each of these consistency groups, also note the 3081 Unable to send email to any of the
cluster ID of the remote cluster. configured email servers.
v Determine how many unique remote cluster IDs Explanation: Either the system was not able to
there are among all of the Global Mirror and Metro connect to any of the SMTP email servers, or the email
Mirror relationships and consistency groups that you transmission has failed. A maximum of six email
have identified in the first two steps. For each of servers can be configured. Error event 2600 or 2601 is
these remote clusters, decide if you want to raised when an individual email server is found to be
re-establish the partnership with that cluster. Ensure not working. This error indicates that all of the email
that the total number of partnerships that you want servers were found to be not working.
to have with remote clusters does not exceed the
User response:
cluster limit. In version 4.3.1 this limit is 1. If you
re-establish a partnership, you will not have to delete v Check the event log for all unresolved 2600 and 2601
the Global Mirror and Metro Mirror relationships errors and fix those problems.
and consistency groups that use the partnership. v If this error has not already been automatically
v Re-establish any selected partnerships. marked fixed, mark this error as fixed.
v Delete all of the Global Mirror and Metro Mirror v Perform the check email function to test that an
relationships and consistency groups that you listed email server is operating properly.
in either of the first two steps whose remote cluster
partnership has not been re-established. Possible Cause-FRUs or other:
v Check that the error has been marked as fixed by the v None
system. If it has not, return to the first step and

SAN problem determination


The procedures that are provided here help you solve problems on the SAN
Volume Controller system and its connection to the storage area network (SAN).

About this task

SAN failures might cause SAN Volume Controller drives to be inaccessible to host
systems. Failures can be caused by SAN configuration changes or by hardware
failures in SAN components.

The following list identifies some of the hardware that might cause failures:
v Power, fan, or cooling switch
v Application-specific integrated circuits
v Installed small form-factor pluggable (SFP) transceiver
v Fiber-optic cables

Perform the following steps if you were sent here from either the maintenance
analysis procedures or the error codes:

Procedure
1. If the customer has changed the SAN configuration by changing the Fibre
Channel cable connections or switch zoning, ask the customer to verify that the
changes were correct and, if necessary, reverse those changes.

Chapter 7. Diagnosing problems 205


2. Verify that the power is turned on to all switches and storage controllers that
the SAN Volume Controller system uses, and that they are not reporting any
hardware failures. If problems are found, resolve those problems before
proceeding further.
3. Verify that the Fibre Channel cables that connect the systems to the switches
are securely connected.
4. If the customer is running a SAN management tool that you are familiar with
and that you have access to, you can use that tool to view the SAN topology
and isolate the failing component.

| Fibre Channel and 10G Ethernet link failures


| This procedure is applicable to Fibre Channel over Ethernet personality enabled
| 10G Ethernet link. When a failure occurs on a single Fibre Channel or 10G Ethernet
| link, the small form-factor pluggable (SFP) transceiver might need to be replaced.

Before you begin

| The following items can indicate that a single Fibre Channel or 10G Ethernet link
| has failed:
v The customer's SAN monitoring tools
v The Fibre Channel port status on the front panel of the node
v The Fibre Channel status LEDs at the rear of the node
| v An error that indicates that a single port has failed (703, 723)

Attempt each of the following actions, in the following order, until the failure is
fixed:
| 1. Ensure that the Fibre Channel or 10G Ethernet cable is securely connected at
| each end.
| 2. Replace the Fibre Channel or 10G Ethernet cable.
3. Replace the SFP transceiver for the failing port on the SAN Volume Controller
node.

Note: SAN Volume Controller nodes are supported with both longwave SFP
transceivers and shortwave SFP transceivers. You must replace an SFP
transceiver with the same type of SFP transceiver. If the SFP transceiver to
replace is a longwave SFP transceiver, for example, you must provide a suitable
replacement. Removing the wrong SFP transceiver could result in loss of data
access.
| 4. Perform the Fibre Channel switch or FCF service procedures for a failing Fibre
| Channel or 10G Ethernet with Fibre Channel over Ethernet personality enabled
| link. This might involve replacing the SFP transceiver at the switch.
| 5. Replace the Fibre Channel adapter or Fibre Channel over Ethernet adapter on
| the node.

Ethernet iSCSI host-link problems


If you are having problems attaching to the Ethernet hosts, your problem might be
related to the network, the SAN Volume Controller system, or the host.

206 SAN Volume Controller: Troubleshooting Guide


Before you begin

For network problems, you can attempt any of the following actions:
v Test your connectivity between the host and SAN Volume Controller ports.
v Try to ping the SAN Volume Controller system from the host.
v Ask the Ethernet network administrator to check the firewall and router settings.
v Check that the subnet mask and gateway are correct for the SAN Volume
Controller host configuration.

For SAN Volume Controller problems, you can attempt any of the following
actions:
v View the configured node port IP addresses by using the lsportip CLI
command.
v View the list of volumes that are mapped to a host by using the lshostvdiskmap
command to ensure that the volume host mappings are correct.
v Verify that the volume is online by using the lsvdisk command.

For host problems, you can attempt any of the following actions:
v Verify that the host iSCSI qualified name (IQN) is correctly configured.
v Use operating system utilities (such as Windows device manager) to verify that
the device driver is installed, loaded, and operating correctly.

Fibre Channel over Ethernet host-link problems


Problems attaching to the Fibre Channel over Ethernet hosts might be related to
the network, the SAN Volume Controller system, or the host.

Before you begin

If error code 705 on node is displayed, this means the FC I/O port is inactive.
Fibre Channel over Ethernet uses Fibre Channel as a protocol and Ethernet as an
inter-connect.

Note: Concerning a Fibre Channel over Ethernet enabled port: either the fibre
channel forwarder (FCF) is not seen, or the Fibre Channel over Ethernet feature is
not configured on switch.
v Verify that the Fibre Channel over Ethernet feature is enabled on the FCF.
v Verify the remote port (switch port) properties on the FCF.

If connecting the host through a Converged Enhanced Ethernet (CEE) Switch:


v Test your connectivity between the host and CEE Switch.
v Ask the Ethernet network administrator to check the firewall and router settings.

Run lsfabric, and verify the host is seen as a remote port in the output. If the
host is not seen, in order:
v Verify that SAN Volume Controller and host get an Fibre Channel ID (FCID) on
the FCF. If unable to verify, check the VLAN configuration.
v Verify that SAN Volume Controller and host port are part of a zone and that
zone is currently in force.
v Verify the volumes are mapped to host and are online. For more information, see
lshostvdiskmap and lsvdisk in the description in the SAN Volume Controller
Information Center.

Chapter 7. Diagnosing problems 207


What to do next

If the problem is not resolved, verify the state of the host adapter.
v Unload and load the device driver
v Use the operating system utilities (for example, Windows Device Manager) to
verify the device driver is installed, loaded, and operating correctly.

Servicing storage systems


Storage systems that are supported for attachment to the SAN Volume Controller
system are designed with redundant components and access paths to enable
concurrent maintenance. Hosts have continuous access to their data during
component failure and replacement.

The following guidelines apply to all storage systems that are attached to the SAN
Volume Controller system:
v Always follow the service instructions that are provided in the documentation
for your storage system.
v Ensure that there are no unfixed errors in the event log before you perform any
service procedures.
v After you perform a service procedure, check the event log and fix any errors.
Expect to see the following types of errors:
– MDisk error recovery procedures (ERPs)
– Reduced paths

The following categories represent the types of service actions for storage systems:
v Controller code upgrade
v Field replaceable unit (FRU) replacement

Controller code upgrade

Ensure that you are familiar with the following guidelines for upgrading controller
code:
v Check to see if the SAN Volume Controller supports concurrent maintenance for
your storage system.
v Allow the storage system to coordinate the entire upgrade process.
v If it is not possible to allow the storage system to coordinate the entire upgrade
process, perform the following steps:
1. Reduce the storage system workload by 50%.
2. Use the configuration tools for the storage system to manually failover all
logical units (LUs) from the controller that you want to upgrade.
3. Upgrade the controller code.
4. Restart the controller.
5. Manually failback the LUs to their original controller.
6. Repeat for all controllers.

FRU replacement

Ensure that you are familiar with the following guidelines for replacing FRUs:
v If the component that you want to replace is directly in the host-side data path
(for example, cable, Fibre Channel port, or controller), disable the external data

208 SAN Volume Controller: Troubleshooting Guide


paths to prepare for upgrade. To disable external data paths, disconnect or
disable the appropriate ports on the fabric switch. The SAN Volume Controller
ERPs reroute access over the alternate path.
v If the component that you want to replace is in the internal data path (for
example, cache, or drive) and did not completely fail, ensure that the data is
backed up before you attempt to replace the component.
v If the component that you want to replace is not in the data path, for example,
uninterruptible power supply units, fans, or batteries, the component is
generally dual-redundant and can be replaced without additional steps.

Chapter 7. Diagnosing problems 209


210 SAN Volume Controller: Troubleshooting Guide
Chapter 8. Recovery procedures
This topic describes these recovery procedures: recover a system and back up and
restore a system configuration. This topic also contains information about
performing the node rescue.

Recover system procedure


The recover system procedure recovers the entire system if the data has been lost
from all nodes. The procedure re-creates the storage system by using saved
configuration data. The recovery might not be able to restore all volume data. This
procedure is also known as Tier 3 (T3) recovery.

Attention: Perform service actions only when directed by the fix procedures. If
used inappropriately, service actions can cause loss of access to data or even data
loss. Before attempting to recover a storage system, investigate the cause of the
failure and attempt to resolve those issues by using other fix procedures. Read and
understand all of the instructions before performing any action.

Attention: Do not attempt the recovery procedure unless the following conditions
are met:
v All hardware errors are fixed.
v All nodes have candidate status.

The system recovery procedure is one of several tasks that must be performed. The
following list is an overview of the tasks and the order in which they must be
performed:
1. Preparing for system recovery
a. Review the information regarding when to run the recover system
procedure
b. Fix your hardware errors
c. Remove the system information for node canisters with error code 550 or
error code 578 by using the service assistant.
2. Performing the system recovery. After you prepared the system for recovery
and met all the pre-conditions, run the system recovery.

Note: Run the procedure on one system in a fabric at a time. Do not perform
the procedure on different nodes in the same system. This restriction also
applies to remote systems.
3. Performing actions to get your environment operational
v Recovering from offline VDisks (volumes) by using the CLI
v Checking your system, for example, to ensure that all mapped volumes can
access the host.

You can run the recovery procedure by using the front panel or the service
assistant.

© Copyright IBM Corp. 2003, 2012 211


When to run the recover system procedure
A recover procedure must be attempted only after a complete and thorough
investigation of the cause of the system failure. Attempt to resolve those issues by
using other service procedures.

Attention: If you experience failures at any time while you are running the
recover system procedure, call the IBM Support Center. Do not attempt to do
further recovery actions because these actions might prevent IBM Support from
restoring the system to an operational status.

Certain conditions must be met before you run the recovery procedure. Use the
following items to help you determine when to run the recovery procedure:
v Check to see if any node in the system has a node status of active. This status
means that the system is still available. In this case, recovery is not necessary.
v Do not recover the system if the management IP address is available from
another node. Ensure that all service procedures have been run.
v Check the node status of every node that is a member of the system. Resolve all
errors.
– All nodes must be reporting either a node error 578 or a Cluster: error. These
error codes indicate that the system has lost its configuration data. If any
nodes report anything other than these error codes, do not perform a
recovery. You can encounter situations where non-configuration nodes report
other node errors, such as a node error 550. The 550 error can also indicate
that a node is not able to join a system.

Note: If any of the buttons on the front panel have been pressed after these
two error codes are reported, the report for the node returns to the 578 node
error. The change in the report happens after approximately 60 seconds. Also,
if the node was rebooted or if hardware service actions were taken, the node
might show only the Cluster: error.
– If any nodes show Node Error: 550, record the data from the second line of
the display. If the last character on the second line of the display is >, use the
right button to scroll the display to the right.
- In addition to the Node Error: 550, the second line of the display can show
a list of node front panel IDs (seven digits) that are separated by spaces.
The list can also show the WWPN/LUN ID (16 hexadecimal digits
followed by a forward slash and a decimal number).
- If the error data contains any front panel IDs, ensure that the node referred
to by that front panel ID is showing Node Error 578:. If it is not reporting
node error 578, ensure that the two nodes can communicate with each
other. Verify the SAN connectivity and restart one of the two nodes by
pressing the front panel power button twice.
- If the error data contains a WWPN/LUN ID, verify the SAN connectivity
between this node and that WWPN. Check the storage system to ensure
that the LUN referred to is online. After verifying these items, restart the
node by pressing the front panel power button twice.

Note: If after resolving all these scenarios, half or greater than half of the
nodes are reporting Node Error: 578, it is appropriate to run the recovery
procedure.
– For any nodes that are reporting a node error 550, ensure that all the missing
hardware that is identified by these errors is powered on and connected
without faults.

212 SAN Volume Controller: Troubleshooting Guide


– If you have not been able to restart the system and if any node other than the
current node is reporting node error 550 or 578, you must remove system
data from those nodes. This action acknowledges the data loss and puts the
nodes into the required candidate state.
v Do not attempt to recover the system if you have been able to restart it.
v If back-end MDisks are removed from the configuration, those volumes that
depended on that hardware cannot be recovered. All previously configured
back-end hardware must be present for a successful recovery.
v Any nodes that were replaced must have the same WWNN as the nodes that
they replaced.
v The configuration backup file must be up to date. If any configuration changes
had been made since the backup was taken, the data is inconsistent and further
investigation is needed. Manual changes are required after the system is
recovered.
v Any data that was in the cache at the point of failure is lost. The loss of data can
result in data corruption on the affected volumes. If the volumes are corrupted,
call the IBM Support Center.

Fix hardware errors


Before you can run a system recovery procedure, it is important that the root cause
of the hardware issues be identified and fixed.

Obtain a basic understanding about the hardware failure. In most situations when
there is no clustered system, a power issue is the cause.
v The node has been powered off or the power cords were unplugged.
v A 2145 UPS-1U might have failed and shut down one or more nodes because of
the failure. In general, this cause might not happen because of the redundancy
provided by the second 2145 UPS-1U.

Removing clustered-system information for nodes with error


code 550 or error code 578 using the front panel
The recovery procedure for clustered systems works only when all nodes are in
candidate status. If there are any nodes that display error code 550 or error code
578, you must remove their system data.

About this task


To remove clustered-system information from a node with an error 550 or 578,
follow this procedure using the front panel:

Procedure
1. Press and release the up or down button until the Actions menu option is
displayed.
2. Press and release the select button.
3. Press and release the up or down button until Remove Cluster? option is
displayed.
4. Press and release the select button.
5. The node displays Confirm Remove?.
6. Press and release the select button.
7. The node displays Cluster:.

Chapter 8. Recovery procedures 213


Results

When all nodes show Cluster: on the top line and blank on the second line, the
nodes are in candidate status. The 550 or 578 error has been removed. You can
now run the recovery procedure.

Removing system information for nodes with error code 550


or error code 578 using the service assistant
The system recovery procedure works only when all nodes are in candidate status.
If there are any nodes that display error code 550 or error code 578, you must
remove their data.

About this task

Before performing this task, ensure that you have read the introductory
information in the overall recover system procedure.

To remove system information from a node with an error 550 or 578, follow this
procedure using the service assistant:

Procedure
1. Point your browser to the service IP address of one of the nodes, for example,
https://node_service_ip_address/service/.
If you do not know the IP address or if it has not been configured, use the
front panel menu to configure a service address on the node.
2. Log on to the service assistant.
3. Select Manage System.
4. Click Remove System Data.
5. Confirm that you want to remove the system data when prompted.
6. Remove the system data for the other nodes that display a 550 or a 578 error.
All nodes previously in this system must have a node status of Candidate and
have no errors listed against them.
7. Resolve any hardware errors until the error condition for all nodes in the
system is None.
8. Ensure that all nodes in the system display a status of candidate.

Results

When all nodes display a status of candidate and all error conditions are None,
you can run the recovery procedure.

Performing recovery procedure for clustered systems using


the front panel
Start recovery when all nodes that were members of the system are online and are
in candidate status. If there are any nodes that display error code 550 or error code
578, remove their system data to place them into candidate status. Do not run the
recovery procedure on different nodes in the same system; this restriction includes
remote clustered systems.

214 SAN Volume Controller: Troubleshooting Guide


About this task

Attention: This service action has serious implications if not performed properly.
If at any time an error is encountered not covered by this procedure, stop and call
IBM Support.

Any one of the following categories of messages may be displayed:


v T3 successful

The volumes are online. Use the final checks to make the environment
operational; see “What to check after running the system recovery” on page 218.
v T3 incomplete

One or more of the volumes is offline because there was fast write data in the
cache. Further actions are required to bring the volumes online; see “Recovering
from offline VDisks using the CLI” on page 218 for details (specifically, see the
task concerning recovery from offline VDisks using the command-line interface
(CLI)).
v T3 failed

Call IBM Support. Do not attempt any further action.

Start the recovery procedure from any node in the system; the node must not have
participated in any other system. To receive optimal results in maintaining the I/O
group ordering, run the recovery from a node that was in I/O group 0.

Note: Each individual stage of the recovery procedure might take significant time
to complete, dependant upon the specific configuration.

Procedure
1. Click the up or down button until the Actions menu option is displayed; then
click Select.
2. Click the up or down button until the Recover Cluster? option is displayed,
and then click Select; the node displays Confirm Recover?.
3. Click Select; the node displays Retrieving.
After a short delay, the second line displays a sequence of progress messages
indicating the actions are taking place; for example, Finding qdisks. The
backup files are scanned to find the most recent configuration backup data.
After the file and quorum data retrieval is complete, the node displays T3
data: on the top line.
4. Verify the date and time on the second line of the display. The time stamp
shown is the date and time of the last quorum update and must be less than 30
minutes before the failure. The time stamp format is YYYYMMDD hh:mm,
where YYYY is the year, MM is the month, DD is the day, hh is the hour, and
mm is the minute.
Attention: If the time stamp is not less than 30 minutes before the failure, call
IBM support.
5. After verifying the time stamp is correct, press and hold the UP ARROW and
click Select.
The node displays Backup file on the top line.
6. Verify the date and time on the second line of the display. The time stamp
shown is the date and time of the last configuration backup and must be less

Chapter 8. Recovery procedures 215


than 24 hours before the failure. The time stamp format is YYYYMMDD hh:mm,
where YYYY is the year, MM is the month, DD is the day, hh is the hour, and
mm is the minute.
Attention: If the time stamp is not less than 24 hours before the failure, call
IBM support.

Note: Changes made after the time of this configuration backup might not be
restored.
7. After verifying the time stamp is correct, press and hold the UP ARROW and
click Select.
The node displays Restoring. After a short delay, the second line displays a
sequence of progress messages indicating the actions taking place; then the
software on the node restarts.
The node displays Cluster on the top line and a management IP address on the
second line. After a few moments, the node displays T3 Completing.

Note: Any system errors logged at this time might temporarily overwrite the
display; ignore the message: Cluster Error: 3025. After a short delay, the
second line displays a sequence of progress messages indicating the actions
taking place.
When each node is added to the system, the display shows Cluster: on the top
line, and the cluster (system) name on the second line.
Attention: After the last node is added to the system, there is a short delay to
allow the system to stabilize. Do not attempt to use the system. The recovery is
still in progress. Once recovery is complete, the node displays T3 Succeeded on
the top line.
8. Click Select to return the node to normal display.

Results

Recovery is complete when the node displays T3 Succeeded. Verify the


environment is operational by performing the checks provided in “What to check
after running the system recovery” on page 218.

Performing system recovery using the service assistant


Start recovery when all nodes that were members of the system are online and are
in candidate status. If any nodes display error code 550 or 578, remove their
system data to place them into candidate status. Do not run the recovery
procedure on different nodes in the same system; this restriction includes remote
systems.

About this task

Attention: This service action has serious implications if not performed properly.
If at any time an error is encountered not covered by this procedure, stop and call
IBM Support.

Note: The web browser must not block pop-up windows, otherwise progress
windows cannot open.

Any one of the following categories of messages may be displayed:


v T3 successful

216 SAN Volume Controller: Troubleshooting Guide


The volumes are online. Use the final checks to make the environment
operational; see “What to check after running the system recovery” on page 218.
v T3 incomplete

One or more of the volumes is offline because there was fast write data in the
cache. Further actions are required to bring the volumes online; see “Recovering
from offline VDisks using the CLI” on page 218 for details.
v T3 failed

Call IBM Support. Do not attempt any further action.

Run the recovery from any nodes in the system; the nodes must not have
participated in any other system.

Note: Each individual stage of the recovery procedure might take significant time
to complete, dependant upon the specific configuration.

Before performing this procedure, read the recover system procedure introductory
information; see “Recover system procedure” on page 211.

Procedure
1. Point your browser to the service IP address of one of the nodes.
If the IP address is unknown or has not been configured, assign an IP address
using the initialization tool.
2. Log on to the service assistant.
3. Select Recover System from the navigation.
4. Follow the online instructions to complete the recovery procedure.
a. Verify the date and time of the last quorum time. The time stamp must be
less than 30 minutes before the failure. The time stamp format is
YYYYMMDD hh:mm, where YYYY is the year, MM is the month, DD is the
day, hh is the hour, and mm is the minute.

Attention: If the time stamp is not less than 30 minutes before the failure, call
IBM Support.
a. Verify the date and time of the last backup date. The time stamp must be
less than 24 hours before the failure. The time stamp format is YYYYMMDD
hh:mm, where YYYY is the year, MM is the month, DD is the day, hh is the
hour, and mm is the minute.

Attention: If the time stamp is not less than 24 hours before the failure, call
IBM Support.
Changes made after the time of this backup date might not be restored.

Results

Verify the environment is operational by performing the checks provided in “What


to check after running the system recovery” on page 218.

If any errors are logged in the error log after the system recovery procedure
completes, use the fix procedures to resolve these errors, especially the errors
related to offline arrays.

If the recovery completes with offline volumes, go to “Recovering from offline


VDisks using the CLI” on page 218.

Chapter 8. Recovery procedures 217


Recovering from offline VDisks using the CLI
If a recovery procedure (T3 procedure) completes with offline volumes, you can
use the command-line interface (CLI) to access the volumes.

About this task

If you have performed the recovery procedure, and it has completed successfully
but there are offline volumes, you can perform the following steps to bring the
volumes back online. Any volumes that are offline and are not thin-provisioned
volumes are offline because of the loss of write-cache data during the event that
led both nodes to lose their hardened data. These volumes might need additional
recovery steps after the volume is brought back online.

Note: If you encounter errors in the error log after running the recovery procedure
that are related to offline arrays, use the fix procedures to resolve the offline array
errors before fixing the offline volume (VDisk) errors.

Example

Perform the following steps to recover an offline volume after the recovery
procedure has completed:
1. Delete all IBM FlashCopy function mappings and Metro Mirror or Global
Mirror relationships that use the offline volumes.
2. Run the recovervdisk, recovervdiskbyiogrp or recovervdiskbysystem
command.
You can recover individual volumes by using the recovervdisk command. You
can recover all the volumes in a clustered system by using the
recovervdiskbysystem command.
3. Recreate all FlashCopy mappings and Metro Mirror or Global Mirror
relationships that use the volumes.

What to check after running the system recovery


Several tasks must be performed before you use the volumes.

Be aware of the following differences regarding the recovered configuration:


v FlashCopy mappings are restored as “idle_or_copied” with 0% progress. Both
volumes must have been restored to their original I/O groups.
v The management ID is different. Any scripts or associated programs that refer to
the system-management ID of the clustered system must be changed.
v Any FlashCopy mappings that were not in the “idle_or_copied” state with 100%
progress at the point of disaster have inconsistent data on their target disks.
These mappings must be restarted.
v Intersystem remote copy partnerships and relationships are not restored and
must be re-created manually.
v Consistency groups are not restored and must be re-created manually.
v Intrasystem remote copy relationships are restored if all dependencies were
successfully restored to their original I/O groups.
v The system time zone might not have been restored.

Before using the volumes, perform the following tasks:


v Start the host systems.

218 SAN Volume Controller: Troubleshooting Guide


v Manual actions might be necessary on the hosts to trigger them to rescan for
devices. You can perform this task by disconnecting and reconnecting the Fibre
Channel cables to each host bus adapter (HBA) port.
v Verify that all mapped volumes can be accessed by the hosts.
v Run file system consistency checks.

Note: Any data that was in the SAN Volume Controller write cache at the time
of the failure is lost.
v Run the application consistency checks.

Backing up and restoring the system configuration


You can back up and restore the configuration data for the system after
preliminary tasks are completed.

Configuration data for the system provides information about your system and the
objects that are defined in it. The backup and restore functions of the svcconfig
command can back up and restore only your configuration data for the SAN
Volume Controller system. You must regularly back up your application data by
using the appropriate backup methods.

You can maintain your configuration data for the system by completing the
following tasks:
v Backing up the configuration data
v Restoring the configuration data
v Deleting unwanted backup configuration data files

Before you back up your configuration data, the following prerequisites must be
met:
v No independent operations that change the configuration for the system can be
running while the backup command is running.
v No object name can begin with an underscore character (_).

Note:
v The default object names for controllers, I/O groups, and managed disks
(MDisks) do not restore correctly if the ID of the object is different from what is
recorded in the current configuration data file.
v All other objects with default names are renamed during the restore process. The
new names appear in the format name_r where name is the name of the object in
your system.

Before you restore your configuration data, the following prerequisites must be
met:
v You have the Security Administrator role associated with your user name and
password.
v You have a copy of your backup configuration files on a server that is accessible
to the system.
v You have a backup copy of your application data that is ready to load on your
system after the restore configuration operation is complete.
v You know the current license settings for your system.

Chapter 8. Recovery procedures 219


v You did not remove any hardware since the last backup of your system
configuration. If you had to replace a faulty node, the new node must use the
same worldwide node name (WWNN) as the faulty node that it replaced.

Note: You can add new hardware, but you must not remove any hardware
because the removal can cause the restore process to fail.
v No zoning changes were made on the Fibre Channel fabric which would prevent
communication between the SAN Volume Controller and any storage controllers
which are present in the configuration.

You can restore the configuration by using any node as the configuration node.
However, if you do not use the node that was the configuration node when the
system was first created, the unique identifier (UID) of the volumes that are within
the I/O groups can change. This action can affect IBM Tivoli Storage Productivity
Center for Fabric, VERITAS Volume Manager, and any other programs that record
this information.

The SAN Volume Controller analyzes the backup configuration data file and the
system to verify that the required disk controller system nodes are available.

Before you begin, hardware recovery must be complete. The following hardware
must be operational: hosts, SAN Volume Controller, drives, the Ethernet network,
and the SAN fabric.

Backing up the system configuration using the CLI


You can back up your configuration data using the command-line interface (CLI).

Before you begin

Before you back up your configuration data, the following prerequisites must be
met:
v No independent operations that change the configuration can be running while
the backup command is running.
v No object name can begin with an underscore character (_).
v If the ID of the object is different from what is recorded in the current
configuration data file, the default object names for controllers, I/O groups, and
managed disks (MDisks) do not restore correctly.
v All other objects with default names are renamed during the restore process. The
new names appear in the format name_r.
where name is the name of the object in your system.

About this task

The backup feature of the svcconfig CLI command is designed to back up


information about your system configuration, such as volumes, local Metro Mirror
information, local Global Mirror information, managed disk (MDisk) groups, and
nodes. All other data that you wrote to the volumes is not backed up. Any
application that uses the volumes on the system as storage, must back up its
application data using the appropriate backup methods.

You must regularly back up your configuration data and your application data to
avoid data loss. If a system is lost after a severe failure occurs, both configuration
of the system and application data is lost. You must reinstate the system to the
exact state it was in before the failure, and then recover the application data.

220 SAN Volume Controller: Troubleshooting Guide


The SSH coding examples that are provided are samples using the PuTTY scp
(pscp) application code. The pscp application is available when you install an SSH
client on your host system. You can access the pscp application through a
Microsoft Windows command prompt.

Perform the following steps to back up your configuration data:

Procedure
1. Back up all of the application data that you stored on your volumes using your
preferred backup method.
2. Open a command prompt.
3. Using the command-line interface, issue the following command to log on to
the system:
plink -i ssh_private_key_file superuser@cluster_ip
where ssh_private_key_file is the name of the SSH private key file for the
superuser and cluster_ip is the IP address or DNS name of the clustered system
for which you want to back up the configuration.
4. Issue the following CLI command to remove all of the existing configuration
backup and restore files that are on your configuration node in the /tmp
directory.
svcconfig clear -all
5. Issue the following CLI command to back up your configuration:
svcconfig backup

The following output is an example of the messages that are displayed during
the backup process:
CMMVC6112W io_grp io_grp1 has a default name
CMMVC6112W io_grp io_grp2 has a default name
CMMVC6112W mdisk mdisk14 ...
CMMVC6112W node node1 ...
CMMVC6112W node node2 ...
....................................................

The svcconfig backup CLI command creates three files that provide
information about the backup process and the configuration. These files are
created in the /tmp directory of the configuration node.
The following table describes the three files that are created by the backup
process:

File name Description


svc.config.backup.xml This file that contains your configuration
data.
svc.config.backup.sh This file that contains the names of the
commands that were issued to create the
backup of the system.
svc.config.backup.log This file contains details about the backup,
including any error information that was
reported.

6. Check that the svcconfig backup command completes successfully. The


following output is an example of the message that is displayed when the
backup process is successful:

Chapter 8. Recovery procedures 221


CMMVC6155I SVCCONFIG processing completed successfully.

If the process fails, resolve the errors, and run the process again.
7. Issue the following command to exit the system:
exit
8. Issue the following command to copy the backup files to a location that is not
in your system:
pscp -i ssh_private_key_file superuser@cluster_ip:/tmp/svc.config.backup.*
/offclusterstorage/
where cluster_ip is the IP address or DNS name of the system and
offclusterstorage is the location where you want to store the backup files.
If the configuration node changes, you must copy these files to a location
outside of your system because the /tmp directory on this node becomes
inaccessible. The configuration node might change in response to an error
recovery action or to a user maintenance activity.

Tip: To maintain controlled access to your configuration data, copy the backup
files to a location that is password-protected.
9. Ensure that the copies of the backup files are stored in the location that you
specified in step 8.

What to do next

You can rename the backup files to include the configuration node name either at
the start or end of the file names so that you can easily identify these files when
you are ready to restore your configuration.

Issue the following command to rename the backup files that are stored on a Linux
or IBM AIX host:
mv /offclusterstorage/svc.config.backup.xml
/offclusterstorage/svc.config.backup.xml_myconfignode

where offclusterstorage is the name of the directory where the backup files are
stored and myconfignode is the name of your configuration node.

To rename the backup files that are stored on a Windows host, right-click the name
of the file and select Rename.

Restoring the system configuration


For directions on the recover procedure, see “Recover system procedure” on page
211.

Before you begin

This configuration restore procedure is designed to restore information about your


configuration, such as volumes, local Metro Mirror information, local Global Mirror
information, storage pools, and nodes. All the data that you have written to the
volumes is not restored. To restore the data on the volumes, you must restore
application data from any application that uses the volumes on the clustered
system as storage separately. Therefore, you must have a backup of this data before
you follow the configuration recovery process.

222 SAN Volume Controller: Troubleshooting Guide


About this task

You must regularly back up your configuration data and your application data to
avoid data loss. If a system is lost after a severe failure occurs, both configuration
for the system and application data is lost. You must reinstate the system to the
exact state it was in before the failure, and then recover the application data.

Important:
1. There are two phases during the restore process: prepare and execute. You must
not change the fabric or system between these two phases.
2. For a SAN Volume Controller with internal solid-state drives (SSDs), all nodes
must be added into the system before restoring your data. See step 9 on page
224.

If you do not understand the instructions to run the CLI commands, see the
command-line interface reference information.

To restore your configuration data, follow these steps:

Procedure
1. Verify that all nodes are available as candidate nodes before you run this
recovery procedure. You must remove errors 550 or 578 to put the node in
candidate state.
2. Create a new system from the front panel. If possible, use the node that was
originally in I/O group 0.
3. From the management GUI, click Access > Users to set up your system and
configure an SSH key for the superuser. This allows access to the CLI.
4. Using the command-line interface, issue the following command to log on to
the system:
plink -i ssh_private_key_file superuser@cluster_ip
where ssh_private_key_file is the name of the SSH private key file for the
superuser and cluster_ip is the IP address or DNS name of the system for
which you want to restore the configuration.

Note: Because the RSA host key has changed, a warning message might
display when you connect to the system using SSH.
5. Issue the following CLI command to ensure that only the configuration node
is online:
lsnode
The following output is an example of what is displayed:
id name status IO_group_id IO_group_name config_node
1 nodel online 0 io_grp0 yes
6. Identify the configuration backup file that you want to restore from.
The file can be either a local copy of the configuration backup XML file that
you saved when backing up the configuration or an up-to-date file on one of
the nodes.
Configuration data is automatically backed up daily at 01:00 system time on
the configuration node.
Attention: You must copy the required backup file to another computer
before you continue. To save a copy of the data, perform the following steps
to check for backup files on both nodes:
a. From the management GUI, click Settings > Support.

Chapter 8. Recovery procedures 223


b. Click Show full log listing.
c. Find the file name that begins with svc.config.cron.xml.
d. Double-click the file to download the file to your computer.
e. If a recent configuration file is not present on this node, configure service
IP addresses for other nodes and connect to the service assistant to look
for configuration files on other nodes. For details on how to do this, see
the information regarding service IPv4 or service IPv6 at “Service IPv4 or
Service IPv6 options” on page 118.
7. Issue the following CLI command to remove all of the existing backup and
restore configuration files that are located on your configuration node in the
/tmp directory:
svcconfig clear -all
8. The XML files contain a date and time that can be used to identify the most
recent backup. After you identify the backup XML file that is to be used when
you restore the system, rename the file to svc.config.backup.xml. From your
desktop, issue the following command to copy the file back on to the system.
pscp -i ssh_private_key_file
full_path_to_identified_svc.config.backup.xml
superuser@cluster_ip:/tmp/
9. If the system contains any nodes with internal solid-state drives (SSDs), these
nodes must be added to the system now. To add these nodes, determine the
panel name, node name, and I/O groups of any such nodes from the
configuration backup file. To add the nodes to the system, issue this
command:
source addnode -panelname panel_name
-iogrp iogrp_name_or_id -name node_name

where panel_name is the name that is displayed on the panel, iogrp_name_or_id


is the name or ID of the I/O group to which you want to add this node, and
node_name is the name of the node.
10. Issue the following CLI command to compare the current configuration with
the backup configuration data file:
svcconfig restore -prepare
This CLI command creates a log file in the /tmp directory of the configuration
node. The name of the log file is svc.config.restore.prepare.log.

Note: It can take up to a minute for each 256-MDisk batch to be discovered. If


you receive error message CMMVC6200W for an MDisk after you enter this
command, all the managed disks (MDisks) might not have been discovered
yet. Allow a suitable time to elapse and try the svcconfig restore -prepare
command again.
11. Issue the following command to copy the log file to another server that is
accessible to the system:
pscp -i ssh_private_key_file
superuser@cluster_ip:/tmp/svc.config.restore.prepare.log
full_path_for_where_to_copy_log_files
12. Open the log file from the server where the copy is now stored.
13. Check the log file for errors.
v If there are errors, correct the condition that caused the errors and reissue
the command. You must correct all errors before you can proceed to step 14.
v If you need assistance, contact the IBM Support Center.
14. Issue the following CLI command to restore the configuration:

224 SAN Volume Controller: Troubleshooting Guide


svcconfig restore -execute

Note: Issuing this CLI command on a single node system adds the other
nodes to the system.
This CLI command creates a log file in the /tmp directory of the configuration
node. The name of the log file is svc.config.restore.execute.log.
15. Issue the following command to copy the log file to another server that is
accessible to the system:
pscp -i ssh_private_key_file
superuser@cluster_ip:/tmp/svc.config.restore.execute.log
full_path_for_where_to_copy_log_files
16. Open the log file from the server where the copy is now stored.
17. Check the log file to ensure that no errors or warnings have occurred.

Note: You might receive a warning stating that a licensed feature is not
enabled. This message means that after the recovery process, the current
license settings do not match the previous license settings. The recovery
process continues normally and you can enter the correct license settings in
the management GUI at a later time.
When you log into the CLI again over SSH, you see this output:
IBM_2145:your_cluster_name:superuser>

18. After the configuration is restored, perform the following actions:


a. Verify that the quorum disks are restored to the MDisks that you want by
using the lsquorum command. To restore the quorum disks to the correct
MDisks, issue the appropriate chquorum CLI commands.
b. Reset the superuser password. The superuser password is not restored as
part of the process.

What to do next

You can remove any unwanted configuration backup and restore files from the
/tmp directory on your configuration by issuing the following CLI command:

svcconfig clear -all

Deleting backup configuration files using the CLI


You can use the command-line interface (CLI) to delete backup configuration files.

About this task

Perform the following steps to delete backup configuration files:

Procedure
1. Issue the following command to log on to the system:
plink -i ssh_private_key_file superuser@cluster_ip
where ssh_private_key_file is the name of the SSH private key file for the
superuser and cluster_ip is the IP address or DNS name of the clustered system
from which you want to delete the configuration.
2. Issue the following CLI command to erase all of the files that are stored in the
/tmp directory:
svconfig clear -all

Chapter 8. Recovery procedures 225


Performing the node rescue when the node boots
If it is necessary to replace the hard disk drive or if the software on the hard disk
drive is corrupted, you can use the node rescue procedure to reinstall the SAN
Volume Controller software.

Before you begin

Similarly, if you have replaced the service controller, use the node rescue procedure
to ensure that the service controller has the correct software.

About this task

Attention: If you recently replaced both the service controller and the disk drive
as part of the same repair operation, node rescue fails.

Node rescue works by booting the operating system from the service controller
and running a program that copies all the SAN Volume Controller software from
any other node that can be found on the Fibre Channel fabric.

Attention: When running node rescue operations, run only one node rescue
operation on the same SAN, at any one time. Wait for one node rescue operation
to complete before starting another.

Perform the following steps to complete the node rescue:

Procedure
1. Ensure that the Fibre Channel cables are connected.
2. Ensure that at least one other node is connected to the Fibre Channel fabric.
3. Ensure that the SAN zoning allows a connection between at least one port of
this node and one port of another node. It is better if multiple ports can
connect. This is particularly important if the zoning is by worldwide port name
(WWPN) and you are using a new service controller. In this case, you might
need to use SAN monitoring tools to determine the WWPNs of the node. If you
need to change the zoning, remember to set it back when the service procedure
is complete.
4. Turn off the node.
5. Press and hold the left and right buttons on the front panel.
6. Press the power button.
7. Continue to hold the left and right buttons until the node-rescue-request
symbol is displayed on the front panel (Figure 73).

Results

Figure 73. Node rescue display

The node rescue request symbol displays on the front panel display until the node
starts to boot from the service controller. If the node rescue request symbol

226 SAN Volume Controller: Troubleshooting Guide


displays for more than two minutes, go to the hardware boot MAP to resolve the
problem. When the node rescue starts, the service display shows the progress or
failure of the node rescue operation.

Note: If the recovered node was part of a clustered system, the node is now
offline. Delete the offline node from the system and then add the node back into
the system. If node recovery was used to recover a node that failed during a
software upgrade process, it is not possible to add the node back into the system
until the upgrade or downgrade process has completed. This can take up to four
hours for an eight-node clustered system.

Chapter 8. Recovery procedures 227


228 SAN Volume Controller: Troubleshooting Guide
Chapter 9. Understanding the medium errors and bad blocks
A storage system returns a medium error response to a host when it is unable to
successfully read a block. The SAN Volume Controller response to a host read
follows this behavior.

The volume virtualization that is provided extends the time when a medium error
is returned to a host. Because of this difference to non-virtualized systems, the
SAN Volume Controller uses the term bad blocks rather than medium errors.

The SAN Volume Controller allocates volumes from the extents that are on the
managed disks (MDisks). The MDisk can be a volume on an external storage
controller or a RAID array that is created from internal drives. In either case,
depending on the RAID level used, there is normally protection against a read
error on a single drive. However, it is still possible to get a medium error on a
read request if multiple drives have errors or if the drives are rebuilding or are
offline due to other issues.

The SAN Volume Controller provides migration facilities to move a volume from
one underlying set of physical storage to another or to replicate a volume that uses
FlashCopy or Metro Mirror or Global Mirror. In all these cases, the migrated
volume or the replicated volume returns a medium error to the host when the
logical block address on the original volume is read. The system maintains tables
of bad blocks to record where the logical block addresses that cannot be read are.
These tables are associated with the MDisks that are providing storage for the
volumes.

The dumpmdiskbadblocks command and the dumpallmdiskbadblocks command are


available to query the location of bad blocks.

It is possible that the tables that are used to record bad block locations can fill up.
The table can fill either on an MDisk or on the system as a whole. If a table does
fill up, the migration or replication that was creating the bad block fails because it
was not possible to create an exact image of the source volume.

The system creates alerts in the event log for the following situations:
v When it detects medium errors and creates a bad block
v When the bad block tables fill up

The following errors are identified:


Table 55. Bad block errors
Error code Description
1840 The managed disk has bad blocks.
1226 The system has failed to create a bad block
because the MDisk already has the
maximum number of allowed bad blocks.
1225 The system has failed to create a bad block
because the system already has the
maximum number of allowed bad blocks.

© Copyright IBM Corp. 2003, 2012 229


The recommended actions for these alerts guide you in correcting the situation.

Bad blocks are cleared by deallocating the volume disk extent by deleting the
volume or by issuing write I/O to the block. It is good practice to correct bad
blocks as soon as they are detected. This action prevents the bad block from being
propagated when the volume is replicated or migrated. It is possible, however, for
the bad block to be on part of the volume that is not used by the application. For
example, it can be in part of a database that has not been initialized. These bad
blocks are corrected when the application writes data to these areas. Before the
correction happens, the bad block records continue to use up the available bad
block space.

230 SAN Volume Controller: Troubleshooting Guide


Chapter 10. Using the maintenance analysis procedures
The maintenance analysis procedures (MAPs) inform you how to analyze a failure
that occurs with a SAN Volume Controller node.

About this task

SAN Volume Controller nodes must be configured in pairs so you can perform
concurrent maintenance.

When you service one node, the other node keeps the storage area network (SAN)
operational. With concurrent maintenance, you can remove, replace, and test all
field replaceable units (FRUs) on one node while the SAN and host systems are
powered on and doing productive work.

Note: Unless you have a particular reason, do not remove the power from both
nodes unless instructed to do so. When you need to remove power, see “MAP
5350: Powering off a SAN Volume Controller node” on page 258.

Procedure
v To isolate the FRUs in the failing node, complete the actions and answer the
questions given in these maintenance analysis procedures (MAPs).
v When instructed to exchange two or more FRUs in sequence:
1. Exchange the first FRU in the list for a new one.
2. Verify that the problem is solved.
3. If the problem remains:
a. Reinstall the original FRU.
b. Exchange the next FRU in the list for a new one.
4. Repeat steps 2 and 3 until either the problem is solved, or all the related
FRUs have been exchanged.
5. Complete the next action indicated by the MAP.
6. If you are using one or more MAPs because of a system error code, mark the
error as fixed in the event log after the repair, but before you verify the
repair.

Note: Start all problem determination procedures and repair procedures with
“MAP 5000: Start.”

MAP 5000: Start


MAP 5000: Start is an entry point to the maintenance analysis procedures (MAPs)
for the SAN Volume Controller.

Before you begin

If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures.”

© Copyright IBM Corp. 2003, 2012 231


This MAP applies to all SAN Volume Controller models. Be sure that you know
which model you are using before you start this procedure. To determine which
model you are working with, look for the label that identifies the model type on
the front of the node.

You might have been sent here for one of the following reasons:
v The fix procedures sent you here
v A problem occurred during the installation of a SAN Volume Controller
v Another MAP sent you here
v A user observed a problem that was not detected by the system

SAN Volume Controller nodes are configured in pairs. While you service one node,
you can access all the storage managed by the pair from the other node. With
concurrent maintenance, you can remove, replace, and test all FRUs on one SAN
Volume Controller while the SAN and host systems are powered on and doing
productive work.

Notes:
v Unless you have a particular reason, do not remove the power from both nodes
unless instructed to do so.
v If a recommended action in these procedures involves removing or replacing a
part, use the applicable procedure.
v If the problem persists after performing the recommended actions in this
procedure, return to step 1 of the MAP to try again to fix the problem.

About this task

Perform the following steps:

Procedure
1. Were you sent here from a fix procedure?
NO Go to step 2
YES Go to step 8 on page 233
2. (from step 1)
Find the IBM System Storage Productivity Center (SSPC) that is close to and is
set up to manage the SAN Volume Controller system. The SSPC is normally
located in the same rack as the SAN Volume Controller system.
3. (from step 2)
Log in to the SSPC using the user ID and password that is provided by the
user.
4. (from step 3)
Log into the management GUI using the user ID and password that is
provided by the user and launch the management GUI for the system that
you are repairing.
5. (from step 4)
Does the management GUI start?
NO Go to step 8 on page 233.
YES Go to step 6.
6. (from step 5)

232 SAN Volume Controller: Troubleshooting Guide


When the SAN Volume Controller system that you want to service is
selected, is the Welcome panel displayed?
NO Go to step 8.
YES Go to step 7.
7. (from step 6 on page 232)
Start the fix procedures.
Did the fix procedures find an error that needs to be fixed?
NO Go to step 8.
YES Follow the fix procedures.
8. (from steps 1 on page 232, 5 on page 232, 6 on page 232, and 7)
Is the power indicator on the front panel off? Check to see if the power LED
on the operator-information panel is off.
NO Go to step 9.
YES Try to turn on the nodes. See “Using the power control for the SAN
Volume Controller node” on page 123.

Note: The uninterruptible power supply unit that supplies power to


the node might also be turned off. The uninterruptible power supply
must be turned on before the node is turned on.
If the nodes are turned on, go to step 9; otherwise, go to the
appropriate Power MAP: “MAP 5060: Power 2145-8A4” on page 245
or “MAP 5050: Power 2145-CG8, 2145-CF8, 2145-8G4, 2145-8F4, and
2145-8F2” on page 238.
9. (from step 8)
Does the front panel of the node show a hardware error? There is a
hardware error if any of the following conditions are true for the node:
v None of the LEDs on the front panel are on and the front-panel display is
blank.
v The error LED 1, which is the bottom LED on the front panel, is on.
Figure 74 shows the location of the service controller error light.

1
svc00561

Figure 74. SAN Volume Controller service controller error light

NO Go to step 10.
YES The service controller for the SAN Volume Controller has failed.
a. Check that the service controller that is indicating an error is
correctly installed. If it is, replace the service controller.
b. Go to “MAP 5700: Repair verification” on page 278.
10. (from step 9)
Is the operator-information panel error LED 1 that you see in Figure 75
on page 234 illuminated or flashing?

Chapter 10. Using the maintenance analysis procedures 233


1 1 1 1

svc00714
2145-CF8 2145-8A4 2145-8G4 2145-8F4
2145-CG8 2145-8F2

Figure 75. Error LED on the SAN Volume Controller models

NO Go to step 11.
YES Go to “MAP 5800: Light path” on page 279.
11. (from step 10 on page 233)
Is the hardware boot display that you see in Figure 76 displayed on the
node?

Figure 76. Hardware boot display

NO Go to step 13.
YES Go to step 12.
12. (from step 11)
Has the hardware boot display that you see in Figure 76 displayed for more
than three minutes?
NO Go to step 13.
YES Perform the following:
a. Go to “MAP 5900: Hardware boot” on page 302.
b. Go to “MAP 5700: Repair verification” on page 278.
13. (from step 11 )
Is Failed displayed on the top line of the front-panel display of the node?
NO Go to step 14.
YES Perform the following:
a. Note the failure code and go to “Boot code reference” on page 154
to perform the repair actions.
b. Go to “MAP 5700: Repair verification” on page 278.
14. (from step 13)
Is Booting displayed on the top line of the front-panel display of the node?
NO Go to step 16 on page 235.
YES Go to step 15.
15. (from step 14)
A progress bar and a boot code are displayed. If the progress bar does not
advance for more than three minutes, it has stalled.

234 SAN Volume Controller: Troubleshooting Guide


Has the progress bar stalled?
NO Go to step 16.
YES Perform the following:
a. Note the failure code and go to “Boot code reference” on page 154
to perform the repair actions.
b. Go to “MAP 5700: Repair verification” on page 278.
16. (from step 14 on page 234 and step 15 on page 234)
If you pressed any of the navigation buttons on the front panel, wait for 60
seconds to ensure that the display has switched to its default display.
Is Node Error displayed on the top line of the front-panel display of the
node?
NO Go to step 17.
YES Perform the following steps:
a. Note the failure code and go to “Node error code overview” on
page 155 to perform the repair actions.
b. Go to “MAP 5700: Repair verification” on page 278.
17. (from step 16)
Is Cluster Error displayed on the top line of the front-panel display of the
node?
NO Go to step 18.
YES A cluster error was detected. This error code is displayed on all the
operational nodes in the system. This type of error is normally
repaired using the fix procedures. Perform the following steps:
a. Go to step 2 on page 232 to perform the fix procedure. If you
return here, go to “Clustered-system code overview” on page 156
to perform the repair actions.
b. Go to “MAP 5700: Repair verification” on page 278.
18. (from step 17)
Is Powering Off, Restarting, Shutting Down, or Power Failure displayed in
the top line of the front-panel display?
NO Go to step 20 on page 236.
YES The progress bar moves every few seconds. Wait for the operation to
complete and then return to step 1 on page 232 in this MAP. If the
progress bar does not move for three minutes, press the power button
and go to step 19.
19. (from step 18)
Did the node power off?
NO Perform the following steps:
a. Remove the power cord from the rear of the box.
b. Wait 60 seconds.
c. Replace the power cord.
d. If the node does not power on, press the power button to
power-on the node and then return to step 1 on page 232 in this
MAP.
YES Perform the following steps:
a. Wait 60 seconds.

Chapter 10. Using the maintenance analysis procedures 235


b. Click the power button to turn on the node and then return to step
1 on page 232 in this MAP.

Note: The 2145 UPS-1U turns off only when its power button is
pressed, input power has been lost for more than five minutes, or the
SAN Volume Controller node has shut it down following a reported
loss of input power.
20. (from step 19 on page 235)
Is Charging or Recovering displayed in the top line of the front-panel
display of the node?
NO Go to step 21.
YES
v If Charging is displayed, the uninterruptible power supply battery is
not yet charged sufficiently to support the node. If Charging is
displayed for more than two hours, go to “MAP 5150: 2145
UPS-1U” on page 248.
v If Recovering is displayed, the uninterruptible power supply battery
is not yet charged sufficiently to be able to support the node
immediately following a power supply failure. However, if
Recovering is displayed, the node can be used normally.
v If Recovering is displayed for more than two hours, go to “MAP
5150: 2145 UPS-1U” on page 248.
21. (from step 20)
Is Validate WWNN? displayed on the front-panel display of the node?
NO Go to step 22 on page 237.
YES The node is indicating that its WWNN might need changing. It enters
this mode when the node service controller or disk has been changed
but the required service procedures have not been followed.

Note: Do not validate the WWNN until you read the following
information to ensure that you choose the correct value. If you choose
an incorrect value, you might find that the SAN zoning for the node is
also not correct and more than one node is using the same WWNN.
Therefore, it is important to establish the correct WWNN before you
continue.
a. Determine which WWNN that you want to use.
v If the service controller has been replaced, the correct value is
probably the WWNN that is stored on disk (the disk WWNN).
v If the disk has been replaced, perhaps as part of a frame
replacement procedure, but has not been re-initialized, the
correct value is probably the WWNN that is stored on the
service controller (the panel WWNN).
b. Select the stored WWNN that you want this node to use:
v To use the WWNN that is stored on the disk, perform the
following steps:
1) From the Validate WWNN? panel, press and release the
select button. The Disk WWNN: panel is displayed and
shows the last five digits of the WWNN that is stored on the
disk.

236 SAN Volume Controller: Troubleshooting Guide


2) From the Disk WWNN: panel, press and release the down
button. The Use Disk WWNN? panel is displayed.
3) Press and release the select button.
v To use the WWNN that is stored on the service controller,
perform the following steps:
1) From the Validate WWNN? panel, press and release the
select button. The Disk WWNN: panel is displayed.
2) From the Disk WWNN: panel, press and release the right
button. The Panel WWNN: panel is displayed and shows the
last five numbers of the WWNN that is stored on the service
controller.
3) From the Panel WWNN: panel, press and release the down
button. The Use Panel WWNN? panel is displayed.
4) Press and release the select button.
c. After you set the WWNN, check the front-panel display:
v If the Node WWNN: panel is displayed on the front panel, the
node is now using the selected WWNN. The Node WWNN:
panel shows the last five numbers of the WWNN that you
selected.
v If the front panel shows Cluster: but does not show a system
name, you must use the recover procedure for a clustered
system to delete the node from the system and add the node
back into the system.
22. (from step 21 on page 236)
Is there a node that is not a member of a clustered system? You can tell if a
node is not a member of a system by checking the front panel menu. If
Cluster: is displayed but no system name is shown, the node is not a
member of a system. (The name is on the second line of the front-panel
display if the current language font allows a two-line display. Otherwise, you
can press the select button to display the name.)
NO Go to step 23.
YES The node is not a member of a system. The node might have been
deleted during a maintenance procedure and has not been added back
into the system. Make sure that each I/O group in the system contains
two nodes. If an I/O group has only one node, add the node back
into that system and ensure that the node is restored to the same I/O
group that it was deleted from.
23. (from step 22)
Is the front-panel display unreadable?
NO Go to step 24.
YES Perform the following steps:
a. Check the language. The display might be set to another language.
b. If the language is set correctly, go to “MAP 5400: Front panel” on
page 263.
24. (from step 23)
No errors were detected by the SAN Volume Controller. If you suspect that
the problem that is reported by the customer is a hardware problem, perform
the following tasks:

Chapter 10. Using the maintenance analysis procedures 237


a. Perform Problem Determination procedures on your host systems, disk
controllers, and Fibre Channel switches.
b. Ask your hardware support center for assistance.

Results

If you suspect that the problem is a software problem, see “Upgrading the system”
documentation for details about how to upgrade your entire SAN Volume
Controller environment.

If the problem is still not fixed, collect diagnostic information and contact the IBM
support center.

MAP 5050: Power 2145-CG8, 2145-CF8, 2145-8G4, 2145-8F4, and


2145-8F2
MAP 5050: Power 2145-CG8, 2145-CF8, 2145-8G4, 2145-8F4, and 2145-8F2 helps you
to solve power problems that have occurred on SAN Volume Controller models
2145-CG8, 2145-CF8, 2145-8G4, 2145-8F4, and 2145-8F2. If you are using a SAN
Volume Controller 2145-8A4, see the Power MAP for that SAN Volume Controller
model.

Before you begin

If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures,” on page 231.

You might have been sent here for one of the following reasons:
v A problem occurred during the installation of a SAN Volume Controller SAN
Volume Controller 2145-CG8, 2145-CF8, 2145-8G4, 2145-8F4, or 2145-8F2 node.
v The power switch failed to turn the node on
v The power switch failed to turn the node off
v Another MAP sent you here

About this task

Perform the following steps:

Procedure
1. Are you here because the node is not powered on?
NO Go to step 11 on page 244.
YES Go to step 2.
2. (from step 1)
Is the power LED on the operator-information panel continuously
illuminated? Figure 77 on page 239 shows the location of the power LED 1
on the operator-information panel.

238 SAN Volume Controller: Troubleshooting Guide


1 1 1

svc00715
2145-CF8 2145-8G4 2145-8F4
2145-CG8 2145-8F2

Figure 77. Power LED on the SAN Volume Controller models 2145-CG8, 2145-CF8,
2145-8G4, and 2145-8F4 or 2145-8F2 operator-information panel

NO Go to step 3.
YES The node is powered on correctly. Reassess the symptoms and return
to “MAP 5000: Start” on page 231 or go to “MAP 5700: Repair
verification” on page 278 to verify the correct operation.
3. (from step 2 on page 238)
Is the power LED on the operator-information panel flashing approximately
four times per second?
NO Go to step 4.
YES The node is turned off and is not ready to be turned on. Wait until the
power LED flashes at a rate of approximately once per second, then
go to step 5.
If this behavior persists for more than three minutes, perform the
following procedure:
a. Remove all input power from the SAN Volume Controller node by
removing the power retention brackets and the power cords from
the back of the node. See “Removing the cable-retention brackets”
to see how to remove the cable-rentention brackets when removing
the power cords from the node.
b. Wait one minute and then verify that all power LEDs on the node
are extinguished.
c. Reinsert the power cords and power retention brackets.
d. Wait for the flashing rate of the power LED to slow down to one
flash per second. Go to step 5.
e. If the power LED keeps flashing at a rate of four flashes per
second for a second time, replace the parts in the following
sequence:
v System board
Verify the repair by continuing with “MAP 5700: Repair verification”
on page 278.
4. (from step 3)
Is the Power LED on the operator-information panel flashing approximately
once per second?
YES The node is in standby mode. Input power is present. Go to step 5.
NO Go to step 6 on page 240.
5. (from step 3 and step 4)
Press the power-on button on the operator-information panel of the node.

Chapter 10. Using the maintenance analysis procedures 239


Is the Power LED on the operator-information panel illuminated a solid
green?
NO Verify that the operator-information panel cable is correctly seated at
both ends.
If you are working on a SAN Volume Controller 2145-CG8 or a SAN
Volume Controller 2145-CF8, and the node still fails to power on,
replace parts in the following sequence:
a. Operator-information panel assembly
b. System board
If you are working on a SAN Volume Controller 2145-8G4, verify that
the operator-information panel cable is correctly seated on the system
board. If the node still fails to power on, replace parts in the following
sequence:
a. Operator-information panel assembly
b. System board
If the SAN Volume Controller 2145-8F4 or SAN Volume Controller
2145-8F2 node still fails to power on, replace parts in the following
sequence:
a. Operator-information panel
b. Cable, signal, front panel
c. Frame assembly
Verify the repair by continuing with “MAP 5700: Repair verification”
on page 278.
YES The power-on indicator on the operator-information panel shows that
the node has successfully powered on. Continue with “MAP 5700:
Repair verification” on page 278 to verify the correct operation.
6. (from step 4 on page 239)
Is the rear panel power LED on or flashing? Figure 78 on page 241 shows the
location of the power LED 1 on the rear panel of the 2145-8G4, 2145-8F4, or
2145-8F2 nodes. Figure 79 on page 241 shows the location of the power LED
1 on the 2145-CF8 or the 2145-CG8.

240 SAN Volume Controller: Troubleshooting Guide


Figure 78. Power LED on the SAN Volume Controller models 2145-8G4, 2145-8F4, and
2145-8F2 rear panel

5
svc00574

4 5 4 1

Figure 79. Power LED indicator on the rear panel of the SAN Volume Controller 2145-CG8 or
2145-CF8

NO Go to step 7 on page 242.


YES The operator-information panel is failing.
Verify that the operator-information panel cable is seated on the
system board.
If you are working on a SAN Volume Controller 2145-CG8 or a SAN
Volume Controller 2145-CF8, and the node still fails to power on,
replace parts in the following sequence:
a. Operator-information panel assembly
b. System board
If you are working on a SAN Volume Controller 2145-8G4, verify that
the operator-information panel cable is correctly seated on the system
board. If the SAN Volume Controller 2145-8G4 still fails to power on,
replace parts in the following sequence:
a. Operator-information panel assembly

Chapter 10. Using the maintenance analysis procedures 241


b. System board
If you are working on a SAN Volume Controller 2145-8F4 or SAN
Volume Controller 2145-8F2, verify that the operator-information panel
cable is correctly seated at both ends. If the cable is correctly seated
and the operator-information panel power light is still not on or
blinking, replace the parts in the following sequence:
a. Operator-information panel
b. Cable, signal, front panel
c. Frame assembly
7. (from step 6 on page 240)
Locate the 2145 UPS-1U (2145 UPS-1U) that is connected to this node.
Does the 2145 UPS-1U that is powering this node have its power on and is
its load segment 2 indicator a solid green?
NO Go to “MAP 5150: 2145 UPS-1U” on page 248.
YES Go to step 8.
8. (from step 7)
Are the ac LED indicators on the rear of the power supply assemblies
illuminated? Figure 80 shows the location of the ac LED 1 and the dc LED
2 on the rear of the power supply assembly that is on the rear panel of the
2145-8G4, 2145-8F4, or 2145-8F2 nodes. Figure 81 shows the location of the ac
LED 1 and the dc LED 2 on the rear of the power supply assembly that is
on the rear panel of the 2145-CF8 or the 2145-CG8.

1 1

svc00307

2 2

Figure 80. SAN Volume Controller models 2145-8G4 and 2145-8F4 or 2145-8F2 ac and dc
LED indicators on the rear panel

1
2
3
svc00571

Figure 81. Power LED indicator and ac and dc indicators on the rear panel of the SAN
Volume Controller 2145-CG8 or 2145-CF8

NO Verify that the input power cable or cables are securely connected at
both ends and show no sign of damage; otherwise, if the cable or
cables are faulty or damaged, replace them. If the node still fails to
power on, replace the specified parts based on the SAN Volume
Controller model type.

242 SAN Volume Controller: Troubleshooting Guide


Replace the SAN Volume Controller 2145-CG8 parts or the SAN
Volume Controller 2145-CF8 parts in the following sequence:
a. Power supply 675W
Replace the SAN Volume Controller 2145-8G4 parts in the following
sequence:
a. Power supply 670W
b. Power backplane
Replace the SAN Volume Controller 2145-8F4 or SAN Volume
Controller 2145-8F2 parts in the following sequence:
a. Power supply, 585W
b. Power backplane
Verify the repair by continuing with “MAP 5700: Repair verification”
on page 278.
YES Go to step 9 for 2145-CG8 and 2145-CF8 models.
Go to step 10 for all other models.
9. (from step 8 on page 242)
Is the power supply error LED on the rear of the SAN Volume Controller
2145-CG8 or SAN Volume Controller 2145-CF8 power supply assemblies
illuminated? Figure 79 on page 241 shows the location of the power LED 1
on the 2145-CF8 or the 2145-CG8.
YES Replace the power supply unit.
NO Go to step 10
10. (from step 8 on page 242 or step 9)
Are the dc LED indicators on the rear of the power supply assemblies
illuminated?
NO Replace the SAN Volume Controller 2145-CG8 parts or the SAN
Volume Controller 2145-CF8 parts in the following sequence:
a. Power supply 675W
b. System board
Replace the SAN Volume Controller 2145-8G4 parts in the following
sequence:
a. Power backplane
b. Power supply 670W
c. System board
Replace the SAN Volume Controller 2145-8F4 or SAN Volume
Controller 2145-8F2 parts in the following sequence:
a. Power backplane
b. Power supply, 585W
c. Frame assembly
Verify the repair by continuing with “MAP 5700: Repair verification”
on page 278.
YES Verify that the operator-information panel cable is correctly seated at
both ends. If the node still fails to power on, replace parts in the
following sequence:
a. Operator-information panel

Chapter 10. Using the maintenance analysis procedures 243


b. Cable, signal, front panel
c. System board (if the node is a SAN Volume Controller 2145-CG8,
SAN Volume Controller 2145-CF8, or a SAN Volume Controller
2145-8G4)
d. Frame assembly (if the node is a SAN Volume Controller 2145-8F4
or SAN Volume Controller 2145-8F2)
Verify the repair by continuing with “MAP 5700: Repair verification”
on page 278.
11. (from step 1 on page 238)
The node will not power off immediately when the power button is pressed.
When the node is fully booted, power-off is performed under the control of
the SAN Volume Controller software. The power-off operation can take up to
five minutes to complete.
Is Powering Off displayed on the front panel?
NO Go to step 12.
YES Wait for the node to power off. If the node fails to power off after 5
minutes, go to step 12.
12. (from step 11)
Attention: Turning off the node by any means other than using the
management GUI might cause a loss of data in the node cache. If you are
performing concurrent maintenance, this node must be deleted from the
system before you proceed. Ask the customer to delete the node from the
system now. If they are unable to delete the node, call your support center for
assistance before you proceed.
The node cannot be turned off either because of a software fault or a
hardware failure. Press and hold the power button. The node should turn off
within five seconds.
Did the node turn off?
NO Turn off the 2145 UPS-1U that is connected to this node.

Attention: Be sure that you are turning off the correct 2145 UPS-1U.
If necessary, trace the cables back to the 2145 UPS-1U assembly.
Turning off the wrong 2145 UPS-1U might cause customer data loss.
Go to step 13.
YES Go to step 13.
13. (from step 12)
If necessary, turn on the 2145 UPS-1U that is connected to this node and then
press the power button to turn the node on.
Did the node turn on and boot correctly?
NO Go to “MAP 5000: Start” on page 231 to resolve the problem.
YES Go to step 14.
14. (from step 13)
The node has probably suffered a software failure. Dump data might have
been captured that will help resolve the problem. Call your support center for
assistance.

244 SAN Volume Controller: Troubleshooting Guide


MAP 5060: Power 2145-8A4
MAP 5060: Power 2145-8A4 helps you to solve power problems that have occurred
on the SAN Volume Controller 2145-8A4 node. If you are using any other SAN
Volume Controller model, see the Power MAP for that SAN Volume Controller
model.

Before you begin

If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures,” on page 231.

You might have been sent here for one of the following reasons:
v A problem occurred during the installation of a 2145-8A4 node.
v The power switch failed to turn the node on.
v The power switch failed to turn the node off.
v Another MAP sent you here.

About this task

Perform the following steps:

Procedure
1. Are you here because the node is not turned on?
NO Go to step 9 on page 248.
YES Go to step 2.
2. (from step 1)
Is the power LED on the operator-information panel continuously
illuminated? Figure 82 shows the location of the power LED 1 on the
operator-information panel.

Figure 82. Power LED on the SAN Volume Controller 2145-8A4 operator-information panel

NO Go to step 3.
YES The node turned on correctly. Reassess the symptoms and return to
“MAP 5000: Start” on page 231 or go to “MAP 5700: Repair
verification” on page 278 to verify the correct operation.
3. (from step 2)
Is the power LED on the operator-information panel flashing?
NO Go to step 5 on page 246.
YES The node is in standby mode. Input power is present. Go to step 4.
4. (from step 3)

Chapter 10. Using the maintenance analysis procedures 245


Press the power-on button on the operator-information panel of the node.
Is the Power LED on the operator-information panel illuminated a solid
green?
NO Verify that the operator-information panel cable is correctly seated at
both ends. If the node still fails to turn on, replace parts in the
following sequence:
a. Operator-information panel
b. Operator-information panel cable
c. System board
Verify the repair by continuing with “MAP 5700: Repair verification”
on page 278.
YES The power-on indicator on the operator-information panel shows that
the node has successfully turned on. Continue with “MAP 5700:
Repair verification” on page 278 to verify the correct operation.
5. (from step 3 on page 245)
Locate the 2145 UPS-1U that is connected to this node.
Does the 2145 UPS-1U that is powering this node have its power on and is
its load segment 2 indicator a solid green?
NO Go to “MAP 5150: 2145 UPS-1U” on page 248.
YES Verify that the input-power cable is securely connected at both ends
and shows no sign of damage; otherwise, if the cable is faulty or
damaged, replace it. If the node still fails to turn on, go to step 6. If
the node turns on, continue with “MAP 5700: Repair verification” on
page 278.
6. (from step 5)
Remove the node from the rack and remove the top cover. Reconnect the
power cable, which is still connected to the 2145 UPS-1U, to the node. Is the
standby power LED that is on the system board illuminated? Figure 83 on
page 247 shows where the diagnostics LEDs are located on the system board.
NO Go to step 7 on page 247.
YES Replace the SAN Volume Controller 2145-8A4 parts in the following
sequence:
a. Operator-information panel
b. Operator-information panel cable

246 SAN Volume Controller: Troubleshooting Guide


Figure 83. SAN Volume Controller 2145-8A4 system board LEDs

1 Fan 1 error LED


2 Fan 2 error LED
3 Fan 3 error LED
4 DIMM 1 error LED
5 DIMM 2 error LED
6 DIMM 3 error LED
7 DIMM 4 error LED
8 PCI Express slot 2 error LED
9 PCI Express slot 1 error LED
10 Fan 4 error LED
11 Fan 5 error LED
12 Voltage regulator error LED
13 Standby power LED
14 Power good LED
15 Baseboard management controller heartbeat LED
16 SAS/SATA controller error LED
7. (from step 6 on page 246)
Is the voltage regulator LED that is on the system board illuminated?
NO Go to step 8.
YES Replace the system board.
8. (from step 7)
Replace the SAN Volume Controller 2145-8A4 parts in the following sequence:
a. Input-power cable (or the 2145 UPS-1U to SAN Volume Controller node
power cable)
Chapter 10. Using the maintenance analysis procedures 247
b. Power supply
Are you now able to turn on the node?
NO Contact your IBM service representative for assistance.
YES The power-on indicator on the front panel shows that the node has
successfully turned on. Continue with “MAP 5700: Repair verification”
on page 278 to verify the correct operation.
9. (from step 1 on page 245)
The node does not turn off when the power button is pressed. When the node
is fully booted, power-off is performed under the control of the SAN Volume
Controller software. The power-off operation can take up to five minutes to
complete.
Is Powering Off displayed on the front panel?
NO Go to step 10.
YES Wait for the node to turn off. If the node fails to turn off after 5
minutes, go to step 10.
10. (from step 9)
Attention: Turning off the node by any means other than using the
management GUI might cause a loss of data in the node cache. If you are
performing concurrent maintenance, this node must be deleted from the
system before you proceed. Ask the customer to delete the node from the
system now. If they are unable to delete the node, contact your IBM service
representative for assistance before you proceed.
The node cannot be turned off either because of a software fault or a
hardware failure. Press and hold the power button. The node should turn off
within five seconds.
Did the node turn off?
NO Turn off the 2145 UPS-1U that is connected to this node.

Attention: Be sure that you are turning off the correct 2145 UPS-1U.
If necessary, trace the cables back to the 2145 UPS-1U assembly.
Turning off the wrong 2145 UPS-1U might cause customer data loss.
Go to step 11.
YES Go to step 11.
11. (from step 8 on page 247)
If necessary, turn on the 2145 UPS-1U that is connected to this node and then
press the power button to turn on the node.
Did the node turn on and boot correctly?
NO Go to “MAP 5000: Start” on page 231 to resolve the problem.
YES Go to step 12.
12. (from step 11)
The node has probably suffered a software failure. Dump data might have
been captured that will help resolve the problem. Contact your IBM service
representative for assistance.

MAP 5150: 2145 UPS-1U


MAP 5150: 2145 UPS-1U helps you solve problems that have occurred in the 2145
UPS-1U systems that are used on a SAN Volume Controller.

248 SAN Volume Controller: Troubleshooting Guide


Before you begin

If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures,” on page 231.

You may have been sent here for one of the following reasons:
v The system problem determination procedures sent you here
v A problem occurred during the installation of a SAN Volume Controller
v Another MAP sent you here
v A customer observed a problem that was not detected by the system problem
determination procedures

About this task

Figure 84 shows an illustration of the front of the panel for the 2145 UPS-1U.

7
LOAD 2 LOAD 1 + -

1yyzvm
1 2 3 4 5 6

Figure 84. 2145 UPS-1U front-panel assembly

1 Load segment 2 indicator


2 Load segment 1 indicator
3 Alarm
4 On-battery indicator
5 Overload indicator
6 Power-on indicator
7 On or off button
8 Test and alarm reset button

Table 56 identifies which status and error LEDs that display on the 2145 UPS-1U
front-panel assembly relate to the specified error conditions. It also lists the
uninterruptible power supply alert-buzzer behavior.
Table 56. 2145 UPS-1U error indicators
[5] [6]
[1] Load2 [2] Load1 [3] Alarm [4] Battery Overload Power-on Buzzer Error condition
Green (see Green (see Note 3 No errors; the 2145
Note 1) ) UPS-1U was
configured by the SAN
Volume Controller

Chapter 10. Using the maintenance analysis procedures 249


Table 56. 2145 UPS-1U error indicators (continued)
[5] [6]
[1] Load2 [2] Load1 [3] Alarm [4] Battery Overload Power-on Buzzer Error condition
Green Amber (see Green No errors; the 2145
Note 2) UPS-1U is not yet
configured by the SAN
Volume Controller
Green Either on Amber Green Beeps for The ac power is over
or off two or under limit. The
seconds uninterruptible power
and then supply has switched to
stops battery mode.
Flashing Flashing Flashing Flashing Three Battery undervoltage
red amber red green beeps every
ten seconds
Green Either on Flashing Flashing Solid on Battery overvoltage
or off red green
Flashing Flashing Flashing Solid on Output wave is
red amber green abnormal when the
charger is open, on
battery mode
Flashing Flashing Solid on The ac-power output
red amber wave is under low
limit or above high
limit on battery mode
Green Either on Amber Beeps for On battery (no ac
or off four power)
seconds
and then
stops
Green Either on Flashing Beeps for Low battery (no ac
or off amber two power)
seconds
and then
stops
Green Either on Red Green Beeps for Overload while on line
or off one second
and then
stops
Amber Red Beeps for Overload while on
one second battery
and then
stops
Either on Either on Flashing Green Solid on Fan failure
or off or off red
Either on Either on Flashing Amber Solid on Battery test fail
or off or off red
Flashing Red Solid on Overload timeout
red
Flashing Amber Green Solid on Over temperature
red
Flashing Amber Red Green Output short circuit
red

250 SAN Volume Controller: Troubleshooting Guide


Table 56. 2145 UPS-1U error indicators (continued)
[5] [6]
[1] Load2 [2] Load1 [3] Alarm [4] Battery Overload Power-on Buzzer Error condition
Notes:
1. The green Load2 LED ([1]) indicates that power is being supplied to the right pair of ac-power outlets as seen
from the rear of the 2145 UPS-1U.
2. The amber Load1 LED ([2]) indicates that power is being supplied to the left pair of ac-power outlets as seen
from the rear of the 2145 UPS-1U. These outlets are not used by the SAN Volume Controller.
This LED might be illuminated during power-on sequences, but it is typically extinguished by the SAN Volume
Controller node that is attached to the 2145 UPS-1U.
3. A blank cell indicates that the light or buzzer is off.

Procedure
1. Is the power-on indicator for the 2145 UPS-1U that is connected to the failing
SAN Volume Controller off?
NO Go to step 3.
YES Go to step 2.
2. (from step 1)
Are other 2145 UPS-1U units showing the power-on indicator as off?
NO The 2145 UPS-1U might be in standby mode. This can be because the
on or off button on this 2145 UPS-1U was pressed, input power has
been missing for more than five minutes, or because the SAN Volume
Controller shut it down following a reported loss of input power. Press
and hold the on or off button until the 2145 UPS-1U power-on indicator
is illuminated (approximately five seconds). On some versions of the
2145 UPS-1U, you need a pointed device, such as a screwdriver, to
press the on or off button.
Go to step 3.
YES Either main power is missing from the installation or a redundant
ac-power switch has failed. If the 2145 UPS-1U units are connected to a
redundant ac-power switch, go to “MAP 5320: Redundant ac power”
on page 255. Otherwise, complete these steps:
a. Restore main power to installation.
b. Verify the repair by continuing with “MAP 5250: 2145 UPS-1U
repair verification” on page 254.
3. (from step 1 and step 2)
Are the power-on and load segment 2 indicators for the 2145 UPS-1U
illuminated solid green, with service, on-battery, and overload indicators off?
NO Go to step 4.
YES The 2145 UPS-1U is no longer showing a fault. Verify the repair by
continuing with “MAP 5250: 2145 UPS-1U repair verification” on page
254.
4. (from step 3)
Is the 2145 UPS-1U on-battery indicator illuminated yellow (solid or
flashing), with service and overload indicators off?
NO Go to step 5 on page 252.
YES The input power supply to this 2145 UPS-1U is not working or is not

Chapter 10. Using the maintenance analysis procedures 251


correctly connected, or the 2145 UPS-1U is receiving input power that
might be unstable or outside the specified voltage or frequency range.
(The voltage should be between 200V and 240V and the frequency
should be either 50 Hz or 60 Hz.) The SAN Volume Controller
automatically adjusts the 2145 UPS-1U voltage range. If the input
voltage has recently changed, the alarm condition might be present
until the SAN Volume Controller has adjusted the alarm setting. Power
on the SAN Volume Controller that is connected to the 2145 UPS-1U. If
the SAN Volume Controller starts the on-battery indicator should go off
within five minutes. If the SAN Volume Controller powers off again or
if the condition persists for at least five minutes, do the following:
a. Check the input circuit protector on the 2145 UPS-1U rear panel,
and press it, if it is open.
b. If redundant ac power is used for the 2145 UPS-1U, check the
voltage and frequency at the redundant ac-power switch output
receptable connected to this 2145 UPS-1U. If there is no power, go
to “MAP 5340: Redundant ac power verification” on page 256. If the
power is not within specification, ask the customer to resolve the
issue. If redundant ac power is not used for this uninterruptible
power supply, check the site power outlet for the 2145 UPS-1U
providing power to this SAN Volume Controller. Check the
connection, voltage, and frequency. If the power is not within
specification, ask the customer to resolve the issue.
c. If the input power is within specification and the input circuit
protector is stable, replace the field-replaceable units (FRUs) in the
following sequence:
1) 2145 UPS-1U power cord
2) 2145 UPS-1U
d. Verify the repair by continuing with “MAP 5250: 2145 UPS-1U
repair verification” on page 254.
5. (from step 4 on page 251)
Is the 2145 UPS-1U overload indicator illuminated solid red?
NO Go to step 6 on page 253.
YES The 2145 UPS-1U output power requirement has exceeded the 2145
UPS-1U capacity.
a. Check that only one SAN Volume Controller node is connected to
the 2145 UPS-1U.
b. Check that no other loads are connected to the 2145 UPS-1U.
c. After ensuring that the output loading is correct, turn off the 2145
UPS-1U by pressing the on or off button until the power-on
indicator goes off. Then unplug the input power from the 2145
UPS-1U. Wait at least five seconds until all LEDs are off and restart
the 2145 UPS-1U by reconnecting it to input power and pressing the
on or off button until the 2145 UPS-1U power-on indicator is
illuminated (approximately five seconds). On some versions of the
2145 UPS-1U, you need a pointed device, such as a screwdriver, to
press the on or off button.
d. If the condition persists, replace the 2145 UPS-1U.

Note: If the condition recurs, replace the power supply or power


supplies in the node.

252 SAN Volume Controller: Troubleshooting Guide


e. Verify the repair by continuing with “MAP 5250: 2145 UPS-1U
repair verification” on page 254.
6. (from step 5 on page 252)
Is the 2145 UPS-1U service indicator illuminated flashing red and the
on-battery indicator illuminated solid yellow, with the power-on and
overload indicators off?
NO Go to step 7.
YES The 2145 UPS-1U battery might be fully discharged or faulty.
a. Check that the 2145 UPS-1U has been connected to a power outlet
for at least two hours to charge the battery. After charging the
battery, press and hold the test or alarm reset button for three
seconds; and then check the service indicator.
b. If the service indicator is still flashing, replace the 2145 UPS-1U.
c. Verify the repair by continuing with “MAP 5250: 2145 UPS-1U
repair verification” on page 254.
7. (from step 6)
Is the 2145 UPS-1U service indicator illuminated flashing red, the on-battery
indicator illuminated solid yellow, and the power-on illuminated solid green,
with the overload indicator off?
NO Go to step 8.
YES The 2145 UPS-1U internal temperature is too high.
a. Turn off the 2145 UPS-1U by pressing the on or off button until the
power-on indicator goes off. Then unplug the 2145 UPS-1U. Clear
vents at the front and rear of the 2145 UPS-1U. Remove any heat
sources. Ensure the airflow around the 2145 UPS-1U is not
restricted.
b. Wait at least five minutes and restart the 2145 UPS-1U by
reconnecting to input power and pressing the on or off button until
the 2145 UPS-1U power-on indicator is illuminated (approximately
five seconds).
c. If the condition persists, replace the 2145 UPS-1U.
d. Verify the repair by continuing with “MAP 5250: 2145 UPS-1U
repair verification” on page 254.
8. (from step 7)
Is the 2145 UPS-1U, service, on-battery, overload, and power-on indicators
illuminated and flashing?
NO The 2145 UPS-1U has an internal fault.
a. Replace the 2145 UPS-1U.
b. Verify the repair by continuing with “MAP 5250: 2145 UPS-1U
repair verification” on page 254.
YES The 2145 UPS-1U battery might be fully discharged or faulty.
a. Check that the 2145 UPS-1U has been connected to a power outlet
for at least two hours to charge the battery. After charging the
battery, press and hold the test or alarm reset button for three
seconds and then check the service indicator.
b. If the service indicator is still flashing, replace the 2145 UPS-1U.
c. Verify the repair by continuing with “MAP 5250: 2145 UPS-1U
repair verification” on page 254.

Chapter 10. Using the maintenance analysis procedures 253


MAP 5250: 2145 UPS-1U repair verification
MAP 5250: 2145 UPS-1U repair verification helps you to verify that field
replaceable units (FRUs) that you have exchanged for new FRUs, or repair actions
that were done, have solved all the problems on the SAN Volume Controller 2145
UPS-1U.

Before you begin

If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures,” on page 231.

You may have been sent here because you have performed a repair and want to
confirm that no other problems exist on the machine.

About this task

Perform the following steps:

Procedure
1. Are the power-on and load segment 2 indicators for the repaired 2145
UPS-1U illuminated solid green, with service, on-battery, and overload
indicators off?
NO Continue with “MAP 5000: Start” on page 231.
YES Go to step 2.
2. (from step 1)
Is the SAN Volume Controller node powered by this 2145 UPS-1U powered
on?
NO Press power-on on the SAN Volume Controller node that is connected
to this 2145 UPS-1U and is powered off. Go to step 3.
YES Go to step 3.
3. (from step 2)
Is the node that is connected to this 2145 UPS-1U still not powered on or
showing error codes in the front panel display?
NO Go to step 4.
YES Continue with “MAP 5000: Start” on page 231.
4. (from step 3)
Does the SAN Volume Controller node that is connected to this 2145 UPS-1U
show “Charging” on the front panel display?
NO Go to step 5.
YES Wait for the “Charging” display to finish (this might take up to two
hours). Go to step 5.
5. (from step 4)
Press and hold the test/alarm reset button on the repaired 2145 UPS-1U for
three seconds to initiate a self-test. During the test, individual indicators
illuminate as various parts of the 2145 UPS-1U are checked.
Does the 2145 UPS-1U service, on-battery, or overload indicator stay on?
NO 2145 UPS-1U repair verification has completed successfully. Continue
with “MAP 5700: Repair verification” on page 278.

254 SAN Volume Controller: Troubleshooting Guide


YES Continue with “MAP 5000: Start” on page 231.

MAP 5320: Redundant ac power


MAP 5320: Redundant ac power helps you solve problems that have occurred in
the redundant ac-power switches used on a SAN Volume Controller. Use this MAP
when a 2145 UPS-1U that is connected to a redundant ac-power switch does not
appear to have input power.

Before you begin

If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures,” on page 231.

You might have been sent here for one of the following reasons:
v A problem occurred during the installation of a SAN Volume Controller.
v “MAP 5150: 2145 UPS-1U” on page 248 sent you here.

About this task

Perform the following steps to solve problems that have occurred in the redundant
ac-power switches:

Procedure
1. One or two 2145 UPS-1Us might be connected to the redundant ac-power
switch. Is the power-on indicator on any of the connected 2145 UPS-1Us on?
NO Go to step 3.
YES The redundant ac-power switch is powered. Go to step 2.
2. (from step 1)
Measure the voltage at the redundant ac-power switch output socket connected
to the 2145 UPS-1U that is not showing power-on.
CAUTION:
Ensure that you do not remove the power cable of any powered
uninterruptible power supply units
Is there power at the output socket?
NO One redundant ac-power switch output is working while the other is
not. Replace the redundant ac-power switch.

CAUTION:
You might need to power-off an operational node to replace the
redundant ac-power switch assembly. If this is the case, consult with
the customer to determine a suitable time to perform the
replacement. See “MAP 5350: Powering off a SAN Volume Controller
node” on page 258. After you replace the redundant ac-power switch,
continue with “MAP 5340: Redundant ac power verification” on page
256.
YES The redundant ac-power switch is working. There is a problem with
the 2145 UPS-1U power cord or the 2145 UPS-1U . Return to the
procedure that called this MAP and continue from where you were
within that procedure. It will help you analyze the problem with the
2145 UPS-1U power cord or the 2145 UPS-1U.
3. (from step 1)

Chapter 10. Using the maintenance analysis procedures 255


None of the used redundant ac-power switch outputs appears to have power.
Are the two input power cables for the redundant ac-power switches
correctly connected to the redundant ac-power switch and to different mains
circuits?
NO Correctly connect the cables. Go to “MAP 5340: Redundant ac power
verification.”
YES Verify that there is main power at both the site's power distribution
units that are providing power to this redundant ac-power switch. Go
to step 4.
4. (from step 3 on page 255)
Is power available at one or more of the site's power distribution units that are
providing power to this redundant ac-power switch?
NO Have the customer fix the mains circuits. Return to the procedure that
called this MAP and continue from where you were within that
procedure.
YES The redundant ac-power switch should operate in this situation.
Replace the redundant ac-power switch assembly. After you replace the
redundant ac-power switch, continue with “MAP 5340: Redundant ac
power verification.”

MAP 5340: Redundant ac power verification


MAP 5340: Redundant ac power verification helps you verify that a redundant
ac-power switch is functioning correctly.

Before you begin

If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures,” on page 231.

You might have been sent here because you have replaced a redundant ac-power
switch or corrected the cabling of a redundant ac-power switch. You can also use
this MAP if you think a redundant ac-power switch might not be working
correctly, because it is connected to nodes that have lost power when only one ac
power circuit lost power.

In this MAP, you will be asked to confirm that power is available at the redundant
ac-power switch output sockets 1 and 2. If the redundant ac-power switch is
connected to nodes that are not powered on, use a voltage meter to confirm that
power is available.

If the redundant ac-power switch is powering nodes that are powered on (so the
nodes are operational), take some precautions before continuing with these tests.
Although you do not have to power off the nodes to conduct the test, the nodes
will power off if the redundant ac-power switch is not functioning correctly.

About this task

For each of the powered-on nodes connected to this redundant ac-power switch,
perform the following steps:
1. Use the management GUI or the command-line interface (CLI) to confirm that
the other node in the same I/O group as this node is online.

256 SAN Volume Controller: Troubleshooting Guide


2. Use the management GUI or the CLI to confirm that all virtual disks connected
to this I/O group are online.
3. Check the redundant ac-power switch output cables to confirm that the
redundant ac-power switch is not connected to two nodes in the same I/O
group.

If any of these tests fail, correct any failures before continuing with this MAP. If
you are performing the verification using powered-on nodes, understand that
power is no longer available if the following is true:
v The on-battery indicator on the 2145 UPS-1U that connects the redundant
ac-power switch to the node lights for more than five seconds.
v The SAN Volume Controller node display shows Power Failure.

When the instructions say “remove power,” you can switch the power off if the
sitepower distribution unit has outputs that are individually switched; otherwise,
remove the specified redundant ac-power switch power cable from the site power
distribution unit's outlet.

Perform the following steps:

Procedure
1. Are the two site power distribution units providing power to this redundant
ac-power switch connected to different power circuits?
NO Correct the problem and then return to this MAP.
YES Go to step 2.
2. (from step 1)
Are both of the site power distribution units providing power to this redundant
ac-power switch powered?
NO Correct the problem and then return to the start of this MAP.
YES Go to step 3.
3. (from step 2)
Are the two cables that are connecting the site power distribution units to the
redundant ac-power switch connected?
NO Correct the problem and then return to the start of this MAP.
YES Go to step 4.
4. (from step 3)
Is there power at the redundant ac-power switch output socket 2?
NO Go to step 8 on page 258.
YES Go to step 5.
5. (from step 4)
Is there power at the redundant ac-power switch output socket 1?
NO Go to step 8 on page 258.
YES Go to step 6.
6. (from step 5)
Remove power from the Main power cable to the redundant ac-power switch.
Is there power at the redundant ac-power switch output socket 1?
NO Go to step 8 on page 258.

Chapter 10. Using the maintenance analysis procedures 257


YES Go to step 7.
7. (from step 6 on page 257)
Reconnect the Main power cable. Remove power from the Backup power cable
to the redundant ac-power switch. Is there power at the redundant ac-power
switch output socket 1?
NO Go to step 8.
YES Reconnect the Backup power cable. The redundant ac power
verification has been successfully completed. Continue with “MAP
5700: Repair verification” on page 278.
8. (from steps 4 on page 257, 5 on page 257, 6 on page 257, and 7)
The redundant ac-power switch has not functioned as expected. Replace the
redundant ac-power switch assembly. Return to the start of this MAP.

Results

MAP 5350: Powering off a SAN Volume Controller node


MAP 5350: Powering off a SAN Volume Controller node helps you power off a
single node to perform a service action without disrupting the host's access to
disks.

Before you begin

Powering off a single node will not normally disrupt the operation of a clustered
system. This is because, within a SAN Volume Controller system, nodes operate in
pairs called an I/O group. An I/O group will continue to handle I/O to the disks
it manages with only a single node powered on. There will, however, be degraded
performance and reduced resilience to error.

Care must be taken when powering off a node to ensure the system is not
impacted more than it need be. If the procedures outlined here are not followed, it
is possible your application hosts will lose access to their data or, in the worst case,
data will be lost.

You can use the following preferred methods to power off a node that is a member
of a system and not offline:
1. Use the Shut Down a Node option on the management GUI
2. Use the CLI command stopcluster –nodename.

It is preferable to use either the management GUI or the command-line interface


(CLI) to power off a node, as these methods provide a controlled handover to the
partner node and provide better resilience to other faults in the system.

If a node is offline or not a member of a system, it must be powered off using the
power button.

About this task

To provide the least disruption when powering off a node, the following should all
apply:
v The other node in the I/O group should be powered on and active in the
system.

258 SAN Volume Controller: Troubleshooting Guide


v The other node in the I/O group should have SAN Fibre Channel connections to
all the hosts and disk controllers managed by the I/O group.
v All the volumes handled by this I/O group should be online.
v The host multipathing is online to the other node in the I/O group.

In some circumstances, the reason you are powering off the node might make
meeting these conditions impossible; for instance, if you are replacing a broken
Fibre Channel card, the volumes will not be showing an online status. You should
use your judgment to decide when it is safe to proceed when a condition has not
been met. Always check with the system administrator before proceeding with a
power off that you know will disrupt I/O access, as they might prefer to either
wait until a more suitable time or suspend the host applications

To ensure a smooth restart, a node must save the data structures it cannot recreate
to its local, internal, disk drive. The amount of data it saves to local disk can be
high, so this operation might take several minutes. Do not attempt to interrupt the
controlled power off.

Attention: The following actions do not allow the node to save data to its local
disk. Therefore, you should not power off a node using these methods:
v Removing the power cable between the node and the uninterruptible power
supply. Normally the uninterruptible power supply provides sufficient power to
allow the write to local disk in the event of a power failure, but obviously it is
unable to provide power in this case.
v Holding down the power button on the node. When the power button is pressed
and released, the node indicates this to the software and the node can write its
data to local disk before it powers off. If the power button is held down, the
hardware interprets this as an emergency power off and shuts down
immediately without giving you the opportunity to save the data to a local disk.
The emergency power off occurs approximately four seconds after the power
button is pressed and held down.
v Pressing the reset button on the light path diagnostics panel.

Using the management GUI to power off a system


This topic describes how to power off a system using the management GUI.

Before you begin

Perform the following steps to use the management GUI to power off a system:

Procedure
1. Sign on to the IBM System Storage Productivity Center as an administrator and
then launch the management GUI for the system that you are servicing.
2. Find the system that you are about to shut down.
If the nodes that you want to power off are shown as Offline, then the nodes
are not participating in the system. In these circumstances, you must use the
power button on the nodes to power off the nodes.
If the nodes that you want to power off are shown as Online, powering off the
nodes can result in the dependent volumes to also go offline. Verify whether or
not the nodes have any dependent volumes.
3. Select the node and click Show Dependent Volumes.
4. Make sure that the status of each volume in the I/O group is Online. You
might need to view more than one page.

Chapter 10. Using the maintenance analysis procedures 259


If any volumes are shown as degraded, only one node in the I/O is processing
I/O requests for that volume. If that node is powered off, it impacts all the
hosts that are submitting I/O requests to the degraded volume.
If any volumes are degraded and you believe that this might be because the
partner node in the I/O group has been powered off recently, wait until a
refresh of the screen shows all the volumes online. All the volumes should be
online within 30 minutes of the partner node being powered off.

Note: If, after waiting 30 minutes, you have a degraded volume and all of the
associated nodes and MDisks are online, contact the IBM Support Center for
assistance.
Ensure that all volumes that are being used by hosts are online before you
continue.
5. If possible, check that all the hosts that access the volumes that are managed by
this I/O group are able to fail over to use paths that are provided by the other
node in the group.
Perform this check using the multipathing device driver software of the host
system. The commands to use differ, depending on the multipathing device
driver being used. If you are using the System Storage Multipath Subsystem
Device Driver (SDD), the command to query paths is datapath query device. It
can take some time for the multipathing device drivers to rediscover paths after
a node is powered on. If you are unable to check on the host that all paths to
both nodes in the I/O group are available, do not power off a node within 30
minutes of the partner node being powered on or you might lose access to the
volume.
6. If you have decided it is okay to continue and power off the nodes, select the
system that you want to power off, and then click Shut Down System.
7. Click OK. If you have selected a node that is the last remaining node that
provides access to a volume for example, a node that contains solid-state drives
(SSDs) with unmirrored volumes, the Shutting Down a Node-Force panel is
displayed with a list of volumes that will go offline if this node is shut down.
8. Check that no host applications are accessing the volumes that will go offline;
only continue with the shut down if the loss of access to these volumes is
acceptable. To continue with shutting down the node, click Force Shutdown.

What to do next

During the shut down, the node saves its data structures to its local disk and
destages all the write data held in cache to the SAN disks; this processing can take
several minutes.

At the end of this process, the system powers off.

Using the SAN Volume Controller CLI to power off a node


This topic describes how to power off a node using the CLI.

Procedure
1. Issue the lsnode CLI command to display a list of nodes in the system and
their properties. Find the node that you are about to shut down and write
down the name of the I/O group it belongs to. Confirm that the other node in
the I/O group is online.
lsnode -delim :

id:name:UPS_serial_number:WWNN:status:IO_group_id: IO_group_name:config_node:

260 SAN Volume Controller: Troubleshooting Guide


UPS_unique_id
1:group1node1:10L3ASH:500507680100002C:online:0:io_grp0:yes:202378101C0D18D8
2:group1node2:10L3ANF:5005076801000009:online:0:io_grp0:no:202378101C0D1796
3:group2node1:10L3ASH:5005076801000001:online:1:io_grp1:no:202378101C0D18D8
4:group2node2:10L3ANF:50050768010000F4:online:1:io_grp1:no:202378101C0D1796
If the node that you want to power off is shown as Offline, the node is not
participating in the system and is not processing I/O requests. In these
circumstances, you must use the power button on the node to power off the
node.
If the node that you want to power off is shown as Online but the other node
in the I/O group is not online, powering off the node impacts all the hosts that
are submitting I/O requests to the volumes that are managed by the I/O
group. Ensure that the other node in the I/O group is online before you
continue.
2. Issue the lsdependentvdisks CLI command to list the volumes that are
dependent on the status of a specified node.
lsdependentvdisks group1node1

vdisk_id vdisk_name
0 vdisk0
1 vdisk1
If the node goes offline or is removed from the system, the dependent volumes
also go offline. Before taking a node offline or removing it from the system, you
can use the command to ensure that you do not lose access to any volumes.
3. If you have decided that it is okay to continue and that you can power off the
node, issue the stopcluster –node <name> CLI command to power off the
node. Ensure that you use the –node parameter, because you do not want to
power off the whole system:
stopcluster –node group1node1
Are you sure that you want to continue with the shut down? yes

Note: If there are dependent volumes and you want to shut down the node
anyway, add the -force parameter to the stopcluster command. The force
parameter forces continuation of the command even though any
node-dependent volumes will be taken offline. Use the force parameter with
caution; access to data on node-dependent volumes will be lost.
During the shut down, the node saves its data structures to its local disk and
destages all the write data held in the cache to the SAN disks; this process can
take several minutes.
At the end of this process, the node powers off.

Using the SAN Volume Controller Power control button


Do not use the power control button to power off a node unless it is an emergency
or you have been directed to do so by another procedure.

Before you begin

With this method, you cannot check the system status from the front panel, so you
cannot tell if the power off is liable to cause excessive disruption to the system.
Instead, use the management GUI or the CLI commands, described in the previous
topics, to power off an active node.

Chapter 10. Using the maintenance analysis procedures 261


About this task

If you must use this method, notice in Figure 85 that each model type has a power
control button 1 on the front.

1 1 1 1 1

svc00716
2145-CF8 2145-8A4 2145-8G4 2145-8F4
2145-CG8 2145-8F2

Figure 85. Power control button on the SAN Volume Controller models

When you have determined it is safe to do so, press and immediately release the
power button. The front panel display changes to display Powering Off, and a
progress bar is displayed.

The 2145-CG8 or the 2145-CF8 requires that you remove a power button cover
before you can press the power button. The 2145-8A4, the 2145-8G4, the 2145-8F4,
or 2145-8F2 might require you to use a pointed device to press the power button.

If you press the power button for too long, the node cannot write all the data to its
local disk. An extended service procedure is required to restart the node, which
involves deleting the node from the system and adding it back into the system.

Results

The node saves its data structures to disk while powering off. The power off
process can take up to five minutes.

When a node is powered off by using the power button (or because of a power
failure), the partner node in its I/O group immediately stops using its cache for
new write data and destages any write data already in its cache to the SAN
attached disks. The time taken by this destage depends on the speed and
utilization of the disk controllers; it should complete in less than 15 minutes, but it
could be longer, and it cannot complete if there is data waiting to be written to a
disk that is offline.

If a node powers off and restarts while its partner node continues to process I/O,
it might not be able to become an active member of the I/O group immediately. It
has to wait until the partner node completes its destage of the cache. If the partner
node is powered off during this period, access to the SAN storage that is managed
by this I/O group is lost. If one of the nodes in the I/O group is unable to service
any I/O, for example, because the partner node in the I/O group is still flushing
262 SAN Volume Controller: Troubleshooting Guide
its write cache, the volumes that are managed by that I/O group will have a status
of Degraded.

MAP 5400: Front panel


MAP 5400: Front panel helps you to solve problems that have occurred on the
front panel.

Before you begin

If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures,” on page 231.

This MAP applies to all SAN Volume Controller models. Be sure that you know
which model you are using before you start this procedure. To determine which
model you are working with, look for the label that identifies the model type on
the front of the node.

You might have been sent here because:


v A problem occurred during the installation of a SAN Volume Controller system,
the front-panel display test failed, or the correct node number failed to be
displayed
v Another MAP sent you here

About this task

Perform the following steps:

Procedure
1. Is the power LED on the operator-information panel illuminated and
showing a solid green?
NO Continue with the power MAP. See “MAP 5050: Power 2145-CG8,
2145-CF8, 2145-8G4, 2145-8F4, and 2145-8F2” on page 238 or “MAP
5060: Power 2145-8A4” on page 245.
YES Go to step 2.
2. (from step 1)
Is the service controller error light 1 that you see in Figure 86 illuminated
and showing a solid amber?

1
svc00561

Figure 86. SAN Volume Controller service controller error light

NO Start the front panel tests by pressing and holding the select button for
five seconds. Go to step 3 on page 264.

Attention: Do not start this test until the node is powered on for at
least two minutes. You might receive unexpected results.

Chapter 10. Using the maintenance analysis procedures 263


YES The SAN Volume Controller service controller has failed.
v Replace the service controller.
v Verify the repair by continuing with “MAP 5700: Repair verification”
on page 278.
3. (from step 2 on page 263)
The front-panel check light illuminates and the display test of all display bits
turns on for 3 seconds and then turns off for 3 seconds, then a vertical line
travels from left to right, followed by a horizontal line travelling from top to
bottom. The test completes with the switch test display of a single rectangle in
the center of the display.
Did the front-panel lights and display operate as described?
NO SAN Volume Controller front panel has failed its display test.
v Replace the service controller.
v Verify the repair by continuing with “MAP 5700: Repair verification”
on page 278.
YES Go to step 4.
4. (from step 3)
Figure 87 provides four examples of what the front-panel display shows before
you press any button and then when you press the up button, the left and right
buttons, and the select button. To perform the front panel switch test, press any
button in any sequence or any combination. The display indicates which
buttons you pressed.

Figure 87. Front-panel display when push buttons are pressed

Check each switch in turn. Did the service panel switches and display operate
as described in Figure 87?
NO The SAN Volume Controller front panel has failed its switch test.
v Replace the service controller.
v Verify the repair by continuing with “MAP 5700: Repair verification”
on page 278.
YES Press and hold the select button for five seconds to exit the test. Go to
step 5.
5. Is the front-panel display now showing Cluster:?
NO Continue with “MAP 5000: Start” on page 231.
YES Keep pressing and releasing the down button until Node is displayed in
line 1 of the menu screen. Go to step 6 on page 265.

264 SAN Volume Controller: Troubleshooting Guide


6. (from step 5 on page 264)
Is this MAP being used as part of the installation of a new node?
NO Front-panel tests have completed with no fault found. Verify the repair
by continuing with “MAP 5700: Repair verification” on page 278.
YES Go to step 7.
7. (from step 6)
Is the node number that is displayed in line 2 of the menu screen the same
as the node number that is printed on the front panel of the node?
NO Node number stored in front-panel electronics is not the same as that
printed on the front panel.
v Replace the service controller.
v Verify the repair by continuing with “MAP 5700: Repair verification”
on page 278.
YES Front-panel tests have completed with no fault found. Verify the repair
by continuing with “MAP 5700: Repair verification” on page 278.

MAP 5500: Ethernet


MAP 5500: Ethernet helps you solve problems that have occurred on the SAN
Volume Controller Ethernet.

Before you begin

If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures,” on page 231.

This MAP applies to all SAN Volume Controller models. Be sure that you know
which model you are using before you start this procedure. To determine which
model you are working with, look for the label that identifies the model type on
the front of the node.

If you encounter problems with the 10 Gbps Ethernet feature on the SAN Volume
| Controller 2145-CG8, see “MAP 5550: 10G Ethernet and Fibre Channel over
| Ethernet personality enabled Adapter port” on page 268.

You might have been sent here for one of the following reasons:
v A problem occurred during the installation of a SAN Volume Controller system
and the Ethernet checks failed
v Another MAP sent you here
v The customer needs immediate access to the system by using an alternate
configuration node. See “Defining an alternate configuration node” on page 268

About this task

Perform the following steps:

Procedure
1. Is the front panel of any node in the system displaying Node Error with
error code 805?
YES Go to step 6 on page 266.
NO Go to step 2 on page 266.

Chapter 10. Using the maintenance analysis procedures 265


2. Is the system reporting error 1400 either on the front panel or in the event
log?
YES Go to step 4.
NO Go to step 3.
3. Are you experiencing Ethernet performance issues?
YES Go to step 9 on page 267.
NO Go to step 10 on page 267.
4. (from step 2) On all nodes perform the following actions:
a. Press the down button until the top line of the display shows Ethernet.
b. Press right until the top line displays Ethernet port 1.
c. If the second line of the display shows link offline, record this port as
one that requires fixing.
d. If the system is configured with two Ethernet cables per node, press the
right button until the top line of the display shows Ethernet port 2 and
repeat the previous step.
e. Go to step 5.
5. (from step 4) Are any Ethernet ports that have cables attached to them
reporting link offline?
YES Go to step 6.
NO Go to step 10 on page 267.
6. (from step 5) Do the SAN Volume Controller nodes have one or two cables
connected?
One Go to step 7.
Two Go to step 8 on page 267.
7. (from step 6) Perform the following actions:
a. Plug the Ethernet cable from that node into the Ethernet port 2 from a
different node.
b. If the Ethernet link light is illuminated when the cable is plugged into
Ethernet port 2 of the other node, replace the system board of the original
node.

1 2 3 4 5
svc00718

Figure 88. Port 2 Ethernet link LED on the SAN Volume Controller rear panel

1SAN Volume Controller 2145-CG8 port 2 (upper right) Ethernet link


LED
2SAN Volume Controller 2145-CF8 port 2 (upper right) Ethernet link
LED
3 SAN Volume Controller 2145-8F2 or SAN Volume Controller
2145-8F4 port 2 (lower right) Ethernet link LED

266 SAN Volume Controller: Troubleshooting Guide


4 SAN Volume Controller 2145-8G4 port 2 (center) Ethernet link LED
5 SAN Volume Controller 2145-8A4 port 2 (upper right) Ethernet link
LED
c. If the Ethernet link light does not illuminate, check the Ethernet switch or
hub port and cable to resolve the problem.
d. Verify the repair by continuing with “MAP 5700: Repair verification” on
page 278.
8. (from step 5 on page 266 or step 6 on page 266) Perform the following
actions:
a. Plug the Ethernet cable from that node into another device, for example,
the SSPC.
b. If the Ethernet link light is illuminated when the cable is plugged into the
other Ethernet device, replace the system board of the original node.
c. If the Ethernet link light does not illuminate, check the Ethernet
switch/hub port and cable to resolve the problem.
d. Verify the repair by continuing with “MAP 5700: Repair verification” on
page 278.
9. (from step 3 on page 266) Perform the following actions:
a. Check all Speed port 1 and Speed port 2 panels for the speed and duplex
settings. The format is: <Speed>/<Duplex>.
1) Press the down button until the top line of the display shows Ethernet.
2) Press the right button until the top line displays Speed 1.
3) If the second line of the display shows link offline, record this port
as one that requires fixing.
4) If the system is configured with two Ethernet cables per node, press
the right button until the top line of the display shows Speed 2 and
repeat the previous step.
b. Check that the SAN Volume Controller port has negotiated at the highest
speed available on the switch. All nodes have gigabit Ethernet network
ports.
c. If the Duplex setting is half, perform the following steps:
1) There is a known problem with gigabit Ethernet when one side of the
link is set to a fixed speed and duplex and the other side is set to
autonegotiate. The problem can cause the fixed side of the link to run
at full duplex and the negotiated side of the link to run at half duplex.
The duplex mismatch can cause significant Ethernet performance
degradation.
2) If the switch is set to full duplex, set the switch to autonegotiate to
prevent the problem described previously.
3) If the switch is set to half duplex, set it to autonegotiate to allow the
link to run at the higher bandwidth available on the full duplex link.
d. If none of the above are true, call your support center for assistance.
10. (from step 2 on page 266)
A previously reported fault with the Ethernet interface is no longer present. A
problem with the Ethernet might have been fixed, or there might be an
intermittent problem. Check with the customer to determine that the Ethernet
interface has not been intentionally disconnected. Also check that there is no
recent history of fixed Ethernet problems with other components of the
Ethernet network.
Is the Ethernet failure explained by the previous checks?

Chapter 10. Using the maintenance analysis procedures 267


NO There might be an intermittent Ethernet error. Perform these steps in
the following sequence until the problem is resolved:
a. Use the Ethernet hub problem determination procedure to check
for and resolve an Ethernet network connection problem. If you
resolve a problem, continue with “MAP 5700: Repair verification”
on page 278.
b. Determine if similar Ethernet connection problems have occurred
recently on this node. If they have, replace the system board.
c. Verify the repair by continuing with “MAP 5700: Repair
verification” on page 278.
YES Verify the repair by continuing with “MAP 5700: Repair verification”
on page 278.

Defining an alternate configuration node


A situation can arise where the customer needs immediate access to the system by
using an alternate configuration node.

About this task


If all Ethernet connections to the configuration node have failed, the system is
unable to report failure conditions, and the management GUI is unable to access
the system to perform administrative or service tasks. If this is the case and the
customer needs immediate access to the system, you can make the system use an
alternate configuration node.

If only one node is displaying Node Error 805 on the front panel, perform the
following steps:

Procedure
1. Press and release the power button on the node that is displaying Node Error
805.
2. When Powering off is displayed on the front panel display, press the power
button again.
3. Restarting is displayed.

Results

The system will select a new configuration node. The management GUI is able to
access the system again.

| MAP 5550: 10G Ethernet and Fibre Channel over Ethernet personality
| enabled Adapter port
| MAP 5550: 10G Ethernet helps you solve problems that have occurred on a SAN
| Volume Controller 2145-CG8 with 10G Ethernet capability, and Fibre Channel over
| Ethernet personality enabled.

Before you begin

If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures,” on page 231.

268 SAN Volume Controller: Troubleshooting Guide


This MAP applies to the SAN Volume Controller 2145-CG8 model with the 10G
Ethernet feature installed. Be sure that you know which model you are using
before you start this procedure. To determine which model you are working with,
look for the label that identifies the model type on the front of the node. Check
that the 10G Ethernet adapter is installed and that an optical cable is attached to
each port. Figure 18 on page 20 shows the rear panel of the 2145-CG8 with the 10G
Ethernet ports.

If you experience a problem with error code 805, go to “MAP 5500: Ethernet” on
page 265.

| If you experience a problem with error code 703 or 723, go to “Fibre Channel and
| 10G Ethernet link failures” on page 206.

You might have been sent here for one of the following reasons:
v A problem occurred during the installation of a SAN Volume Controller system
and the Ethernet checks failed
v Another MAP sent you here

About this task

Perform the following steps:

Procedure
| 1. Is node error 720 or 721 displayed on the front panel of the affected node or
| is service error code 1072 shown in the event log?
YES Go to step 11 on page 271.
NO Go to step 2.
2. (from step 1) Perform the following actions from the front panel of the
affected node:
a. Press and release the up or down button until Ethernet is shown.
b. Press and release the left or right button until Ethernet port 3 is shown.
Was Ethernet port 3 found?
No Go to step 11 on page 271
Yes Go to step 3
3. (from step 2) Perform the following actions from the front panel of the
affected node:
a. Press and release the up or down button until Ethernet is shown.
b. Press and release the up or down button until Ethernet port 3 is shown.
c. Record if the second line of the display shows Link offline, Link online,
or Not configured.
d. Press and release the up or down button until Ethernet port 4 is shown.
e. Record if the second line of the display shows Link offline, Link online,
or Not configured.
f. Go to step 4.
4. (from step 3) What was the state of the 10G Ethernet ports that were seen in
step 3?
Both ports show Link online
The 10G link is working now. Verify the repair by continuing with
“MAP 5700: Repair verification” on page 278.

Chapter 10. Using the maintenance analysis procedures 269


One or more ports show Link offline
Go to step 5.
One or more ports show Not configured
| For information about the port configuration, see the CLI command
| cfgportip description in the SAN Volume Controller Information
| Center for iSCSI.
For Fibre Channel over Ethernet information, please see the CLI
command lsportfc description in the SAN Volume Controller
Information Center. This command provides connection properties
and status to help determine if the Fibre Channel over Ethernet is a
part of a correctly configured VLAN.
5. (from step 4 on page 269) Is the amber 10G Ethernet link LED off for the
offline port?
YES Go to step 6
NO The physical link is operational. The problem might be with the
system configuration. See the configuration topic “iSCSI configuration
details” and “Fibre Channel over Ethernet configuration details” in the
SAN Volume Controller Information Center.
6. (from step 5) Perform the following actions:
a. Check that the 10G Ethernet ports are connected to a 10G Ethernet fabric.
b. Check that the 10G Ethernet fabric is configured.
c. Pull out the small form-factor pluggable (SFP) transceiver and plug it back
in.
d. Pull out the optical cable and plug it back in
e. Clean contacts with a small blast of air, if available.
f. Go to step 7.
7. (from step 6) Did the amber link LED light?
YES The physical link is operational. Verify the repair by continuing with
“MAP 5700: Repair verification” on page 278.
NO Go to step 8.
8. (from step 7) Swap the 10G SFPs in port 3 and port 4, but keep the optical
cables connected to the same port.
Is the amber link LED on the other port off now?
YES Go to step 10.
NO Go to step 9.
9. (from step 8) Swap the 10G Ethernet optical cables in port 3 and port 4.
Observe how the amber link LED changes. Swap the cables back.
Did the amber link LED on the other port go off?
YES Check the 10G Ethernet optical link and fabric that is connected to the
port that now has the amber LED off. The problem is associated with
the cable. The problem is either in the optical cable or the Ethernet
switch. Check that the Ethernet switch shows that the port is
operational. If it does not show that the port is operational, replace the
optical cable. Verify the repair by continuing with “MAP 5700: Repair
verification” on page 278.
NO Go to step 11 on page 271.
10. (from step 8) Perform the following actions:

270 SAN Volume Controller: Troubleshooting Guide


a. Replace the SFP that now has the amber link LED off.
b. Verify the repair by continuing with “MAP 5700: Repair verification” on
page 278.
11. (from steps 1 on page 269, 2 on page 269, and 9 on page 270) Have you
already removed and replaced the 10G Ethernet adapter?
YES Go to step 12.
NO Perform the following actions:
a. Remove and replace the 10G Ethernet adapter card.
b. Verify the repair by continuing with “MAP 5700: Repair
verification” on page 278.
12. (from steps 11) Replace the 10G Ethernet adapter with a new one.
a. Replace the 10G Ethernet adapter card.
b. Verify the repair by continuing with “MAP 5700: Repair verification” on
page 278.

MAP 5600: Fibre Channel


MAP 5600: Fibre Channel helps you to solve problems that have occurred on the
SAN Volume Controller Fibre Channel ports.

Before you begin

If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures,” on page 231.

This MAP applies to all SAN Volume Controller models. Be sure that you know
which model you are using before you start this procedure. To determine which
model you are working with, look for the label that identifies the model type on
the front of the node.

You might have been sent here for one of the following reasons:
v A problem occurred during the installation of a SAN Volume Controller system
and the Fibre Channel checks failed
v Another MAP sent you here

About this task

Perform the following steps to solve problems caused by the Fibre Channel ports:

Procedure
1. Are you here to diagnose a problem on a SAN Volume Controller 2145-8F2?
NO Go to step 2.
YES Go to step 3.
2. Are you trying to resolve a Fibre Channel port speed problem?
NO Go to step 3.
YES Go to step 12 on page 277.
3. (from step 1 and step 2) Display Fibre Channel port 1 status on the SAN
Volume Controller front-panel display. For more information, see Chapter 6,
“Using the front panel of the SAN Volume Controller,” on page 97.

Chapter 10. Using the maintenance analysis procedures 271


Is the front-panel display on the SAN Volume Controller showing Fibre
Channel port-1 active?
NO A Fibre Channel port is not working correctly. Check the port status
on the second line of the display.
v Inactive: The port is operational but cannot access the Fibre
Channel fabric. The Fibre Channel adapter is not configured
correctly, the Fibre Channel small form-factor pluggable (SFP)
transceiver has failed, the Fibre Channel cable has either failed or is
not installed, or the device at the other end of the cable has failed.
Make a note of port-1. Go to step 8 on page 274.
v Failed: The port is not operational because of a hardware failure.
Make a note of port-1. Go to step 10 on page 276.
v Not installed: This port is not installed. Make a note of port-1. Go
to step 11 on page 276.
YES Press and release the right button to display Fibre Channel port-2 . Go
to step 4.
4. (from step 3 on page 271)
Is the front panel display on the SAN Volume Controller showing Fibre
Channel port-2 active?
NO A Fibre Channel port is not working correctly. Check the port status
on the second line of the display.
v Inactive: The port is operational but cannot access the Fibre
Channel fabric. The Fibre Channel adapter is not configured
correctly, the Fibre Channel small form-factor pluggable (SFP)
transceiver has failed, the Fibre Channel cable has either failed or is
not installed, or the device at the other end of the cable has failed.
Make a note of port-2. Go to step 8 on page 274.
v Failed: The port is not operational because of a hardware failure.
Make a note of port-2. Go to step 10 on page 276.
v Not installed: This port is not installed. Make a note of port-2. Go
to step 11 on page 276.
YES Press and release the right button to display Fibre Channel port-3. Go
to step 5.
5. (from step 4)
Is the front panel display on the SAN Volume Controller showing Fibre
Channel port-3 active?
NO A Fibre Channel port is not working correctly. Check the port status
on the second line of the display.
v Inactive: The port is operational but cannot access the Fibre
Channel fabric. The Fibre Channel adapter is not configured
correctly, the Fibre Channel small form-factor pluggable (SFP)
transceiver has failed, the Fibre Channel cable has either failed or is
not installed, or the device at the other end of the cable has failed.
Make a note of port-3. Go to step 8 on page 274.
v Failed: The port is not operational because of a hardware failure.
Make a note of port-3. Go to step 10 on page 276.
v Not installed: This port is not installed. Make a note of port-3. Go
to step 11 on page 276.
YES Press and release the right button to display Fibre Channel port-4. Go
to step 6 on page 273.

272 SAN Volume Controller: Troubleshooting Guide


6. (from step 5 on page 272)
Is the front panel display on the SAN Volume Controller showing Fibre
Channel port-4 active?
NO A Fibre Channel port is not working correctly. Check the port status
on the second line of the display.
v Inactive: The port is operational but cannot access the Fibre
Channel fabric. The Fibre Channel adapter is not configured
correctly, the Fibre Channel small form-factor pluggable (SFP)
transceiver has failed, the Fibre Channel cable has either failed or is
not installed, or the device at the other end of the cable has failed.
Make a note of port-4. Go to step 8 on page 274.
v Failed: The port is not operational because of a hardware failure.
Make a note of port-4. Go to step 9 on page 275.
v Not installed: This port is not installed. Make a note of port-4. Go
to step 11 on page 276.
YES Go to step 7.
7. (from step 6)
A previously reported fault with a Fibre Channel port is no longer being
shown. A problem with the SAN Fibre Channel fabric might have been fixed
or there might be an intermittent problem.
Check with the customer to see if any Fibre Channel ports have been
disconnected or if any component of the SAN Fibre Channel fabric has failed
and has been fixed recently.
Is the Fibre Channel port failure explained by the previous checks?
NO There might be an intermittent Fibre Channel error.
a. Use the SAN problem determination procedure to check for and
resolve any Fibre Channel fabric connection problems. If you
resolve a problem, continue with “MAP 5700: Repair verification”
on page 278.
b. Check if similar Fibre Channel errors have occurred recently on the
same port on this SAN Volume Controller node. If they have,
replace the Fibre Channel cable, unless it has already been
replaced.
c. Replace the Fibre Channel SFP transceiver, unless it has already
been replaced.

Note: SAN Volume Controller nodes are supported with both


longwave SFP transceivers and shortwave SFP transceivers. You
must replace an SFP transceiver with the same type of SFP
transceiver. If the SFP transceiver to replace is a longwave SFP
transceiver, for example, you must provide a suitable replacement.
Removing the wrong SFP transceiver could result in loss of data
access. See the “Removing and replacing the Fibre Channel SFP
transceiver on a SAN Volume Controller node” documentation to
find out how to replace an SFP transceiver.
d. Replace the Fibre Channel adapter assembly shown in the
following table.

Node Adapter assembly


SAN Volume Controller 2145-CG8 port 1, 2, 4-port Fibre Channel HBA
3, or 4

Chapter 10. Using the maintenance analysis procedures 273


Node Adapter assembly
SAN Volume Controller 2145-CF8 port 1, 2, 4-port Fibre Channel HBA
3, or 4
SAN Volume Controller 2145-8A4 port 1, 2, 4-port Fibre Channel HBA
3, or 4
SAN Volume Controller 2145-8G4 port 1, 2, 4-port Fibre Channel HBA
3, or 4
SAN Volume Controller 2145-8F4 port 1, 2, 4-port Fibre Channel HBA
3, or 4
SAN Volume Controller 2145-8F2 port 1 or 2 Dual port Fibre Channel HBA - low profile
SAN Volume Controller 2145-8F2 port 3 or 4 Dual port Fibre Channel HBA - full height

e. Verify the repair by continuing with “MAP 5700: Repair


verification” on page 278.
YES Verify the repair by continuing with “MAP 5700: Repair verification”
on page 278.
8. (from steps 3 on page 271, 4 on page 272, 5 on page 272, and 6 on page 273)
The noted port on the SAN Volume Controller is showing a status of inactive.
For certain models, this might occur when the Fibre Channel speed is not set
correctly.
Are you diagnosing a problem on a SAN Volume Controller 2145-8F2?
NO Go to step 9 on page 275.
YES All SAN Volume Controller ports on the SAN Volume Controller
2145-8F2 nodes must run at the same speed. This speed is set by one
of the system properties; therefore the system speed must be set to a
speed that all ports can use.
If the node or nodes are currently online in the system, change the
speed property to a speed that all SAN Volume Controller 2145-8F2
ports can use.

Attention: Changing the SAN Volume Controller speed setting


causes an I/O outage on the system. Ensure that all host operations
are stopped before performing these steps.
a. Press the down button until the top line of the display shows
Ethernet.
b. Press the right button until the top line displays Speed 1.
c. If the second line of the display shows link offline, record this
port as one that requires fixing.
d. If the system is configured with two Ethernet cables per node,
press the right button until the top line of the display shows Speed
2 and repeat the previous step.
e. Go to step 9 on page 275.
If the node is not currently online in the system, you might need to
set the speed of the node to a different speed setting before the node
can join the system. To temporarily set the speed of the node, perform
the following steps:

274 SAN Volume Controller: Troubleshooting Guide


Note: After the node joins the system, the node's Fibre Channel port
speed will be changed to match the system setting. Check the setting
before changing the node.
a. Press and hold the down button.
b. Press and release the select button.
c. Release the down button.
The Fibre Channel speed setting is shown on the display. If this
value does not match the speed of the SAN, use the down and up
buttons to set it correctly.
d. Press the select button to accept any changes and return to the
Fibre Channel status display.
e. If the status shows active, continue with “MAP 5700: Repair
verification” on page 278. Otherwise, go to step 9.
9. (from step 8 on page 274)
The noted port on the SAN Volume Controller displays a status of inactive. If
the noted port still displays a status of inactive, replace the parts that are
associated with the noted port until the problem is fixed in the following
order:
a. Fibre Channel cables from the SAN Volume Controller to Fibre Channel
network.
b. Faulty Fibre Channel fabric connections, particularly the SFP transceiver at
the Fibre Channel switch. Use the SAN problem determination procedure
to resolve any Fibre Channel fabric connection problem.
c. SAN Volume Controller Fibre Channel SFP transceiver.

Note: SAN Volume Controller nodes are supported with both longwave
SFPs and shortwave SFPs. You must replace an SFP with the same type of
SFP transceiver that you are replacing. If the SFP transceiver to replace is a
longwave SFP transceiver, for example, you must provide a suitable
replacement. Removing the wrong SFP transceiver could result in loss of
data access. See the “Removing and replacing the Fibre Channel SFP
transceiver on a SAN Volume Controller node” documentation to find out
how to replace an SFP transceiver.
d. Replace the Fibre Channel adapter assembly shown in the following table:

Node Adapter assembly


SAN Volume Controller 2145-CG8 port 1, 2, 4-port Fibre Channel HBA
3, or 4
SAN Volume Controller 2145-CF8 port 1, 2, 4-port Fibre Channel HBA
3, or 4
SAN Volume Controller 2145-8A4 port 1, 2, 4-port Fibre Channel HBA
3, or 4
SAN Volume Controller 2145-8G4 port 1, 2, 4-port Fibre Channel HBA
3, or 4
SAN Volume Controller 2145-8F4 port 1, 2, 4-port Fibre Channel HBA
3, or 4
SAN Volume Controller 2145-8F2 port 1 or 2 Dual port Fibre Channel HBA - low profile
SAN Volume Controller 2145-8F2 port 3 or 4 Dual port Fibre Channel HBA - full height

e. Verify the repair by continuing with “MAP 5700: Repair verification” on


page 278.

Chapter 10. Using the maintenance analysis procedures 275


10. (from steps 3 on page 271, 4 on page 272, 5 on page 272, and 6 on page 273)
The noted port on the SAN Volume Controller displays a status of failed.
Verify that the Fibre Channel cables that connect the SAN Volume Controller
nodes to the switches are securely connected. Replace the parts that are
associated with the noted port until the problem is fixed in the following
order:
a. Fibre Channel SFP transceiver.

Note: SAN Volume Controller nodes are supported with both longwave
SFP transceivers and shortwave SFP transceivers. You must replace an SFP
transceiver with the same type of SFP transceiver. If the SFP transceiver to
replace is a longwave SFP transceiver, for example, you must provide a
suitable replacement. Removing the wrong SFP transceiver could result in
loss of data access. See the “Removing and replacing the Fibre Channel
SFP transceiver on a SAN Volume Controller node” documentation to find
out how to replace an SFP transceiver.
b. Replace the Fibre Channel adapter assembly shown in the following table:

Node Adapter assembly


SAN Volume Controller 2145-CG8 port 1, 2, 4-port Fibre Channel HBA
3, or 4
SAN Volume Controller 2145-CF8 port 1, 2, 4-port Fibre Channel HBA
3, or 4
SAN Volume Controller 2145-8A4 port 1, 2, 4-port Fibre Channel HBA
3, or 4
SAN Volume Controller 2145-8G4 port 1, 2, 4-port Fibre Channel HBA
3, or 4
SAN Volume Controller 2145-8F4 port 1, 2, 3 4-port Fibre Channel HBA
or 4
SAN Volume Controller 2145-8F2 port 1 or 2 Dual port Fibre Channel HBA - low profile
SAN Volume Controller 2145-8F2 port 3 or 4 Dual port Fibre Channel HBA - full height

c. Verify the repair by continuing with “MAP 5700: Repair verification” on


page 278.
11. (from steps 3 on page 271, 4 on page 272, 5 on page 272, and 6 on page 273)
The noted port on the SAN Volume Controller displays a status of not
installed. If you have just replaced the Fibre Channel adapter, make sure that
it is installed correctly. If you have replaced any other system board
components, make sure that the Fibre Channel adapter has not been disturbed.
Is the Fibre Channel adapter failure explained by the previous checks?
NO
a. Replace the Fibre Channel adapter assembly shown in the
following table:
Table 57. SAN Volume Controller Fibre Channel adapter assemblies
SAN Volume Controller 2145-CG8 port 1, 2, 4-port Fibre Channel HBA
3, or 4
SAN Volume Controller 2145-CF8 port 1, 2, 4-port Fibre Channel HBA
3, or 4
SAN Volume Controller 2145-8A4 port 1, 2, 4-port Fibre Channel HBA
3, or 4

276 SAN Volume Controller: Troubleshooting Guide


Table 57. SAN Volume Controller Fibre Channel adapter assemblies (continued)
SAN Volume Controller 2145-8G4 port 1, 2, 4-port Fibre Channel HBA
3, or 4
SAN Volume Controller 2145-8F4 port 1, 2, 4-port Fibre Channel HBA
3, or 4
SAN Volume Controller 2145-8F2 port 3 or 4 Dual port Fibre Channel HBA - full height
SAN Volume Controller 2145-8F2 port 1 or 2 Dual port Fibre Channel HBA - low profile

b. If the problem is not fixed, replace the Fibre Channel connection


hardware in the order that is shown in Table 58.
Table 58. SAN Volume Controller Fibre Channel adapter connection hardware
Node Adapter connection hardware
SAN Volume Controller 2145-8A4 port 1, 2, 1. Riser card, PCI Express
3, or 4
2. System board
SAN Volume Controller 2145-8G4 port 1, 2, 1. Riser card, PCI Express
3, or 4
2. System board
SAN Volume Controller 2145-8F4 port 1, 2, 1. Riser card, PCI Express
3, or 4
2. Frame assembly
SAN Volume Controller 2145-8F2 port 1 or 2 1. Riser card, PCI low profile
2. Frame assembly
SAN Volume Controller 2145-8F2 port 3 or 4 1. Riser card, PCI
2. Frame assembly

c. Verify the repair by continuing with “MAP 5700: Repair


verification” on page 278.
YES Verify the repair by continuing with “MAP 5700: Repair verification”
on page 278.
12. (from step 2 on page 271)
For the SAN Volume Controller models 2145-8A4, 2145-8G4, and 2145-8F4,
each Fibre Channel port autonegotiates its operating speed with the switch to
which it is connected. If the speed at which it is operating is lower than the
operating speed that is supported by the switch, this indicates that a high
number of link errors are being detected.
To display the current speed of the link, perform the following steps:
a. Press the up or down button on the front panel until FC Port-1 Status: is
displayed.
b. Press and release the select button.
c. Press the left or right button until FC Port-1 Speed: is displayed.
d. Press and release the select button.
e. Press the down button.
The second line of the front-panel display shows the current Fibre Channel
speed of the port.
Is the port operating at lower than the expected speed?
NO Repeat the check with the other Fibre Channel ports until the failing

Chapter 10. Using the maintenance analysis procedures 277


port is located. If no failing port is located, the problem no longer
exists. Verify the repair by continuing with “MAP 5700: Repair
verification.”
YES Perform the following steps:
a. Check the routing of the Fibre Channel cable to ensure that no
damage exists and that the cable route contains no tight bends.
Any bend should have no less than a 3-inch radius. Either reroute
or replace the Fibre Channel cable.
b. Remove the Fibre Channel cable for 2 seconds and then reinsert it.
This will cause the Fibre Channel adapter to renegotiate its
operating speed.
c. Recheck the speed of the Fibre Channel port. If it is now correct,
you have resolved the problem. Otherwise, the problem might be
caused by one of the following:
v 4-port Fibre Channel HBA
v SAN Volume Controller SFP transceiver
v Fibre Channel switch gigabit interface converter (GBIC) or SFP
transceiver
v Fibre Channel switch
Recheck the speed after changing any component until the
problem is resolved and then verify the repair by continuing with
“MAP 5700: Repair verification.”

MAP 5700: Repair verification


MAP 5700: Repair verification helps you to verify that field-replaceable units
(FRUs) that you have exchanged for new FRUs, or repair actions that have been
done have solved all the problems on the SAN Volume Controller.

Before you begin

If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures,” on page 231.

You might have been sent here because you performed a repair and want to
confirm that no other problems exists on the machine.

About this task

Perform the following steps to verify your repair:

Procedure
1. Are the Power LEDs on all the nodes on? For more information about this
LED, see “Power LED” on page 18.
NO Go to “MAP 5000: Start” on page 231.
YES Go to step 2.
2. (from step 1)
Are all the nodes displaying Cluster: on the top line of the front panel
display with the second line blank or displaying a system name?
NO Go to “MAP 5000: Start” on page 231.

278 SAN Volume Controller: Troubleshooting Guide


YES Go to step 3.
3. (from step 2 on page 278)
Using the SAN Volume Controller application for the system you have just
repaired, check the status of all configured managed disks (MDisks).
Do all MDisks have a status of online?
NO If any MDisks have a status of offline, repair the MDisks. Use the
problem determination procedure for the disk controller to repair the
MDisk faults before returning to this MAP.
If any MDisks have a status of degraded paths or degraded ports,
repair any storage area network (SAN) and MDisk faults before
returning to this MAP.
If any MDisks show a status of excluded, include MDisks before
returning to this MAP.
Go to “MAP 5000: Start” on page 231.
YES Go to step 4.
4. (from step 3)
Using the SAN Volume Controller application for the system you have just
repaired, check the status of all configured volumes. Do all volumes have a
status of online?
NO Go to step 5.
YES Go to step 6.
5. (from step 4)
Following a repair of the SAN Volume Controller, a number of volumes are
showing a status of offline. Volumes will be held offline if SAN Volume
Controller cannot confirm the integrity of the data. The volumes might be the
target of a copy that did not complete, or cache write data that was not written
back to disk might have been lost. Determine why the volume is offline. If the
volume was the target of a copy that did not complete, you can start the copy
again. Otherwise, write data might not have been written to the disk, so its
state cannot be verified. Your site procedures will determine how data is
restored to a known state.
To bring the volume online, you must move all the offline disks to the recovery
I/O group and then move them back to an active I/O group.
Go to “MAP 5000: Start” on page 231.
6. (from step 4)
You have successfully repaired the SAN Volume Controller.

MAP 5800: Light path


MAP 5800: Light path helps you to solve hardware problems on all the SAN
Volume Controller models that are preventing the node from booting.

Before you begin

If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures,” on page 231.

You might have been sent here because of the following:


v The Error LED on the operator-information panel is on or flashing

Chapter 10. Using the maintenance analysis procedures 279


v Another MAP sent you here

Light path for SAN Volume Controller 2145-CG8


Use the diagnostics LEDs that are located on the system board to solve hardware
problems with the SAN Volume Controller 2145-CG8 node.

About this task

Ensure that the node is turned on, and then perform the following steps to resolve
any hardware errors that are indicated by the Error LED and light path LEDs:

Procedure
1. Is the Error LED, shown in Figure 89, on the SAN Volume Controller
2145-CG8 operator-information panel on or flashing?

1 2

svc00721
2

Figure 89. SAN Volume Controller 2145-CG8 or 2145-CF8 operator-information panel

1 System error LED


2 Release latch
NO Reassess your symptoms and return to “MAP 5000: Start” on page 231.
YES Go to step 2 on page 286.
2. (from step 1 on page 286)
Press the release latch and open the light path diagnostics panel, which is
shown in Figure 93 on page 286.
Are one or more LEDs on the light path diagnostics panel on or flashing?

REMIND
OVERSPEC LOG LINK PS PCI SP

FAN TEMP MEM NMI

CNFG CPU VRM DASD RAID BRD

RESET
Light Path Diagnostics

Figure 90. SAN Volume Controller 2145-CG8 or 2145-CF8 light path diagnostics panel

NO Verify that the operator-information panel cable is correctly seated at

280 SAN Volume Controller: Troubleshooting Guide


both ends. If the error LED is still illuminated but no LEDs are
illuminated on the light path diagnostics panel, replace parts in the
following sequence:
a. Operator-information panel
b. System board
Verify the repair by continuing with “MAP 5700: Repair verification”
on page 278.
YES See Table 60 on page 289 and perform the action specified for the
specific light path diagnostics LEDs. Then go to step 3 on page 292.
Some actions will require that you observe the state of LEDs on the
system board. Figure 91 on page 282 shows the location of the system
board LEDs. The fan LEDs are located adjacent to each FAN. To view
the LEDs you need to perform the following actions:
a. Turn off the node while ensuring that its data is mirrored and
synchronized. See “MAP 5350: Powering off a SAN Volume
Controller node” on page 258 for more information.
b. (Optional) Identify and label all the cables that are attached to the
node so that they can be replaced in the same port. Remove the
node from the rack and place it on a flat, static-protective surface.
See the “Removing the node from a rack” information to find out
how to perform the procedure.
c. Remove the top cover.
d. See Table 60 on page 289 and perform the action specified for the
specific light path diagnostics LEDs. Then go to step 3 on page 292.

Chapter 10. Using the maintenance analysis procedures 281


1 2 3

23

22

21

20 6
19
18
7
17
svc00713
16
15 14 13 12 11 10 9 8

Figure 91. SAN Volume Controller 2145-CG8 system board LEDs diagnostics panel

1 Battery LED


2 IMM heartbeat LED
3 Enclosure management heartbeat LED
4 DIMM 10-18 error LEDs
5 Microprocessor 1 error LED
6 DIMM 1-9 error LEDs
7 Fan 1 error LED
8 Fan 2 error LED
9 Fan 3 error LED
10 Fan 4 error LED
11 Fan 5 error LED
12 Fan 6 error LED
13 SAS RAID riser-card missing LED
14 240 VA error LED

282 SAN Volume Controller: Troubleshooting Guide


15 Power channel A error LED
16 Power channel B error LED
17 Power channel C error LED
18 Power channel D error LED
19 Power channel E error LED
20 AUX power channel error LED
21 System board error LED
22 Microprocessor 2 error LED
23 Riser 2 missing LED
Table 59. Diagnostics panel LED prescribed actions
Diagnostics
panel LED Action
OVER SPEC The power supplies are using more power than their maximum rating. If
the OVER SPEC LED is lit, one or more of the six 12V channel error LEDs
(A, B, C, D, E, or AUX) is also lit on the system board. Perform the
following actions to resolve the problem:
1. Turn off the node, pull the node forward in the rack, and remove the
cover. Do not disconnect power from the node.
2. Check which 12V channel error LED is lit on the system board, and
remove the components listed for that LED:
v LED A: fans, disk drive, any solid-state drives (SSDs), or disk
backplane
v LED B: Fibre Channel adapter and riser, all memory
v LED C: disk controller, all memory
v LED D: microprocessor
v LED E: High-speed SAS adapter and riser, if installed
v LED AUX: Fibre Channel adapter and high-speed SAS adapter, if
installed
3. Restart the node to see whether the problem remains.
4. Reinstall each device one at a time that you removed for the LED
problems. Start the node each time to isolate the failing device.
5. Replace any failing device.
6. If no device was isolated, and if LED C or LED D is lit, turn off the
node and remove the microprocessor. You need alcohol wipes and
thermal grease to replace the microprocessor. Power on the server by
toggling switch block 3 (SW3) bit 6. Restart the server. If the problem
has resolved, replace the microprocessor; otherwise, reinstall the
microprocessor. In either case, toggle switch block 3 (SW3) bit 6 back
to its original position.
7. If no device was isolated, and if LED AUX is lit, turn off the node and
remove the operator-information panel. Power on the server by
toggling switch block 3 (SW3) bit 6. Restart the server. Restart the
server. If the problem was resolved, replace the operator-information
panel; otherwise, reinstall the operator-information panel. In either
case, toggle switch block 3 (SW3) bit 6 back to its original position.
8. If no failing device is isolated, replace the system board.
LOG An error occurred. Connect a keyboard and a monitor. Check the IMM
system event log and the system event log for information about the error.
Replace any components that are identified in the event logs.
LINK This is not used on the SAN Volume Controller 2145-CG8. Replace the
system board.

Chapter 10. Using the maintenance analysis procedures 283


Table 59. Diagnostics panel LED prescribed actions (continued)
Diagnostics
panel LED Action
PS Power supply 1 or power supply 2 has failed. Perform the following
actions to resolve the problem:
1. Check the power supply that has a lit amber LED.
2. Make sure that the power supplies are seated correctly.
3. Remove one of the power supplies to isolate the failed power supply.
4. Replace the failed power supply.
PCI An error occurred on a PCI bus or on the system board. An additional
LED is lit next to a failing PCI slot. Perform the following actions to
resolve the problem:
1. Identify the failing adapter by checking the LEDs on the PCI slots.
2. If PCI slot 1 shows an error, replace the 4-port Fibre Channel adapter
assembly.
3. If PCI slot 2 shows an error, replace the high-speed SAS adapter
assembly.
4. If the error is not resolved, replace the system board.
SP A service processor error was detected. Perform the following actions to
resolve the problem:
1. Remove power from the node. Reconnect the server to the power, and
restart the node.
2. If the problem remains, replace the system board.
FAN A fan has failed, is operating too slowly, or was removed. A failing fan
can also cause the TEMP LED to be lit. Perform the following actions to
resolve the problem:
1. Reseat the failing fan, which is indicated by a lit LED near the fan
connector on the system board.
2. If the problem remains, replace the failing fan.
TEMP The system temperature exceeded a threshold level. A failing fan can
cause the TEMP LED to be lit. Perform the following actions to resolve
the problem:
1. Make sure that the heat sink is seated correctly.
2. Determine whether a fan has failed. If it has, replace it.
3. Verify that the ambient temperature is within normal operating
specifications.
4. Make sure that airflow in and around the SAN Volume Controller
2145-CG8 is not obstructed.
MEM A memory configuration or a memory error that is not valid has occurred.
Both the MEM LED and CNFG LED might be lit. Perform the following
actions to resolve the problem:
1. Check that all the memory DIMMs are correctly installed.
2. If any memory error LEDs are lit, replace the indicated memory
module.
3. If the MEM LED and the CNFG LED are lit, adjust the memory so
that DIMM slots 2, 3, 5, 6, 7, and 8 are the only ones used.
NMI A non-maskable interrupt occurred or the NMI button was pressed. This
situation should not occur. If the NMI button on the light path diagnostic
panel was pressed by mistake, restart the node; otherwise, call your
support center.

284 SAN Volume Controller: Troubleshooting Guide


Table 59. Diagnostics panel LED prescribed actions (continued)
Diagnostics
panel LED Action
CNFG A hardware configuration error occurred. If the MEM LED is also lit,
follow the actions shown for MEM LED. If the CPU LED is lit, check to
see if a microprocessor is installed in CPU 2. If one is installed, remove it
because the configuration is not supported. If no other light path LEDs
are lit, replace the FRUs in the order shown until the problem is resolved:
1. Operator-information panel
2. Operator-information panel cable
3. System board
CPU A microprocessor failed or a microprocessor configuration is not valid.
Both the CPU LED and the CNFG LED might be lit. Perform the
following actions:
1. Check the system board error LEDs.
2. If CPU 1 error LED is lit, check that the microprocessor is correctly
installed.
3. If the error persists, replace the microprocessor.
4. If the error persists, replace the system board.
VRM This is not used on the SAN Volume Controller 2145-CG8.
DASD A disk drive failed or is missing. A SAN Volume Controller 2145-CG8
must have its system hard disk drive installed in drive slot 4. Up to four
optional solid-state drives (SSDs) can be installed in drive slots 0 to 3.

If an SSD has been deliberately removed from a slot, the system error
LED and the DASD diagnostics panel LED will light. The error is
maintained even if the SSD is replaced in a different slot. If an SSD has
been removed or moved, the error is cleared by powering off the node
using MAP 5350, removing both the power cables, replacing the power
cables, and then restarting the node.

Resolve any node or system errors that relate to SSDs or the system disk
drive.

If an error is still shown, power off the node and reseat all the drives.

If the error remains, replace the following components in the order listed:
1. The system disk drive
2. The disk backplane
RAID This is not used on the SAN Volume Controller 2145-CG8.
BRD An error occurred on the system board. Perform the following actions to
resolve the problem:
1. Check the LEDs on the system board to identify the component that
caused the error. The BRD LED can be lit because of any of the
following reasons:
v Battery
v Missing PCI riser-card assembly. There must be a riser card in PCI
slot 2 even if the optional adapter is not present.
v Failed voltage regulator
2. Replace any failed or missing replacement components, such as the
battery or PCI riser-card assembly.
3. If a voltage regulator fails, replace the system board.

Chapter 10. Using the maintenance analysis procedures 285


3. Continue with “MAP 5700: Repair verification” on page 278 to verify the
correct operation.

Light path for SAN Volume Controller 2145-CF8


Use the diagnostics LEDs that are located on the system board to solve hardware
problems with theSAN Volume Controller 2145-CF8 node.

About this task

Ensure that the node is turned on, and then perform the following steps to resolve
any hardware errors that are indicated by the Error LED and light path LEDs:

Procedure
1. Is the Error LED, shown in Figure 92, on the SAN Volume Controller
2145-CF8 operator-information panel on or flashing?

1 2 3 4 5

svc_bb1gs008
2 1

4 3

10 9 8 7 6

Figure 92. SAN Volume Controller 2145-CG8 or 2145-CF8 operator-information panel

5 System error LED


6 Release latch
NO Reassess your symptoms and return to “MAP 5000: Start” on page 231.
YES Go to step 2.
2. (from step 1)
Press the release latch and open the light path diagnostics panel, which is
shown in Figure 93.
Are one or more LEDs on the light path diagnostics panel on or flashing?

REMIND
OVERSPEC LOG LINK PS PCI SP

FAN TEMP MEM NMI

CNFG CPU VRM DASD RAID BRD

RESET
Light Path Diagnostics

Figure 93. SAN Volume Controller 2145-CG8 or 2145-CF8 light path diagnostics panel

NO Verify that the operator-information panel cable is correctly seated at

286 SAN Volume Controller: Troubleshooting Guide


both ends. If the error LED is still illuminated but no LEDs are
illuminated on the light path diagnostics panel, replace parts in the
following sequence:
a. Operator-information panel
b. System board
Verify the repair by continuing with “MAP 5700: Repair verification”
on page 278.
YES See Table 60 on page 289 and perform the action specified for the
specific light path diagnostics LEDs. Then go to step 3 on page 292.
Some actions will require that you observe the state of LEDs on the
system board. Figure 94 on page 288 shows the location of the system
board LEDs. The fan LEDs are located adjacent to each FAN. To view
the LEDs you need to perform the following actions:
a. Turn off the node while ensuring that its data is mirrored and
synchronized. See “MAP 5350: Powering off a SAN Volume
Controller node” on page 258 for more information.
b. (Optional) Identify and label all the cables that are attached to the
node so that they can be replaced in the same port. Remove the
node from the rack and place it on a flat, static-protective surface.
See the “Removing the node from a rack” information to find out
how to perform the procedure.
c. Remove the top cover.
d. See Table 60 on page 289 and perform the action specified for the
specific light path diagnostics LEDs. Then go to step 3 on page 292.

Chapter 10. Using the maintenance analysis procedures 287


1 2 3 4

24
5

23

6
22

7
21
20

19 8
18
9
17
16 14 13 12 11 10

15

Figure 94. SAN Volume Controller 2145-CF8 system board LEDs diagnostics panel

1 Slot 2 missing PCI riser card LED


2 Enclosure manager heartbeat LED
3 Battery LED
4 IMM heartbeat LED
5 Slot 1 missing PCI riser card LED
6 System error LED
7 Microprocessor 1 error LED
8 DIMM 1-8 error LEDs
9 Fan 1 error LED
10 Fan 2 error LED
11 Fan 3 error LED
12 Fan 4 error LED
13 Fan 5 error LED
14 Fan 6 error LED
15 240 VA error LED
16 Power channel A error LED
288 SAN Volume Controller: Troubleshooting Guide
17 Power channel B error LED
18 Power channel C error LED
19 Power channel D error LED
20 Power channel E error LED
21 AUX power channel error LED
22 SAS/SATA RAID error LED
23 Microprocessor 2 error LED
24 DIMM 9-16 error LEDs
Table 60. Diagnostics panel LED prescribed actions
Diagnostics
panel LED Action
OVER SPEC The power supplies are using more power than their maximum rating. If
the OVER SPEC LED is lit, one or more of the six 12V channel error LEDs
(A, B, C, D, E, or AUX) is also lit on the system board. Perform the
following actions to resolve the problem:
1. Turn off the node, pull the node forward in the rack, and remove the
cover. Do not disconnect power from the node.
2. Check which 12V channel error LED is lit on the system board, and
remove the components listed for that LED:
v LED A: fans, disk drive, any solid-state drives (SSDs), or disk
backplane
v LED B: Fibre Channel adapter and riser, all memory
v LED C: disk controller, all memory
v LED D: microprocessor
v LED E: High-speed SAS adapter and riser, if installed
v LED AUX: Fibre Channel adapter and high-speed SAS adapter, if
installed
3. Restart the node to see whether the problem remains.
4. Reinstall each device one at a time that you removed for the LED
problems. Start the node each time to isolate the failing device.
5. Replace any failing device.
6. If no device was isolated, and if LED C or LED D is lit, turn off the
node and remove the microprocessor. You need alcohol wipes and
thermal grease to replace the microprocessor. Power on the server by
toggling switch block 3 (SW3) bit 6. Restart the server. Restart the
server. If the problem has resolved, replace the microprocessor;
otherwise, reinstall the microprocessor. In either case, toggle switch
block 3 (SW3) bit 6 back to its original position.
7. If no device was isolated, and if LED AUX is lit, turn off the node and
remove the operator-information panel. Power on the server by
toggling switch block 3 (SW3) bit 6. Restart the server. Restart the
server. If the problem was resolved, replace the operator-information
panel; otherwise, reinstall the operator-information panel. In either
case, toggle switch block 3 (SW3) bit 6 back to its original position.
8. If no failing device is isolated, replace the system board.
LOG An error occurred. Connect a keyboard and a monitor. Check the IMM
system event log and the system event log for information about the error.
Replace any components that are identified in the event logs.
LINK This is not used on the SAN Volume Controller 2145-CF8. Replace the
system board.

Chapter 10. Using the maintenance analysis procedures 289


Table 60. Diagnostics panel LED prescribed actions (continued)
Diagnostics
panel LED Action
PS Power supply 1 or power supply 2 has failed. Perform the following
actions to resolve the problem:
1. Check the power supply that has a lit amber LED.
2. Make sure that the power supplies are seated correctly.
3. Remove one of the power supplies to isolate the failed power supply.
4. Replace the failed power supply.
PCI An error occurred on a PCI bus or on the system board. An additional
LED is lit next to a failing PCI slot. Perform the following actions to
resolve the problem:
1. Identify the failing adapter by checking the LEDs on the PCI slots.
2. If PCI slot 1 shows an error, replace the 4-port Fibre Channel adapter
assembly.
3. If PCI slot 2 shows an error, replace the high-speed SAS adapter
assembly.
4. If the error is not resolved, replace the system board.
SP A service processor error was detected. Perform the following actions to
resolve the problem:
1. Remove power from the node. Reconnect the server to the power, and
restart the node.
2. If the problem remains, replace the system board.
FAN A fan has failed, is operating too slowly, or was removed. A failing fan
can also cause the TEMP LED to be lit. Perform the following actions to
resolve the problem:
1. Reseat the failing fan, which is indicated by a lit LED near the fan
connector on the system board.
2. If the problem remains, replace the failing fan.
TEMP The system temperature exceeded a threshold level. A failing fan can
cause the TEMP LED to be lit. Perform the following actions to resolve
the problem:
1. Make sure that the heat sink is seated correctly.
2. Determine whether a fan has failed. If it has, replace it.
3. Verify that the ambient temperature is within normal operating
specifications.
4. Make sure that airflow in and around the SAN Volume Controller
2145-CF8 is not obstructed.
MEM A memory configuration or a memory error that is not valid has occurred.
Both the MEM LED and CNFG LED might be lit. Perform the following
actions to resolve the problem:
1. Check that all the memory DIMMs are correctly installed.
2. If any memory error LEDs are lit, replace the indicated memory
module.
3. If the MEM LED and the CNFG LED are lit, adjust the memory so
that DIMM slots 2, 3, 5, 6, 7, and 8 are the only ones used.
NMI A non-maskable interrupt occurred or the NMI button was pressed. This
situation should not occur. If the NMI button on the light path diagnostic
panel was pressed by mistake, restart the node; otherwise, call your
support center.

290 SAN Volume Controller: Troubleshooting Guide


Table 60. Diagnostics panel LED prescribed actions (continued)
Diagnostics
panel LED Action
CNFG A hardware configuration error occurred. If the MEM LED is also lit,
follow the actions shown for MEM LED. If the CPU LED is lit, check to
see if a microprocessor is installed in CPU 2. If one is installed, remove it
because the configuration is not supported. If no other light path LEDs
are lit, replace the FRUs in the order shown until the problem is resolved:
1. Operator-information panel
2. Operator-information panel cable
3. System board
CPU A microprocessor failed or a microprocessor configuration is not valid.
Both the CPU LED and the CNFG LED might be lit. Perform the
following actions:
1. Check the system board error LEDs.
2. If CPU 1 error LED is lit, check that the microprocessor is correctly
installed.
3. If the error persists, replace the microprocessor.
4. If the error persists, replace the system board.
VRM This is not used on the SAN Volume Controller 2145-CF8.
DASD A disk drive failed or is missing. A SAN Volume Controller 2145-CF8
must have its system hard disk drive installed in drive slot 4. Up to four
optional solid-state drives (SSDs) can be installed in drive slots 0 to 3.

If an SSD has been deliberately removed from a slot, the system error
LED and the DASD diagnostics panel LED will light. The error is
maintained even if the SSD is replaced in a different slot. If an SSD has
been removed or moved, the error is cleared by powering off the node
using MAP 5350, removing both the power cables, replacing the power
cables, and then restarting the node.

Resolve any node or system errors that relate to SSDs or the system disk
drive.

If an error is still shown, power off the node and reseat all the drives.

If the error remains, replace the following components in the order listed:
1. The system disk drive
2. The disk backplane
RAID This is not used on the SAN Volume Controller 2145-CF8.
BRD An error occurred on the system board. Perform the following actions to
resolve the problem:
1. Check the LEDs on the system board to identify the component that
caused the error. The BRD LED can be lit because of any of the
following reasons:
v Battery
v Missing PCI riser-card assembly. There must be a riser card in PCI
slot 2 even if the optional adapter is not present.
v Failed voltage regulator
2. Replace any failed or missing replacement components, such as the
battery or PCI riser-card assembly.
3. If a voltage regulator fails, replace the system board.

Chapter 10. Using the maintenance analysis procedures 291


3. Continue with “MAP 5700: Repair verification” on page 278 to verify the
correct operation.

Light path for SAN Volume Controller 2145-8A4


Use the diagnostics LEDs that are located on the system board to solve hardware
problems with the SAN Volume Controller 2145-8A4 node.

About this task

Ensure that the node is turned on and then perform the following steps to resolve
any hardware errors that are indicated by the Error LED and light path LEDs:

Procedure
1. Is the Error LED, shown in Figure 95, on the SAN Volume Controller
2145-8A4 operator-information panel on or flashing?

Figure 95. SAN Volume Controller 2145-8A4 operator-information panel

1 Error LED


NO Reassess your symptoms and return to “MAP 5000: Start” on page 231.
YES Go to step 2.
2. (from step 1)
Observe the state of the diagnostic LEDs on the system board. To view the
LEDs, follow these steps:
a. Turn off the node while ensuring that its data is mirrored and synchronized.
See “MAP 5350: Powering off a SAN Volume Controller node” on page 258
for more information.
b. Identify and label all the cables that are attached to the node so that they
can be replaced in the same port. Remove the node from the rack and place
it on a flat, static-protective surface.
c. Remove the top cover.
d. Turn on the node.
3. (from step 2)
Other than the Standby Power, Power good, and the Baseboard management
controller heartbeat LEDs, are one or more LEDs on the system board on or
flashing?
NO Verify that the operator-information panel cable is correctly seated at
both ends. If the error LED is still on but no error LEDs are illuminated
on the system board, replace parts in the following sequence:
a. Operator-information panel
b. Operator-information panel cable
c. System board

292 SAN Volume Controller: Troubleshooting Guide


Go to step 5 on page 294.
YES Identify any diagnostic LEDs on the system board that are on. Figure 96
shows the location of the system board LEDs. The fan LEDs are located
adjacent to each fan. You can ignore the three LEDs that do not indicate
an error: 13, 14, and 15.

Figure 96. SAN Volume Controller 2145-8A4 system board LEDs

1 Fan 1 error LED


2 Fan 2 error LED
3 Fan 3 error LED
4 DIMM 1 error LED
5 DIMM 2 error LED
6 DIMM 3 error LED
7 DIMM 4 error LED
8 PCI Express slot 2 error LED
9 PCI Express slot 1 error LED
10 Fan 4 error LED
11 Fan 5 error LED
12 Voltage regulator error LED
13 Standby power LED
14 Power good LED
15 Baseboard management controller heartbeat LED
16 SAS/SATA controller error LED
4. (from step 3 on page 292)

Chapter 10. Using the maintenance analysis procedures 293


Are any diagnostic LEDs other than 13, 14, and 15 on the system board
illuminated?
NO Go to step 5.
YES See Table 61 and replace the parts specified for the specific LEDs
one-at-a-time in the following order until the error is repaired. Then go
to step 5.
Table 61. SAN Volume Controller 2145-8A4 diagnostics panel LED prescribed actions
Diagnostics
panel LED Action
DIMM error Replace parts in the following sequence:
LEDs (1 1. Indicated DIMM
through 4)
2. System board
Fan error LEDs Replace parts in the following sequence:
(1 through 5) 1. Indicated fan
2. System board
®
PCI Express Replace parts in the following sequence:
slot 1 error 1. PCI riser card
LED
2. System board
3. Fibre Channel adapter
PCI Express This is not used on the SAN Volume Controller 2145-8A4. Replace the
slot 2 error system board.
LED
Voltage Replace the system board.
regulator error
LED
SAS/SATA This is not used on the SAN Volume Controller 2145-8A4. Replace the
controller error system board.
LED

5. (from step 4 on page 293)


Replace the top cover and place the node in the rack. See the “Removing the
node from a rack” information to find out how to perform the procedure. Then
continue with “MAP 5700: Repair verification” on page 278 to verify the correct
operation.

Light path for SAN Volume Controller 2145-8G4


Use light path diagnostics to solve hardware problems with the SAN Volume
Controller 2145-8G4 node.

About this task

Ensure that the node is turned on and then perform the following steps to resolve
any hardware errors indicated by the Error LED and light path LEDs:

Procedure
1. Is the Error LED, shown in Figure 97 on page 295, on the SAN Volume
Controller 2145-8G4 operator-information panel illuminated or flashing?

294 SAN Volume Controller: Troubleshooting Guide


2 1

svc00230
Figure 97. SAN Volume Controller 2145-8G4 operator-information panel

1 Release latch


2 Error LED
NO Reassess your symptoms and return to “MAP 5000: Start” on page 231.
YES Go to step 2.
2. (from step 1 on page 294)
Press the release latch and open the light path diagnostics panel, which is
shown in Figure 98.

Light Path
Diagnostics

OVER SPEC PS1 PS2

CPU VRM CNFG


REMIND
MEM NMI S ERR

SP DASD RAID

FAN TEMP BRD

PCI
svc00224

Figure 98. SAN Volume Controller 2145-8G4 light path diagnostics panel

Are one or more LEDs on the light path diagnostics panel on or flashing?
NO Verify that the operator-information panel cable is correctly seated at
both ends. If the error LED is still illuminated but no LEDs are
illuminated on the light path diagnostics panel, replace parts in the
following sequence:
a. Operator-information panel
b. System board
Verify the repair by continuing with “MAP 5700: Repair verification”
on page 278.
YES See Table 62 on page 297 and perform the action specified for the
specific light path diagnostics LEDs. Then go to step 3 on page 298.
Some actions will require that you observe the state of LEDs on the
system board. Figure 99 on page 296 shows the location of the system
board LEDs. The fan LEDs are located adjacent to each FAN. To view
the LEDs you will need to do the following:

Chapter 10. Using the maintenance analysis procedures 295


a. Turn off the node while ensuring that its data is mirrored and
synchronized. See “MAP 5350: Powering off a SAN Volume
Controller node” on page 258 for more information.
b. Identify and label all the cables that are attached to the node so that
they can be replaced in the same port. Remove the node from the
rack and place it on a flat, static-protective surface. See the
“Removing the node from a rack” information to find out how to
perform the procedure.
c. Remove the top cover and open the fan doors.
d. Press the light path diagnostics button (7 in Figure 99).

Note: The light path diagnostics button is used to illuminate the


light path diagnostics LEDs when power is disconnected from the
SAN Volume Controller 2145-8G4 node.

1 2 3 4 5 6 7 8 9 10 11 12 13

svc00231

14 15 16 17 18

Figure 99. SAN Volume Controller 2145-8G4 system board LEDs

1 System-board battery error LED


2 DIMM 5 error LED
3 DIMM 6 error LED
4 DIMM 7 error LED
5 DIMM 8 error LED
6 Light path diagnostics active LED
7 Light path diagnostics button
8 Microprocessor 2 error LED
9 Microprocessor 1 error LED
10 DIMM 1 error LED

296 SAN Volume Controller: Troubleshooting Guide


11 DIMM 2 error LED
12 DIMM 3 error LED
13 DIMM 4 error LED
14 System-board fault LED
15 Power B error LED
16 Power A error LED
17 Power C error LED
18 Power D error LED
Table 62. Diagnostics panel LED prescribed actions
Diagnostics
panel LED Action
OVER SPEC Replace parts in the following sequence:
1. Power supply
2. Power backplane
3. System board
PS1 If you have just replaced the power supply, check that it is correctly
installed. If it is correctly installed, replace parts in the following
sequence:
1. Power supply
2. Power backplane
PS2 This is not used on the SAN Volume Controller 2145-8G4. This is a false
indication. A sensor has failed or the system board service processor
firmware is not functioning correctly. Contact your support center to see if
a firmware update is available. If not, replace parts in the following
sequence:
1. Power backplane
2. Operator-information panel
3. System board
CPU A microprocessor has failed. Make sure that the failing microprocessor,
which is indicated by a lit LED on the system board, is installed correctly.
If it is installed correctly, replace the microprocessor.
VRM This is not used on the SAN Volume Controller 2145-8G4.
CNFG Microprocessor configuration error. Check the installed microprocessors
for compatibility.
MEM Observe the DIMM LEDs on the system board. If any DIMM LED is
flashing, make sure that the correct type of DIMM is installed in every
slot. Replace parts in the following sequence:
1. Failing DIMM
2. System board
Note: If more than one DIMM is indicated by the light path diagnostics,
replace the DIMMs one-at-a-time, starting at the lowest-numbered DIMM
slot that the diagnostics indicated.
NMI A non-maskable interrupt occurred. Call your support center and check if
any software updates need to be applied to this SAN Volume Controller
2145-8G4. If this node will not join the system, run node recovery. If node
recovery does not resolve the problem, replace the system board assembly.

Chapter 10. Using the maintenance analysis procedures 297


Table 62. Diagnostics panel LED prescribed actions (continued)
Diagnostics
panel LED Action
S ERR A soft error occurred. Call your support center and check if any software
updates need to be applied to this SAN Volume Controller 2145-8G4. If
this node will not join the system, run node recovery. If node recovery
does not resolve the problem, replace the system board assembly.
SP The Service processor has failed. Replace the system board assembly.
DASD This is not used on the SAN Volume Controller 2145-8G4. A sensor has
failed or the system board service processor firmware is not functioning
correctly. Contact your support center to see if a firmware update is
available. If not, replace parts in the following sequence:
1. Operator-information panel
2. System board
BRD Observe the battery LED and the system board LED. If the battery LED is
illuminated, replace the battery. If the system board LED is illuminated,
replace the system board.
FAN A fan has failed, is operating too slowly, or has been removed. A failing
fan can also cause the TEMP LED to be lit. Replace the failing fan, which
is indicated by a lit LED near the fan connector on the system board.
TEMP If any fan failures exist, repair those before attempting this procedure.
Verify that the ambient temperature is within normal operating
specifications. Make sure that airflow in and around the SAN Volume
Controller 2145-8G4 is not obstructed. If the error persists, replace the
system board.
RAID This is not used on the SAN Volume Controller 2145-8G4.
PCI The Fibre Channel card might be failing. Ensure the Fibre Channel card
and the riser card are correctly installed. If the error persists, replace the
Fibre Channel card.

3. Continue with “MAP 5700: Repair verification” on page 278 to verify the
correct operation.

Light path for SAN Volume Controller 2145-8F2 and SAN


Volume Controller 2145-8F4
Use light path diagnostics to solve hardware problems with the SAN Volume
Controller 2145-8F2 and SAN Volume Controller 2145-8F4 nodes.

About this task

Ensure that the node is turned on and then perform the following steps to resolve
any hardware errors indicated by the Error LED and light path LEDs:

Procedure
1. Is the Error LED, shown in Figure 100 on page 299, on the SAN Volume
Controller 2145-8F2 or the SAN Volume Controller 2145-8F4
operator-information panel illuminated or flashing?

298 SAN Volume Controller: Troubleshooting Guide


1 2

svc00108
Figure 100. SAN Volume Controller 2145-8F4 operator-information panel

1 Error LED


2 Release latch
NO Reassess your symptoms and return to “MAP 5000: Start” on page 231.
YES Go to step 2.
2. (from step 1 on page 298)
Press the release latch and open the light path diagnostics panel, which is
shown in Figure 101.

Light Path
Diagnostics

OVER SPEC PS1 PS2

CPU VRM CNFG


REMIND
MEM NMI S ERR

SP DASD

FAN TEMP BRD

PCI A PCI B PCI C


svc00085

Figure 101. SAN Volume Controller 2145-8F2 and SAN Volume Controller 2145-8F4 light
path diagnostics panel

Are one or more LEDs on the light path diagnostics panel on or flashing?
NO Verify that the operator-information panel cable is correctly seated at
both ends. If the error LED is still illuminated but no LEDs are
illuminated on the light path diagnostics panel, replace parts in the
following sequence:
a. Operator-information panel
b. Cable, signal, front panel
c. Frame assembly
Verify the repair by continuing with “MAP 5700: Repair verification”
on page 278.
YES See Table 63 on page 301 and perform the action specified for the
specific light path diagnostics LEDs, then go to step 3 on page 302.
Some actions will require that you observe the state of LEDs on the
system board or on the fan backplanes. The location of the system
board LEDs are shown in Figure 102 on page 300. The fan LEDs are
located adjacent to each FAN. To view the LEDs you will need to do
the following:

Chapter 10. Using the maintenance analysis procedures 299


a. Turn off the node while ensuring that its data is mirrored and
synchronized. See “MAP 5350: Powering off a SAN Volume
Controller node” on page 258 for more information.
b. Identify and label all the cables that are attached to the node so that
they can be replaced in the same port. Remove the node from the
rack and place it on a flat, static-protective surface. See the
“Removing the node from a rack” information to find out how to
perform the procedure.
c. Remove the top cover and open the fan doors.
d. Press the light path diagnostics button 1. See Figure 102.

Note: The light path diagnostics button is used to illuminate the


light path diagnostics LEDs when power is disconnected from the
SAN Volume Controller 2145-8F2 or SAN Volume Controller
2145-8F4 node.

8 7 6

9
10
11
12
13
14
15
16

1 2 3 4
svc00107

Figure 102. SAN Volume Controller 2145-8F2 and SAN Volume Controller 2145-8F4 system
board LEDs

1 Light path diagnostics button


2 System board fault LED
3 Light path activity LED
4 VRM 2 Error LED
5 CPU 2 Error LED
6 CPU 1 Error LED
7 VRM 1 Error LED
8 Battery LED
9 DIMM 1 error LED
10 DIMM 2 error LED
11 DIMM 3 error LED
12 DIMM 4 error LED

300 SAN Volume Controller: Troubleshooting Guide


13 DIMM 5 error LED
14 DIMM 6 error LED
15 DIMM 7 error LED
16 DIMM 8 error LED
Table 63. Diagnostics panel LED prescribed actions
Diagnostics
panel LED Action
OVER SPEC Replace the power supply
PS1 If you have just replaced the power supply, check that it is correctly
installed. If it is correctly installed, replace parts in the following
sequence:
1. Power supply
2. Power backplane
PS2 This is not used on the SAN Volume Controller 2145-8F2 nor the SAN
Volume Controller 2145-8F4. A sensor has failed or the system board
service processor firmware is not functioning correctly. Contact your
support center to see if a firmware update is available. If not, replace
parts in the following sequence:
1. Power backplane
2. Operator-information panel
3. Frame assembly
CPU Observe the CPU indicators on the system board. The microprocessor
adjacent to the illuminated LED is failing. If you have installed the
incorrect type of microprocessor, the LED will be flashing. Replace parts
in the following sequence:
1. Microprocessor
2. Frame assembly
VRM Observe the VRM indicators on the system board. The VRM adjacent to
the illuminated LED is failing. Verify that the VRM is correctly installed.
Replace parts in the following sequence:
1. VRM
2. Frame assembly
CNFG Observe all system board LEDs. Make sure that DIMMs, microprocessors,
and VRMs are installed correctly and are of the correct type. Replace parts
in the following sequence:
1. Component adjacent to the illuminated LED
2. Frame assembly
MEM Observe the DIMM LEDs on the system board. If any DIMM LED is
flashing, make sure that the correct type of DIMM is installed in every
slot. Replace parts in the following sequence:
1. Failing DIMM
2. Frame assembly
Note: If more than one DIMM is indicated by the light path diagnostics,
replace the DIMMs one-at-a-time, starting at the lowest-numbered DIMM
slot that the diagnostics indicated.
NMI A non-maskable interrupt occurred. Call your support center and check if
any software updates need to be applied to this SAN Volume Controller
2145-8F2 or SAN Volume Controller 2145-8F4. If this node will not join
the system, run node recovery. If node recovery does not resolve the
problem, replace the frame assembly.

Chapter 10. Using the maintenance analysis procedures 301


Table 63. Diagnostics panel LED prescribed actions (continued)
Diagnostics
panel LED Action
S ERR A soft error occurred. Call your support center and check if any software
updates need to be applied to this SAN Volume Controller 2145-8F2 or
SAN Volume Controller 2145-8F4. If this node will not join the system,
run node recovery. If node recovery does not resolve the problem, replace
the frame assembly.
SP The Service processor has failed. Replace the frame assembly.
DASD This is not used on the SAN Volume Controller 2145-8F2 or SAN Volume
Controller 2145-8F4. This is a false indication. A sensor has failed or the
system board service processor firmware is not functioning correctly.
Contact your support center to see if a firmware update is available. If
not, replace parts in the following sequence:
1. Operator-information panel
2. Frame assembly
FAN Observe the LEDs on the fan backplanes. The fan adjacent to the failing
LED is failing. Replace parts in the following sequence:
1. Fan
2. Fan backplane
TEMP If any fan failures exist, repair those before attempting this procedure.
Verify that the ambient temperature is within normal operating
specifications. Make sure that airflow in and around the SAN Volume
Controller 2145-8F2 or SAN Volume Controller 2145-8F4 is not obstructed.
Replace the frame assembly.
BRD Observe the battery LED and the system board LED. If the battery LED is
illuminated, replace the battery. If the system board LED is illuminated,
replace the frame assembly.
PCI A This is not used on the SAN Volume Controller 2145-8F2 nor SAN
Volume Controller 2145-8F4. This is a false indication. A sensor has failed
or the system board service processor firmware is not functioning
correctly. Contact your support center to see if a firmware update is
available. If not, replace parts in the following sequence:
1. Operator-information panel
2. Frame assembly
PCI B One of the Fibre Channel adapter cards connected to this bus might be
failing. Ensure that both adapters are correctly installed and that the riser
card latches are fully closed. If possible, display the Fibre Channel card
status on the SAN Volume Controller 2145-8F2 or SAN Volume Controller
2145-8F4 front panel to determine the failing card. Otherwise, remove the
Fibre Channel cards one-at-a-time to determine the failing card. Replace
parts in the following sequence:
1. Fibre Channel adapter card
2. Frame assembly
PCI C Replace the frame assembly.

3. Continue with “MAP 5700: Repair verification” on page 278 to verify the
correct operation.

MAP 5900: Hardware boot


MAP 5900: Hardware boot helps you solve problems that are preventing the node
from starting its boot sequence.

302 SAN Volume Controller: Troubleshooting Guide


Before you begin

If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures,” on page 231.

This MAP applies to all SAN Volume Controller models. Be sure that you know
which model you are using before you start this procedure. To determine which
model you are working with, look for the label that identifies the model type on
the front of the node.

You might have been sent here for one of the following reasons:
v The hardware boot display, shown in Figure 103, is displayed continuously.

Figure 103. Hardware boot display

v The node rescue display, shown in Figure 104, is displayed continuously.

Figure 104. Node rescue display

v The boot progress is hung and an error is displayed on the front panel
v Another MAP sent you here

About this task

Perform the following steps to allow the node to start its boot sequence:

Procedure
1. Is the Error LED on the operator-information panel illuminated or flashing?
NO Go to step 2.
YES Go to “MAP 5800: Light path” on page 279 to resolve the problem.
2. (From step 1)
If you have just installed the SAN Volume Controller node or have just
replaced a field replaceable unit (FRU) inside the node, perform the
following steps:
a. Ensure that the correct power cable assembly from the 2145 UPS-1U to the
node is installed. The correct power cable assembly has tape that binds the
cables together.
b. Identify and label all the cables that are attached to the node so that they
can be replaced in the same port. Remove the node from the rack and place
it on a flat, static-protective surface. See the “Removing the node from a
rack” information to find out how to perform the procedure.
c. Remove the top cover. See the “Removing the top cover” information to
find out how to perform the procedure.

Chapter 10. Using the maintenance analysis procedures 303


d. If you have just replaced a FRU, ensure that the FRU is correctly placed and
that all connections to the FRU are secure.
e. Ensure that all memory modules are correctly installed and that the latches
are fully closed. See the “Replacing the memory modules (DIMM)”
information to find out how to perform the procedure.
f. Ensure that the Fibre Channel adapter cards are correctly installed. See the
“Replacing the Fibre Channel adapter assembly” information to find out
how to perform the procedure.
g. Ensure that the disk drive and its connectors are correctly installed. See the
“Replacing the disk drive” information to find out how to perform the
procedure.
h. Ensure that the service controller is correctly installed. See the “Replacing
the service controller” information to find out how to perform the
procedure.
i. Replace the top cover. See the “Replacing the top cover” information to find
out how to perform the procedure.
j. Place the node in the rack. See the “Replacing the node in a rack”
information to find out how to perform the procedure.
k. Turn on the node.
Does the boot operation still hang?
NO Verify the repair by continuing with “MAP 5700: Repair verification”
on page 278.
YES Go to step 3.
3. (from step 2 on page 303)
Check if the system BIOS is reporting any errors. You need to attach a display
and keyboard to see the BIOS output. The customer should be able to supply a
suitable display and keyboard.
a. Turn off the node while ensuring that its data is mirrored and synchronized.
See “MAP 5350: Powering off a SAN Volume Controller node” on page 258.
b. Connect the keyboard 1 and the display 2. Figure 105 on page 305
shows the location of the keyboard and monitor ports. Figure 106 on page
305 shows the location of the keyboard and monitor ports on the 2145-CF8.
Figure 107 on page 305 shows the location of the keyboard and monitor
ports on the 2145-CG8.

304 SAN Volume Controller: Troubleshooting Guide


Model type: 2145-8G4 Model type: 2145-8A4

1 2 1 2

Model type: 2145-8F4 Model type: 2145-8F2

svc00675
1 2 1 2

Figure 105. Keyboard and monitor ports on the SAN Volume Controller models 2145-8G4,
2145-8A4, 2145-8F4 and 2145-8F2

svc00572
1

Figure 106. Keyboard and monitor ports on the SAN Volume Controller 2145-CF8

svc00723
2

Figure 107. Keyboard and monitor ports on the SAN Volume Controller 2145-CG8

c. Turn on the node.


d. Watch the display.
v If the POST sequence indicates an error, or if the BIOS
Configuration/Setup Utility program indicates an error during startup,
you need to resolve the error.
v If it indicates an error with a specific hardware item, power off the node
and remove it from the rack. Ensure the item specified is correctly
installed, replace the node in the rack, and then restart the node. If the
error is still reported, replace the specified item.
v If a configuration error is reported, run the Configuration/Setup Utility
program option to reset the BIOS to its default (factory) settings.
e. Turn off the node and remove the keyboard and display.
f. Turn on the node.
Does the boot operation still hang?

Chapter 10. Using the maintenance analysis procedures 305


NO Verify the repair by continuing with “MAP 5700: Repair verification”
on page 278.
YES Go to step 4.
4. (from step 3 on page 304)
a. Turn off the node while ensuring that its data is mirrored and synchronized.
See “MAP 5350: Powering off a SAN Volume Controller node” on page 258.
b. Identify and label all the cables that are attached to the node so that they
can be replaced in the same port. Remove the node from the rack and place
it on a flat, static-protective surface. See the “Removing the node from a
rack” information to find out how to perform the procedure.
c. Remove the top cover. See the “Removing the top cover” information to
find out how to perform the procedure.
d. Remove some of the memory modules:
v If you are using the SAN Volume Controller 2145-CG8 or the SAN
Volume Controller 2145-CF8, remove the memory modules in slots 2, 5, 7,
and 8.
v If you are using the SAN Volume Controller 2145-8A4, remove the
memory modules in slots 2 through 4.
v If you are using the SAN Volume Controller 2145-8G4, remove the
memory modules in slots 2 and 4 through 8.
v If you are using the SAN Volume Controller 2145-8F4 or the SAN Volume
Controller 2145-8F2, remove the memory modules in slots 3 through 8.
e. Remove all installed Fibre Channel cards.
f. Remove the disk drive.
g. Replace the top cover. See the “Replacing the top cover” information to find
out how to perform the procedure.
h. Place the node in the rack. See the “Replacing the node in a rack”
information to find out how to perform the procedure.
i. Turn on the node.
5. Does the boot operation still hang with the booting display (perform the NO
action) or has the boot operation progressed (perform the YES action)?

Note: With the FRUs removed, the boot will hang with a different boot failure
code.
NO Go to step 6 to replace the FRUs, one-at-a-time, until the failing FRU is
isolated.
YES Go to step 7
6. (From step 5)
Remove all hardware except the hardware that is necessary to power up.
Continue to add in the FRUs one at a time and power on each time until the
original failure is introduced.
Does the boot operation still hang?
NO Verify the repair by continuing with “MAP 5700: Repair verification”
on page 278.
YES Go to step 7.
7. (from steps 4 and 6)

306 SAN Volume Controller: Troubleshooting Guide


a. Turn off the node while ensuring that its data is mirrored and synchronized.
See “MAP 5350: Powering off a SAN Volume Controller node” on page 258
for more information.
b. Identify and label all the cables that are attached to the node so that they
can be replaced in the same port. Remove the node from the rack and place
it on a flat, static-protective surface. See the “Removing the node in a rack”
information to find out how to perform the procedure.
c. Remove the top cover. See the “Removing the top cover” information to
find out how to perform the procedure.
d. Replace the Fibre Channel cards and the disk drive.
e. Replace the memory modules:
v If you are using the SAN Volume Controller 2145-CG8 or the SAN
Volume Controller 2145-CF8, replace the memory module in slots 3 and 6
with any of the removed memory modules from slots 2, 5, 7, and 8.
v If you are using the SAN Volume Controller 2145-8A4, replace the
memory module in slot 1 with any of the removed memory modules
from slots 2 through 4.
v If you are using the SAN Volume Controller 2145-8G4, replace the
memory modules in slots 1 and 3 with any two of the removed memory
modules from slots 2 and 4 through 8.
v If you are using the SAN Volume Controller 2145-8F4 or the SAN Volume
Controller 2145-8F2, replace the memory modules in slots 1 and 2 with
any two of the removed memory modules from slots 3 through 8.
f. Replace the top cover. See the “Replacing the top cover” information to find
out how to perform the procedure.
g. Place the node in the rack. See the “Replacing the node in a rack”
information to find out how to perform the procedure.
h. Turn on the node.
Does the boot operation still hang with the booting display (perform the NO
action) or does the display progress beyond the initial booting panel
(perform the YES action)?
NO Exchange the failing memory modules for new FRUs and verify the
repair by continuing with “MAP 5700: Repair verification” on page 278.
YES Replace the parts in the following sequence:
v For the SAN Volume Controller 2145-CG8 or the SAN Volume
Controller 2145-CF8:
a. Service controller
b. System board
v For the SAN Volume Controller 2145-8A4 and SAN Volume
Controller 2145-8G4:
a. Service controller
b. System board
v For the SAN Volume Controller 2145-8F4 and SAN Volume
Controller 2145-8F2:
a. Service controller
b. Frame assembly
Verify the repair by continuing with “MAP 5700: Repair verification” on page
278.

Chapter 10. Using the maintenance analysis procedures 307


MAP 6000: Replace offline SSD
MAP 6000: This procedure replaces a solid-state drive (SSD) that has failed while it
is still a member of a storage pool.

Before you begin

If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures,” on page 231.

This map applies to models with internal solid-state drives (SSDs). Be sure that
you know which model you are using before you start this procedure. To
determine which model you are working on, look for the label that identifies the
model type on the front of the node.

About this task

Use this MAP to determine which detailed MAP to use for replacing an offline
SSD.

Attention: If the drive use property is member and the drive must be replaced,
contact IBM support before taking any actions.

Procedure

Are you using an SSD in a RAID 0 array and using volume mirroring to provide
redundancy?
Yes Go to “MAP 6001: Replace offline SSD in a RAID 0 array.”
No Go to “MAP 6002: Replace offline SSD in RAID 1 array or RAID 10 array”
on page 310.

MAP 6001: Replace offline SSD in a RAID 0 array


MAP 6001: This procedure replaces a solid-state drive (SSD) that has failed while it
is still a member of a storage pool.

Before you begin

If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures,” on page 231.

This map applies to models with internal solid-state drives (SSDs). Be sure that
you know which model you are using before you start this procedure. To
determine which model you are working on, look for the label that identifies the
model type on the front of the node.

Attention:
1. Back up your SAN Volume Controller configuration before you begin these
steps.
2. If the drive use property is member and the drive must be replaced, contact IBM
support before taking any actions.

About this task

Perform the following steps only if a drive in a RAID 0 (striped) array has failed:

308 SAN Volume Controller: Troubleshooting Guide


Procedure
1. Record the properties of all volume copies, MDisks, and storage pools that are
dependent on the failed drive.
a. Identify the drive ID and the error sequence number with status equals
offline and use equals failed using the lsdrive CLI command.
b. Review the offline reason using the lsevent <seq_no> CLI command.
c. Obtain detailed information about the offline drive or drives using the
lsdrive <drive_id> CLI command.
d. Record the mdisk_id, mdisk_name, node_id, node_name, and slot_id for each
offline drive.
e. Obtain the storage pools of the failed drives using the lsmdisk <mdisk_id>
CLI command for each MDisk that was identified in the substep 1c.
Continue with the following steps by replacing all the failed drives in one
of the storage pools. Make note of the node, slot, and ID of the selected
drives.
f. Determine all the MDisks in the storage pool using the lsmdisk
-filtervalue mdisk_grp_id=<grp id> CLI command.
g. Identify which MDisks are internal (ctrl_type equals 4) and which
MDisks contain SSDs (ctrl_type equals 6).
h. Find the volumes with extents in the storage pool using the lsmdiskmember
<mdisk_id> CLI command for each MDisk found in substep 1f.
It is likely that the same volumes will be returned for each MDisk.
i. Record all the properties on each volume listed in step 1h by using the
lsvdisk <vdisk_id> CLI command. For each volume check if it has online
volume copies which indicate it is mirrored. Use this information in step 9
on page 310.
j. Obtain a list of all the drives in each internal MDisk in the storage pool
using the lsdrive -filtervalue mdisk_id=<mdisk_id> CLI command. Use
this information in step 8 on page 310.
k. Record all the properties of all the MDisks in the storage pool using the
lsmdisk <mdisk_id> CLI command. Use this information in step 8 on page
310.
l. Record all the properties of the storage pool using the lsmdisk <mdisk_id>
CLI command. Use this information in step 7 on page 310.

Note: If a listed volume has a mirrored, online, and in-sync copy, you can
recover the copied volume data from the copy. All the data on the unmirrored
volumes will be lost and will need to be restored from backup.
2. Delete the storage pool using the rmmdiskgrp -force <mdiskgrp id> CLI
command.
All MDisks and volume copies in the storage pool are also deleted. If any of
the volume copies were the last in-sync copy of a volume, all the copies that
are not in sync are also deleted, even if they are not in the storage pool.
3. Using the drive ID that you recorded in substep 1e, set the use property of the
drive to unused using the chdrive command.
chdrive -use unused <id of offline drive>
The drive is removed from the drive listing.
4. Follow the physical instructions to replace or remove a drive. See the
“Replacing a SAN Volume Controller 2145-CG8 solid-state drive (SSD)”

Chapter 10. Using the maintenance analysis procedures 309


documentation or the “Removing a SAN Volume Controller 2145-CG8
solid-state drive (SSD)” documentation to find out how to perform the
procedures.
5. A new drive object is created with the use attribute set to unused. This action
might take several minutes.
Obtain the ID of the new drive using the lsdrive CLI command.
6. Change the use property for the new drive to candidate.
chdrive -use candidate <drive id of new drive>
7. Create a new storage pool with the same properties as the deleted storage
pool. Use the properties that you recorded in substep 1l.
mkmdiskgrp -name <mdiskgrp name as before> -ext <extent size as before>
8. Create again all MDisks that were previously in the storage pool using the
information from steps 1j and 1k.
v For internal RAID 0 MDisks, use this command:
mkarray -level raid0 -drive <list of drive IDs> -name
<mdisk_name> <mdiskgrp id or name>
where -name <mdisk_name> is optional, but you can use the parameter to
make the new array have the same MDisk name as the old array.
v For external MDisks, use the addmdisk CLI command.
v For non-RAID 0 MDisks, use the mkarray CLI command.
9. For all the volumes that had online, in sync, mirrored volume copies before
the MDisk group was deleted, add a new volume copy in the new storage
pool to restore redundancy using the following command:
addvdiskcopy -mdiskgrp <mdiskgrp id> -vtype striped -easytier
<on or off as before> <vdisk_id>
10. For any volumes that did not have an online, in sync, mirrored copy, create
the volume again and restore the data from a backup or use other methods.
11. Mark the drive error as fixed using the error sequence number from step 1b.
cherrstate -sequencenumber <error_sequence_number>

MAP 6002: Replace offline SSD in RAID 1 array or RAID 10


array
MAP 6002: This procedure replaces a solid-state drive (SSD) that has failed while it
is still a member of a storage pool.

Before you begin

If you are not familiar with these maintenance analysis procedures (MAPs), first
read Chapter 10, “Using the maintenance analysis procedures,” on page 231.

This map applies to models with internal solid-state drives (SSDs). Be sure that
you know which model you are using before you start this procedure. To
determine which model you are working on, look for the label that identifies the
model type on the front of the node.

310 SAN Volume Controller: Troubleshooting Guide


Attention:
1. Back up your SAN Volume Controller configuration before you begin these
steps.
2. If the drive use property is member and the drive must be replaced, contact IBM
support before taking any actions.

About this task

Perform the following steps if a drive fails in a RAID 1 or RAID 10 array:

Procedure
1. Make sure the drive property use is not member.
Use the lsdrive CLI command to determine the use.
2. Record the drive property values of the node ID and the slot ID for use in step
4. These values identify which physical drive to remove.
3. Record the error sequence number for use in step 11.
4. Use the drive ID that you recorded in step 2 to set the use attribute property
of the drive to unused with the chdrive command.
chdrive -use failed <id of offline drive>
chdrive -use unused <id of offline drive>
The drive is removed from the drive listing.
5. Follow the physical instructions to replace or remove a drive. See the
“Replacing a SAN Volume Controller 2145-CG8 solid-state drive (SSD)”
documentation or the “Removing a SAN Volume Controller 2145-CG8
solid-state drive (SSD)” documentation to find out how to perform the
procedures.
6. A new drive object is created with the use property set to unused.
7. Change the use property for the drive to candidate.
chdrive -use candidate <id of new drive>
8. Change the use property for the drive to spare.
chdrive -use spare <id of new drive>
v If you are using spare drives, perform a member exchange. Move data from
the spare to the newly inserted device.
v If you do not have a spare, when you mark the drive object as spare, the
array starts to build on the newly inserted device.
9. If the spare is not a perfect match for the replaced drive, then the array is
considered unbalanced, and error code 1692 is recorded in the error log.
10. Follow the fix procedure to complete the procedure.
11. Mark the drive error as fixed using the error sequence number from step 3.
cherrstate -sequencenumber <error_sequence_number>

Chapter 10. Using the maintenance analysis procedures 311


312 SAN Volume Controller: Troubleshooting Guide
Appendix. Accessibility
Accessibility features help a user who has a physical disability, such as restricted
mobility or limited vision, to use software products successfully.

Features

This list includes the major accessibility features in the management GUI:
v You can use screen-reader software and a digital speech synthesizer to hear what
is displayed on the screen. The following screen reader has been tested: JAWS
11.
v Most of the GUI features are accessible by using the keyboard. For those features
that are not accessible, equivalent function is available by using the
command-line interface (CLI).
v When setting or changing an IP address on the SAN Volume Controller front
panel, you can disable the fast increase function to reduce the address scrolling
speed of the up and down buttons to two seconds. This feature is documented
in the topic that discusses initiating cluster (system) creation from the front
panel, which is located in the IBM System Storage SAN Volume Controller
Information Center and the IBM System Storage SAN Volume Controller Software
Installation and Configuration Guide.

Navigating by keyboard
You can use keys or key combinations to perform operations and initiate many
menu actions that can also be done through mouse actions. You can navigate the
management GUI and help system from the keyboard by using the following key
combinations:
v To navigate between different GUI panels, select the Low-graphics mode option
on the GUI login panel. You can use this option to navigate to all the panels
without manually typing the web addresses.
v To go to the next frame, press Ctrl+Tab.
v To move to the previous frame, press Shift+Ctrl+Tab.
v To navigate to the next link, button, or topic within a panel, press Tab inside a
frame (page).
v To move to the previous link, button, or topic within a panel, press Shift+Tab.
v To select GUI objects, press Enter.
v To print the current page or active frame, press Ctrl+P.
v To expand a tree node, press the Right Arrow key. To collapse a tree node, press
the Left Arrow key.
v To scroll all the way up, press Home; to scroll all the way down, press End.
v To go back, press Alt+Left Arrow key.
v To go forward, press Alt+Right Arrow key.
v For actions menus:
– Press Tab to navigate to the grid header.
– Press the Left or Right Arrow keys to reach the drop-down field.
– Press Enter to open the drop-down menu.
– Press the Up or Down Arrow keys to select the menu items.

© Copyright IBM Corp. 2003, 2012 313


– Press Enter to launch the action.
v For filter panes:
– Press Tab to navigate to the filter panes.
– Press the Up or Down Arrow keys to change the filter or navigation for
nonselection.
– Press Tab to navigate to the magnifying glass icon in the filter pane and press
Enter.
– Type the filter text.
– Press Tab to navigate to the red X icon and press Enter to reset the filter.
v For information areas:
– Press Tab to navigate to information areas.
– Press Tab to navigate to the fields that are available for editing.
– Type your edit and press Enter to issue the change command.

Accessing the publications

You can find the HTML version of the IBM System Storage SAN Volume Controller
information at the following website:

publib.boulder.ibm.com/infocenter/svc/ic/index.jsp

You can access this information using screen-reader software and a digital speech
synthesizer to hear what is displayed on the screen. The information was tested
using the following screen reader: JAWS Version 10 or later.

314 SAN Volume Controller: Troubleshooting Guide


Notices
This information was developed for products and services offered in the U.S.A.

IBM may not offer the products, services, or features discussed in this document in
other countries. Consult your local IBM representative for information on the
products and services currently available in your area. Any reference to an IBM
product, program, or service is not intended to state or imply that only that IBM
product, program, or service may be used. Any functionally equivalent product,
program, or service that does not infringe any IBM intellectual property right may
be used instead. However, it is the user's responsibility to evaluate and verify the
operation of any non-IBM product, program, or service.

IBM may have patents or pending patent applications covering subject matter
described in this document. The furnishing of this document does not grant you
any license to these patents. You can send license inquiries, in writing, to:

IBM Director of Licensing


IBM Corporation
North Castle Drive
Armonk, NY 10504-1785
U.S.A.

For license inquiries regarding double-byte character set (DBCS) information,


contact the IBM Intellectual Property Department in your country or send
inquiries, in writing, to:

Intellectual Property Licensing


Legal and Intellectual Property Law
IBM Japan Ltd.
1623-14, Shimotsuruma, Yamato-shi
Kanagawa 242-8502 Japan

The following paragraph does not apply to the United Kingdom or any other
country where such provisions are inconsistent with local law:
INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS
PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER
EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS
FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or
implied warranties in certain transactions, therefore, this statement may not apply
to you.

This information could include technical inaccuracies or typographical errors.


Changes are periodically made to the information herein; these changes will be
incorporated in new editions of the publication. IBM may make improvements
and/or changes in the product(s) and/or the program(s) described in this
publication at any time without notice.

Any references in this information to non-IBM Web sites are provided for
convenience only and do not in any manner serve as an endorsement of those Web
sites. The materials at those Web sites are not part of the materials for this IBM
product and use of those Web sites is at your own risk.

© Copyright IBM Corp. 2003, 2012 315


IBM may use or distribute any of the information you supply in any way it
believes appropriate without incurring any obligation to you.

Licensees of this program who wish to have information about it for the purpose
of enabling: (i) the exchange of information between independently created
programs and other programs (including this one) and (ii) the mutual use of the
information which has been exchanged, should contact:

IBM Corporation
Almaden Research
650 Harry Road
Bldg 80, D3-304, Department 277
San Jose, CA 95120-6099
U.S.A.

Such information may be available, subject to appropriate terms and conditions,


including in some cases, payment of a fee.

The licensed program described in this document and all licensed material
available for it are provided by IBM under terms of the IBM Customer Agreement,
IBM International Program License Agreement or any equivalent agreement
between us.

Any performance data contained herein was determined in a controlled


environment. Therefore, the results obtained in other operating environments may
vary significantly. Some measurements may have been made on development-level
systems and there is no guarantee that these measurements will be the same on
generally available systems. Furthermore, some measurements may have been
estimated through extrapolation. Actual results may vary. Users of this document
should verify the applicable data for their specific environment.

Information concerning non-IBM products was obtained from the suppliers of


those products, their published announcements or other publicly available sources.
IBM has not tested those products and cannot confirm the accuracy of
performance, compatibility or any other claims related to non-IBM products.
Questions on the capabilities of non-IBM products should be addressed to the
suppliers of those products.

All statements regarding IBM's future direction or intent are subject to change or
withdrawal without notice, and represent goals and objectives only.

This information is for planning purposes only. The information herein is subject to
change before the products described become available.

This information contains examples of data and reports used in daily business
operations. To illustrate them as completely as possible, the examples include the
names of individuals, companies, brands, and products. All of these names are
fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.

COPYRIGHT LICENSE:

This information contains sample application programs in source language, which


illustrate programming techniques on various operating platforms. You may copy,
modify, and distribute these sample programs in any form without payment to
IBM, for the purposes of developing, using, marketing or distributing application

316 SAN Volume Controller: Troubleshooting Guide


programs conforming to the application programming interface for the operating
platform for which the sample programs are written. These examples have not
been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or
imply reliability, serviceability, or function of these programs. The sample
programs are provided "AS IS", without warranty of any kind. IBM shall not be
liable for any damages arising out of your use of the sample programs.

If you are viewing this information softcopy, the photographs and color
illustrations may not appear.

Trademarks
IBM, the IBM logo, and ibm.com® are trademarks or registered trademarks of
International Business Machines Corp., registered in many jurisdictions worldwide.
Other product and service names might be trademarks of IBM or other companies.
A current list of IBM trademarks is available on the web at Copyright and
trademark information at www.ibm.com/legal/copytrade.shtml.

Adobe and the Adobe logo are either registered trademarks or trademarks of
Adobe Systems Incorporated in the United States, and/or other countries.

Intel, Intel logo, Intel Xeon, and Pentium are trademarks or registered trademarks
of Intel Corporation or its subsidiaries in the United States and other countries.

Linux is a registered trademark of Linus Torvalds in the United States, other


countries, or both.

Microsoft, Windows, Windows NT, and the Windows logo are trademarks of
Microsoft Corporation in the United States, other countries, or both.

UNIX is a registered trademark of The Open Group in the United States and other
countries.

Java and all Java-based trademarks and logos are trademarks or registered
trademarks of Oracle and/or its affiliates.

Electronic emission notices


The following electronic emission statements apply to this product. The statements
for other products that are intended for use with this product are included in their
accompanying documentation.

Federal Communications Commission (FCC) statement


This explains the Federal Communications Commission's (FCC) statement.

This equipment has been tested and found to comply with the limits for a Class A
digital device, pursuant to Part 15 of the FCC Rules. These limits are designed to
provide reasonable protection against harmful interference when the equipment is
operated in a commercial environment. This equipment generates, uses, and can
radiate radio frequency energy and, if not installed and used in accordance with
the instruction manual, might cause harmful interference to radio communications.
Operation of this equipment in a residential area is likely to cause harmful
interference, in which case the user will be required to correct the interference at
his own expense.

Notices 317
Properly shielded and grounded cables and connectors must be used in order to
meet FCC emission limits. IBM is not responsible for any radio or television
interference caused by using other than recommended cables and connectors, or by
unauthorized changes or modifications to this equipment. Unauthorized changes
or modifications could void the user's authority to operate the equipment.

This device complies with Part 15 of the FCC Rules. Operation is subject to the
following two conditions: (1) this device might not cause harmful interference, and
(2) this device must accept any interference received, including interference that
might cause undesired operation.

Industry Canada compliance statement


This Class A digital apparatus complies with ICES-003.

Cet appareil numérique de la classe A est conform à la norme NMB-003 du


Canada.

Avis de conformité à la réglementation d'Industrie Canada


Cet appareil numérique de la classe A est conforme à la norme NMB-003 du
Canada.

Australia and New Zealand Class A Statement


Attention: This is a Class A product. In a domestic environment this product
might cause radio interference in which case the user might be required to take
adequate measures.

European Union Electromagnetic Compatibility Directive


This product is in conformity with the protection requirements of European Union
(EU) Council Directive 2004/108/EC on the approximation of the laws of the
Member States relating to electromagnetic compatibility. IBM cannot accept
responsibility for any failure to satisfy the protection requirements resulting from a
non-recommended modification of the product, including the fitting of non-IBM
option cards.

Attention: This is an EN 55022 Class A product. In a domestic environment this


product might cause radio interference in which case the user might be required to
take adequate measures.

Responsible Manufacturer:

International Business Machines Corp.


New Orchard Road
Armonk, New York 10504
914-499-1900

European community contact:

IBM Deutschland GmbH


Technical Regulations, Department M372

318 SAN Volume Controller: Troubleshooting Guide


IBM-Allee 1, 71139 Ehningen, Germany
Tele: +49 7032 15 2941
e-mail: mailto:[email protected]

Germany Electromagnetic compatibility directive


Deutschsprachiger EU Hinweis: Hinweis für Geräte der Klasse A
EU-Richtlinie zur Elektromagnetischen Verträglichkeit

Dieses Produkt entspricht den Schutzanforderungen der EU-Richtlinie


2004/108/EG zur Angleichung der Rechtsvorschriften über die elektromagnetische
Verträglichkeit in den EU-Mitgliedsstaaten und hält die Grenzwerte der EN 55022
Klasse A ein.

Um dieses sicherzustellen, sind die Geräte wie in den Handbüchern beschrieben zu


installieren und zu betreiben. Des Weiteren dürfen auch nur von der IBM
empfohlene Kabel angeschlossen werden. IBM übernimmt keine Verantwortung für
die Einhaltung der Schutzanforderungen, wenn das Produkt ohne Zustimmung der
IBM verändert bzw. wenn Erweiterungskomponenten von Fremdherstellern ohne
Empfehlung der IBM gesteckt/eingebaut werden.

EN 55022 Klasse A Geräte müssen mit folgendem Warnhinweis versehen werden:

"Warnung: Dieses ist eine Einrichtung der Klasse A. Diese Einrichtung kann im
Wohnbereich Funk-Störungen verursachen; in diesem Fall kann vom Betreiber
verlangt werden, angemessene Mabnahmen zu ergreifen und dafür
aufzukommen."

Deutschland: Einhaltung des Gesetzes über die


elektromagnetische Verträglichkeit von Geräten

Dieses Produkt entspricht dem "Gesetz über die elektromagnetische Verträglichkeit


von Geräten (EMVG)." Dies ist die Umsetzung der EU-Richtlinie 2004/108/EG in
der Bundesrepublik Deutschland.

Zulassungsbescheinigung laut dem Deutschen Gesetz über die


elektromagnetische Verträglichkeit von Geräten (EMVG) (bzw. der
EMC EG Richtlinie 2004/108/EG) für Geräte der Klasse A

Dieses Gerät ist berechtigt, in übereinstimmung mit dem Deutschen EMVG das
EG-Konformitätszeichen - CE - zu führen.

Verantwortlich für die Einhaltung der EMV Vorschriften ist der Hersteller:
International Business Machines Corp.
New Orchard Road
Armonk,New York 10504
Tel: 914-499-1900

Der verantwortliche Ansprechpartner des Herstellers in der EU ist:

IBM Deutschland GmbH


Technical Regulations, Department M372
IBM-Allee 1, 71139 Ehningen, Germany
Tele: +49 7032 15 2941
e-mail: mailto:[email protected]

Notices 319
Generelle Informationen: Das Gerät erfüllt die
Schutzanforderungen nach EN 55024 und EN 55022 Klasse A.

Japan VCCI Council Class A statement

People's Republic of China Class A Electronic Emission


Statement

International Electrotechnical Commission (IEC) statement


This product has been designed and built to comply with (IEC) Standard 950.

United Kingdom telecommunications requirements


This apparatus is manufactured to the International Safety Standard EN60950 and
as such is approved in the U.K. under approval number NS/G/1234/J/100003 for
indirect connection to public telecommunications systems in the United Kingdom.

Korean Communications Commission (KCC) Class A


Statement

320 SAN Volume Controller: Troubleshooting Guide


Russia Electromagnetic Interference (EMI) Class A Statement

rusemi
Taiwan Class A compliance statement

European Contact Information


This topic contains the product service contact information for Europe.

European Community contact:


IBM Technical Regulations
Pascalstr. 100, Stuttgart, Germany 70569
Tele: 0049 (0)711 785 1176
Fax: 0049 (0)711 785 1283
Email: mailto: tjahn @ de.ibm.com

Taiwan Contact Information


This topic contains the product service contact information for Taiwan.
IBM Taiwan Product Service Contact Information:
IBM Taiwan Corporation
3F, No 7, Song Ren Rd., Taipei Taiwan
Tel: 0800-016-888
f2c00790

Notices 321
322 SAN Volume Controller: Troubleshooting Guide
Index
Numerics active status 104
adding
circuit breakers (continued)
requirements (continued)
10 Gbps Ethernet nodes 71 SAN Volume Controller
link failures 268 address 2145-CG8 37
MAP 5550 268 MAC 106 CLI
10 Gbps Ethernet card Address Resolution Protocol (ARP) 5 cluster (system) commands 75
activity LED 19 addressing service commands 76
10G Ethernet 206, 268 configuration node 5 CLI commands
2145 UPS-1U lssystem
alarm 53 displaying clustered system
circuit breakers 54
connecting 52 B properties 89
cluster (system) CLI
connectors 54 back-panel assembly
accessing 76
controls and indicators on the front SAN Volume Controller 2145-8A4
when to use 75
panel 52 connectors 24
cluster (system) commands
description of parts 54 indicators 24
CLI 75
dip switches 54 SAN Volume Controller 2145-8F2
clustered system
environment 56 connectors 31
restore 218
heat output of node 38 indicators 30
T3 recovery 218
Load segment 1 indicator 53 SAN Volume Controller 2145-8F4
clustered systems
Load segment 2 indicator 53 connectors 28
adding nodes 71
MAP indicators 28
Call Home email 128, 131
5150: 2145 UPS-1U 249 SAN Volume Controller 2145-8G4
deleting nodes 69
5250: repair verification 254 connectors 26
error codes 156
nodes indicators 26
IP address
heat output 38 SAN Volume Controller 2145-CF8
configuration node 5
on or off button 54 connectors 22
IP failover 6
on-battery indicator 53 indicators 22
IPv4 address 104
operation 51 SAN Volume Controller 2145-CG8
IPv6 address 105
overload indicator 54 connectors 20
metadata, saving 99
ports not used 54 indicators 19
options 104
power-on indicator 54 backing up
overview 5
service indicator 53 system configuration files 220
properties 89
test and alarm-reset button 54 backup configuration files
recovery codes 156
unused ports 54 deleting
removing nodes 69
using the CLI 225
restore 212
restoring 222
T3 recovery 212
A bad blocks 229
battery
codes
about this document node error
Charging, front panel display 98
sending comments xvii critical 155
power 99
ac and dc LEDs 33 noncritical 155
boot
ac power switch, cabling 49 node rescue 155
codes, understanding 154
accessibility commands
failed 97
keyboard 313 svcconfig backup 220
progress indicator 97
repeat rate svcconfig restore 222
buttons, navigation 13
up and down buttons 313 comments, sending xvii
repeat rate of up and down configuration
buttons 122 node failover 6
shortcut keys 313 C configuration event IDs 136
accessing Call Home 128, 131 configuration node 5
cluster (system) CLI 76 Canadian electronic emission notice 318 connecting
management GUI 68 charging 98 2145 UPS-1U 52
publications 313 circuit breakers connectors
service assistant 75 2145 UPS-1U 54 2145 UPS-1U 54
service CLI 77 requirements SAN Volume Controller 2145-8A4 24
action menu options SAN Volume Controller SAN Volume Controller 2145-8F2 31
front panel display 108 2145-8A4 42 SAN Volume Controller 2145-8F4 28
sequence 108 SAN Volume Controller SAN Volume Controller 2145-8G4 26
action options 2145-8G4 44 SAN Volume Controller 2145-CF8 22
node SAN Volume Controller SAN Volume Controller 2145-CG8 20
create cluster 113 2145-CF8 39

© Copyright IBM Corp. 2003, 2012 323


contact information deleting electronic emission notices (continued)
European 321 backup configuration files New Zealand 318
Taiwan 321 using the CLI 225 People's Republic of China 320
controls and indicators on the front panel nodes 69 Taiwan 321
2145 UPS-1U determining United Kingdom 320
alarm 53 hardware boot failure 154 emails
illustration 52 SAN problem 205 Call Home
Load segment 1 indicator 53 Deutschsprachiger EU Hinweis 319 event notifications 130
Load segment 2 indicator 53 diagnosing problems inventory information 131
on or off button 54 through error codes 125 inventory information 131
on-battery indicator 53 through event logs 125 EMC statement, People's Republic of
overload indicator 54 with SAN Volume Controller 125 China 320
power-on indicator 54 display on front panel Enter Service?
test and alarm-reset button 54 Change WWNN option 120 option 120
front-panel display 13 Enter Service? option 120 error codes 143
SAN Volume Controller Exit Actions option 122 front panel display 98
navigation buttons 13 Exit Service option 120 understanding 132
node status LED 12 IPv6 address 105 error event IDs 143
select button 13 Node WWNN 106 error events 126
SAN Volume Controller 2145-8A4 overview 13 error LED 14
illustration 11 Paced Upgrade option 121 errors
operator-information panel 15 Recover Cluster 121 logs
SAN Volume Controller 2145-8F2 Rescue Node option 122 describing the fields 127
error LED 14 Service Address 106 error events 126
illustration 12 Service DHCPv4 119 managing 127
operator information panel 17 Service DHCPv6 119 understanding 126
SAN Volume Controller 2145-8F4 Set FC Speed option 121 viewing 127
illustration 12 status indicators node 155
operator information panel 17 action menu options 108 Ethernet
SAN Volume Controller 2145-8G4 boot failed 97 activity LED 19, 32
illustration 11 boot progress 97 link failures 6, 265
operator information panel 16 charging 98 link LED 32
SAN Volume Controller 2145-CF8 error codes 98 MAP 5500 265
illustration 10 hardware boot 98 port 106
operator-information panel 15 menu options 102 European contact information 321
SAN Volume Controller 2145-CG8 node rescue request 99 European Union (EU), EMC Directive
illustration 9 power failure 99 conformance statement 318
operator-information panel 14 powering off 99 event IDs 132
status indicators recovering 100 event notifications
action menu options 108 restarting 100 inventory information email 131
boot failed 97 shutting down 100 overview 128
boot progress 97 validate WWNN? 101 events
charging 98 version 106 reporting 125
error codes 98 displaying examples
hardware boot 98 IPv6 address 105 clusters in SAN fabric 7
menu options 102 displaying vital product data 87 redundant ac power switch
node rescue request 99 documentation cabling 49
power failure 99 improvement xvii Exit Actions
powering off 99 option 122
recovering 100 Exit Service
restarting 100
shutting down 100
E option 120
electronic emission notices
create cluster
Avis de conformité à la
action option 113
create clustered system
réglementation d'Industrie F
Canada 318 fabric
error codes 156
Deutschsprachiger EU Hinweis 319 SAN overview 7
critical
European Union (EU) 318 failover, configuration node 5
node errors 155
Federal Communications Commission FCC (Federal Communications
(FCC) 317 Commission) electronic emission
French Canadian 318 notice 317
D Germany 319 Federal Communications Commission
defining FRUs Industry Canada 318 (FCC) electronic emission notice 317
for the redundant ac-power International Electrotechnical Fibre Channel
switch 65 Commission (IEC) 320 LEDs 31
for the SAN Volume Controller 57 Japanese Voluntary Control Council link failures 206
degraded status 104 for Interference (VCCI) 320 MAP 271
Korean 320 port menu option 107

324 SAN Volume Controller: Troubleshooting Guide


Fibre Channel (continued)
port numbers 35
G indicators and controls on the front
panel (continued)
SFP transceiver 206 gateway status indicators (continued)
field replaceable units menu option 105 charging 98
redundant ac-power switch node option 115, 117 error codes 98
describing 65 Germany electronic emission compliance hardware boot 98
SAN Volume Controller statement 319 menu options 102
describing 57 node rescue request 99
disk drive assembly 57 power failure 99
disk drive cables 57 H powering off 99
Ethernet cable 57 hard-disk drive activity LED 17 recovering 100
fan assembly 57 hardware restarting 100
Fibre Channel cable 57 boot 98, 303 shutting down 100
Fibre Channel SFP transceiver 57 boot failure 154 indicators on the rear panel 32
frame assembly 57 components 9 10 Gbps Ethernet card 19
front panel 57 failure 98 ac and dc LEDs 33, 34, 35
operator-information panel 57 node 9 Ethernet
power cable assembly 57 activity LED 19, 32
service controller 57 link LED 32

fields
system board assembly 57
I Fibre Channel LEDs 31
power-supply error LED 33
description for the node vital product I/O operations, stopped 99 power, location, and system-error
data 90 identification LEDs 33
description for the system vital label, node 14 SAN Volume Controller 2145-CG8
product data 94 name 106 Ethernet activity LED 19
device 90 number 106 information
event log 127 IEC (International Electrotechnical center xiv
fibre-adapter card 90 Commission) electronic emission information, system
front panel 90 notice 320 LED 18
memory module 90 inactive status 104 informational events 132
processor 90 indicators and controls on the front panel International Electrotechnical Commission
processor cache 90 2145 UPS-1U (IEC) electronic emission notice 320
software 90 alarm 53 inventory information
system 94 illustration 52 emails 131
system board 90 Load segment 1 indicator 53 event notifications 128
uninterruptible power supply 90 Load segment 2 indicator 53 IP address
fix on or off button 54 cluster 105
errors 213 on-battery indicator 53 cluster (system) 104
French Canadian electronic emission overload indicator 54 IPv6 105
notice 318 power-on indicator 54 service 118
front panel test and alarm-reset button 54 system 105
2145 UPS-1U 52 SAN Volume Controller IPv4 address 104
action menu options 108 navigation buttons 13 IPv6
booting 123 node status LED 12 address 105
buttons and indicators 97 select button 13 gateway menu option 105
charging 123 SAN Volume Controller 2145-8A4 prefix mask menu option 105
display 13 illustration 11 iSCSI
ID 14 operator-information panel 15 link problems 207
menu options 102 SAN Volume Controller 2145-8F2
Ethernet 106 error LED 14
illustration 12
Fibre Channel port-1 through
port-4 107 operator information panel 17 J
SAN Volume Controller 2145-8F4 Japanese electronic emission notice 320
IPv4 address 104
IPv6 address 105 illustration 12
Language? 122 operator information panel 17
node 106 SAN Volume Controller 2145-8G4 K
version 106 illustration 11 keyboard
power failure 123 operator information panel 16 accessibility 313
powering off the SAN Volume SAN Volume Controller 2145-CF8 Korean electronic emission
Controller 123 illustration 10 statement 320
recovering 123 operator-information panel 15
SAN Volume Controller 97 SAN Volume Controller 2145-CG8
illustration 9
front panel display
node rescue request 226 operator-information panel 14 L
status indicators language menu selection options 122
action menu options 108 LEDs
boot failed 97 ac and dc 33, 34, 35
boot progress 97 diagnostics 279

Index 325
LEDs (continued) MAP menu options (continued)
Ethernet 5000: Start 231 clusters (continued)
activity 19, 32 5050: Power SAN Volume Controller status 104
link 32 2145-CG8, 2145-CF8, 2145-8G4, Ethernet
Fibre Channel 31 2145-8F4, and 2145-8F2 238 MAC address 106
hard-disk drive activity 17 5060: Power 2145-8A4 245 port 106
location 19, 33 5150: 2145 UPS-1U 249 speed 106
power 18, 33 5250: 2145 UPS-1U repair Fibre Channel port-1 through
power-supply error 33 verification 254 port-4 107
rear-panel indicators 19, 22, 24, 26, 5320: Redundant ac power 255 front panel display 102
28, 30 5340: Redundant ac power IPv4 gateway 105
SAN Volume Controller 2145-8A4 24 verification 256 IPv6 gateway 105
SAN Volume Controller 2145-8F2 30 5400: Front panel 263 IPv6 prefix 105
SAN Volume Controller 2145-8F4 28 5500: Ethernet 265 Language? 122
SAN Volume Controller 2145-8G4 26 5550: 10 Gbps Ethernet 268 node
SAN Volume Controller 2145-CF8 22 5600: Fibre Channel 271 options 106
SAN Volume Controller 2145-CG8 19 5700: Repair verification 278 status 106
system information 18 5800: Light path 279 SAN Volume Controller
system-error 17, 33 5900: Hardware boot 303 active 104
legal notices 6000: Replace offline SSD 308 degraded 104
Notices 315 6001 Replace offline SSD in a RAID 0 inactive 104
trademarks 317 array 308 sequence 102
light path MAP 279 6002: Replace offline SSD in a RAID 1 system
link failures array or RAID 10 array 310 gateway 105
Fibre Channel 206 power off SAN Volume Controller IPv6 prefix 105
link problems node 258 status 106
iSCSI 207 MAPs (maintenance analysis procedures) message classification 157
Load segment 1 indicator 53 10 Gbps Ethernet 268
Load segment 2 indicator 53 2145 UPS-1U 249
locator LED 19
log files
2145 UPS-1U repair verification 254
Ethernet 265
N
navigation
viewing 127 Fibre Channel 271
buttons 13
front panel 263
create cluster 113
hardware boot 303
Language? 122
M light path 279
power
recover cluster 122
MAC address 106 New Zealand electronic emission
SAN Volume Controller
maintenance analysis procedures (MAPs) statement 318
2145-8A4 245
10 Gbps Ethernet 268 node
SAN Volume Controller
2145 UPS-1U 249 create cluster 113
2145-8F2 238
Ethernet 265 options
SAN Volume Controller
Fibre Channel 271 create cluster? 113
2145-8F4 238
front panel 263 gateway 117
SAN Volume Controller
hardware boot 303 IPv4 address 113
2145-8G4 238
light path 279 IPv4 confirm create? 115
SAN Volume Controller
overview 231 IPv4 gateway 115
2145-CF8 238
power IPv4 subnet mask 114
SAN Volume Controller
SAN Volume Controller IPv6 address 116
2145-CG8 238
2145-8A4 245 IPv6 Confirm Create? 117
power off 258
SAN Volume Controller IPv6 prefix 116
redundant ac power 255, 256
2145-8F2 238 Remove Cluster? 121
repair verification 278
SAN Volume Controller status 106
SSD failure 308, 310
2145-8F4 238 subnet mask 114
start 231
SAN Volume Controller rescue request 99
using 231
2145-8G4 238 software failure 238, 245
media access control (MAC) address 106
SAN Volume Controller node canisters
medium errors 229
2145-CG8 238 configuration 5
menu options
repair verification 278 node rescue
clustered system
SSD failure 308, 310 codes 155
IPv4 address 104
start 231 node status LED 12
IPv4 gateway 105
management GUI nodes
IPv4 subnet 105
accessing 68 adding 71
clustered systems
shut down a node 258 cache data, saving 99
IPv6 address 105
management GUI interface configuration 5
clusters
when to use 68 addressing 5
IPv6 address 105
managing failover 5
options 104
event log 127 deleting 69
reset password 122
failover 6

326 SAN Volume Controller: Troubleshooting Guide


nodes (continued) panel (continued) power (continued)
hard disk drive failure 99 operator information (continued) requirements
identification label 14 SAN Volume Controller SAN Volume Controller
options 2145-8G4 16 2145-8A4 41
main 106 SAN Volume Controller SAN Volume Controller
removing 69 2145-CF8 15 2145-8F2 45
replacing nondisruptively 80 SAN Volume Controller SAN Volume Controller
rescue 2145-CG8 14 2145-8F4 45
performing 226 rear SAN Volume Controller
viewing SAN Volume Controller 2145-8G4 43
general details 88 2145-8A4 24 SAN Volume Controller
vital product data 87 SAN Volume Controller 2145-CF8 39
noncritical 2145-8F2 30 SAN Volume Controller
node errors 155 SAN Volume Controller 2145-CG8 36
not used 2145-8F4 28 restored 99
2145 UPS-1U ports 54 SAN Volume Controller switch, failure 238, 245
location LED 33 2145-8G4 26 uninterruptible power supply 123
notifications SAN Volume Controller power LED 18
Call Home information 131 2145-CF8 22 Power MAP 2145-8A4 245
inventory information 131 SAN Volume Controller Power MAP 2145-CF8, 2145-8G4,
sending 128 2145-CG8 19 2145-8F4, and 2145-8F2 238
number range 157 passwords power off
resetting 122 SAN Volume Controller 258
People's Republic of China, electronic power-supply error LED 33
O emission statement 320
physical characteristics
preparing
SAN Volume Controller
object classes and instances 142
2145 UPS-1U 56 environment 36
object codes 142
redundant ac-power switch 48 uninterruptible power supply
object types 142
SAN Volume Controller 2145-8A4 environment 56
on or off button 54
connectors 24 publications
operator information panel
SAN Volume Controller 2145-8F2 accessing 313
locator LED 19
connectors 31
SAN Volume Controller 2145-8F2 17
SAN Volume Controller 2145-8F4
SAN Volume Controller 2145-8F4 17
SAN Volume Controller 2145-8G4 16
connectors 28
SAN Volume Controller 2145-8G4
R
system-information LED 18 reader feedback, sending xvii
connectors 26
operator-information panel rear-panel indicators
SAN Volume Controller 2145-CF8
hard-disk drive activity LED 17 SAN Volume Controller 2145-8A4 24
connectors 22
power button 17 SAN Volume Controller 2145-8F2 30
service ports 23
power LED 18 SAN Volume Controller 2145-8F4 28
unused ports 24
release latch 18 SAN Volume Controller 2145-8G4 26
SAN Volume Controller 2145-CG8
reset button 17 SAN Volume Controller 2145-CF8 22
connectors 20
SAN Volume Controller 2145-8A4 15 SAN Volume Controller 2145-CG8 19
service ports 21
SAN Volume Controller 2145-CF8 15 Recover Cluster
unused ports 21
SAN Volume Controller 2145-CG8 14 option 121
port speed
system-error LED 17 recovering
Fibre Channel 107
overload indicator 54 front panel display 100
ports
overview offline virtual disks (volumes)
Ethernet 19, 32
product 1 using CLI 218
not used
redundant ac-power switch 47 offline volumes
2145 UPS-1U 54
SAN fabric 7 using CLI 79
SAN Volume Controller
vital product data 87 recovery
2145-8A4 24
system
SAN Volume Controller
when to run 212
2145-8F4 28
P SAN Volume Controller
systems
starting 216
Paced Upgrade 2145-8G4 26
redundant ac power switch
option 121 port names, worldwide 35
cabling 49
panel port numbers, Fibre Channel 35
examples 49
front 13 SAN Volume Controller 2145-CF8 22
redundant ac-power switch
name 14 SAN Volume Controller 2145-CG8 20
environment preparation 48
operator information POST (power-on self-test) 126
field replaceable units 65
SAN Volume Controller power
MAP 255, 256
2145-8A4 15 button 17
overview 47
SAN Volume Controller controls 123
problems 255
2145-8F2 17 failure 99
specifications 48
SAN Volume Controller off
verifying 256
2145-8F4 17 operation 99
related information xiv

Index 327
release latch 18 SAN Volume Controller (continued) SAN Volume Controller 2145-8F2
removing field replaceable units (continued) (continued)
550 errors 213, 214 microprocessor 57 dimensions and weight 45
578 errors 213, 214 operator-information panel 57 heat output 45
node from a cluster 121 power backplane 57 humidity 45
nodes 69 power cable assembly 57 indicators and controls on the front
Repair verification MAP 278 power supply assembly 57 panel 12
repairing riser card, PCI 57 light path MAP 298
space-efficient volume 78 riser card, PCI Express 57 MAP 5800: Light path 298
replacing nodes service controller 57 operator information panel 17
nondisruptively 80 service controller cable 57 product characteristics 45
reporting system board 57 rear-panel indicators 30
events 125 thermal grease 57 specifications 45
requirements voltage regulator module 57 weight and dimensions 45
2145 UPS-1U 51 front-panel display 97 SAN Volume Controller 2145-8F4
ac voltage 36, 37, 39, 40, 41, 42, 43, hardware 1 air temperature 45
44 hardware components 9 connectors 28
circuit breakers 37, 39, 42, 44 menu options controls and indicators on the front
electrical 36, 39, 41, 43 Language? 122 panel 12
power 36, 39, 41, 43 node 106 dimensions and weight 45
SAN Volume Controller 2145-8A4 41 node 9 heat output 45
SAN Volume Controller 2145-8G4 43 overview 1 humidity 45
SAN Volume Controller 2145-CF8 39 power control 123 indicators and controls on the front
SAN Volume Controller 2145-CG8 36 power off 258 panel 12
Rescue Node power-on self-test 126 light path MAP 298
option 122 preparing environment 36 MAP 5800: Light path 298
rescue nodes properties 88 operator information panel 17
performing 226 software product characteristics 45
reset button 17 overview 1 rear-panel indicators 28
reset password menu option 122 SAN Volume Controller 2145-8A4 specifications 45
navigation 122 additional space requirements 43 weight and dimensions 45
resetting the password 122 air temperature without redundant ac SAN Volume Controller 2145-8G4
resetting passwords 122 power 42 additional space requirements 45
restore circuit breaker requirements 42 air temperature without redundant ac
system 211, 218 connectors 24 power 44
controls and indicators on the front circuit breaker requirements 44
panel 11 connectors 26
S dimensions and weight 43
heat output of node 43
controls and indicators on the front
panel 11
SAN (storage area network)
humidity with redundant ac dimensions and weight 45
fabric overview 7
power 42 heat output of node 45
problem determination 205
humidity without redundant ac humidity with redundant ac
SAN Volume Controller
power 42 power 44
2145 UPS-1U 52
indicators and controls on the front humidity without redundant ac
action options
panel 11 power 44
create cluster 113
input-voltage requirements 41 indicators and controls on the front
field replaceable units
light path MAP 292 panel 11
4-port Fibre Channel adapter 57
MAP 5800: Light path 292 input-voltage requirements 43
40×40×28 fan 57
nodes light path MAP 294
40×40×56 fan 57
heat output 43 MAP 5800: Light path 294
alcohol wipe 57
not used, service ports 24 nodes
CMOS battery 57
operator-information panel 15 heat output 45
disk backplane 57
ports 24 not used, service ports 26
disk controller 57
power requirements for each operator information panel 16
disk drive assembly 57
node 41 power requirements for each
disk drive cables 57
product characteristics 41 node 43
disk power cable 57
rear-panel indicators 24 product characteristics 43
disk signal cable 57
requirements 41 rear-panel indicators 26
Ethernet cable 57
specifications 41 requirements 43
fan assembly 57
temperature with redundant ac specifications 43
fan power cable 57
power 42 temperature with redundant ac
Fibre Channel adapter
weight and dimensions 43 power 44
assembly 57
SAN Volume Controller 2145-8F2 weight and dimensions 45
Fibre Channel cable 57
air temperature 45 SAN Volume Controller 2145-CF8
Fibre Channel HBA 57
connectors 31 additional space requirements 40
frame assembly 57
controls and indicators on the front air temperature without redundant ac
front panel 57
panel 12 power 39
memory module 57

328 SAN Volume Controller: Troubleshooting Guide


SAN Volume Controller 2145-CF8 SAN Volume Controller library storage area network (SAN)
(continued) related publications xiv fabric overview 7
circuit breaker requirements 39 self-test, power-on 126 problem determination 205
connectors 22 sending storage systems
controls and indicators on the front comments xvii restore 211
panel 10 serial number 13 servicing 208
dimensions and weight 40 service subnet
heat output of node 41 actions, uninterruptible power menu option 105
humidity with redundant ac supply 51 subnet mask
power 40 service address node option 114
humidity without redundant ac navigation 118 summary of changes xi, xii
power 39 options 118 switches
indicators and controls on the front Service Address 2145 UPS-1U 54
panel 10 option 106 redundant ac power 47
input-voltage requirements 39 service assistant syslog messages 128
light path MAP 286 accessing 75 system
MAP 5800: Light path 286 interface 74 backing up configuration file using
nodes when to use 74 the CLI 220
heat output 41 service CLI diagnose failures 106
operator-information panel 15 accessing 77 IPv6 address 105
ports 22 when to use 76 restoring backup configuration
power requirements for each service commands files 222
node 39 CLI 76 system-error LED 17
product characteristics 39 service controller
rear-panel indicators 22 replacing
requirements 39
service ports 23
validate WWNN 101
Service DHCPv4
T
T3 recovery
specifications 39 option 119
removing
temperature with redundant ac Service DHCPv6
550 errors 213, 214
power 40 option 119
578 errors 213, 214
unused ports 24 service ports
restore
weight and dimensions 40 SAN Volume Controller 2145-CF8 23
clustered system 211
SAN Volume Controller 2145-CG8 SAN Volume Controller 2145-CG8 21
starting 215
additional space requirements 38 Set FC Speed
what to check 218
air temperature without redundant ac option 121
when to run 212
power 37 shortcut keys
Taiwan
circuit breaker requirements 37 accessibility 313
contact information 321
connectors 20 keyboard 313
electronic emission notice 321
controls and indicators on the front shutting down
test and alarm-reset button 54
panel 9 front panel display 100
trademarks 317
dimensions and weight 38 SNMP traps 128
troubleshooting
heat output of node 38 software
event notification email 128, 131
humidity with redundant ac failure, MAP 5050 238
SAN failures 205
power 37 failure, MAP 5060 245
using error logs 98
humidity without redundant ac overview 1
using the front panel 97
power 37 version
indicators and controls on the front display 106
panel 9 space requirements
input-voltage requirements 36 SAN Volume Controller 2145-8A4 43 U
light path MAP 280 SAN Volume Controller 2145-8G4 45 understanding
MAP 5800: Light path 280 SAN Volume Controller 2145-CF8 40 clustered-system recovery codes 156
nodes SAN Volume Controller 2145-CG8 38 error codes 132, 156
heat output 38 specifications event log 126
operator-information panel 14 redundant ac-power switch 48 fields for the node vital product
ports 20 speed data 90
power requirements for each Fibre Channel port 107 fields for the system vital product
node 36 Start MAP 231 data 94
product characteristics 36 starting node rescue codes 155
rear-panel indicators 19 clustered system recovery 215 uninterruptible power supply
requirements 36 system recovery 216 2145 UPS-1U
service ports 21 T3 recovery 215 controls and indicators 52
specifications 36 status environment 56
temperature with redundant ac active 104 operation 51
power 37 degraded 104 overview 51
unused ports 21 inactive 104 front panel MAP 263
weight and dimensions 38 operational 104, 106 operation 51
overview 51

Index 329
uninterruptible power supply (continued) worldwide port names (WWPNs)
preparing environment 56 description 35
United Kingdom electronic emission
notice 320
unused ports
2145 UPS-1U 54
SAN Volume Controller 2145-8A4 24
SAN Volume Controller 2145-8F4 28
SAN Volume Controller 2145-8G4 26
SAN Volume Controller 2145-CF8 24
SAN Volume Controller 2145-CG8 21
using
CLI 77
error code tables 132
GUI interfaces 67
management GUI 67
service assistant 74

V
validating
volume copies 77
VDisks (volumes)
recovering from offline
using CLI 218
viewing
event log 127
vital product data (VPD)
displaying 87
overview 87
understanding the fields for the
node 90
understanding the fields for the
system 94
viewing
nodes 87
volume copies
validating 77
volumes
recovering from offline
using CLI 79
volumes (VDisks)
recovering from offline
using CLI 218
VPD (vital product data)
displaying 87
overview 87
understanding the fields for the
node 90
understanding the fields for the
system 94

W
when to use
cluster (system) CLI 75
management GUI interface 68
service assistant 74
service CLI 76
worldwide node names
change 120
choose 101
display 106
node, front panel display 106, 120
validate, front panel display 101

330 SAN Volume Controller: Troubleshooting Guide




Part Number: 31P1670

Printed in USA

(1P) P/N: 31P1670

GC27-2284-03

You might also like