Dyk 5 27
Dyk 5 27
Dyk 5 27
wes5tools
1/30/2012
1/30/2012
Initial Release
1/30/2012
6840-1456
6841-2456
6841-6456 6841-7456
4884 4884
4884 4884
4884 4884
588X 588X
1/30/2012
Storage Upgrades
6840 (WES 5.x) may be upgraded from SYM 7.15 to SYM 8.37 (WES 5.4)
A software and firmware upgrade
6840 (WES 5.4) is not field upgradeable to 6841 (NS 5.5 or 6.0)
Would require installing a new 2 Gbit drive tray chassis
1/30/2012
Storage Co-existence
The following arrays may coexist in a Teradata system, but not within a clique:
WES 3.x (SCSI Quad-array)
Require Raid Manager 5 on nodes.
Note: SYMplicity AWS SW must be at the same level as the latest version of SYMplicity SW in the system.
1/30/2012
1/30/2012
ESM-B
X10 X1 Tray Number FC-AL Fan FRU
Fan FRU
4 Fibre Channel loops connect the 4 drive trays to the controllers. A drive tray is connected to 2 loops, each controls half of the drives.
FC-AL
FC-AL
FC-AL
Fan FRU
FC-AL
FC-AL
FC-AL
Fan FRU
FC-AL
FC-AL
FC-AL
Fan FRU
Mini-hub number matches the drive loop (channel) 8 Mini-hubs number. (4 host 4 drive side)
Drive Loop 4 3 2 1
OUT OUT OUT OUT OUT OUT OUT OUT
Controller Module
IN IN IN IN IN IN IN IN
1/30/2012
.
ESM B PBC ESM A PBC
.
Drive Tray 1
ESM B PBC
SFP
SFP
SFP
SFP
SFP
SFP
SFP
SFP
SFP
SFP
SFP
SFP
IN
OUT
IN
OUT
IN
OUT IN
OUT
IN
OUT IN
OUT
IN
OUT IN
OUT
SFPs or GBICS
IN
OUT
SFP
IN
OUT
SFP
IN
OUT
SFP
IN
OUT
SFP
Mini-Hub CH 4
Mini-Hub CH 3
Mini-Hub CH 2
Mini-Hub CH 1
Loop CH4
TachyonTL Chip 3
Loop CH3
TachyonTL Chip 2
Loop CH2
TachyonTL Chip 1
Loop CH1
TachyonTL Chip 0 Chip 5 TachyonTL
Loop CH4
TachyonTL Chip 3 Chip 4 TachyonTL
Loop CH3
TachyonTL Chip 2
Loop CH2
TachyonTL Chip 1
Loop CH1
TachyonTL Chip 0 Chip 5 TachyonTL
Chip 4 TachyonTL
Controller A
Controller B
1/30/2012
10
Host Ports B1 A2 B2
OUT OUT OUT
OUT OUT OUT OUT
Controller Module
IN IN IN IN
IN IN IN IN
Node 1
HA1 HA2
Node 2
HA1 HA2
Node 3
HA1 HA2
Node 4
HA1 HA2
Arrays 2, 3, 4
1/30/2012
11
Controller A
TachyonTL TachyonTL Comm Board TachyonTL
Controller B
TachyonTL
RS232
Mini-Hub A1
ETH
Mini-Hub A2
Mini-Hub B1
Mini-Hub B2
GBIC IN
GBIC OUT
GBIC IN
GBIC OUT
GBIC IN
GBIC OUT
GBIC IN
GBIC OUT
FC HBA FC HBA
FC HBA FC HBA
FC HBA FC HBA
FC HBA FC HBA
Node 1
Node 2
Node 3
Node 4
1/30/2012
12
1/30/2012
13
Controller A
0x01
ESM Controller B
0xe4
14
.. .
11
0x1f 0x71
0x2c
0x66 0x4a
12
13
0x23
11
0x6e 0x36
10
ESM
0xcd 0xca
4
Node 1
Node 3
...
0xcc 0xcb
2 3
Drive Tray
1/30/2012 NCR Proprietary & Confidential 16
Loop Initialization
Loop Initialization Primitive (LIP) is the process used to initialize the loop and assign an AL_PA to each device. Loop initialization occurs at power-up, or when any device detects a loop failure (loss of signal synchronization at its receiver), or when a device is inserted or removed. A LIP signal can be sent by one or many devices depending upon the cause. The LIP will propagate around the loop, triggering all other devices to transmit LIP as well. At this point the loop is not useable. (mili-seconds) A second series of signals selects a loop a master to control the AL_PA assignments. If a fabric port is on the loop it will become the master, otherwise the device with the numerically lowest port name (WWN) will be selected.
1/30/2012
17
The last step builds a list of all devices and assigned AL_PAs, the complete list is sent to all devices.
1/30/2012
18
1/30/2012
19
Loop Arbitration
When a device is ready to transmit data, it must first arbitrate and gain control of the Loop. It does this by transmitting the Arbitrate (ARBx) signal, where x = the AL_PA of the device. If the device receives its own ARBx signal it will gain control of the Loop. If however, a higher priority (lower AL-PA) node wishes to use the loop, it discards the lower priority ARB(x) and replaces it with its own. Since the original node does not see its own signal returning it cannot win arbitration instead it passes on the higher ARB(x) signal. After a device wins arbitration it transmits an Open (OPN) signal to a destination device, thus opening a point-to-point communication. Once a device relinquishes control of the Loop, the other devices will again have a chance to arbitrate. An Access Fairness Algorithm prohibits a device from arbitrating again until all other devices have a chance to arbitrate.
1/30/2012 NCR Proprietary & Confidential 20
1/30/2012
21
1/30/2012
23
c700t0d1
The virtual RAID controller has a target ID which is the addressing mechanism for identifying the array.
c700t0d2
c700t0d3
c700t1d1
c700t1d2
DAMC 3
c700t0d5
c700t1d3 c700t1d4
1/30/2012
25
Use the: Controller > View Associated Components selection to display current ownership.
1/30/2012
26
A1
GBIC
A2
B1
B2 Switch Switch
Node
Node
Node
1/30/2012
27
Failover
If mppd has a problem with a path then it will mark that path as "down". If all paths from a node to a controller are bad and marked as "down" then mppd will fail that controller and reassign volume ownership (move all the LUNs over) to the surviving controller. The mppd driver does not hold a failed controller in reset: SYMplicity 8 the controller will be Offline SYMplicity 7 the controller will be Online/Passive.
1/30/2012
28
In addition there is a utility for the LSI Host Bus Adapter (HBA): lsiUtil - General purpose HBA utility and information. The node utilities do not have visibility to the drive side fibre paths in the array, use SYMplicity to debug drive side problems.
1/30/2012
29
mppUtil
Use mppUtil -g [target_ID] to view the drivers internal information about an array.
# /opt/mpp/bin/mppUtil -g1 (c700t1 = DAMC101-3) MPP Information: ---------------ModuleName: DAMC101-3 VirtualTargetID: 0x001 ObjectCount: 0x000 WWN: 600a0b80000f675c000000004095f6eb ModuleHandle: none Controller 'A' Status: ----------------------ControllerHandle: none UTMLunExists: Y (031) NumberOfPaths: 1 Path #1 --------DirectoryVertex: none Controller 'B' Status: ----------------------ControllerHandle: none UTMLunExists: Y (031) NumberOfPaths: 1 Path #1 --------DirectoryVertex: none Lun Information
1/30/2012 NCR Proprietary & Confidential 30
N N N N N
Y N N N N
Present: Y Failed: N
Y N N N N
Present: Y Failed: N
mppUpdate
This utility updates the MPP driver configuration file /etc/conf/pack.d/mppd/space.c
mppUpdate
Options
Set verbose output. -c Clear all array devices from the file. -d module_name Remove the specified array from the file.
-v
Virtual array target IDs (t# in c700t_) are arbitrarily selected by the MPP driver and made persistent by the mppUpdate utility. This utility is automatically run on the next reboot following the installation of the MPPD package. This utility must be manually run whenever arrays are added or removed. New arrays are added to the end of the list and assigned the next available t#.
1/30/2012 NCR Proprietary & Confidential 31
A kernel rebuild and reboot is required whenever changes (manual or by mppUpdate) are made to the space.c file. The list and order of array names in the space.c file must be identical on all nodes within a clique!
1/30/2012 NCR Proprietary & Confidential 32
Array Name
The Array Name is defined through the SYMplicity AMW on each array: Storage Array > Rename
(CLI - set storageArray userLabel=name)
The name is stored on the array and read by MPPD from the array. MPPD does not use the emwdata.bin file to determine array names. The name is cleared by a sysWipe.
1/30/2012
33
mppProbe
This utility runs during node bootup as part of the /etc/rc1.d/S14rdacProbe startup script. This utility probes the physical device addresses for array devices. When it finds an array or data volumes that are not attached to the MPP driver, it will attach them. This utility also creates UTM nodes.
mppUpdate Options -a -u -k Attach newly discovered arrays and volumes Create UTM nodes, existing UTMs are removed Keep existing UTM node entries. [-a] [-u] [-k]
1/30/2012
35
UTM LUNs
For each controller there is a UTM LUN (LUN 31). UTM LUNs are used by the SMAgent software to talk to the controller across the fibre path for array management purposes. The UTM LUN device name follows the physical SCSI HBA name, example - /dev/utm/c220t0d1fs0.
c100 c101 c220 c221 = = = = port port port port 0 1 0 1 on on on on the the the the first HBA first HBA second HBA second HBA
The physical name is also seen in the output of many of the utilities to identify a physical port connection to the array.
1/30/2012
36
1/30/2012
37
1/30/2012
38
lsiUtil
The LSI Fibre Channel adapter has a utility that will display various status and configuration information about the adapter.
/opt/lsiUtil/lsiUtil options [device]
Some more common options: -a = Show All Info -f = Show FW Info -l = List Attached Devices -m = Show Manufacturing Info -r = Reset FC Link Stats -s = Show FC Link Stats -u = Show IO Unit Info -R = Issue A Hard Chip Reset -P = Show ALPA Loop Position Map (summary of l) -V File = Show Version Of Firmware File device = Device to query (c100tfd0s0) if omitted all LSI ports are queried
lsiUtil Example
/opt/lsiUtil/lsiUtil -f -m -P -u -s c220 c220: Running Firmware Version: 2.00.06 FW Info -f Flashed Firmware Version: LSIFC929-2.00.06 (2003.06.02) BIOS Version: BLANK Chip Name: LSIFC929 Chip Revision: B.0 Board Name: 7004G2-LC Manufacturing Info -m Board Assembly: 03-00010-01A Board Tracer Number: 4337174302 Active ALPAs: 3 ALPA positions -P 0:0xef 1:0xe4 2:0xe8 Mapped paths -u Mapped Paths: ONE TimeSinceReset: 608807393314 Microseconds Tx Frame Count: 0x7704 Rx Frame Counts: 0xdb8a Tx Word Counts: 0xfd41a Rx Word Counts: 0x3b5e80 LIP Count: 0x1 NOS Count: 0x0 FC Link Status -s Error Frame Counts: 0x0 Dumped Frame Counts: 0x0 Link Failure Count: 0x0 Loss of Sync Count: 0x1a Loss of Signal Count: 0x1a Primative Seq Err Count: 0x0 Invalid Tx Word Count: 0x0 Invalid Crc Count: 0x0 Initiator IO Count: 0x5758
NCR Proprietary & Confidential
1/30/2012
40
1/30/2012
41
1/30/2012
42
1/30/2012
43
X X X X X X
X X X X X X X X * X
rdac
X X X *
1/30/2012
44
To run SYMplicity on a Windows AWS: Start > Programs > SYMplicity Storage Manager Client
1/30/2012
45
1/30/2012
46
SM Client
Directly Managed
Private LAN
SM Client
MPPD
Array Controllers
Ctrl. Firmware
Ctrl. Firmware
1/30/2012
47
The configuration is contained in the two data files, emwdata.bin and emwback.bin.
For MP-RAS they are in the /var/opt/SM directory. Windows they are in the \Program Files\SM8\client\data folder.
EMW Tips
Use Tools > Rescan on a Host that has unresponsive arrays.
The Tools > Update Monitor is available only when the Event Monitor is NOT synchronized with the management software. Disconnect customer LAN before running Auto Discovery, this prevents lengthy scan and CLAN node names from appearing in the tree. Or use Add Devices to manually specify.
1/30/2012 NCR Proprietary & Confidential 49
Volume: Change - modification priority media scan settings Properties Drive: Locate Hot Spare - assign, unassign Fail Reconstruct Properties
Advanced: Help: Download - ESM or Drive FW Contents Capture State Information About Reset Controller
NCR Proprietary & Confidential 50
SMdevices
SMdevices (UNIX only, part of the SMutil pkg) displays the association between mppd physical device (c700) and SYMplicity volume name.
# SMdevices SYMplicity Storage Manager for NCR Devices, Version 08.37.53.01 Built Wed Mar 12 08:12:39 CST 2003 Copyright (C) LSI Logic Corp 2002. All rights reserved. /dev/rdsk/c700t0d0s0 /dev/rdsk/c700t0d1s0 /dev/rdsk/c700t0d2s0 /dev/rdsk/c700t0d3s0 /dev/rdsk/c700t0d4s0 /dev/rdsk/c700t0d5s0 /dev/rdsk/c700t0d6s0 . . /dev/rdsk/c700t1d0s0 /dev/rdsk/c700t1d1s0 /dev/rdsk/c700t1d2s0 /dev/rdsk/c700t1d3s0 /dev/rdsk/c700t1d4s0 /dev/rdsk/c700t1d5s0 /dev/rdsk/c700t1d6s0 /dev/rdsk/c700t1d7s0 . .
1/30/2012
0, 1, 2, 3, 4, 5, 6,
0, 1, 2, 3, 4, 5, 6,
0, 1, 2, 3, 4, 5, 6, 7,
0, 1, 2, 3, 4, 5, 6, 7,
51
1/30/2012
52
1/30/2012
53
1/30/2012
54
Retrieve Event Log (MEL) Redistribute volumes back to preferred paths Retrieve and set RLS error counts Add devices or perform automatic discovery of arrays
NCR Proprietary & Confidential 55
Controller States
There is NO Passive state with SYMplicity 8.xx The valid states for a controller are: Online (active) or Offline The selections for changing the state of the controller are under the Controller menu.
1/30/2012
56
Remote Volume Mirroring - Disabled Snapshot Volume - Disabled Storage Partitioning - Disabled & 0/0 Allowed/Used
1/30/2012 NCR Proprietary & Confidential 57
1/30/2012
58
1/30/2012
59
SMcli
SMcli is a command line interface utility that provides access to the SYMplicity Script Engine commands. SMcli functions the same on MP-RAS and Windows. Commands are sent to the desired array by the specifying both controllers IP addresses or the name of the array.
SMcli <ctrl_A IP> <ctrl_B IP> -c <command>; SMcli -n <array_name> -c <command>;
The arrays, AWS and all nodes are connected to the PLAN. Therefore, SMcli can be executed from the AWS or from any node.
1/30/2012
60
SMcli Bugs
There are some bugs with the JRE 1.2.2 (within SMruntime package) that can cause the SMcli command to hang or or fail with error code 12.
Bug 1 - SMcli command may fail if only one IP address is specified to an array. Example: SMcli <ctrl_A IP> -c <command>; Bug 2 - SMcli command may fail if access is directed through a host agent. Example: SMcli <host_agent IP> -n <array_name> -c <command>;
To add a new array to a Host Agent Attached path (FC) specify the node name:
SMcli -A SMP004-7
1/30/2012
63
Rename an Array
Use Caution Use set storageArray userLabel to change the name of the array, same as Storage Array > Rename from GUI.
SMcli -n DAMC102-2 -c set storageArray userLabel=\MDA102-2\; Note: If you change the array name using SMcli the emwdata.bin file will not be updated. You must also use the SMcli -A to add the array to the configuration. If you are using SMcli to repair a problem and rename the array back to an original name use the IP address to bypass the configuration file. SMcli 10.5.102.21 10.5.102.22 -c set storageArray userLabel=\DAMC102-2\; Note: the backslash and quotes are required - \array_name\
1/30/2012
65
1/30/2012
67
1/30/2012
68
1/30/2012
69
*Important: The start drive reconstruct command will be available with AP 5.43.xx.xx (Sonoran 4.3) NS 6.1 release in 2H04. Do NOT use set drive operationalState=optimal because it will REVIVE the drive. Current release (AP 5.37) requires you to use the GUI to properly replace a drive and begin reconstruction (normally it should automatically start after drive replacement).
1/30/2012 NCR Proprietary & Confidential 70
1/30/2012
71
Redistribute Volumes
Syntax:
reset storageArray volumeDistribution
Example::
SMcli -n DAMC101-2 -c reset storageArray volumeDistribution;
1/30/2012
72
wes5tools
(MP-RAS only)
1/30/2012
73
GSC, Engineering and LSI has worked together to enhance the SMcli command set to encompass the wes5tools functionality. Thus, for the next release NS 6.1 some wes5tools scripts will still work and others will not.
1/30/2012
74
1/30/2012
75
1/30/2012
76
1/30/2012
77
1/30/2012
78
Controller Shell
1/30/2012
79
AWS GUI - use the Connect function. Telnet - From the AWS telnet to the CMIC using the port # for the controller. Example: telnet CMIC001-1 12021
Where: DAMC 21 = 12021 DAMC 22 = 12022 DAMC 31 = 12031 DAMC 32 = 12032
If Server Management is cabled properly. Use portShow command from CMIC to verify.
rlogin [IP address or name of controller] From an MP-RAS node using rshfa, example: # rshfa DAMC101-21 fcDevs 4
Note: rlogin or rshfa will cause any previously open sessions via SLAN/RS-232 to close. Always exit shell by typing ~
1/30/2012 NCR Proprietary & Confidential 80
<cmd> The command can be a simple command with no arguments like fcAll" or it can have multiple arguments like fcDevs 4".
1/30/2012 NCR Proprietary & Confidential 81
-> arrayPrintSummary 05/12/04-16:16:14 (GMT) (tShell): NOTE: Controllers synchronized. 05/12/04-16:16:14 (GMT) (tShell): NOTE: (host 60, drive 7). 05/12/04-16:16:14 (GMT) (tShell): NOTE: 8, 10, 12, 14} 05/12/04-16:16:14 (GMT) (tShell): NOTE: (Present/Not Failed). SCSI IDs (host 60, 05/12/04-16:16:14 (GMT) (tShell): NOTE: 9, 11, 13, 15} RDAC Mode is Dual-Active. Controller Mode Active. SCSI IDs 8 Volumes Owned = {0, 2, 4, 6, Alt. Ctrl. Mode Active drive 6). 8 Volumes Owned = {1, 3, 5, 7,
The example above executed from a node using rshfa: # rshfa DAMC101-21 arrayPrintSummary
1/30/2012
83
The fc 12 command will display the output of the moduleList and arrayPrintSummary together.
1/30/2012
84
1/30/2012
85
fcAll (Tick 0025561080) ==> 05/12/04-14:44:01 (GMT) 4884-A Chip LinkStat 0-Dst 1-Dst 2-Dst 3-Dst 4-Src 5-Src Up-Loop Up-Loop Up-Loop Up-Loop Up-Loop Up-Loop Our Num ::...Exchange Counts...:: Num ..Link Up.. Port Port :: :: Link Bad Bad ID Logi ::Open Total Errors:: Down Char Frame 1 20 :: 1 928630 8:: 2 0 0 1 20 :: 1 916945 8:: 2 0 0 1 20 :: 0 832819 8:: 2 0 0 1 20 :: 0 837783 8:: 2 0 0 1 3 :: 0 17402 7:: 5 0 4 e8 3 :: 0 17081 7:: 5 0 4
# of abnormally terminated exchanges Loop = private loop Fab = FC Switch connect NoHub = No mini-hub installed Dst = channels to drive trays Src = channels to nodes # of times the link had to initialized (always min of 1)
Errors while link was up # if 100s or multiples more than the other ports there is a problem
1/30/2012
<view>, <devNum (0=all)> All views by view type (active) Inquiry view Names view Path view Common names view Buf view Detail view (All luns) Detail view (Active luns only) All views by lun device All views by view type (detailed) Rls view Devices with any errors Devices with Rw errors returned to VDD Devices with major errors
Tray Slot 1,1 1,2 1,3 1,4 2,1 2,2 2,3 2,4 3,1 3,2 3,3 3,4 4,1 4,2 4,3 4,4 3,0 4,0 1,0 2,0
Cur Alpa d9 d6 d5 d4 cd cc cb ca c3 bc ba b9 b2 b1 ae ad 1e 1d 23 1f 02 e8
Path Channels Cur:0 Alt:3 Cur:3 Alt:0 Cur:0 Alt:3 Cur:3 Alt:0 Cur:0 Cur:2 Cur:0 Cur:2 Cur:1 Cur:2 Cur:1 Cur:2 Cur:1 Cur:3 Cur:1 Cur:3 Cur:1 Cur:1 Cur:0 Cur:0 Cur:0 Cur:5 Alt:2 Alt:0 Alt:2 Alt:0 Alt:2 Alt:1 Alt:2 Alt:1 Alt:3 Alt:1 Alt:3 Alt:1 Alt:2 Alt:3 Alt:3 Alt:2 Alt:1 Alt:4
The Current and Alternate paths for the drives must alternate, if they do not then there is a path failure. Current = active port
Encl = ESM Lmir = local mirror, used for cache mirroring with alternate controller This = This controller
Alt:2 Alt:0
Alt:3 Alt:1
Alt:2
Alt:3
88
1/30/2012
89
Press enter to keep current value and continue to next field. Reboot the controller to apply changes. The names under the network settings are NOT the same as the array name under SYMplicity (not associated with
emwdata.bin).
1/30/2012
92
1/30/2012
93
General Debugging
Windows AWS Logs
Fault Viewer AWS Console Application ICONs
Physical Inspection
LSI HBA LEDs Mini-Hub LEDs ESM LEDs Controller LEDs Drive LEDs
SYMplicity GUI
SMclient Recovery Guru and ICONs Storage Array Profile Capture State Information (perform only on a quiesced array) Major Event Log (MEL) (stored on the array in DACstore) Run Link Status (RLS)
Forwarding of Faults
The LSI Fibre Channel arrays report additional events that are not received by the UMB, and thus, not reported to the AWS through the Service Subsystem. For these type of events to be reported to the AWS / CSF special configuration is required depending upon the type of AWS: Windows NCR MEL package is installed on the AWS
UNIX AWS - Forwarding of SNMP traps are enabled and sent to the AWS.
1/30/2012
95
1/30/2012
96
When configured a green check mark will appear next to the AWS in the EMW
1/30/2012
97
SYMplicity GUI
1/30/2012
98
1/30/2012
99
Clique Configurations
A FC_AL loop configuration a node will have only 1 path from a node to each controller. A system with FC Switches can have up to 4 paths from a node to each controller.
FC_AL Configuration Switched Configuration Disk Array 1 A B1 B Disk Array 2 A B Disk Array 3 A B Disk Array 4 A B
Controller B
A1
GBIC
A2
B2
Switch
Switch
Node
Node
Node
1/30/2012
100
Place the controller back Online which should cause the bypass LED of the failed port to come on. The only condition that should be reported by the Recovery GURU or a Health Status should be Volume not on Preferred Path. With SYMplicity 8 you will not be able to redistribute volumes back to the preferred controller until the path is fixed.
1/30/2012
102
2Gb Link Speed LED should be illuminated. Every Mini-hub with a cable connected should have its Bypass LED off. Every Mini-hub should have its Loop Good LED illuminated. The LSI HBA LEDs should be green of cable is connected.
1/30/2012
103
1/30/2012
104
N N N N N
With switches this will be 2 or 4 and each path will have display Present & Failed status
Y Y N N N
Present: Y Failed: Y
Y N N N
105
Busy: N
down
c700t1
up
up
Controller A
Controller B
C100t0
PCI Bus # Controller # Port # on HBA
Green Yellow
Down
Yellow Yellow
1/30/2012
106
==== Failed: Failed: Failed: Failed: ==== Failed: Failed: Failed: Failed: ==== Failed: Failed: Failed: Failed: ==== Failed: Failed: Failed: Failed:
N N N N N N N N N N N Y N N N N
Controller A Controller B
1/30/2012
107
X
1
2 to 4 physical paths from every host to each controller. All single path and most multiple path failures will not result in a controller failover. mppd uses the 4 paths in a round-robin fashion. mppd will mark a bad path as down mppCheck will mark a fixed path as up ripm -p shows mppds perspective of path states # ripm -p (lsiUtil -S) c700t0 active c100t2d0s0 c101t2d0s0 c220t2d0s0 c221t2d0s0 down up down up active c100t0d0s0 c101t0d0s0 c220t0d0s0 c221t0d0s0 up up up up
Switch
0
1215
Switch
Node 1
Node 4
FC Switch shell command show port. 2 paths to ctrl A are still up to all Nodes
Operational State ----------Online Offline Online Online Login Status -----LoggedIn NotLoggedIn LoggedIn LoggedIn Config Type -----G G G G Running Type ------F Unknown F F Link Link State Speed --------Active 2Gb/s InActive Auto Active Active 2Gb/s 2Gb/s
109
1/30/2012
# ripm -p (lsiUtil -S) c700t0 active c100t2d0s0 c101t2d0s0 c220t2d0s0 c221t2d0s0 active c100t0d0s0 c101t0d0s0 c220t0d0s0 c221t0d0s0 down up up up active c100t0d0s0 c101t0d0s0 c220t0d0s0 c221t0d0s0 active c100t4d0s0 c101t4d0s0 c220t4d0s0 c221t4d0s0 down up up up
Switch
0
1215
Switch
c700t1
down up up up
down up up up
(same indication for c700t2 and t3 on this node only) Node 1 Node 2 Port Number -----0 1 . . 8 9 . . Node 3 Admin State ----Online Online Online Online Node 4
1/30/2012
Disk Array 1 A B
Disk Array 2 A B
Disk Array 3 A B
Disk Array 4 A B
Switch
Switch
Node 1
Node 2
Node 3
Node 4
1/30/2012
111
1/30/2012
112
/opt/mpp/bin/mppUtil -g0 |pg MPP Information: ---------------ModuleName: DAMC101-2 VirtualTargetID: 0x000 ObjectCount: 0x000 WWN: 600a0b80000f69420000000040b1b251 ModuleHandle: none Controller 'B' Status: ----------------------ControllerHandle: none UTMLunExists: Y (031) NumberOfPaths: 1 Path #1 --------DirectoryVertex: none Lun Information
1/30/2012 NCR Proprietary & Confidential
Y N N N N
Y N N N N
Present: Y Failed: N
113
1. Run mppProbe -auk on all nodes connected to the unconfigured path. UTM LUNs will be created. 2. Run ripm -p (mppUtil -S) and verify path is up on all nodes. 3. Run mppCheck to force mppd to use restored paths. 4. Redistribute volumes back to preferred controllers.
1/30/2012
114
1/30/2012
115
1/30/2012
116
Check for failed loops reported by each controller using either the SMcli command or the controllers shell command:
There should be only two loops listed for each drive. None of the Paths should be listed as failed. NOTE: Typically the Preferred loops should alternate between drives in the tray starting with the lowest loop, but this assignment is only setup on a drive basis when that controller accesses a particular drive.
Tray Slot 2,1 2,2 2,3 2,4 2,5 2,6 2,7 2,8
Cur Alpa cd cc cb ca c9 c7 c6 c5
Path Channels Pref:0 Cur:0 Pref:2 Cur:2 Pref:0 Cur:0 Pref:2 Cur:2 Pref:0 Cur:0 Pref:2 Cur:2 Pref:0 Cur:0 Pref:2 Cur:2
117
1/30/2012
118
RLS using SMcli: set storageArray RLSBaseline = currentTime upload storageArray file=rls.txt content=RLSCounts
1/30/2012 NCR Proprietary & Confidential 119
ITW Invalid Transmission Word LF Link Failure LOS Loss of Synchronization LOSG Loss of Signal PSP Primitive Sequence Protocol Error ICRC Invalid CRC
1/30/2012
120
Get the location of the first device Get the location of its upstream device Use Possible Candidate Table to find out the possible candidates for bad component. Using the example on the previous page, the faulty component could be:
ESM in tray 2 Transmitter of drive 2,5 Receiver of drive 2,6
1/30/2012 NCR Proprietary & Confidential 121
ESM of Tray X, Device B (drive), Device A (drive)* Cable or SFPs between Tray X and Y, ESM of Tray Y, Device B (drive), ESM of Tray X*, Device A (drive)* Any cable or SFP in the channel, Minihub, ESM of Tray X, Device B (drive), Device A (controller)*, Controller Chassis* Cable or SFPs between Tray X and Controller Module, Minihub, Device B (controller), ESM of Tray X*, Device A (drive)*, Controller Chassis* MiniHub, Device B (controller), Device A (controller)*, Controller Chassis*
Tray X
Tray Y
Tray X
Controller Module
Controller Module
Tray X
Controller Module
Controller Module
1/30/2012
122
Drive Modules
.
A B
.
A B
.
A B
.
A
Drives
ESMs
IN
OUT
IN
OUT
IN
OUT IN
OUT
SFPs Cables
Loop 4
IN OUT IN OUT
Loop 3
IN OUT
Loop 2
IN OUT
Loop 1
Identify the suspect components on the faulty segment of the FC loop, then either replace one at a time or fanout to further isolate the faulty component.
SFPs
1
Mini-Hubs
Controller Module
CH0
CH3
CH2
CH1
CH3
CH2
CH1
CH0
A
CH4 CH5 CH4
B
CH5
Controllers
1/30/2012
123
FC Loop
- Use fcAll on the controller and lsiUtil a on the nodes. - The device that has the high error counts is the receiving device on the loop. - The faulty segment of the FC loop is from the transmitting device to the receiving device. - The order of the devices on the loop must be determined from the lsiUtil -a command.
1/30/2012 NCR Proprietary & Confidential 125
(resets the HBA FC Link statistics) (displays the HBA FC Link statistics)
Link Statistics
1:0xe8
2:0x01
0x1 TARGET 0/0 0x200200a0b80cf4cd / 0x200200a0b80cf4ce 0xe8 INITIATOR INITIATOR is the other 0x200000062b062884 / 0x100000062b062884 0xef blank is this Node 0x200000062b062154 / 0x100000062b062154
Order of the devices on the bus TARGET is the Disk Array Node
1/30/2012
126
Drive Side
Host Side
1/30/2012
127
TachyonTL
TachyonTL
TachyonTL
TachyonTL
TachyonTL
TachyonTL
TachyonTL
TachyonTL
Controller A
TachyonTL TachyonTL TachyonTL
Controller B
TachyonTL
Identify the suspect components on the faulty segment of the FC loop, then either replace one at a time or fan-out to further isolate the faulty component.
- If the problem is between the controller and the node the suspect components are: 1SFP, 1-Mini-hub, 1-cable, 1controller, and 1-HBA. - If the problem is between the two nodes on the loop the suspect components are: 2SFPs, 1-Mini-hub, 2-cables, or 2-HBAs.
Mini-Hub A1
Mini-Hub A2
Mini-Hub B1
Mini-Hub B2
GBIC IN
GBIC OUT
GBIC IN
GBIC OUT
GBIC IN
GBIC OUT
GBIC IN
GBIC OUT
1 2 3 4 QLA2204 QLA2204
1 2 3 4 QLA2204 QLA2204
1 2 3 4 QLA2204 QLA2204
1 2 3 4 QLA2204 QLA2204
Node 1 (SMP)
Node 2 (SMP)
Node 3 (SMP)
Node 4 (SMP)
1/30/2012
128
UNIX AWS
/var/opt/SM/monitor/RLS.<arrayWWN>.csv
1/30/2012
129
1/30/2012
130
1/30/2012
131
ADEPT
Tallies based on MP-RAS streams error log & Windows System Log Tallies are calculated in real time as errors occur Adept always running on each node monitoring logs Adept runs on AWS and consolidates Adept events sent from the nodes Runs on SMP and MPP systems with MPRAS or Windows nodes. Requires Windows AWS Provides drive serial number Restart Adept : W2K = Adept Service
UNIX = /etc/init.d/adept [start | stop]
1/30/2012
132
fnd === yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes . .
1/30/2012
133
Volume Name Serial Number ---------------- ---------------Scsi0:9:0:0:20 3ET0QRZG000073 Scsi0:8:0:0:13 3ET0E71R000072 Scsi0:9:0:0:2d 3HX0X9LG000073
Note: a similar report for a Windows node can be generated by executing ADEPTReport.bat from the node. A file ADEPTReport.txt is created in the ADEPT directory on that node.
1/30/2012
134
Search the Array Profile to match the serial number to it's Tray/Slot.
1/30/2012
135
Scsi0:9:0:0:2d P9P0I0
DAMC101-3
1/30/2012
136
1/30/2012
137
References
1/30/2012
138
Technical Publications
http://infocentral.daytonoh.ncr.com/tsd-library/
1/30/2012
139