AIX SSP ServiceEduDec
AIX SSP ServiceEduDec
AIX SSP ServiceEduDec
Service Education
Vasu Vallabhaneni
Jacob Rosales
Service Education
VIOS
VIOS
NG
VIOS
PHYP
VIOS
NG
VIOS
PHYP
VIOS
PHYP
PHYP
PHYP
VIOS
NG
33
VIOS
NG
Service Education
VIOS
NG
PHYP
VIOS
NG
VIOS
NG
VIOS
NG
LPAR
LPAR
LPAR
LPAR
VIOS
Power
HyperVisor
LPAR
Service Education
LPAR
LPAR
LPAR
SAN
VIOS
Storage Pool
Power
HyperVisor
LPAR
Storage Pool
LPAR
Storage Pool
LPAR
Power
HyperVisor
LPAR
Power
HyperVisor
VIOS
LPAR
LPAR
LPAR
LPAR
VIOS
LPAR
LPAR
LPAR
LPAR
VIOS
LPAR
LPAR
LPAR
LPAR
VIOS
Power
HyperVisor
Power
HyperVisor
Storage Pool
SAN
Service Education
Service Education
Service Education
Create SSP
SSP can be created from CLI or CFGASSIST interface
Requirements
One Disk for Cluster Repository [size > 1 GB]
One Disk for Pool [size >= 10 GB]
SSP Cluster Name
Storage Pool Name
Service Education
Service Education
10
Service Education
11
Service Education
12
Service Education
13
Service Education
14
Service Education
15
Service Education
16
Service Education
17
State
OK
MTM
8233-E8B02108F9BP
8233-E8B02108F9BP
8233-E8B0210BBE8P
Service Education
Partition Num
2
3
2
State
Pool State
OK
OK
OK
OK
OK
OK
18
Service Education
a405d026e60f11e0b317fe6a2accf50b
/home/ios/logs/viod.log
0
0
0
0.0.0.0
3
4
/home/ios/socks/vioke_unix
6
/home/ios/socks/api_eve_unix
mycluster
ea93b8c4e5ff11e0a394fe6a2accf50b
00000000000000000000000000000000
ea87514ce5ff11e0a394fe6a2accf50b
mypool
D_E_F_A_U_L_T_061310
0000000009035C7B000000004E7CB1D6
UP
19
Service Education
20
Service Education
21
Service Education
22
Service Education
23
Service Education
24
Service Education
25
Service Education
26
Service Education
27
Service Education
lscluster m
lssrc ls vio_daemon
cluster status clustername ssp1
From root shell
clcmd lssrc ls vio_daemon
28
Service Education
29
Service Education
30
Service Education
31
Service Education
32
Service Education
33
Service Education
34
Service Education
35
Service Education
36
Service Education
37
Service Education
38
Service Education
39
Service Education
40
Service Education
41
Service Education
42
Service Education
Add New Disks Fails then clean up and fail the command
Remove Old Disks Fails
Return an error to command
DBN is will take over the clean up
43
Service Education
TotalLUSize(mb)
40960
LUs
2
Type
CLPOOL
PoolID
44
Service Education
45
Service Education
Example
cluster rmnode clustername ssp1 hostname node2.austin.ibm.com
46
Service Education
47
Service Education
48
Service Education
49
Service Education
Pre Phase
Verify DBN role has been relinquish by the vio_daemon
All SSP VTDs moved to Define State [rmdev l <vtd name>]
Stop vio_daemon
Stop Pool
UndoPre Phase
Start Pool
Start vio_daemon
Move all SSP VTDs to Available State
50
Service Education
51
Service Education
52
Service Education
53
Service Education
Delete SSP
Delete SSP using CLI or Smit interface
Requirements
All the SSP objects have to removed before initiating delete SSP
lus
Clones
Images
54
Service Education
55
Service Education
56
Service Education
57
Service Education
58
Service Education
59
Service Education
Logical Units
Overview
Base building blocks for device virtualization and advanced functionality
Snapshot/Rollback
Thin/Thick
IM
Collection of files within one or more filesets
SSP DB is final arbiter for LU management
/var/vio/SSP//D_E_F_A_U_L_T...
/var/vio/SSP//D_E_F_U_A_L_T/
/var/vio/SSP//D_E_F_A_U_L_T...
/var/vio/SSP//D_E_F_U_A_L_T/
VOL1
60
Service Education
...
VOLX
Logical Units
Creation
MKBDSP command enhanced for clustering
API Validation
Device creation within SSP DB
Object creation within pool
Rollback on failure
Debuging
ioscli_global.*
viod.log
CLI
Create Request
61
API
Validation
Service Education
API
DB Inserts
API
Pool OBJ Creation
API/DAEMON
LCE
Logical Units
Removal
MKBDSP command enhanced for clustering
API Validation
Object removal within pool
Device removal within SSP DB
Debuging
ioscli_global.*
viod.log
CLI
Remove Request
62
API
Validation
Service Education
API
Pool OBJ Removal
API
DB Delete
API/DAEMON
LCE
Logical Units
Provisioning
MKBDSP command enhanced for clustering
API Validation
Device creation within SSP DB
Device creation within local system (ODM)
Rollback on failure
Debuging
ioscli_global.*
viosCmd.log
cfglog
viod.log
CLI
Map Request
63
API
Validation
Service Education
API
DB Inserts
API
MKDEV
CFG
Add Child
API
DB Update
SSP Database
Configuration
Database artifacts
/var/vio/SSP/<CLUSTER>/D_E_F_A_U_L_T_061310/VIOSCFG/DB
Catalog file (DB)
Transaction logs
Backup
Startup and FFDC
/var/vio/SSP/<CLUSTER>/local/VIOSCFG/DB
solid.ini
Error logs (solmsg/sollerror/soltrace).out
Checkpoints 5 Transactions/5 minutes
DB Backup once a day at 9pm
Debug
Pool full, 64MB minimum space
Error Report
viod.log
64
Service Education
SSP Database
LIBVIO
Provides database services and abstraction utilizing ODBC
Debug
65
ioscli_global.*
vioCmd.log
viod.log
cfglog
Service Education
66
Service Education
67
Service Education
68
Service Education
Election Framework
Heartbeat/Validate Success
Init
Primary
Error/Node no
Longer meets
Reqs
Detect No Primary
Elect Self Success
Detect New
Primary
Elect Start
Elect Self
Error/Validate
failure
Relinquish
No new Primary
Node does
not Meet reqs
Call Next
Cand
Elect Self
Request
Rcvd BUSY
Start timer
Detect New
Primary
Node is elector
Exhausted
Node list
Elect Fail
Start timer
Service Education
Elect Wait
Timer
Transitional State
Final State
2011 IBM Corporation
Node A fails to
elect self, calls
next node
Node A starts
Election
Node B
# cat dbn.ecf
ecf_version:1
cluster_id:0
primary_node_id:0
primary_node_ipaddr:0.0.0.0
elector_node_id:0
election_result:1
# cat dbn.ecf
ecf_version:1
cluster_id:Cluster ID
primary_node_id:0
primary_node_ipaddr:0.0.0.0
elector_node_id:Node A ID
election_result:1
Node A
Node B fails to
elect self, calls
next node
Node C attempts to
elect self
Node C
# cat dbn.ecf
ecf_version:1
cluster_id:Cluster ID
primary_node_id:Node C ID
primary_node_ipaddr:9.3.2.14
elector_node_id:0
election_result:2
Node C elects
self, updates
ECF
Node C is Primary
Node D
70
Service Education
Node A starts
Election
# cat dbn.ecf
ecf_version:1
cluster_id:Cluster ID
primary_node_id:0
primary_node_ipaddr:0.0.0.0
elector_node_id:Node A ID
election_result:1
Node A
Node A is in Wait
Timer state
Node A fails
election, updates
ECF
71
Node B attempts to
elect self
Service Education
Node B
# cat dbn.ecf
ecf_version:1
cluster_id:Cluster ID
primary_node_id:0
primary_node_ipaddr:0.0.0.0
elector_node_id:0
election_result:4
Node B fails to
elect self, calls
next node
Node A detects it
started election
- Database Node
pnn.ecf
72
Service Education
Contents of ECF
# cat dbn.ecf
ecf_version:1
cluster_id:7a454cd2e95f11e084b736867718cf0b
primary_node_id:7a39063ee95f11e084b736867718cf0b
primary_node_ipaddr:9.3.92.142
elector_node_id:0
election_result:2
- File version
- Unique identifier for cluster
- Unique ID for the primary node*
- IP addr for the primary node
- Unique identifier for the elector
- Result of previous elections**
73
Service Education
EAs store the primary node id of the elected node and a sequence number
Sequence number is increased every time a non-zero value is set for the ea node id
7A39063EE95F11E084B736867718CF0B - 0000000000000001
Primary
Node ID
74
Service Education
Sequence
Number
Primary Roles
75
Service Education
VKE Services
Interface Abstraction Services
Emulator <-> Cluster Communication
Emulator -> Daemon
Emulator <-> Database
Daemon <-> Cluster Communication
76
Service Education
VKE Architecture
Module Breakdown by Interface
Separate anchor block by interface with lock protection independent of each other
77
Service Education
VKE RAS
Component Trace
vke component
Traces all interfaces except the syscall interface
vke.sc component
Traces all syscalls with viodHead information (wraps quickly)
Error Log
KDB
https://w3.tap.ibm.com/w3ki03/display/vio/VKE+Debug
78
Service Education
ESTART/ECASSOC/ECOMPLETE
Normal election sequence. ECASSOC not required in all cases.
ERELINQH
Election relinquish handle not found. Common trace if 1st data word (the handle) is 0.
FMHBTOUT
Daemon heartbeat timed out. VKE will kill daemon PID
ECFTIMER
79
Service Education
[<address> | main
vke pcb
[<address> | main
Build String in root anchor block shows date/time and build cycle
80
Service Education
81
Service Education
Mapping Problems
Configuring SSP VTDs is more complicated than other types of
VTDs, due to interactions with the SSP database and storage pool
Check the error log. If there is no vhost error log, then the CLI never got far enough to call mkdev;
check the CLI traces.
If error log has Detail Data like this:
DetailData
ADDITIONALINFORMATION
module:ngdisk_init_lun1.67rc:0000000000000016location:00000583
data:13D49BA2AD9F11E08AF76A6C1249000D888100000000000000
82
Service Education
this indicates that the problem was not with the device driver, but the
configuration method. Check the config log:
MS80611548913046cfg_vt_ngdisklvtscsi1
M08061154cfg_vtdev_ngdisk.c94>Enterconfigure_deviceforcfg_vt_ngdisk
M08061154cfg_vtdev_ngdisk.c260erroraccessingdatabaseforvtscsi1rc1.
M08061154cfg_vtdev_ngdisk.c136ioctl(VSCSI_HOST_ADD_NGDISK)failed.vhost=vhost0,vtdev=vtscsi1,errno=22
M08061154cfg_vtdev_ngdisk.c146exitwithERROR,rc=33,rc2=47,cfg_failed=0x21.vhost=vhost0,vtdev=vtscsi1
cfg_failed not being 0 confirms that the config method encountered an error,
in this case with accessing the database.
83
Service Education
Hung I/O
In kdb, you can use the svvtd vhostX command to list information
about all VTDs mapped to the specified vhost adapter, including any
commands on their active queues:
CommandElementatF1000A0400AFC348
cmd_list.next:F1000A0400AFC348cmd_list.prev:F1000A0400AFC348
delay_devstrat.next:0delay_devstrat.prev:0
working_area:0tag:F1000A0015B91548
srp_id:61300time0:1E8116F20683BF
start_time:1E8116F20699C6proc_time1:1607
proc_time2:0wait_time:0
lua:8100000000000000lun:F1000A0400CFBC00
num_bufs:1iodones_rcvd:0
task_attrib:00flags:00status_qualifier:00
status:00non_scsi_status:00iu_len:40resid:0
add_len_cdb:0sense_size:0
CDB2A000006B09000000804000000000000
[...]
84
Service Education
The upper portion of the tid is the thread slot number. (Note that 0x2AB = 683.)
KDB(0)>f683
pvthread+02AB00STACK:
[000093C4].unlock_enable_mem+0000B8()
[000D6DBC]e_block_thread+00049C()
[00014F50].kernel_add_gate_cstack+000030()
[F1000000C066EF4C]osCondBlock+00000C()
[F1000000C066B104]ioWaitPager+000284(??)
[F1000000C08384F4]cfsDIOWrite+0011F4(??,??,??,??,??,??,??,??)
[F1000000C083AED4]cfsRioMove+0004F4(??,??,??,??,??,??)
[F1000000C07DC164]cfsDataWrite+000724(??,??,??,??,??,??,??)
[F1000000C07BED10]cfsRdwrAttr+0008F0(??,??,??,??,??,??,??,??)
[005984A8]vnop_rdwr+0001A8(??,??,??,??,??,??,??,??)
[005B0314]vno_rw+0000B4(??,??,??,??,??)
[0055EDDC]fp_rwuio+00029C(??,??,??,??)
[00014F50].kernel_add_gate_cstack+000030()
[F1000000C025E180]ngdisk_io_proc+000A00(??,??)
[F1000000C0267204]ngdisk_thread+0003E4(??)
[00014D70].hkey_legacy_gate+00004C()
[00254234]threadentry+000094(??,??,??,??)
85
Service Education
86
Service Education
I/O Errors
Any errors attempting to write to the backing SP file will generate an error log
DetailData
ADDITIONALINFORMATION
module:ngdisk_io_proc1.64rc:0000000000000034location:00004563
data:2370BF6E94411E0B7336A6C1249000D21B08D4E94411E0B7336A6C1249000D
Exceptions:
87
Service Education
VIOSBR Backup
VIOSBR has been updated to support backup of SSP
VIOSBR backs up the following data
88
Backup of all the SSP and Classic Mappings for all the nodes in UP state in the
cluster
Service Education
89
Service Education
90
Service Education
VIOSBR - View
VIOSBR has been updated to view SSP backup files
viosbr -view -file FileName -clustername clusterName [-type devType][-detail | -mapping]
Eaxmple viosbr -view -file test.ITL_UPT.tar.gz -clustername ITL_UPT | more
Files in the cluster Backup
===========================
ITL_UPTDB
ITL_UPTMTM8233-E8B02061AAFPP1.xml
ITL_UPTMTM8233-E8B02061AAFPP2.xml
ITL_UPTMTM8233-E8B02061AAFPP3.xml
ITL_UPTMTM8233-E8B02061AAFPP4.xml
===========================
Details in: /home/ios/ITL_UPT.8716414/ITL_UPTMTM8233-E8B02061AAFPP1.xml
===============================================================
Controllers:
============
Name
Phys Loc
----------iscsi0
pager0
U8233.E8B.061AAFP-V1-C32769-L0-L0
vasi0
U8233.E8B.061AAFP-V1-C32769
vbsd0
U8233.E8B.061AAFP-V1-C32769-L0
fcs0
U5877.001.0080617-P1-C1-T1
91
Service Education
92
Service Education
Creates Cluster
Start pool
Start vio_daemon
Using clcmd run restore on all the node with skipcluster option
Example Nodes are added one at a time. If adding a node fails then viosbr returns with error. Do the following.
Verify whether the node is in cluster using lscluster m
Verify the mappings on the node using lsmap all
Run the restore command again
Limitations
93
viosbr doesnt restore network configuration if the SSP is configured on the system
Restore the network configuration before restoring SSP on all the nodes using
viosbr restore -clustername ITL_UPT file test.ITL_UPT.tar.gz type net
Service Education
94
Service Education
Command
viosbr restore -file test.ITL_UPT.tar.gz -clustername ITL_UPT subfile
ITL_UPTMTM8233-E8B02061AAFPP1.xml
95
Service Education
96
Service Education
Command to restore
viosbr restore -file test.ITL_UPT.tar.gz -clustername ITL_UPT subfile
ITL_UPTMTM8233-E8B02061AAFPP1.xml -xmlvtds
97
Service Education
Questions?
98
Service Education