Sun Cluster Cheat Sheet: Daemons and Processes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Sun Cluster 3.2 - Cheat Sheet http://www.datadisk.co.uk/html_docs/sun/sun_cluster_cs.

htm

Sun Cluster Cheat Sheet


This cheatsheet contains common commands and information for both Sun Cluster 3.1 and 3.2, there is some missing information and over time I hope to complete this i.e zones, NAS
devices, etc

Also both versions of Cluster have a text based GUI tool, so don't be afraid to use this, especially if the task is a simple one

scsetup (3.1)
clsetup (3.2)

Also all the commands in version 3.1 are available to version 3.2

Daemons and Processes

At the bottom of the installation guide I listed the daemons and processing running after a fresh install, now is the time explain what these processes do, I have managed to obtain
informtion on most of them but still looking for others.

Versions 3.1 and 3.2


This is used by cluster kernel threads to execute userland commands (such as the run_reserve and dofsck
commands). It is also used to run cluster commands remotely (like the cluster shutdown command).
clexecd This daemon registers with failfastd so that a failfast device driver will panic the kernel if this daemon is
killed and not restarted in 30 seconds.
This daemon provides access from userland management applications to the CCR. It is automatically restarted if it
cl_ccrad is stopped.
The cluster event daemon registers and forwards cluster events (such as nodes entering and leaving the cluster).
cl_eventd There is also a protocol whereby user applications can register themselves to receive cluster events.
The daemon is automatically respawned if it is killed.
cluster event log daemon logs cluster events into a binary log file. At the time of writing for this course, there
cl_eventlogd is no published interface to this log. It is automatically restarted if it is stopped.
This daemon is the failfast proxy server.The failfast daemon allows the kernel to panic if certain essential
failfastd daemons have failed
The resource group management daemon which manages the state of all cluster-unaware applications. A failfast
rgmd driver panics the kernel if this daemon is killed and not restarted in 30 seconds.
This is the fork-and-exec daemon, which handles requests from rgmd to spawn methods for specific data services. A
rpc.fed failfast driver panics the kernel if this daemon is killed and not restarted in 30 seconds.
This is the process monitoring facility. It is used as a general mechanism to initiate restarts and failure action
scripts for some cluster framework daemons (in Solaris 9 OS), and for most application daemons and application
rpc.pmfd fault monitors (in Solaris 9 and10 OS). A failfast driver panics the kernel if this daemon is stopped and not
restarted in 30 seconds.
Public managment network service daemon manages network status information received from the local IPMP daemon
pnmd running on each node and facilitates application failovers caused by complete public network failures on nodes. It
is automatically restarted if it is stopped.

Disk path monitoring daemon monitors the status of disk paths, so that they can be reported in the output of the
cldev status command. It is automatically restarted if it is stopped.
scdpmd Multi-threaded DPM daemon runs on each node. It is automatically started by an rc script when a node boots. It
monitors the availibility of logical path that is visiable through various multipath drivers (MPxIO), HDLM,
Powerpath, etc. Automatically restarted by rpc.pmfd if it dies.

Version 3.2 only


This daemon serves as a proxy whenever any quorum device activity requires execution of some command in userland
qd_userd i.e a NAS quorum device
cl_execd
ifconfig_proxy_serverd
rtreg_proxy_serverd
is a daemon for the public network management (PMN) module. It is started at boot time and starts the PMN service.
cl_pnmd It keeps track of the local host's IPMP state and facilities inter-node failover for all IPMP groups.
scprivipd This daemon provisions IP addresses on the clprivnet0 interface, on behalf of zones.
This daemon monitors the state of Solaris 10 non-global zones so that applications designed to failover between
sc_zonesd zones can react appropriately to zone booting failure
It is used for reconfiguring and plumbing the private IP address in a local zone after virtual cluster is created,
cznetd also see the cznetd.xml file.
This is the "fork and exec" daemin which handles requests from rgmd to spawn methods for specfic data services.
rpc.fed Failfast will hose the box if this is killed and not restarted in 30 seconds
scqdmd The quorum server daemon, this possibly use to be called "scqsd"
pnm mod serverd

File locations

Both Versions (3.1 and 3.2)


man pages /usr/cluster/man
/var/cluster/logs
log files /var/adm/messages
Configuration files (CCR, eventlog, etc) /etc/cluster/
Cluster and other commands /usr/cluser/lib/sc
Version 3.1 Only
sccheck logs /var/cluster/sccheck/report.<date>
Cluster infrastructure file /etc/cluster/ccr/infrastructure
Version 3.2 Only
sccheck logs /var/cluster/logs/cluster_check/remote.<date>
Cluster infrastructure file /etc/cluster/ccr/global/infrastructure
Command Log /var/cluster/logs/commandlog

SCSI Reservations

scsi2:
/usr/cluster/lib/sc/pgre -c pgre_inkeys -d /dev/did/rdsk/d4s2
Display reservation keys
scsi3:

1 of 5 8/25/2010 4:43 AM
Sun Cluster 3.2 - Cheat Sheet http://www.datadisk.co.uk/html_docs/sun/sun_cluster_cs.htm

/usr/cluster/lib/sc/scsi -c inkeys -d /dev/did/rdsk/d4s2

scsi2:
/usr/cluster/lib/sc/pgre -c pgre_inresv -d /dev/did/rdsk/d4s2
determine the device owner
scsi3:
/usr/cluster/lib/sc/scsi -c inresv -d /dev/did/rdsk/d4s2

Command shortcuts

In version 3.2 there are number of shortcut command names which I have detailed below, I have left the full command name in the rest of the document so it is obvious what we are
performing, all the commands are located in /usr/cluster/bin

shortcut
cldevice cldev
cldevicegroup cldg
clinterconnect clintr
clnasdevice clnas
clquorum clq
clresource clrs
clresourcegroup clrg
clreslogicalhostname clrslh
clresourcetype clrt
clressharedaddress clrssa

Shutting down and Booting a Cluster

3.1 3.2
cluster shutdown -g0 -y
##other nodes in cluster
scswitch -S -h <host>
shutdown -i5 -g0 -y
shutdown entire cluster
## Last remaining node
scshutdown -g0 -y

scswitch -S -h <host> clnode evacuate <node>


shutdown single node shutdown -i5 -g0 -y shutdown -i5 -g0 -y
reboot a node into non-cluster mode ok> boot -x ok> boot -x

Cluster information

3.1 3.2
cluster list -v
Cluster scstat -pv cluster show
cluster status
clnode list -v
Nodes scstat –n clnode show
clnode status
cldevice list
Devices scstat –D cldevice show
cldevice status
clquorum list -v
Quorum scstat –q clquorum show
clqorum status
clinterconnect show
Transport info scstat –W
clinterconnect status
clresource list -v
Resources scstat –g clresource show
clresource status

clresourcegroup list -v
scsat -g
Resource Groups scrgadm -pv
clresourcegroup show
clresourcegroup status

clresourcetype list -v
Resource Types clresourcetype list-props -v
clresourcetype show
IP Networking Multipathing scstat –i clnode status -m
Installation info (prints packages and version) scinstall –pv clnode show-rev -v

Cluster Configuration

3.1 3.2
Release cat /etc/cluster/release
Integrity check sccheck cluster check
Configure the cluster (add nodes, add data
scinstall
services, etc) scinstall
Cluster configuration utility (quorum, data
scsetup clsetup
sevices, resource groups, etc)
Rename cluster rename -c <cluster_name>
Set a property cluster set -p <name>=<value>
## List cluster commands
cluster list-cmds

## Display the name of the cluster


cluster list
List
## List the checks
cluster list-checks

## Detailed configuration
cluster show -t global
Status cluster status

2 of 5 8/25/2010 4:43 AM
Sun Cluster 3.2 - Cheat Sheet http://www.datadisk.co.uk/html_docs/sun/sun_cluster_cs.htm

Reset the cluster private network settings cluster restore-netprops <cluster_name>


Place the cluster into install mode cluster set -p installmode=enabled
Add a node scconf –a –T node=&lthost><host> clnode add -c <clustername> -n <nodename> -e endpoint1,endpoint2 -e endpoing3,endpoint4
Remove a node scconf –r –T node=&lthost><host> clnode remove
Prevent new nodes from entering scconf –a –T node=.

scconf -c -q node=<node>,maintstate

Put a node into maintenance state Note: use the scstat -q command to verify
that the node is in maintenance mode, the
vote count should be zero for that node.

scconf -c -q node=<node>,reset

Get a node out of maintenance state Note: use the scstat -q command to verify
that the node is in maintenance mode, the
vote count should be one for that node.

Node Configuration

3.1 3.2
clnode add [-c <cluster>] [-n <sponsornode>] \
-e <endpoint> \
Add a node to the cluster -e <endpoint>
<node>
## Make sure you are on the node you wish to remove
Remove a node from the cluster clnode remove
Evacuate a node from the cluster scswitch -S -h <node> clnode evacuate <node>
Cleanup the cluster configuration (used after
clnode clear <node>
removing nodes)

## Standard list
clnode list [+|<node>]
List nodes
## Destailed list
clnode show [+|<node>]

Change a nodes property clnode set -p <name>=<value> [+|<node>]


Status of nodes clnode status [+|<node>]

Admin Quorum Device

Quorum devices are nodes and disk devices, so the total quorum will be all nodes and devices added together. You can use the scsetup(3.1)/clsetup(3.2) interface to add/remove
quorum devices or use the below commands.

3.1 3.2

scconf –a –q globaldev=d11
Adding a SCSI device to the quorum Note: if you get the error message "uable to scrub device" clquorum add [-t <type>] [-p <name>=<value>] [+|<devicename>]
use scgdevs to add device to the global device namespace.

Adding a NAS device to the quorum n/a clquorum add -t netapp_nas -p filer=<nasdevice>,lun_id=<IDnumdevice nasdevice>
Adding a Quorum Server n/a clquorum add -t quorumserver -p qshost<IPaddress>,port=<portnumber> <quorumser
Removing a device to the quorum scconf –r –q globaldev=d11 clquorum remove [-t <type>] [+|<devicename>]
## Evacuate all nodes ## Place the cluster in install mode
cluster set -p installmode=enabled
## Put cluster into maint mode
scconf –c –q installmode ## Remove the quorum device
clquorum remove <device>
Remove the last quorum device ## Remove the quorum device
scconf –r –q globaldev=d11 ## Verify the device has been removed
clquorum list -v
## Check the quorum devices
scstat –q

## Standard list
clquorum list -v [-t <type>] [-n <node>] [+|<devicename>]

## Detailed list
List clquorum show [-t <type>] [-n <node>] [+|<devicename>]

## Status
clquorum status [-t <type>] [-n <node>] [+|<devicename>]

scconf –c –q reset
Resetting quorum info clquorum reset
Note: this will bring all offline quorum devices online

## Obtain the device number


Bring a quorum device into maintenance mode
scdidadm –L clquorum enable [-t <type>] [+|<devicename>]
(3.2 known as enabled) scconf –c –q globaldev=<device>,maintstate
Bring a quorum device out of maintenance
scconf –c –q globaldev=<device><device>,reset clquorum disable [-t <type>] [+|<devicename>]
mode (3.2 known as disabled)

Device Configuration

3.1 3.2
Check device cldevice check [-n <node>] [+]
Remove all devices from node cldevice clear [-n <node>]

## Turn on monitoring
cldevice monitor [-n <node>] [+|<device>]
Monitoring
## Turn off monitoring
cldevice unmonitor [-n <node>] [+|<device>]

Rename cldevice rename -d <destination_device_name>


Replicate cldevice replicate [-S <source-node>] -D <destination-node> [+]

3 of 5 8/25/2010 4:43 AM
Sun Cluster 3.2 - Cheat Sheet http://www.datadisk.co.uk/html_docs/sun/sun_cluster_cs.htm

Set properties of a device cldevice set -p default_fencing={global|pathcount|scsi3} [-n <node>] <device>

## Standard display
cldevice status [-s <state>] [-n <node>] [+|<device>]
Status
## Display failed disk paths
cldevice status -s fail

## Standard List
cldevice list [-n <node>] [+|<device>]
Lists all the configured devices including paths
scdidadm –L
across all nodes. ## Detailed list
cldevice show [-n <node>] [+|<device>]
List all the configured devices including paths
scdidadm –l see above
on node only.
Reconfigure the device database, creating cldevice populate
scdidadm –r
new instances numbers if required. cldevice refresh [-n <node>] [+]
Perform the repair procedure for a particular scdidadm –R <c0t0d0s0> - device
cldevice repair [-n <node>] [+|<device>]
path (use then when a disk gets replaced) scdidadm –R 2 - device id

Disks group

3.1 3.2
Create a device group n/a cldevicegroup create -t vxvm -n <node-list> -p failback=tru
Remove a device group n/a cldevicegroup delete <devgrp>
Adding scconf -a -D type=vxvm,name=appdg,nodelist=<host>:<host>,preferenced=true cldevicegroup add-device -d <device> <devgrp>
Removing scconf –r –D name=<disk group> cldevicegroup remove-device -d <device> <devgrp>
Set a property cldevicegroup set [-p <name>=<value>] [+|<devgrp>]

## Standard list
cldevicegroup list [-n <node>] [-t <type>] [+|<devgrp>]
List scstat
## Detailed configuration report
cldevicegroup show [-n <node>] [-t <type>] [+|<devgrp>]

status scstat cldevicegroup status [-n <node>] [-t <type>] [+|<devgrp>]


adding single node scconf -a -D type=vxvm,name=appdg,nodelist=<host> cldevicegroup add-node [-n <node>] [-t <type>] [+|<devgrp>]
Removing single node scconf –r –D name=<disk group>,nodelist=<host> cldevicegroup remove-node [-n <node>] [-t <type>] [+|<devgr
Switch scswitch –z –D <disk group> -h <host> cldevicegroup switch -n <nodename> <devgrp>
Put into maintenance mode scswitch –m –D <disk group> n/a
take out of maintenance mode scswitch -z -D <disk group> -h <host> n/a
onlining a disk group scswitch -z -D <disk group> -h <host> cldevicegroup online <devgrp>
offlining a disk group scswitch -F -D <disk group> cldevicegroup offline <devgrp>
Resync a disk group scconf -c -D name=appdg,sync cldevicegroup syn [-t <type>] [+|<devgrp>]

Transport Cable

3.1 3.2
Add clinterconnect add <endpoint>,<endpoint>
Remove clinterconnect remove <endpoint>,<endpoint>
Enable scconf –c –m endpoint=<host>:qfe1,state=enabled clinterconnect enable [-n <node>] [+|<endpoint>,<endpoint>]
scconf –c –m endpoint=<host>:qfe1,state=disabled
Disable clinterconnect disable [-n <node>] [+|<endpoint>,<endpoint>]
Note: it gets deleted
## Standard and detailed list
List scstat
clinterconnect show [-n <node>][+|<endpoint>,<endpoint>]
Status scstat clinterconnect status [-n <node>][+|<endpoint>,<endpoint>]

Resource Groups

3.1 3.2

Adding (failover) scrgadm -a -g <res_group> -h <host>,<host> clresourcegroup create <res_group>

Adding (scalable) clresourcegroup create -S <res_group>


Adding a node to a resource group clresourcegroup add-node -n <node> <res_group>

## Remove a resource group


clresourcegroup delete <res_group>
Removing scrgadm –r –g <group>
## Remove a resource group and all its resources
clresourcegroup delete -F <res_group>

Removing a node from a resource group clresourcegroup remove-node -n <node> <res_group>


changing properties scrgadm -c -g <resource group> -y <propety=value> clresourcegroup set -p Failback=true + <name=value>
Status scstat -g clresourcegroup status [-n <node>][-r <resource][-s <state>][-t <resourcetype>
Listing scstat –g clresourcegroup list [-n <node>][-r <resource][-s <state>][-t <resourcetype>][
Detailed List scrgadm –pv –g <res_group> clresourcegroup show [-n <node>][-r <resource][-s <state>][-t <resourcetype>][
Display mode type (failover or scalable) scrgadm -pv -g <res_group> | grep 'Res Group mode'

## All resource groups


clresourcegroup offline +

Offlining scswitch –F –g <res_group> ## Individual group


clresourcegroup offline [-n <node>] <res_group>

clresourcegroup evacuate [+|-n <node>]

## All resource groups


clresourcegroup online +
Onlining scswitch -Z -g <res_group>
## Individual groups
clresourcegroup online [-n <node>] <res_group>

4 of 5 8/25/2010 4:43 AM
Sun Cluster 3.2 - Cheat Sheet http://www.datadisk.co.uk/html_docs/sun/sun_cluster_cs.htm

Evacuate all resource groups from a node


clresourcegroup evacuate [+|-n <node>]
(used when shutting down a node)

scswitch –u –g <res_group>
Unmanaging clresourcegroup unmanage <res_group>
Note: (all resources in group must be disabled)

Managing scswitch –o –g <res_group> clresourcegroup manage <res_group>


Switching scswitch –z –g <res_group> –h <host> clresourcegroup switch -n <node> <res_group>
Suspend n/a clresourcegroup suspend [+|<res_group>]
Resume n/a clresourcegroup resume [+|<res_group>]
Remaster (move the resource group/s to their
n/a clresourcegroup remaster [+|<res_group>]
preferred node)
Restart a resource group (bring offline then
n/a clresourcegroup restart [-n <node>] [+|<res_group>]
online)

Resources

3.1 3.2
Adding failover network resource scrgadm –a –L –g <res_group> -l <logicalhost> clreslogicalhostname create -g <res_group> <lh-resource>
Adding shared network resource scrgadm –a –S –g <res_group> -l <logicalhost> clressharedaddress create -t -g <res_group> <sa-resource>
scrgadm –a –j apache_res -g <res_group> \
adding a failover apache application and -t SUNW.apache -y Network_resources_used = <logicalhost>
attaching the network resource -y Scalable=False –y Port_list = 80/tcp \
-x Bin_dir = /usr/apache/bin
scrgadm –a –j apache_res -g <res_group> \
adding a shared apache application and -t SUNW.apache -y Network_resources_used = <logicalhost>
attaching the network resource -y Scalable=True –y Port_list = 80/tcp \
-x Bin_dir = /usr/apache/bin
scrgadm -a -g rg_oracle -j hasp_data01 -t SUNW.HAStoragePlus \ clresource create -t HAStorage -g <res_group> \
Create a HAStoragePlus failover resource > -x FileSystemMountPoints=/oracle/data01 \ -p FilesystemMountPoints=<mount-point-list> \
> -x Affinityon=true -p Affinityon=true <rs-hasp>

scrgadm –r –j res-ip
Removing clresource delete [-g <res_group>][-t <resourcetype>][+|<resource>]
Note: must disable the resource first

## Changing
clresource set -t <type> -p <name>=<value> +
changing or adding properties scrgadm -c -j <resource> -y <property=value>
## Adding
clresource set -p <name>+=<value> <resource>

clresource list [-g <res_group>][-t <resourcetype>][+|<resource>]


List scstat -g
## List properties
clresource list-props [-g <res_group>][-t <resourcetype>][+|<resour
scrgadm –pv –j res-ip
Detailed List scrgadm –pvv –j res-ip
clresurce show [-n <node>] [-g <res_group>][-t <resourcetype>][+|<r

Status scstat -g clresource status [-s <state>][-n <node>] [-g <res_group>][-t <reso
Disable resoure monitor scrgadm –n –M –j res-ip clresource monitor [-n <node>] [-g <res_group>][-t <resourcetype>][
Enable resource monitor scrgadm –e –M –j res-ip clresource unmonitor [-n <node>] [-g <res_group>][-t <resourcetype>

Disabling scswitch –n –j res-ip clresource disable <resource>

Enabling scswitch –e –j res-ip clresource enable <resource>


Clearing a failed resource scswitch –c –h<host>,<host> -j <resource> -f STOP_FAILED clresource clear -f STOP_FAILED <resource>
Find the network of a resource scrgadm –pvv –j <resource> | grep –I network
## offline the group ## offline the group
scswitch –F –g rgroup-1 clresourcegroup offline <res_group>

## remove the resource ## remove the resource


Removing a resource and resource group scrgadm –r –j res-ip clresource [-g <res_group>][-t <resourcetype>][+|<resource>]

## remove the resource group ## remove the resource group


scrgadm –r –g rgroup-1 clresourcegroup delete <res_group>

Resource Types

3.1 3.2
Adding (register in 3.2) scrgadm –a –t <resource type> i.e SUNW.HAStoragePlus clresourcetype register <type>
Register a resource type to a node n/a clresourcetype add-node -n <node> <type>
Deleting (remove in 3.2) scrgadm –r –t <resource type> clresourcetype unregister <type>
Deregistering a resource type from a node n/a clresourcetype remove-node -n <node> <type>
Listing scrgadm –pv | grep ‘Res Type name’ clresourcetype list [<type>]
Listing resource type properties clresourcetype list-props [<type>]
Show resource types clresourcetype show [<type>]
Set properties of a resource type clresourcetype set [-p <name>=<value>] <type>

5 of 5 8/25/2010 4:43 AM

You might also like